oxford notes on limits and cont

Analysis II: Continuity and Differentiability HT 2010

Janet Dyson

Contents

0. Introduction

1. Limits of Functions

2. Continuity of functions

3. Continuity and Uniform Continuity

4. Boundedness of continuous functions on a closed and bounded inter-val.

5. Intermediate value theorem

6. Monotonic Functions and Inverse Function Theorem

7. Limits at infinity and infinite limits

8. Uniform Convergence

9. Uniform Convergence: Examples and Applications

10. Differentiation: definitions and elementary results

11. The elementary functions

12. The Mean Value Theorem

13. Applications of the MVT

14. L’Hospital’s Rule

15. Taylor’s Theorem

16. The Binomial Theorem

Introduction

Acknowledgement

These lectures have been developed by a number of lecturers over the years. I would partic-ulary like to thank Professor Roger Heath-Brown who gave the first lectures for this coursein its present form in 2003 and Dr Brian Stewart and Dr Zhongmin Qian who allowed me toadapt their lecture notes and use their LATEX files.

Lectures

To get the most out of the course you must attend the lectures. There will be more expla-nation in the lectures than there is in the notes.

On the other hand I will not put everything on the board which is in the printed notes. Insome places I have put in extra examples which I will not have time to demonstrate in thelectures. There is also some extra material which I have put in for interest wbut which I donot regard as central to the course.

Numbering system:

In the printed notes there are 16 sections. Within each section there are subsections. The-orems, defintions, etc are numbered consecutively within each subsection. So for exampleTheorem 1.2.3 is the third result in Section 1, Subsection 2. I will use the numbering in theprinted notes, even though I will omit some subsections in the lectures, so the numberingwill no longer be consecutive.

Exercise sheets

The weekly problem sheets which accompany the lectures are an integral part of the course.In Analysis above all you will only understand the definitions and theorems by using them.

I assume that week 1 tutorials are being devoted to the final sheets from the MichaelmasTerm courses.

I suggest that the problem sheets for this course are tackled in tutorials in weeks 2–8, withthe 8th sheet used a vacation work for a tutorial in the first week of Trinity Term.

Corrections

Please email any corrections to me at [email protected]

Notation:

I will use this notation (which was used in courses on MT) throughout.

• C: set of all complex numbers - the complex plane.

• R: set of all real numbers - the real line; R ⊂ C.

• Q: the rational numbers, Q ⊂ R.

• N: the natural numbers, 1, 2, . . . , N ⊂ Q.

• ∀: “for all” or“for every” or “whenever”.

• ∃: “there exist(s)” or “there is (are)”.

• Sometimes I will write “s. t.” for “such that” , “resp.” for “respectively”, “iff” for “ifand only if”.

Recall the following definition from ‘Introduction to Pure Mathematics’ last term

If a, b ∈ R then we define intervals as follows:

(a, b) := {x ∈ R : a < x < b}[a, b] := {x ∈ R : a ≤ x ≤ b}

(−∞, a) := {x ∈ R : x < a},

etc.

1 Limits of Functions

1.1 Sequence limits and completeness

This course builds on the ideas from Analysis I and also uses many of the results from thatcourse. I have put some of the most important results from Analysis I in these notes butI will not write them on the board in the lecture. However, I will begin by recalling thedefinition of limits for sequences.

Definition 1.1.1. A sequence {zn} of real (or complex) numbers has limit l, if∀ε > 0, ∃N ∈ N, such that,

|zn − l| < ε ∀n > N.

We denote this by ‘zn → l as n → ∞’ or by ‘limn→∞ zn = l’.

Definition 1.1.2. A sequence {zn} of real (or complex) numbers converges if it has a limitl.

Often we prove things by contradiction. We start by assuming that what we want is nottrue. That means we have to be able to write down the contrapositive of a proposition.We can do this mechanically: working from the left change every ∀ into ∃, every ∃ into ∀and negate the simple proposition at the end.

For example, by the first definition, a sequence {zn} does not converge to l 1, if and only if∃ε > 0, such that ∀k ∈ N, ∃nk > k s. t.

|znk− l| > ε.

Definition 1.1.3. {zn} is called a Cauchy sequence if ∀ε > 0 ∃N ∈ N such that ∀n, m >N

|zn − zm| < ε.

Here is the key theorem, sometimes called The General Principle for Convergence:

Theorem (Cauchy’s Criterion). A sequence {zn} of real (or complex) numbers convergesif and only if it is a Cauchy sequence.

When mathematicians say that the real number system R and the complex number systemC are complete what they mean is that this theorem is true. There are no sequences whichlook as though they converge but don’t, there are no ‘gaps’ in the real number line.

According to Cauchy’s criterion, {zn} diverges [i.e. has no finite limit], if and only if ∃ε > 0,such that ∀k ∈ N, there exists [at least] two integers nk1 , nk2 > k s. t. |znk1

− znk2| > ε.

1i.e. either {zn} diverges, or zn → a 6= l

1

1.2 Compactness

The following theorem demonstrates the “compactness” of a bounded subset.

Theorem. (The Bolzano–Weierstrass Theorem) Any bounded sequence in R (or in C)has a subsequence which converges to a point in R (in C).

The proofs of theorems about continuous functions we are going to prove in this course relyon the following

Corollary 1.2.1. A bounded sequence {zn} in R (or in C) converges to a limit l if and onlyif all convergent subsequences of {zn} have the same limit l.

Proof. =⇒: This was proved in Analysis I: any subsequence of a convergent sequence tendsto the same limit.

⇐=: We argue by contradictionSuppose {zn} is divergent.

Since {zn} is bounded, there exists a subsequence {znk} converging to some limit l1 by the

Bolzano–Weierstrass Theorem. Notice that N \ {nk : k ∈ N} can’t be finite: if that were tohappen then {zn} and {znk

} would have a common tail and so zn → l1 after all.

We can therefore let {zmk} be the subsequence got by omitting the terms labelled by nk. If

this subsequence converges to l1 then it is easy to see that {zn} converges to l1. (In AnalysisI we did this for special cases like z2n → l and z2n+1 → l; the argument is easily modified.)

So we have that {zmk} does not converge to l1. Writing down the contrapositive, then, we

have there exists ε0 > 0 such that for every natural number j there exists a natural numberwhich we denote by rj such that rj > j and |zmrj

− l1| > ε0.

Now {zmrj} is bounded, so by the Bolzano–Weierstrass Theorem there exists a convergent

subsequence {zmrjs}, with zmrjs

→ l2 say. Then letting s → ∞ in

0 < ε0 6 |zmrjs− l1|

we get, by the preservation of weak inequalities, that

0 < ε0 6 |l2 − l1|

and so l1 6= l2. We thus have found two subsequences of {zn} which converge to distinctlimits.

1.3 Limit points

Before we talk about limits of functions ‘f(x) → l as x → a’ we need to say something aboutthe sort of points a in a set that we are interested in—we want to exclude points which xcan’t get near!

Definition 1.3.1. Let E ⊆ R (or C). A point p ∈ R (or C) is called a limit point (or anaccumulation point, or a cluster point) of E, if ∀δ > 0, there is at least one point z ∈ Eother then p such that

0 < |z − p| < δ.

2

Definition 1.3.2. A point which is not a limit point of E is called an isolated point of E.

There are all sorts of exotic examples of limit points but most sets we will consider areintervals so the following result is crucial:

Theorem 1.3.3. p ∈ R is a limit point of an interval [a, b] ( or (a, b], or [a, b) or (a, b)) ifand only if p ∈ [a, b].

Proof for the interval (a, b]. There are (by trichotomy) only three cases: p < a, p ∈ [a, b], andp > b. In the first take δ := (a−p)/2 and get a contradiction, in the third take δ := (p−b)/2.The case p ∈ [a, b] is an exercise.

1.4 Functions

Let f : X → Y , where X and Y are subsets of C or R.

Although there’s no such thing a typical function here are three examples, which are oftenuseful as test cases when we formulate definitions and make conjectures.

Example 1.4.1. f(x) =√

1 − x2 with domain E = [−1, 1]. What is its graph? Its graphlooks continuous . . . .

Example 1.4.2. Consider function f on E = (0, 1] given by

f(x) :=

pq if x = p

q in lowest terms,

0 when x is irrational.

This time our sketch of the graph is a bit more sketchy. [Try with Maple].

Example 1.4.3. The function f(x) = x sin 1x with domain R\{0} is an important test case

in our work. As x gets close to 0, the values of f oscillate, but they do get close to 0. Oncewe have formalised this we will see that f has limit 0 as x goes to 0.

1.5 Limits of Functions

Having looked at these examples we make a definition.

Definition 1.5.1. Let E ⊆ R (or C), and f : E → R (or C) be a real (or complex) function.Let p be a limit point of E and let l be a number. We say that f tends to l as x tends to

p if ∀ε > 0 ∃δ > 0 such that

|f(x) − l| < ε ∀x ∈ E such that 0 < |x − p| < δ.

In symbols we write this as ‘limx→p f(x) = l’ or ‘f(x) → l as x → p.’

Remark 1.5.2. (i) Note that p is not necessarily in E.

(ii) Note that in the definition δ may depend on p and ε.

3

Example 1.5.3. Let α > 0 be a constant. Consider the function f(x) = |x|α sin 1x on the

domain E = R\{0}. Show that f(x) → 0 as x → 0.

Since | sin θ| 6 1 we have that∣

∣xα sin 1x

∣

∣ ≤ |x|α for any x 6= 0. Therefore, ∀ ε > 0, choose

δ = ε1/α. Then∣

∣

∣

∣

xα sin1

x− 0

∣

∣

∣

∣

≤ |x|α < ε whenever 0 < |x − 0| < δ.

According to the definition, |x|α sin 1x → 0 as x → 0.

Example 1.5.4. Consider the function f(x) = x2 on the domain E = R. Let a ∈ R. Showthat f(x) → a as x → a.

Note that |x2 − a2| = |x − a||x + a| ≤ |x − a|(|x| + |a|). So we want to get a bound on x.Suppose that |x − a| < 1, then

|x| = |x − a + a| ≤ |x − a| + |a| < 1 + |a|.

So ∀ǫ > 0, choose δ = min{1, ǫ1+2|a|}. Then

|x2 − a2| ≤ |x − a|((1 + |a|) + |a|) < ǫ whenever |x − a| < δ

as required.

Theorem 1.5.5. Let f : E → R (or C) and p be a limit point of E. If f has a limit asx → p, then the limit is unique.

Proof. Suppose f(x) → l1 and also f(x) → l2 as x → p, where l1 6= l2. Then 12 |l1 − l2| > 0,

so by definition, ∃δ1 > 0 such that

|f(x) − l1| < 12 |l1 − l2| ∀x ∈ E such that 0 < |x − p| < δ1

Similarly, ∃δ2 > 0 such that

|f(x) − l2| < 12 |l1 − l2| ∀x ∈ E such that 0 < |x − p| < δ2.

Let δ = min{δ1, δ2}. Since p is a limit point of E and δ > 0, ∃x0 ∈ E such that 0 < |x0−p| <δ. However

|l1 − l2| = |(f(x0) − l1) − (f(x0) − l2)| [Add and subtract technique]

6 |f(x0) − l1| + |f(x0) − l2| [Triangle Law]

< 12 |l1 − l2| + 1

2 |l1 − l2|= |l1 − l2|

a contradiction.

Remark 1.5.6. An exercise in contrapositives: f doesn’t converge to l as x → p (i.e. eitherf has no limit or f(x) → a 6= l as x → p), means that ∃ε > 0, such that ∀δ > 0, ∃x ∈ Esuch that 0 < |x − p| < δ but |f(x) − l| > ε.

4

The following theorem translates questions about function limits to questions about sequencelimits, and so we can make use of results in Analysis I.

Theorem 1.5.7. Let f : E → R (or C) where E ⊆ R (or C), p be a limit point of E andl ∈ C. Then the following two statements are equivalent:

(a) f(x) → l as x → p;

(b) For every sequence {pn} in E such that pn 6= p and limn→∞ pn = p we have thatf(pn) → l as n → ∞.

We might say informally that limx→p f(x) = l if and only if f tends to the same limit l alongany sequence in E going to p.

Proof. =⇒: Suppose limx→p f(x) = l. Then ∀ε > 0, ∃δ > 0 such that

|f(x) − l| < ε ∀x ∈ E such that 0 < |x − p| < δ.

Now suppose {pn} is a sequence in E, with pn → p and pn 6= p. Then ∃N ∈ N such that

|pn − p| < δ ∀n > N .

So, since pn 6= p,|f(pn) − l| < ε ∀n > N.

Hence, limn→∞ f(pn) = l.

⇐=: Argue by contradiction. Suppose limx→p f(x) = l is not true. Then ∃ε0 > 0,such that ∀δ > 0,—which we choose to be 1/n for arbitrary n— ∃xn ∈ E, with 0 < |xn−p| <1/n but

|f(xn) − l| > ε0.

Therefore we have found a sequence {xn} which converges to p but {f (xn)} does not tendto l. Contradiction.

1.6 Example

The above result is very useful when we want to prove that limits do not exist.

Example 1.6.1. Show that limx→0 sin 1x doesn’t exist.

Let xn =1

πnand yn =

1

2πn + π2

. Then both sequences xn and yn tend to 0, but

limn→∞

sin1

xn= 0

and

limn→∞

sin1

yn= 1.

So limx→0 sin 1x cannot exist.

5

1.7 Algebra of Limits

We can use the theorem of the previous subsection together with the Algebra of Limits ofSequences to prove the corresponding results: we get the Algebra of Limits of Functions. Westate the theorem for C but it also holds for R.

Theorem 1.7.1. Let E ⊆ C and let p be a limit point of E. Let f, g : E → C, and letα, β ∈ C. Suppose that f(x) → A, g(x) → B as x → p. Then the following limits exist andhave the values stated:

(Addition) limx→p (f(x) + g(x)) = A + B;

(Negation) limx→p (−f(x)) = −A;

(Linear Combination) limx→p (α · f + β · g) (x) = αA + βB;

(Product) limx→p (f(x)g(x)) = AB;

(Quotient) if B 6= 0 then ∃δ > 0 s.t. g(x) 6= 0 ∀x ∈ E such that 0 < |x − p| < δ, andlimx→p (f(x)/g(x)) = A/B;

(Weak Inequality) if f(x) > 0 for all x ∈ E then A > 0.

It is a good exercise to prove these results directly from the definitions; just mimic thesequence proofs.

Example of proof: If B 6= 0 then ∃δ > 0 s.t. g(x) 6= 0 ∀x ∈ E such that 0 < |x − p| < δ, andlimx→p (1/g(x)) = 1/B;

I will do it both ways:

(i) Deduction from AOL for sequences: Suppose first that there is no such δ. Then for eachn, ∃pn ∈ E such that 0 < |pn − p| < 1/n, and g(pn) = 0. But then pn → p, so g(pn) → B,giving B = 0, a contradiction. So δ > 0 exists.

Now let {xn} be any sequence in E with xn → p and xn 6= p. We may assume xn ∈ (p−δ, p+δ)(by tails). Hence g(xn) 6= 0 and g(xn) → B. Thus by the AOL for sequences, 1/g(xn) → 1/B.Thus 1/g(x) → 1/B as required.

(ii) Direct proof: Take ǫ = |B|/2 > 0. So ∃δ1 > 0 such that

|g(x) − B| < |B|/2 ∀x ∈ E such that 0 < |x − p| < δ1.

Thus by the Triangle Law

|g(x)| = |B + (g(x) − B)| ≥ |B| − |g(x) − B| > |B| − |B|/2 = |B|/2

∀x ∈ E such that 0 < |x − p| < δ1. (So in particular, g(x) 6= 0 whenever 0 < |x − p| < δ1.)

Now, given ǫ > 0, ∃δ2 > 0 such that

|g(x) − B| < |B|2ǫ/2 ∀x ∈ E such that 0 < |x − p| < δ2.

6

Take δ = min{δ1, δ2}. Then if x ∈ E is such that 0 < |x − p| < δ

|1/g(x) − 1/B| =|g(x) − B||g(x)||B| <

|B|2ǫ/2

|B|2/2= ǫ

as required.

Remark 1.7.2. Note we have also proved above that if limx→p g(x) = B 6= 0, then there isa positive number δ > 0 such that

|g(x)| >|B|2

∀x ∈ E such that 0 < |x − p| < δ.

In particular, |g(x)| > 0 ∀x ∈ E such that 0 < |x − p| < δ.

It can be proved similarly that if g : E → R and B > 0, then ∃δ > 0 such that g(x) > B2 > 0

∀x ∈ E such that 0 < |x − p| < δ.

1.8 An extension

Sometimes we want to extend the notion ‘f(x) → ℓ as x → p’ to cover ‘infinity’. Here isone such extension: note that although ∞ appears in the language, we have not given itthe status of a number: it can only appear in certain phrases in our mathematical languagewhich are shorthand for quite complicated statements about real numbers.

Definition 1.8.1. Suppose that E ⊆ R is a set which is unbounded above and f : E → R.Then we write f(x) → ∞ as x → ∞ to mean that ∀B > 0, ∃D > 0 such that

f(x) > B ∀x ∈ E s.t. x > D.

Extra result added 26th January 2010:

Here is a version of the sandwich theorem.

Proposition 1.8.2. Let E ⊆ R and let p be a limit point of E. Let f, m, M : E → R. Supposethat there exists δ > 0 s.t. m(x) ≤ f(x) ≤ M(x) for all x ∈ E such that 0 < |x − p| < δ andthat m(x) → l, M(x) → l as x → p. Then limx→p f(x) exists and equals l.

Proof. Since m(x) → l and M(x) → l as x → p we have:∀ε > 0 ∃δ1 > 0 s.t. l − ε < m(x) < l + ε ∀x ∈ E s.t. 0 < |x − p| < δ1. and∀ε > 0 ∃δ2 > 0 s.t. l − ε < M(x) < l + ε ∀x ∈ E s.t. 0 < |x − p| < δ2.So if we take δ3 = min{δ, δ1, δ2}, then∀ε > 0 l − ε < m(x) ≤ f(x) ≤ M(x) < l + ε ∀x ∈ E s.t. 0 < |x − p| < δ3,and we are done.

2 Continuity of functions

We all have a good informal idea of what it means to say that a function has a continuousgraph: we can draw it without lifting the pencil from the paper. But we want now to useour precise definition of ‘f(x) → l as x → p’ to discuss the idea of continuity. That is wewant to discuss the precise question of whether f is continuous at a particular point p.

7

2.1 Definition

In the definition of limx→p f(x), the point p need not belong to the domain E of f . But evenif it does, and f(p) is well-defined, the limit of f at p may have nothing to do with f(p).

The classic example is the function

f(x) :=

{

0 if x 6= 0,1 otherwise.

Then limx→0 f(x) = 0 6= 1.

This example motivates our definition.

Definition 2.1.1. Let f : E → R (or C), where E ⊆ R (or C), and p ∈ E. If ∀ε > 0 ∃δ > 0such that

|f(x) − f(p)| < ε ∀x ∈ E such that |x − p| < δ

then we say that f is continuous at p.

We continue with the notation of the definition for a moment and see what this means forisolated and limit points.

Proposition 2.1.2. f is continuous at any isolated point of E.

Proof. As p is isolated there exists δ > 0 such that there are no other points of x ∈ E suchthat 0 < |x − p| < δ. The inequality required is therefore vacuously true.

Proposition 2.1.3. If p ∈ E is a limit point of E, then f is continuous at p, if and only if

limx→p

f(x) exists and limx→p

f(x) = f(p).

Proof. It’s clear that the continuity definition implies the limit one at once. The limit one,provided the limit is f(p), delivers all that we need for continuity except that the inequality|f(x) − l| < ε holds for x = p as well as the other points x in |x − p| < δ. But this isimmediate.

2.2 Examples

Example 2.2.1. Let α > 0 be a constant. The function f(x) = |x|α sin 1x not defined at

x = 0 so it makes no sense to ask if it is continuous there. In such circumstances we modifyf in some suitable way. So we look at

g(x) :=

{

|x|α sin 1x if x 6= 0,

0 if x = 0.

Then 0 is a limit point of the domain, and we calculated before that limx→0 g(x) = 0 = g(0),so g is continuous at 0.

8

Example 2.2.2. Let f : (0, 1] → R be defined by

f(x) :=

{

1n if x = m

n in lowest terms,0 if x is irrational.

At which points of (0, 1] is f continuous?

This is very like problem on the Exercise Sheets, so I won’t give a full proof here, onlyindicate how I would tackle it.

Every p ∈ (0, 1] is a limit point, so we need to work out limx→p f(x) for each p. We knowthat we can do this by looking at limn→∞ f(pn) for each sequence {pn} converging to p.

We know that there is always a sequence of irrationals {xn} converging to p. (Because, fromAnalysis I, for every n ∈ N the interval (p, p+1/n) contains an irrational number xn.) Thenthe sequence {f(xn)} is just the null sequence (0, 0, . . . ) with limit 0.

So perhaps we need to distinguish between rational and irrational points?

Suppose p 6= 0 is rational. Then, with {xn} as above, f(xn) → 0 but f(p) = p 6= 0. Thereforef is not continuous at non-zero rational points.

Now let p be irrational. Some sequences (for example irrational ones) tend to 0 = f(p). Butdo all sequences have this property? Let pn → p, and consider f(pn). If this does not tendto zero, then for some ε > 0 we can find a subsequence such that f(pnj

) > ε. That is, thesepnj

must be rational and have denominators less than 1ε . There are only a finite number of

such points in the interval, and so—since pn → p—we can find a n such that for n > N allthe pn are irrational or have denominators at least 1

ε . Hence we cannot have the claimedsubsequence.

Therefore f is continuous at irrational points since for all sequences {pn} we have thatf(pn) → f(p).

2.3 Algebraic properties

We can use our characterisation of continuity at limit points in terms of limx→p f(x) togetherwith the Algebra of Function Limits to prove that the class of functions continuous at p isclosed under all the usual operations. We state the theorem for C but it also holds for R.

Theorem 2.3.1. Let E ⊆ C and let p ∈ E. Let f, g : E → C, and let α, β ∈ C. Supposethat f, g are continuous at p. Then the following functions are also continuous at p:

(Addition) f(x) + g(x);

(Negation) −f(x);

(Linear Combination) (α · f + β · g) (x);

(Product) (f(x)g(x)); and

(Quotient) (f(x)/g(x)) provided g(p) 6= 0 (which guarantees that there exists δ such thatf(x)/g(x) is defined ∀x ∈ E such that |x − p| < δ).

9

Proof. Follows directly from the Algebra of Function Limits. However, it is a good2 exerciseto write out a proof from the definition—again just mimic what was done for the AOL forsequences.

Example 2.3.2. Let f : C → C (or R → R) be a polynomial. Then f is continuous at everypoint of C (or R).

Further, if f(x) = r(x)q(x) , where r, q : C → C (or R → R) are polynomials. Then if q(p) 6= 0,

f is continuous at p.

This follows immediately from the above theorem because the function f(x) = x with domainC (or R) is continuous.

2.4 Composition of continuous functions

However we can do more than these trivial algebraic results.

Theorem 2.4.1. Let f : E → C and g : f(E) → C, and define h : E → C by

h(x) = (g ◦ f)(x) ≡ g(f(x)) for x ∈ E.

If f is continuous at p ∈ E and g is continuous at f(p), then h is continuous at p.

Proof. For any ε > 0, since g is continuous at f(p), ∃δ1 > 0 such that

|g(y) − g(f(p))| < ε ∀y ∈ f(E) such that |y − f(p)| < δ1.

That is|g(f(x)) − g(f(p))| < ε ∀x ∈ E such that |f(x) − f(p)| < δ1.

However, f is continuous at p, so ∃δ > 0 such that

|f(x) − f(p)| < δ1 ∀x ∈ E such that |x − p| < δ.

Hence|g(f(x)) − g(f(p))| < ε ∀x ∈ E such that |x − p| < δ

so that h is continuous at p.

Extra result added 26th January 2010:

The following theorem follows immediately from Proposition 2.1.3 and the proof of Theorem1.5.7. In this case we do not need to avoid sequences which hit the point:

Theorem. 1.5.7 ′ Let f : E → R (or C) where E ⊆ R (or C) and p ∈ E. Then the followingtwo statements are equivalent:

(a) f(x) is continuous at p;

(b) For every sequence {pn} in E such that limn→∞ pn = p we have that f(pn) → f(p) asn → ∞.

2Doing this will reinforce the definitions, but also consolidate your understanding of sequences.

10

3 Continuity and Uniform Continuity

3.1 Continuous functions on sets

Having made our definition of ‘continuity’ we will see that actually, what usually matters isnot continuity at a point, but continuity at all points of a set, and the interesting sets areusually intervals or disks. In the later lectures we are going to establish several importanttheorems about continuous functions on bounded intervals.

But here is the definition of continuity on a set.

Definition 3.1.1. Let f : E → R (or C). We say that f is continuous on E if f iscontinuous at every point of E.

For later use we decode this in terms of εs and δs.

Proposition 3.1.2. Let f : E → R (or C). Then f is continuous on E if,

∀p ∈ E and ∀ε > 0, ∃δ > 0

such that|f(x) − f(p)| < ε ∀x ∈ E such that |x − p| < δ.

Note that the δ may depend on ε and on the point p.

We are about to look at uniform continuity, in which δ does not depend on p. First we willconsider an example which is not uniformly continuous.

3.2 An Example

We look at an example of a function continuous on a set.

Example 3.2.1. Let f : (0,∞) → R be given by f(x) :=1

x.

Show that for every p 6= 0, limx→p1x = 1

p , and thus f(x) is continuous on (0,∞).

By the algebra of limits this is all clear. But we want to analyse what is going on morecarefully, to see how the δ is related to ε and the the point x in question.

First,

|f(x) − f(p)| =

∣

∣

∣

∣

1

x− 1

p

∣

∣

∣

∣

=|x − p||x||p|

and we can see that the problem term is1

x.

However, |p| > 0, and so when |x − p| < 12 |p| we have by the Triangle Law that

|x| > |p| − |x − p| > 12 |p|;

so we’re going to have to pick δ 612 |p|.

11

For these x, then, we have that

|f(x) − f(p)| 62

|p|2 |x − p|

and if we make sure 2|p|2

|x − p| < ε we will be done.

This can be achieved that by choosing

δ := min(

12 |p|, 1

2ε|p|2)

which is indeed positive.

Note that for small ε (the interesting ones) the values of δ we need depend heavily on p.Near 1 choosing 1

2ε will do, but at 10−6 we need 12.1012 ε. Our function is certainly continuous

at every point, but there’s no way of controlling over the whole interval how far it strays ina small neighbourhood.

3.3 Uniform Continuity

Sometimes we want to be able to control what happens over a set more ‘uniformly’.

Definition 3.3.1. Let f : E → R (or C). Then f is uniformly continuous on E if,

∀ε > 0, ∃δ > 0

such that|f(p) − f(x)| < ε ∀p ∈ E and ∀x ∈ E such that |p − x| < δ.

Note the difference3between this and the definition of ‘continuous on E’. In this, the uniformcase, we must find δ on the basis of ε alone, and we have to choose one that will give theinequality for all x ∈ E and p ∈ E. Obviously if we can do this it is very nice, it gives us away of controlling what happens on a set all at once.

Of course f : E → R (or C) is uniformly continuous on E implies that f is continuous on E.

Here is one class of functions that satisfy the uniform continuity condition.

Example 3.3.2. Suppose that f is Lipschitz continuous in E: that is, assume that there isa constant M such that

|f(x) − f(y)| 6 M |x − y| ∀x, y ∈ E.

Then f is uniformly continuous on E.

3For those who like pure formulae,

Continuity on E: ∀p ∈ E ∀ε > 0 ∃δ > 0 ∀x ∈ E [|x − p| < δ =⇒ |f(x) − f(p)| < ε]

Uniform Continuity on E: ∀ε > 0 ∃δ > 0 ∀p ∈ E ∀x ∈ E [|x − p| < δ =⇒ |f(x) − f(p)| < ε]

Swapping ∀s doesn’t give problems, but swapping the ∀p and ∃δ is the crunch.

12

Take x, y ∈ E. Given ε > 0, choose δ = εM+1 > 0. Then

|f(x) − f(y)| 6 M |x − y|

6 M

(

ε

M + 1

)

< ε

whenever |y − x| < δ.

Note that our choice of δ does not depend on x or y. For a given ε > 0 we can find a δ thatworks for all x and y.


x is Lipschitz continuous on [1,∞), so it is uniformly continuous.

To see the Lipschitz condition note that

|√x −√y| 6

|x − y|√x +

√y

612 |x − y|

for all x, y > 1.

3.4 Continuity implies Uniform Continuity on [a, b]

Our first real theorem is:

Theorem 3.4.1 (Uniform Continuity on [a, b]). If f : [a, b] → R (or C) is continuous,then f is uniformly continuous.

More generally, a continuous function on a closed and bounded set—‘compact set’ as we’llsay next year—is uniformly continuous.

Proof. Suppose that f were not uniformly continuous. By the contrapositive of ‘uniformcontinuity’ there would exist ε > 0, such that for any δ > 0—which we choose as δ = 1

n forarbitrary n—there exists a pair of points xn, yn ∈ [a, b], such that

|xn − yn| < 1n but |f(xn) − f(yn)| > ε.

Since {xn : n ∈ N} ⊆ [a, b] is bounded, by the Bolzano–Weierstrass Theorem there exists asubsequence {xnk

} which converges to some p. Hence p must be a limit point of [a, b], sop ∈ [a, b]. But

|ynk− p| 6 |xnk

− ynk| + |xnk

− p|

<1

nk+ |xnk

− p| → 0

Thus xnk→ p and ynk

→ p, so that by continuity at p we have

0 < ε 6 limk→∞

|f(xnk) − f(ynk

)| = |f(p) − f(p)| = 0

which is impossible.

13

3.5 An example on an unbounded interval


x is uniformly continuous in the unbounded interval [0, +∞).

We do this in three steps: we prove uniform continuity on [0, 1], we prove uniform continuityon [1, +∞), and we patch these together.

It is easy to get that√

x is continuous on [0, 1]; provided |x − p| < 12p we will get

∣

∣

√x −√

p∣

∣ 6|x − p|√x +

√p

62

3√

p|x − p|

and can argue from there. Thus it must be uniformly continuous by Theorem 3.4.1.

Secondly we have already shown that√

x is Lipschitz continuous on [1,∞), so it is uniformlycontinuous on [1,∞).

Now we have to patch these together. This is a standard sort of argument which we do thistime as an example.

We have that for all ε > 0, ∃δ1 > 0 such that

|√x −√y| < 1

2ε ∀x, y ∈ [0, 1] such that |x − y| < δ1.

and ∃δ2 > 0|√x −√

y| < 12ε ∀x, y ∈ [1,∞) such that |x − y| < δ2.

Choose δ = min{δ1, δ2,12} > 0. Then, suppose that |x− y| < δ. If x, y > 1 or x, y 6 1 we are

done.

So suppose that x ∈ [0, 1] and y > 1. Then |x − 1| < δ and |y − 1| < δ so that

|√x −√y| 6 |√x −

√1| + |√y −

√1|

< 12ε + 1

2ε = ε

Hence we have|√x −√

y| < ε

whenever x, y ∈ [0,∞) such that |x − y| < δ. By definition, f(x) =√

x is uniformlycontinuous in the unbounded interval [0, +∞).

3.6 A counterexample on a half open interval

The condition that the interval [a, b] is closed cannot be relaxed.

Example 3.6.1. f(x) = 1x is not uniformly continuous in the half open interval (0, 1]. (see

also Example 3.2.1)

Take ǫ = 1. We show that there is no δ > 0 such that definition 3.3.1 holds.

Take sequences xn = 1n and yn = 1

n+1 . Then |f(xn) − f(yn)| = 1, but |xn − yn| → 0. Sofor any δ > 0, there exists n such that |xn − yn| < δ but |f(xn) − f(yn)| 6< 1. So f is notuniformly continuous.

14

4 Continuous functions on a closed and bounded interval

4.1 Boundedness

We begin with some definitions.

Definition 4.1.1. Let f : E → R (or C), and let M be a non-negative real number. We saythat f is bounded by M on E if

|f(z)| 6 M ∀z ∈ E.

We also say that M is a bound for f on E. If there is a bound for f on E we say that f isbounded (on E).

Here is one of the important theorems of the course:

Theorem 4.1.2 (Continuous functions on [a, b] are bounded). If f : [a, b] → R (or C)is continuous, then f is bounded.

Proof. Argue by contradiction. Suppose f were unbounded, then for any n ∈ N, there is atleast one point xn ∈ [a, b] such that |f(xn)| > n. Since {xn} is bounded, by the Bolzano–Weierstrass Theorem, there exists a subsequence {xnk

} converging to p, say. Then p is a limitpoint of the interval [a, b] so p ∈ [a, b]. Note that |f(xnk

)| > nk > k. Now f is continuous atp and so we have that

f(p) = limk→∞

f(xnk)

so in particular the sequence {f(xnk)} is convergent. Hence, by an Analysis I result, this

sequence is bounded. As its k-th term exceeds k we have a contradiction.

Therefore f must be bounded.

We will now show that these bounds are ‘attained’.

Notation 4.1.3. Let f : E → R be a bounded real-valued function, with E 6= ∅. Then write

supx∈E

f(x) := sup{f(t) | t ∈ E}

infx∈E

f(x) := inf{f(t) | t ∈ E}

noting that these exist by the Completeness Axiom.

Corollary 4.1.4. Let f : [a, b] → R be continuous then supx∈[a,b] f(x) and infx∈[a,b] f(x)exist.

Proof. Immediate.

15

Note 4.1.5. Recall that the supremum is precisely this: an upper bound, such that nothingsmaller is an upper bound. It is convenient to translate this into ε-language about functionsas follows:

M = supx∈E

f(x) if and only if

{

∀x ∈ E, f(x) 6 M ; and∀ε > 0 ∃xε ∈ E such that f(xε) > M − ε.

We have a similar characterisation of infimum:

m = infx∈E

f(x) if and only if

{

∀x ∈ E, f(x) > m; and∀ε > 0 ∃xε ∈ E such that f(xε) < m + ε.

Here now is our second important theorem; note that it is only for real-valued functions.

Theorem 4.1.6 (Continuous functions on [a, b] attain their bounds). Let f : [a, b] →R be continuous, then f attains (or achieves) its supremum and infimum. That is, thereexist points4 x1 and x2 in [a, b] such that f(x1) = supx∈[a,b] f(x) and f(x2) = infx∈[a,b] f(x).

Proof. (1st Proof: by contradiction.) Let us prove by contradiction that the supremum Mof f is attained.

Assume the contrary, that is

f(t) < M for all t ∈ [a, b].

Consider the function g defined on [a, b] by

g(x) =1

M − f(x)

which is positive and continuous on [a, b]. Therefore g is, as we have proved, bounded on[a, b], by M0 say:

1

M − f(x)= g(x) 6 M0.

It follows that

f(x) 6 M − 1

M0

for all x ∈ [a, b] which is a contradiction to the fact that M is the least upper bound.

A similar argument deals with the infimum. 5

As this is such an important theorem we give a different proof.

4Note that x1, x2 may be not unique.5Or we may apply what we have done to −f and get the result at once since

inf{t | t ∈ E} = − sup{−t | t ∈ E}.

16

Proof. (2nd Proof: direct.) The continuous function f is bounded by our earlier theorem,so that m := infx∈[a,b] f(x) exists by the Completeness Axiom of the real number system

[Analysis I]. Apply the characterisation of infimum we have given, taking ε := 1n to find a

point xn ∈ [a, b] such thatm 6 f(xn) < m + 1

n .

Now {xn} is bounded, so we may use the Bolzano–Weierstrass Theorem to extract a con-vergent subsequence {xnk

}; suppose we have xnk→ p. Then p is a limit point of [a, b] so

p ∈ [a, b]. Since f is continuous at p, we have that f(xnk) → f(p). From the inequality

m 6 f(xnk) < m +

1

nk

we can deduce, as limits preserve weak inequalities, that

m 6 limk→∞

f(xnk) = f(p) 6 lim

k→∞

(

m +1

nk

)

= m

so that f(p) = m = infx∈[a,b] f(x).

A similar argument will deal with the supremum.

4.2 A Generalisation

In the proofs we have used only:(i) [a, b] is bounded;(ii) [a, b] is closed (i.e. [a, b] contains all limit points of [a, b];(iii) f is continuous.

This prompts us to make the following definition:

Definition 4.2.1. A subset A of R (or of C) is compact if it is bounded, and if it containsall its limit points.

Our proofs would then give the more general result:

Theorem. Let f : E → R be a real valued function on a compact subset E of R or C. Thenf is bounded, uniformly continuous, and attains its bounds.

5 The Intermediate Value Theorem

So far we have concentrated on extreme values, the supremum and the infimum. What canwe say about possible values between these?

Theorem 5.0.2 (IVT). Let f : [a, b] → R be continuous, and let c be a number betweenf(a) and f(b). Then there is at least one ξ ∈ [a, b] such that f(ξ) = c.

This is one of the most important theorems in this course.

17

Proof. By considering −f instead of f if necessary, we may assume that f(a) 6 c 6 f(b).The cases c = f(a) and c = f(b) are trivial, so assume f(a) < c < f(b).

Define g(x) = f(x) − c. Then g(a) < 0 < g(b). Let x1 = a and y1 = b. Divide the interval[x1, y1] into two equal parts. If g(1

2(x1 + y1)) = 0 then ξ := 12(x1 + y1) will do. Otherwise,

we choose x2 = x1 and y2 = (12(x1 + y1) if g(1

2(x1 + y1)) > 0, or x2 = 12(x1 + y1) and y2 = y1

if g(12(x1 + y1)) < 0. Then

g(x2)g(y2) < 0; [x2, y2] ⊂ [x1, y1]; and |y2 − x2| = 12 (y1 − x1) .

Apply the same argument to [x2, y2] instead of [x1, y1], we then find that: either g(12(x2 +

y2)) = 0 and we can take ξ := 12(x2 + y2), or there exist x3, y3 such that

g(x3)g(y3) < 0; [x3, y3] ⊂ [x2, y2]; and |y3 − x3| = 12 (y2 − x2) .

By repeating the same procedure, we thus find two sequences xn, yn, such that

(i) either g(12(xn−1 + yn−1)) = 0 and we can take ξ := 1

2(xn−1 + yn−1),

or g(xn)g(yn) < 0;

(ii) [xn, yn] ⊂ [xn−1, yn−1] for any n = 2, . . . ;

(iii) |yn − xn| = 12 |yn−1 − xn−1| = · · · = 1

2n−1 |y1 − x1| =b − a

2n−1.

Obviously, {xn} is a bounded increasing sequence, and {yn} is a bounded decreasing sequence.Bounded monotone sequences converge and so xn → ξ and yn → ξ′ for some ξ, ξ′ ∈ [a, b].Since by Algebra of Limits

|ξ′ − ξ| = limn→∞

|yn − xn| = limn→∞

12n−1 (b − a) = 0,

we get ξ = ξ′. Since g is continuous at ξ, we have by Algebra of Limits and the preservationof weak inequalities that

0 > limn→∞

g(xn)g(yn) = limn→∞

g(xn) limn→∞

g(yn) = g(ξ)2.

Hence g(ξ)2 = 0 as we are dealing with real numbers, so that g(ξ) = 0.

That is, f(ξ) = c.

Remark 5.0.3. The above proof of the IVT also provides a method of finding roots tof(ξ) = c, but other methods may find roots faster if additional information about f (e.g. thatf is differentiable) is available.

Corollary 5.0.4. Let {[xn, yn]} be a decreasing net6 of closed intervals of R such that thelength yn − xn → 0. Then ∩∞

n=1[xn, yn] contains exactly one point.

Proof. Just extract the relevant lines of the IVT proof.

Remark 5.0.5. The proof of IVT requires more than what we needed for boundedness andthe attainment of bounds. We have used the fact that [a, b] is unbroken. That is, we haveused the fact that [a, b] is “connected”.

6That is [xn+1, yn+1] ⊂ [xn, yn] for each n

18

Here (for interest) is a sketch of an alternative proof, which identifies ξ as the supremum ofa certain set.

Sketch of alternative proof to IVT. As in the original proof it is sufficient to prove thatif g : [a, b] → R is continous and g(a) < 0 < g(b), then there exists ξ ∈ (a, b) such thatg(ξ) = 0.

Define E = {x ∈ [a, b] : g(x) < 0}.Then E 6= ∅ (why?) and E is bounded (why?). So, by the Completeness Axiom, ξ = supEexists. We prove g(ξ) = 0.

Suppose first that g(ξ) < 0 (so ξ ∈ [a, b)). But then, as g is continuous, there exists h > 0s.t. (ξ, ξ + h) ⊂ [a, b] and g(t) < 0 for t ∈ (ξ, ξ + h). (Proof?) But then ξ + h/2 ∈ E so ξ isnot the sup, a contradiction.

Suppose now that g(ξ) > 0 (so ξ ∈ (a, b]). But then, again because g is continuous, thereexists h > 0 s.t. (ξ − h, ξ) ⊂ [a, b] and g(t) > 0 for t ∈ (ξ − h, ξ). (Proof?) But there existst ∈ (ξ − h, ξ] such that g(t) < 0 which is also a contradiction.

Hence g(ξ) = 0 as required.

This proof may seem familiar from Analysis I, where a proof similar to this was used to provethe existence of

√2. In fact we can now prove this directly from the IVT.

Example 5.0.6. There exists a unique positive number ξ s.t. ξ2 = 2.

Proof: Consider f(x) = x2 − 2. Note that f(0) = −2 and f(2) = 2. So f : [0, 2] → R,f(0) < 0 < f(2) and also, as f is a polynomial, it is continuous. Thus, by the IVT, thereexists ξ ∈ (0, 2) such that f(ξ) = 0, as required.

More generally the IVT is often used to show that algebraic equations have solutions. Inthe following, if you draw the graphs of y = ex and y = αx, you will see that if α = e thecurves touch, if α < e they do not meet, but if α > e then they meet twice. The followingexample shows how to make this graphical argument rigorous using the IVT. It shows thatif α > e there exist two solutions. Once we have covered differentiability you will be able toprove that there are exactly two solutions, by using the fact that f ′(x) < 0 if x < log α, butf ′(x) > 0 if x > log α.

Example 5.0.7. Let α > e. Show that there exist two distinct points xi > 0, i = 1, 2, suchthat exi = αxi.

Proof: Consider f(x) = ex − αx. We will prove later that for all x, ex is continuous. Hence

f(x) is continuous on [0,∞). ex is defined by its power series so that ex > x2

2 . Thus eX > αXfor any X > 2α. Fix such an X(> log α).

Then f(0) = 1 > 0, f(log α) = α(1 − log α) < 0, f(X) > 0. So we can apply the IVTto the two intervals [0, log α], and [log α, X] to find that there exist x1 ∈ [0, log α] such thatf(x1) = 0, and x2 ∈ [log α, X] such that f(x2) = 0 as required.

19

5.1 Closed bounded intervals map to closed bounded intervals

We can reformulate the theorems of sections 4 and 5 as follows.

Theorem 5.1.1. Let f : [a, b] → R be a real valued continuous map. Then f ([a, b]) = [m, M ]for some m, M ∈ R.

That is, a continuous real-valued function maps a closed and bounded interval onto a closedand bounded interval.

Proof. Let m := infx∈[a,b] f(x) and M := supx∈[a,b] f(x). These exist by the theorem onboundedness. Clearly f ([a, b]) ⊆ [m, M ].

By the theorem on the attainment of bounds, there exist ξ ∈ [a, b] and η ∈ [a, b] such thatf(ξ) = m and f(η) = M ; hence m, M ∈ f ([a, b]).

Now let y ∈ [m, M ], so f(ξ) ≤ y ≤ f(η). By applying the IVT to f restricted to the interval[ξ, η] (or [η, ξ] as case may be) we find an x ∈ [ξ, η] ⊆ [a, b] such that f(x) = y; hencey ∈ f ([a, b]). Hence [m, M ] ⊆ f ([a, b]).

6 Monotonic Functions and Inverse Function Theorem

6.1 Monotone Functions

The following definitions require the ordered structure of real numbers, and so are not avail-able for functions on a subset of the complex plane.

Definition 6.1.1. Let f be a real function on E ⊆ R.

(a) (i) We say that f is increasing if f(x) 6 f(y) whenever x 6 y.

(ii) We say that f is strictly increasing if f(x) < f(y) whenever x < y.

(b) (i) We say that f is decreasing if f(x) > f(y) whenever x 6 y.

(ii) We say that f is strictly decreasing if f(x) > f(y) whenever x < y.

A function is called monotone on E if it is increasing or decreasing on E.

6.2 Continuity of the Inverse Function

Recall that the inverse function was defined in ‘Introduction to Pure Mathematics’ last term.

Definition 6.2.1. Let f : A → B be a function. We say that ‘f is invertible’ if there existsa function g : B → A such that g(f(x)) = x for all x ∈ A and f(g(y)) = y, for all y ∈ B.We then call g an inverse of f.

20

We have seen that continuous functions map intervals to intervals. We want to say somethingabout the inverse function when it exists. Note that any result about increasing functionsf can be translated into a result about decreasing functions by the simple expedient ofconsidering the functions −f .

We will prove:

Theorem 6.2.2 (Inverse Function Theorem (IFT)). Let f be a strictly increasing andcontinuous real function on [a, b]. Then f has a well-defined continuous inverse on [f(a), f(b)].

This is contained in the following theorem.

Theorem 6.2.3. Let f : [a, b] → R be strictly increasing and continuous on [a, b]. Then

(i) f ([a, b]) = [f(a), f(b)].

(ii) there exists a unique function g : [f(a), f(b)] → R such that g(f(x)) = x for all x ∈ [a, b]and f(g(y)) = y for all y ∈ [f(a), f(b)];

(iii) g is strictly monotone increasing

(iv) g is continuous.

Proof. The first assertion is just Theorem 5.1.1 as in this case m = f(a) and M = f(b).

The second is straightforward; f : [a, b] → [f(a), f(b)] is now 1—1 and onto. So giveny ∈ [f(a), f(b)] there exists a unique x ∈ [a, b] such that f(x) = y. Define g(y) = x. So theinverse function exists and is unique.

The third assertion is also straightforward. Assume there exist u, v ∈ [f(a), f(b)] with u < vbut g(u) > g(v). But as f is strictly increasing this implies u = f(g(u)) > f(g(v)) = v, acontradicton.

It is the fourth assertion that needs our attention. We must prove that for any y0 ∈[f(a), f(b)] the function g is continuous at y0.

For y0 ∈ (f(a), f(b)). Given ǫ > 0, if necessary take ǫ smaller such that g(y0) + ǫ ∈ [a, b] andg(y0) − ǫ ∈ [a, b]. Choose δ = min{f(g(y0) + ǫ) − y0, y0 − f(g(y0) − ǫ)}. (Draw the graph ofg(y) to see why we choose it like this) Then

y0 − δ < y < y0 + δ

=⇒ f(g(y0) − ǫ) < y < f(g(y0) + ǫ)

=⇒ g(f(g(y0) − ǫ)) < g(y) < g(f(g(y0) + ǫ))

=⇒ g(y0) − ǫ < g(y) < g(y0) + ǫ

and g is continuous at y0 as required. The points y0 = f(a) and y0 = f(b) are similar.

Remark 6.2.4. Note that from Q1, problem sheet 3, if f : [a, b] → R is a continuous, 1–1function with f(a) < f(b), then f is strictly increasing on [a, b]. So for the Inverse FunctionTheorem (IFT) it is sufficient to assume that f : [a, b] → R is continuous and 1–1.

21

Note 6.2.5. I have not used the notation f−1 for the inverse function. If you do choose touse it then you must make very clear what you intend the domains of f and f−1 to be. Itis not for nothing that the special notations ‘arcsin’ etc. exist! For example sine and cosineare only invertible on a part of their domain where they are inceasing or decreasing.

6.3 Exponentials, Logarithms, Powers etc.

In the following I will consider the functions only on real domains. Some of the results extendto complex domains.

Recall from Analysis I that functions such as exp(x), sin(x), cos(x), sinh(x) and cosh(x) etcare defined by their power series each of which has infinite radius of convergence. Later wewill see that a power series is continuous within its radius of convergence so each of thesefunctions is continuous on R. For each of them, if we take as domain a closed interval onwhich the function is strictly monotone, then we can use the IFT to show the function hasa continuous inverse. (See also Problem sheet 3 Q3)

In particular we can therefore define the exponential function: exp : R → R as exp(x) =∑ xn

n! . The following properties were proved in Analysis I (though some used results to beproved in this course):

1. exp′(x) = exp(x);

2. exp(x) exp(y) = exp(x + y);

3. exp 0 = 1 and exp(−x) = 1/ exp(x);

4. exp(x) > 0;

5. exp is strictly increasing and exp : R → (0,∞) is a bijection and hence invertible. Theinverse is denoted by log : (0,∞) → R;

6. log(xy) = log(x) + log(y);

7. Let e denote the real number e = exp(1) =∑ 1

n! then log e = 1;

8. For any a > 0 and any x ∈ R define ax = exp(x log a). Then ax+y = axay; Alsoex = exp(x);

In addition

9. As noted above exp is continuous. But we can also prove it directly

Lemma 6.3.1. The function exp is continuous.

Proof. We have| exp(x + h) − exp(x)| = exp(x)| exp(h) − 1)|

so for |h| < 1 we have by the Triangle Law and the preservation of 6 under limits

| exp(x + h) − exp(x)| 6 exp(x)∑

n>1

|h|n/n! 6 exp(x)∑

n>1

|h|n =|h|

1 − |h| exp(x),

22

which tends to 0 as h → 0.

10. We can obtain numerous inequalities: For example if x > 0,

exp(x) = 1 + x +x2

2!+ lim

n→∞

n∑

r=3

xr

r!> 1 + x,

and hence also if x > 0,

exp(−x) <1

1 + x.

Note: We can also define exp : C → C by exp(z) =∑ zn

n! . The first 3 of the above propertiesalso hold in C and also exp(z) 6= 0.

We can now apply the Inverse Function Theorem to get:

Lemma 6.3.2. For every y > 0 the function log is continuous at y.

Proof. We apply the theorem by fnding an A > 0 such that 1/(1 + A) < y < 1 + A andconsidering exp : [−A, A] → [exp(−A), exp(A)], as, from (10) above, the image interval thencontains y.

6.4 Left-hand and Right-hand limits

For functions defined on an interval, we may talk about right-hand and left-hand limits.

Definition 6.4.1. (i) Let f : [a, b) → R (or C) and p ∈ [a, b); and let l ∈ R (or l ∈ C). Wesay that l is the right-hand limit of f at p if, ∀ε > 0, ∃δ > 0 such that

|f(x) − l| < ε ∀x ∈ [a, b) such that 0 < x − p < δ.

We write this as

limx→p+

f(x) = l; or as limx→px>p

f(x); or sometimes as f(p+) = l.

Similarly we have:

(ii) Let f : (a, b] → R (or C) and p ∈ (a, b]; and let l ∈ R (or l ∈ C). We say that l is the

left-hand limit of f at p if, ∀ε > 0, ∃δ > 0 such that

|f(x) − l| < ε ∀x ∈ [a, b) such that − δ < x − p < 0.

We write this as

limx→p−

f(x) = l; or as limx→px<p

f(x); or sometimes as f(p−) = l.

The following provides good practice in using the definitions.

23

Proposition 6.4.2. Let f : (a, b) → C and let p ∈ (a, b). Then the following are equivalent:

(i) limx→p f(x) = l;

(ii) Both limx→p+ f(x) = l and limx→p− f(x) = l.

Example 6.4.3. Consider function f : R → R given by

f(x) =

{

x if x > 0;x + 1 if x < 0.

Then f(0+) = 0 and f(0−) = 1. But limx→0 does not exist.

6.5 Left-continuity and Right-continuity

We translate the above definitions into ‘continuity’ language.

Definition 6.5.1. (i) We say f is right continuous at p if f(p+) = f(p).7

(ii) We say f is left continuous at p if f(p−) = f(p).

Again, for practice prove the following.

Proposition 6.5.2. Let f : (a, b) → R and let p ∈ (a, b). Then the following are equivalent:

(i) f is continuous at p;

(ii) f is both left-continuous at p and right-continuous at p.

Example 6.5.3. Again consider the function

f(x) =

{

x if x > 0;x + 1 if x < 0.

Then at 0 f is right continuous but not left continuous. It is not continuous at 0.

6.6 Continuity of Monotone Functions

This will probably be omitted from lectures for lack of time.

We now discuss the continuity of monotone functions. Remember that any result about increasingfunctions f can be translated into a result about decreasing functions by the simple expedient ofconsidering the functions −f .

Theorem 6.6.1. Let f : (a, b) → R be an increasing function. Then for every x0 ∈ (a, b) theright-hand limit f(x0+) and the left-hand limit f(x0−) of f at x0 exist.

Moreover, f(x0−) = supa<x<x0f(x), f(x0+) = infx0<x<b f(x) and

f(x0−) 6 f(x0) 6 f(x0+).

7Note that we are saying that the limit exists and that it equals f(p).

24

Proof. By hypothesis,{f(x) : a < x < x0} is non-empty and is bounded above by f(x0), and thereforehas a least upper bound A := supa<x<x0

f(x). Then A 6 f(x0). We have to show that f(x0−) = A.Let ε > 0 be given. It follows from the definition of supa<x<x0

f(x), that there is a xε ∈ (a, x0) suchthat

A − ε < f(xε) 6 A.

As x0 − xε > 0 choose δ := x0 − xε. Then, x ∈ (xε, x0) if and only if 0 < x0 − x < δ, and thus, as fis increasing

A − ε <[

f(xε) 6]

f(x) 6 A for all 0 < x0 − x < δ.

By definition f(x0−) = A and we are done.

The other inequality can be obtained by a similar argument (a good exercise); or by applying whatwe have done to the function −f(b − x) on (0, b − a) and juggling with the inequalities.

Remark 6.6.2. Informally we call the difference f(x0+) − f(x0−) the “jump” of f at x0.

7 Limits at infinity and infinite limits

7.1 Limits at infinity: functions of a real variable

We want to extend our definition of the limit ‘limx→a f(x)’ to allow us to talk about the endpoints of infinite intervals like (0,∞).

Definition 7.1.1. Let f be a real or complex valued function defined on a subset E of R,and let l ∈ R or l ∈ C as the case may be. Suppose that for every b ∈ R the set E ∩ (b, +∞)is non-empty. We say that f(x) → l as x → +∞ if, ∀ε > 0, ∃B > 0 such that

|f(x) − l| < ε ∀x ∈ E such that x > B.

We write this as limx→+∞ f(x) = l.

Exercise 7.1.2. Make a similar definition for limx→−∞ f(x) = l.

Note 7.1.3. We will often just write ‘f(x) → l as x → ∞’ for ‘f(x) → l as x → +∞’. Thereis a slight danger of confusion—see what we say about functions of a complex variable—butif we take care it will be all right.

7.3 Limits at infinity: functions of a complex variable

Definition 7.3.1. Let f be a real or complex valued function defined on a subset E of C,and let l ∈ R or l ∈ C as the case may be. Suppose that for every b ∈ R there are pointsz ∈ E such that |z| > b. We say that f(z) → l as z → ∞ if, ∀ε > 0, ∃B > 0 such that

|f(z) − l| < ε ∀z ∈ E such that |z| > B.

We write this as limz→∞ f(z) = l.

Note that there may be a mild inconsistency with the previous definition if E ⊆ R. If we arethinking ‘complex’ we’ll need both the real limits at ±∞ to be equal.

25

Example 7.3.2. Considersin z

zas z → ∞. For real values z = x we get that

∣

∣

∣

∣

sinx

x

∣

∣

∣

∣

6

1

|x| → 0 as x → ∞. But for pure imaginary values like zk = 2πik, with k ∈ Z we’ll get that∣

∣

∣

∣

sin zk

zk

∣

∣

∣

∣

=e2πk − e−2πk

4πk→ ∞ as k → ∞.

Exercise 7.3.3. Write down the contrapositive of ‘f tends to a limit as z → ∞.

7.4 Tending to infinity. . .

Very briefly we discuss ‘infinite limits’. We must take great care not to deceive ourselves: inneither R nor C is there a number ∞.

Definition 7.4.1. Let f : E → R be a real valued function on a subset of R or C and let pbe a limit point of E. We say that f(z) tends to +∞ as z → p if ∀B > 0, ∃δ > 0 such that

f(z) > B ∀z ∈ E such that 0 < |z − p| < δ.

We may write this as f(z) → +∞ as z → p.

Exercise 7.4.2. Make a similar definition for f(z) → −∞ as z → p.

For complex valued functions things are easier:

Definition 7.4.3. Let f : E → C be a real valued function on a subset of R or C and let pbe a limit point of E. We say that f(z) tends to ∞ as z → p if ∀B > 0 ∃δ > 0 such that

|f(z)| > B ∀z ∈ E such that 0 < |z − p| < δ.

We may write this as f(z) → ∞ as z → p.

7.5 Euler’s Limit

We prove the following result.

Proposition 7.5.1. The limits limx→∞

(

1 + 1x

)xand limx→−∞

(

1 + 1x

)xexist and are both

equal to e.

Proof. First limit: By the continuity of exp, from Problem Sheet 3,Q4b, it is enough toprove that limx→∞ x log

(

1 + 1x

)

= 1, or by AOL that limx→∞1

x log(1+ 1x)

= 1. Write y =

log(

1 + 1x

)

; then1

x log(

1 + 1x

) − 1 =exp(y) − 1 − y

y.

Note that as 1 + 1x > 1 for x > 0, we have y > 0, and then

0 6exp(y) − 1 − y

y=

∑

n>2 yn/n!

y6

∑

n>2 yn

y=

y

1 − y.

26

So if we can show that y → 0 as x → ∞ we are done.Let ε > 0; By continuity of log at 1 we can find δ such that | log t| < ε for t ∈ (1, 1+ δ). TakeK = 1

δ . Then, as x > K implies 1x < δ,

|y| = | log(1 +1

x)| < ε, ∀x > K.

A similar argument will deal with the other limit.

8 Uniform Convergence

8.1 Motivation

Let E ⊆ R (or C), and let p ∈ E be a limit point, so that p = limx→p x. We have seen that‘continuity at p’ is exactly the right condition to ensure that

limx→p

f(x) = f( limx→p

x),

that is to ensure that ‘taking the limit limx→p’ and ‘finding the value under f ’ can beinterchanged.

There are many other situations in which we would like to understand whether the order ofin which we perform two mathematical operations is significant or not:

(i) Suppose we have not just a single function f on E but a whole sequence {fn}. When islimn→∞ limx→p fn(x) = limx→p limn→∞ fn(x)?

(ii) Similarly, when is limx→p∑∞

0 fn(x) =∑∞

0 limx→p fn(x)?

(iii) Once we have defined derivatives and integrals—as limits—we will want to know when

limn→∞ f ′n(x) = (limn→∞ fn(x))′? and when limn→∞

∫ ba fn(t) dt =

∫ ba limn→∞ fn(t) dt?

The answers to some of these questions are given in this lecture and the next.

To see that there are non-trivial problems we look at one typical example.

Example 8.1.1. Consider the sequence of functions {fn}, where fn : [0, 1] → R given by

fn(x) =

{

−nx + 1 if 0 6 x < 1n ,

0 if x >1n .

Consider also the function f : [0, 1] → R given by

f(x) =

{

1 if x = 0,0 if x > 0.

Sketch their graphs, and note that for all x ∈ [0, 1] we have that f(x) = limn→∞ fn(x).

27

Note that although all the fn are continuous the limit function f is not continuous at 0.

Also,limx→0

limn→∞

fn(x) = limx→0

f(x) = 0

whilelim

n→∞limx→0

fn(x) = limn→∞

1 = 1

so thatlimx→0

limn→∞

fn(x) 6= limn→∞

limx→0

fn(x).

8.2 Definition

Just as when we defined ‘uniform continuity’ as a stronger version of ‘continuous at all points’by insisting on being able to choose one ‘δ’ to deal with all points, so we now strengthen ourdefinition of ‘convergent’.

So let E ⊆ R (or C) and let fn : E → R (or C) be a sequence of functions. Then for each(fixed) x ∈ E, {fn(x)} is a sequence of real (or complex) numbers. If this sequence convergesfor every x ∈ E, then the limit which will depend on x so we will call it f(x). Thus f : E → R

(or C) is a function. Hence we have the definition (using Analysis I):

Definition 8.2.1. By fn converges to f on E we mean that ∀x ∈ E, and ∀ε > 0, ∃N ∈ N

such that|fn(x) − f(x)| < ε ∀n > N.

So, of course, in general N depends on x. For ‘uniform convergence’ we insist that one Nworks for all x.

Definition 8.2.2. By fn converges uniformly to f on E we mean that ∀ε > 0, ∃N ∈ N

such that|fn(x) − f(x)| < ε ∀n > N and ∀x ∈ E.

We write this as ‘fn → f uniformly on E’ or ‘fnu→ f ’.

It is trivial to see that:

Proposition 8.2.3. If the sequence {fn} converges uniformly to f on E then at every pointx ∈ E we have that the sequence {fn(x)} converges to f(x).

There is one special case which we should single out. Suppose that for each n ∈ N we havethat sn(x) =

∑n0 fk(x). Suppose that s : E → R (or C). If we apply the definition to the

sequence {sn} and the function s we will get

Remark 8.2.4. We say that the series∑

fn converges uniformly to s on E if ∀ε > 0,∃N ∈ N such that

∣

∣

∣

∣

∣

n∑

0

fk(x) − s(x)

∣

∣

∣

∣

∣

< ε ∀n > N and ∀x ∈ E.

We may write this as ‘∑∞

0 fn(x) = s(x) (uniformly on E)’.

28

8.3 Test for Uniform Convergence

We can re-express the definition in a more practical way:

Theorem 8.3.1. Let E be a non-empty subset of R or C. Let fn,f : E → R (or C). Thenthe following are equivalent:

(i) fn → f uniformly on E;

(ii) ∃N s.t. ∀n > N , the real numbers mn := supx∈E |fn(x) − f(x)| exist and moreovermn → 0 as n → ∞.

Proof. (=⇒)

Suppose fn → f uniformly on E, then (by definition) ∀ε > 0, ∃N ∈ N such that

|fn(x) − f(x)| < 12ε ∀x ∈ E and ∀n > N.

Hence, for each n, 12ε is an upper bound of the set {|fn(x) − f(x)| : x ∈ E}. Then the least

upper bounds satisfy

mn = supx∈E

|fn(x) − f(x)| 612ε < ε ∀n > N.

By the definition of sequence limits, limn→∞ mn = 0.

(⇐=)

Suppose the mn exist for all n > N1, and that limn→∞ supx∈E |fn(x) − f(x)| = 0, Then∀ε > 0 ∃N > N1 such that

supx∈E

|fn(x) − f(x)| < ε ∀n > N.

Therefore|fn(x) − f(x)| 6 sup

x∈E|fn(x) − f(x)| < ε ∀x ∈ E and ∀n > N.

This is the definition that fn → f uniformly on E.

Example 8.3.2. Let E = [0, 1) and let fn(x) = xn. Clearly limn→∞ fn(x) = 0, so f(x) = 0.

Then mn = supx∈E |xn − 0| = supx∈E xn. But xn = (1/2)1n ∈ E and fn(xn) = 1/2 so that

mn > fn(xn) = 1/2 9 0, as n → ∞so fn is not uniformly convergent on [0, 1).

However, if instead we consider E = [0, r], where 0 < r < 1 is a fixed constant. Then xn → 0uniformly on E, because now

mn = sup[0,r]

xn6 rn → 0, as n → ∞.

Remark 8.3.3. The test is practical on a closed and bounded interval E = [a, b] in caseswhere the functions fn and f are differentiable. In such cases the supremum will be achieved

either at a or at b or at some interior point whered(fn(x) − f(x))

dx= 0. We will prove this

later in the course; for the moment use it in exercises.8

8Of course we will not use it in building up the theory.

29

8.4 Cauchy’s Criterion

Just as we found for sequences of numbers there is a characterisation of uniform convergencewhich does not depend on knowing the limit function.

Theorem 8.4.1 (Cauchy’s Criterion for Uniform Convergence). Let E ⊆ R (or C)and let fn : E → R (or C). Then fn converges uniformly on E, if and only if, ∀ε > 0,∃N ∈ N such that

|fn(x) − fm(x)| < ε ∀n, m > N and ∀x ∈ E. (∗)

Proof. (=⇒) Suppose fn converges uniformly on E with limit function f , then ∀ε > 0,∃N ∈ N such that

|fn(x) − f(x)| < 12ε ∀n > N and ∀x ∈ E.

So, ∀x ∈ E and ∀n, m > N

|fn(x) − fm(x)| 6 |fn(x) − f(x)| + |fm(x) − f(x)|< 1

2ε + 12ε

= ε.

(⇐=) Conversely, suppose (*) holds. Then for any x ∈ E, {fn(x)} is a Cauchy se-quence, so that it is convergent. Let us denote its limit by f(x). For every ε > 0, chooseN ∈ N such that

|fn(x) − fm(x)| < 12ε ∀n, m > N and ∀x ∈ E.

For any fixed n > N and x ∈ E, letting m → ∞ in the above inequality we obtain, by thepreservation of weak inequalities, that

|fn(x) − f(x)| = limm→∞

|fn(x) − fm(x)| 612ε < ε.

According to the definition, fn → f uniformly on E.

Corollary 8.4.2 (Cauchy’s criterion for uniform convergence of series). The series∑∞

n=0 fn

is uniformly convergent on E if and only if ∀ε > 0, ∃N ∈ N such that∣

∣

∣

∣

∣

n∑

k=m+1

fk(x)

∣

∣

∣

∣

∣

< ε ∀n > m > N and ∀x ∈ E.

8.5 The M-test

As a consequence, we prove the following simple but important uniform convergence test forseries.

Theorem 8.5.1 (The Weierstrass M-Test). Let E ⊆ R (or C) and fn : E → R (or C).Suppose that there is a sequence {Mn} of real numbers such that

|fn(x)| 6 Mn ∀x ∈ E.

If∑∞

n=0 Mn converges then∑∞

n=0 fn converges uniformly on E.

30

Note that the Mn must be independent of x.

Proof. By Cauchy’s Criterion for the convergence of∑

Mn we have that ∀ε > 0, ∃N ∈ N

such thatn

∑

k=m+1

Mk < ε ∀n > m > N.

Now by the Triangle Law

∣

∣

∣

∣

∣

n∑

k=m+1

fk(x)

∣

∣

∣

∣

∣

6

n∑

k=m+1

|fk(x)| 6

n∑

k=m+1

Mk < ǫ,∀n > m > N and ∀x ∈ E,

which is Cauchy’s criterion for the uniform convergence of the series.

Corollary 8.5.2. Suppose the conditions for the M -test hold, and∑

Mn is convergent.Then

∣

∣

∣

∣

∣

∞∑

n=0

fn(x)

∣

∣

∣

∣

∣

6

∞∑

n=0

|fn(x)| 6

∞∑

n=0

Mn ∀x ∈ E.

Proof. Apply the preservation of weak inequalities as N → ∞ to the obvious inequalities

∣

∣

∣

∣

∣

N∑

n=0

fn(x)

∣

∣

∣

∣

∣

6

N∑

n=0

|fn(x)| 6

N∑

n=0

Mn ∀x ∈ E.

9 Uniform Convergence: Examples and Applications

9.1 Examples

Example 9.1.1. Let E = [0, 1] and let

fn(x) =x

1 + n2x2.

Then clearly limn→∞ fn(x) = 0 for every x ∈ E.

Using (1 − nx)2 > 0 we can see that

0 6 fn(x) =1

2n

2nx

1 + n2x26

1

2n→ 0

and so get (by looking at ‘sups’) that fn → f uniformly on [0, 1].

Example 9.1.2. Let E = [0, 1] and let

fn(x) =nx

1 + n2x2.

Then clearly limn→∞ fn(x) = 0 for every x ∈ [0, 1].

31

But fn(1/n) = 1/2, so that

supx∈[0,1]

|fn(x) − f(x)| >1

29 0 as n → ∞

and so fn converges to 0 but not uniformly in [0, 1].

Example 9.1.3.∑∞

n=0 xn converges to 11−x in (−1, 1), but not uniformly.

From Analysis I, sn(x) =∑n

k=0 xk = 1−xn+1

1−x tends to 11−x for any |x| < 1. On the other

hand∣

∣

∣

∣

sn(x) − 1

1 − x

∣

∣

∣

∣

=|x|n+1

|1 − x|so that (look at x = n+1

n+2)

supx∈(−1,1)

∣

∣

∣

∣

sn(x) − 1

1 − x

∣

∣

∣

∣

>

(

n+1n+2

)n+1

|1 − n+1n+2 |

=n + 2

(

1 + 1n+1

)n+1 → ∞.

Hence∑∞

n=0 xn doesn’t converge uniformly.

Example 9.1.4.∑∞

n=0 xn converges uniformly on [−r, r] for any 0 < r < 1.

This follows from the M -test with Mn := rn.

9.2 Uniform Convergence preserves continuity

We have already seen that the limit of a sequence of continuous functions may not be con-tinuous. This theorem tells us that ‘uniformity’ gives us the extra condition we need.

Theorem 9.2.1. Let fn,f : E → R (or C), and fn → f uniformly in E. Suppose all fn arecontinuous at x0 ∈ E. Then the limit function f is also continuous at x0, so that

limx→x0

limn→∞

fn(x) = limn→∞

fn(x0) = limn→∞

limx→x0

fn(x).

Proof. ∀ε > 0, ∃N ∈ N s.t.

|fn(x) − f(x)| < 13ε ∀n > N and ∀x ∈ E.

Since fN+1 is continuous at x0, ∃δ > 0 (depending on x0 and ε) such that

|fN+1(x) − fN+1(x0)| < 13ε for all |x − x0| < δ.

Hence, if |x − x0| < δ then by the Triangle Law

|f(x) − f(x0)|6 |f(x) − fN+1(x)| + |fN+1(x) − fN+1(x0)| + |fN+1(x0) − f(x0)|< 1

3ε + 13ε + 1

3ε

= ε.

By definition, f is continuous at x0.

32

Note it is very importan that N + 1 is fixed, so that δ does not depend on n.

Remark 9.2.2 (Version for series). If∑∞

n=0 fn converges uniformly on E and every fn iscontinuous at x0 ∈ E, then

limx→x0

∞∑

n=0

fn(x) =

∞∑

n=0

fn(x0).

In particular, if fn is continuous on E for all n and∑∞

n=0 fn converges uniformly on E,then

∑∞n=1 fn is continuous on E.

9.3 Power Series

We can apply the the results of the previous subsection to the important case of power series.

Theorem 9.3.1 (Continuity of Power Series). Suppose the radius of convergence of the powerseries

∑∞n=0 anxn is R, where 0 6 R 6 ∞. Then for every 0 6 r < R,

∑∞n=0 anxn converges

uniformly on the closed disk {x : |x| 6 r}. Therefore,∑∞

n=0 anxn is continuous on the opendisk {x : |x| < R}.

Proof. According to the definition of ‘radius of convergence’,∑∞

n=0 anxn is absolutely con-vergent for |x| < R. In particular,

∑∞n=0 |an|rn is convergent. Since

|anxn| 6 |an|rn for all x such that |x| 6 r

we have, by Weierstrass M-test, that∑∞

n=0 anxn converges uniformly on {x : |x| 6 r}.But anxn is continuous for any n ∈ N . So, for any r < R,

∑∞n=0 anxn is continuous for

|x| 6 r, and hence on the open disk {x : |x| < R}.

Note 9.3.2. Note that this says nothing about convergence or continuity at the end-points.If you are interested subsection 9.5 deals with this in the real case.

Corollary 9.3.3. The functions expx, sin x, cos x, cosh x and sinhx can all be defined bypower series with infinite radius of convergence so are all continuous on C.

9.4 Integrals and derivatives of sequences

Next term, in the course Analysis III, you will learn how to define integrals, and the proofsof the following theorems will be given.

Theorem 9.4.1. If fn → f uniformly on [a, b] and if every fn is continuous, then

∫ b

af =

∫ b

alim

n→∞fn = lim

n→∞

∫ b

afn.

Similarly, if the series∑∞

n=1 fn converges uniformly on [a, b] and if all fn are continuous,then we may integrate the series term by term

∫ b

a

∞∑

n=1

fn =∞

∑

n=1

∫ b

afn.

33

Note 9.4.2. However, uniform convergence is not the ‘right’ condition for integrating aseries term by term: we can exchange the order of integration

∫ ba (which involves a limiting

procedure) and limn→∞ under much weaker conditions. The search for correct conditionsfor term-by-term integration led to the discovery of Lebesgue integration [Part A option:Integration].

Theorem 9.4.3. Let fn(x) → f(x) for each x ∈ [a, b]. Suppose f ′n exists and is continuous

on [a, b] for every n, and that f ′n → g uniformly on [a, b]. Then f ′ exists and is continuous

on [a, b], andd

dxlim

n→∞fn(x) = lim

n→∞

d

dxfn(x).

Similarly, if∑

fn converges on [a, b], and if every f ′n exists and is continuous on [a, b] and

if∑

f ′n converges uniformly on [a, b], then

d

dx

∞∑

n=1

fn =∞

∑

n=1

f ′n.

9.5 The end points

This section is likely to be omitted for lack of time.

When 0 < R < ∞ the points where |z| = R need to be handled differently. We only deal with thereal case, so there are two such points R and −R. Scaling (replacing x by x/R or −x/R) lets us dealonly with power series where the radius is 1 and describe what happens at x = 1.

Theorem 9.5.1 (Abel’s Continuity Theorem). Suppose that the series∑

∞

n=0anxn has radius of

convergence R = 1. Suppose further that∑

∞

n=0an converges.

Then∑

∞

n=0anxn converges uniformly on [0, 1].

Consequently,∑

∞

n=0anxn is continuous on (−1, 1], and in particular

limx→1−

∞∑

n=0

anxn =

∞∑

n=0

an.

Proof. First note that our general result gives continuity on (−1, 1); it is only the point x = 1 wehave to deal with. We will get continuity provided we get uniform convergence on [0, 1].

By Cauchy’s Criterion for the convergent∑

∞

n=0an we have that, for every ε > 0, there is N such

that, for every n > m > N we have∣

∣

∣

∣

∣

n∑

k=m

ak

∣

∣

∣

∣

∣

< ε.

Now fix m > N , and for the partial sums from m use the notation

Ak =k

∑

j=m

aj for k > m; and Am−1 = 0

noting that subtracting consecutive sums gives us back the original sequence9

ak = Ak − Ak−1.

9Think ‘Differentiation undoes Integration’.

34

By what we have from the Cauchy Criterion above, |Ak| < ε whenever k > m − 1. We have byelementary algebra the following formula10

n∑

k=m

akxk =n

∑

k=m

(Ak − Ak−1)xk

=n

∑

k=m

Akxk −n

∑

k=m

Ak−1xk

=

n−1∑

k=m

Ak

(

xk − xk+1)

+ Anxn.

Hence, by the Triangle Law we have that

∣

∣

∣

∣

∣

n∑

k=m

akxk

∣

∣

∣

∣

∣

6

n−1∑

k=m

|Ak|(

xk − xk+1)

+ |An|xn

< ε

n−1∑

k=m

(

xk − xk+1)

+ εxn

= εxm

6 ε

for any x ∈ [0, 1].

The Cauchy Criterion yields that∑

∞

n=0anxn is uniformly convergent on [0, 1].

9.6 Monotone Sequences of Continuous Functions

This section is likely to be omitted for lack of time.

The theorem of this subsection is a partial converse of our theorem that ‘uniform convergence preservescontinuity’; if the sequence is monotone then the continuity of the limit will give uniformity ofconvergence.

Theorem 9.6.1 (The Dini Theorem). . Let fn be a sequence of real continuous functions on [a, b];and let f be a real continuous function on [a, b].

Suppose thatlim

n→∞

fn(x) = f(x) for every x ∈ [a, b]

and thatfn(x) > fn+1(x) for all n and for all x ∈ [a, b].

Then fn → f uniformly on [a, b].

Proof. Let gn(x) = fn(x)−f(x). Then gn is continuous for every n, gn > 0 and limn→∞ gn(x) = 0 forany x ∈ [a, b]. Suppose {gn} were not uniformly convergent on [a, b]. Write down the contrapositiveto see that for some ε > 0, and every natural number k there exists a natural number nk > k and apoint xk ∈ [a, b] such that

|gnk(xk)| = gnk

(xk) > ε.

10This is called Abel’s summation formula—think ‘integration by parts’.

35

We may choose nk so that k → nk is increasing. We may assume that xk → p—otherwise use theBolzano–Weierstrass theorem to extract a convergent subsequence of {xk} and use it instead. Thenp ∈ [a, b]. For any (fixed) k, since {gn} is decreasing,

ε 6 gnl(xl) 6 gnk

(xl)

for all l > k. Letting l → ∞ in the above inequality, we obtain

ε 6 liml→∞

gnk(xl) = gnk

(p)

as gnkis continuous at p. This contradicts to the assumption that limk→∞ gnk

(p) = 0.

Example 9.6.2. Let fn(x) = 1

1+nxfor x ∈ (0, 1). Then limn→∞ fn(x) = 0 for every x ∈ (0, 1), fn

is decreasing in n, but fn does not converge uniformly. Dini’s theorem doesn’t apply, as (0, 1) is notcompact.

10 Differentiation: definitions and elementary results

10.1 Definitions

In this course we only study differentiability for real (or complex)-valued functions on E,where E is a subset of the real line R. The theory of the differentiability of complex valuedfunctions on the complex plane C is very different from the real case and requires anothertheory—See Complex Analysis [Part A: Analysis].

Definition 10.1.1. Let f : (a, b) → R (or C), and let x0 ∈ (a, b). By f is differentiable

at x0 we mean that the following limit exists:

limx→x0

f(x) − f(x0)

x − x0.

When it exists we denote the limit by f ′(x0) which we call the derivative of f at x0.

[That is ∀ε > 0 ∃δ > 0 such that

∣

∣

∣

∣

f(x) − f(x0)

x − x0− f ′(x0)

∣

∣

∣

∣

< ε ∀x ∈ (a, b) such that 0 < |x − x0| < δ.]

For example, it is easy to see that the function x 7→ x is differentiable at every point of R

and has derivative f ′(x0) = 1 at every point; and the function t 7→ e2πit is differentiable atevery point, although we can’t yet prove that.

Sometimes it is helpful to also define ‘left-hand’ and ‘right-hand’ versions of these.

Definition 10.1.2. (i) Let f : [a, b) → R (or C), and let x0 ∈ [a, b). We say that f has a

right-derivative at x0 if the following limit exists

limx→x0+

f(x) − f(x0)

x − x0.

If the limit exists we denote it by f ′+(x0).

36

(ii) Let f : (a, b] → R (or C), and let x0 ∈ (a, b]. We say that f has a left-derivative atx0 if the following limit exists

limx→x0−

f(x) − f(x0)

x − x0.

If the limit exists we denote it by f ′−(x0).

The following result is easily proved (compare what we did for left- and right-continuity).

Proposition 10.1.3. Let f : (a, b) → R (or C). Then the following are equivalent:

(a) f is differentiable at x0 and f ′(x0) = l;

(b) f has both left- and right-derivatives at x0, and f ′−(x0) = l = f ′

+(x0).

Definition 10.1.4. (i) Suppose that f : (a, b) → R (or C). Then we say that f is differ-

entiable on (a, b) if f is differentiable at every point of (a, b).

(ii) Suppose that f : [a, b] → R (or C). Then we say that f is differentiable on [a, b] if fis differentiable at every point of (a, b), and if f ′

+(a) and f ′−(b) exist.

If you wish you can define differentiable on (a, b] and [a, b) as well.

Remark 10.1.5. Let y = f(x). There are other notations for derivatives

dydx or df(x0)

dx [G. W. Leibnitz]

y′ or f ′(x0) [J. L. Lagrange]

Dy or Df(x0) [A. L. Cauchy, in particular for vector-valued functions of several variables].

10.2 An Example

Define a function f : R → R by

f(x) =

{

x2 sin1

xfor x > 0,

0 for x 6 0.

Then we can show that

f ′(x) =

0 when x < 0,0 when x = 0,

2x sin1

x− cos

1

xwhen x > 0.

The derivative for x 6 0 can be found directly from the definition. Later we will see that wecan use the chain rule to find the derivative for x > 0.

Note that the derivative is not continuous at the origin. (See problem sheet 5.)

We can get other interesting examples by replacing the ‘x2 by xα and the ‘1

x’ by

1

xβ.

37

10.3 Derivatives and differentials

By looking at the definition of ‘limit’ in terms of ε and δ (see problemsheet) we can easilyprove that:

Proposition 10.3.1. Suppose that f : (a, b) → R is differentiable at x0 ∈ (a, b) and thatf ′(x0) > 0. Then there exists a δ > 0 such that for all x ∈ (x0, x0 + δ) we have thatf(x) > f(x0), and for all x ∈ (x0 − δ, x0) we have that f(x) < f(x0).

We have corollaries like:

Corollary 10.3.2. Suppose that f : [a, b) → R is right-differentiable at x0 ∈ [a, b) and thatf ′+(x0) > 0. Then there exists a δ > 0 such that for all x ∈ (x0, x0 + δ) we have that

f(x) > f(x0).

In fact, if f is differentiable at x0, then the ‘increment’ of f near x0 can be expressed

f(x) − f(x0) = f ′(x0) (x − x0) + o(x, x0)

where o is a function of x and x0 satisfying

limx→x0

o(x, x0)

x − x0= 0.

That is, the ‘linear part’ of the increment f(x) − f(x0) is f ′(x0) (x − x0); all the rest issmall in comparison. This is sometimes called the differential of f at x0. It is the firstapproximation to f near x0.

10.4 Differentiability and Continuity

Theorem 10.4.1 (Differentiability =⇒ Continuity). Let f : (a, b) → R (or C). If f isdifferentiable at x0 ∈ (a, b) then f is continuous at x0.

Proof. Since

limx→x0

(f(x) − f(x0)) = limx→x0

f(x) − f(x0)

x − x0(x − x0)

= limx→x0

f(x) − f(x0)

x − x0lim

x→x0

(x − x0) by AOL

= f ′(x0) × 0

= 0.

Therefore limx→x0 f(x) = f(x0), so that, by definition, f is continuous at x0.

Note: The converse is not true. For example |x| is continuous but is not differentiable at 0.In fact there exist functions which are continuous everywhere, but not differentiable at anypoint! (See Bartle and Sherbert. )

38

10.5 Algebraic properties

The following results are straightforward consequences of the Algebra of Limits. They let usbuild up at once all the calculus we learned at school—provided we can differentiate a fewstandard functions (constants, linear functions, exp, sin and cos).

Theorem 10.5.1. Suppose f, g : (a, b) → R (or C) are both are differentiable at x0 ∈ (a, b),and λ, µ ∈ R (or C).(i) [Linearity of differentiation] λ · f + µ · g is differentiable at x0 and

(λ · f + µ · g)′ (x0) = λ · f ′(x0) + µ · g′(x0).

(ii)[The Product Rule] fg : x 7→ f(x)g(x) is differentiable at x0 and

(fg)′ (x0) = f(x0)g′(x0) + f ′(x0)g(x0).

(iii) [The Quotient Rule] Suppose g(x0) 6= 0. Then x 7→ f(x)g(x) is differentiable at x0 and

(

f

g

)′

(x0) =f ′(x0)g(x0) − f(x0)g

′(x0)

g2(x0).

Proof. (ii) Apply AOL to

f(x)g(x) − f(x0)g(x0)

x − x0= f(x)

g(x) − g(x0)

x − x0+ g(x0)

f(x) − f(x0)

x − x0.

Let x → x0 and use the definitions of f ′(x0), g′(x0), and the continuity of f(x) so f(x) →f(x0).

(iii)See problem sheet 5.

10.6 The Chain Rule

Theorem 10.6.1 (The Chain Rule). Suppose f : (a, b) → R, and that g : (c, d) → R.Suppose that f ((a, b)) ⊆ (c, d), so that g ◦ f : (a, b) → R is defined.

Suppose further that f is differentiable at x0 ∈ (a, b), and that g is differentiable at f(x0).

Then g ◦ f is differentiable at x0 and

(g ◦ f)′ (x0) = g′ (f(x0)) f ′(x0).

Proof. Write y0 = f(x0), and define a function v on (c, d) by

v(y) =

g(y) − g(y0)

y − y0− g′(y0) for all y 6= y0,

0 for y = y0.

Note that v(y) → 0 as y → y0, so that v is continuous at y0.

39

Rewriting the definition of v we see that we have an expression for the increment

g(y) − g(y0) = (y − y0)(

g′(y0) + v(y))

valid for any y ∈ (c, d). In particular

g(f(x)) − g(f(x0)) = (f(x) − f(x0))(

g′(y0) + v(f(x)))

so thatg(f(x)) − g(f(x0))

x − x0= g′(y0)

f(x) − f(x0)

x − x0+ v(f(x))

f(x) − f(x0)

x − x0.

Since f is differentiable at x0, f continuous at x0. But v is continuous at y0 = f(x0) andhence v(f(x)) is continuous at x0. Thus v(f(x)) → 0 as x → x0. Letting x → x0 we obtain,using AOL

limx→x0

g(f(x)) − g(f(x0))

x − x0= g′(y0) lim

x→x0

f(x) − f(x0)

x − x0

+ limx→x0

v(f(x)) limx→x0

f(x) − f(x0)

x − x0

= g′(y0)f′(x0) + 0 × f ′(x0)

= f ′(x0)g′(y0) .

10.7 Higher Derivatives

Suppose that f : (a, b) → R (or C) is differentiable at every point of some (x0 − δ, x0 + δ).The it makes sense to ask if f ′ is differentiable at x0. If it is differentiable then we denoteits derivative by f ′′(x0).

More generally we can define in a recursive way (n + 1)-th derivatives f (n+1).

Definition 10.7.1. Suppose that f : (a, b) → R (or C) is such that f , f ′,. . . ,f (n) exist atevery point of (a, b). Suppose that x0 ∈ (a, b). By f is (n + 1)-times differentiable at x0

we mean that f (n) is differentiable at x0. We write f (n+1)(x0) := f (n)′(x0).

The following is proved by an easy induction using Linearity and the Product Rule.

Theorem 10.7.2 (The Leibnitz Formula). Let f, g : (a, b) → R (or C) be n-times differen-tiable on (a, b). Then x 7→ f(x)g(x) is n-times differentiable and

(fg)(n) (x) =n

∑

j=0

(

n

j

)

f (j)(x)g(n−j)(x).

11 The elementary functions

11.1 Differentiating power series

The elementary functions—exp x, cosx, sin x, log x, arctanx—are defined as power series, orare got as inverse functions of real functions defined by power series.

40

We start with a lemma:

Lemma 11.1.1. The power series∑∞

n=0 anxn and∑∞

n=0(n+1)an+1xn have the same radius

of convergence.

Proof. Let the radii be R and R′; we will show R > R′ and R′ > R.

First suppose that |x1| < R′; then∑∞

n=0(n + 1)an+1xn is absolutely convergent at x = x1.

That is,∑∞

n=0(n + 1)|an+1||x1|n converges. Now note that |anxn1 | 6 n|an||xn

1 |. Hence by thecomparison test

∑∞n=0 |an||x1|n converges.Therefore, by definition of ‘radius of convergence’

we have that R > R′.

Now suppose that |x1| < R; and choose x2 so that |x1| < |x2| < R. Then∑∞

n=0 |an||x2|nconverges, and so (Analysis I) |an||x2|n → 0 as n → ∞. But a convergent sequence isbounded (Analysis I) so there exists M such that |an||x2|n < M for all n. Now

|(n + 1)an+1x1|n 6 (n + 1)M

∣

∣

∣

∣

x1

x2

∣

∣

∣

∣

n

and as, by the Ratio Test∑

(n + 1)∣

∣

∣

x1x2

∣

∣

∣

nis convergent, we have by the Comparison Test

that∑∞

n=0(n + 1)|an+1||x1|n is convergent. By the definition of ‘radius of convergence’ wehave that R′ > R.

Theorem 11.1.2 (Term-by-term differentiation). The power series f(x) :=∑∞

k=0 anxn andg(x) :=

∑∞k=0(n + 1)an+1x

n have the same radius of convergence R, and for any x such that|x| < R we have that f is differentiable at x and moreover that f ′(x) = g(x).

Proof. (not examinable)

The first part is done by the lemma.

Suppose |x| < R; choose some r such that |x| < r < R. (For example, r := (|x| + R)/2 ifR < ∞, or r = |x| + 1 if R = ∞.)

For any point w such that |w| < r, consider

f(w) − f(x)

w − x− g(x) =

∞∑

n=1

an

(

wn − xn

w − x− nxn−1

)

=∞

∑

n=2

an

(

wn − xn

w − x− nxn−1

)

;

where we have added the series f(w), f(x) and g(x) term by term, which is justified by AOL.Our aim is to show that

f(w) − f(x)

w − x− g(x) → 0 as w → x.

The binomial identity

wn − xn

w − x= xn−1 + xn−2w + · · · + xwn−2 + wn−1

41

is easily proved by induction; then we have that for any w 6= x and n > 2

wn − xn

w − x− nxn−1 = xn−1 + xn−2w + · · · + xwn−2 + wn−1

−xn−1 − xn−1 − · · · − xn−1 − xn−1

=n−1∑

k=1

(

xn−1−kwk − xn−1)

=n−1∑

k=1

xn−1−k(

wk − xk)

.

Let

hn (w) = an

n−1∑

k=1

xn−1−k(

wk − xk)

for n = 2, 3, · · · .

Thenf(w) − f(x)

w − x− g(x) =

∞∑

n=2

hn (w)

All hn are continuous in R as they are polynomials in w; and hn(x) = 0 for all n > 2. Weclaim that

∑∞n=2 hn (w) converges uniformly in |w| 6 r. In fact

|hn (w) | 6 |an|n−1∑

k=1

|x|n−1−k(

|w|k + |x|k)

6 2n|an|rn−1.

Now∑

n|an|rn−1 is convergent, so that∑∞

n=2 hn (w) converges uniformly in closed disk{w : |w| 6 r} by the Weierstrass M-test. Hence

∑∞n=2 hn (w) is continuous in the disk

|w| 6 r as the uniform limit of continuous functions is continuous. Therefore

limw→x

∞∑

n=2

hn (w) =∞

∑

n=2

hn (x) = 0

so that

limw→x

f(w) − f(x)

w − x= lim

w→x

(

f(w) − f(x)

w − x− g(x)

)

+ g(x)

= limw→x

∞∑

n=2

hn (w) + g(x)

= g(x).

11.2 The Exponential Function, Trigonometric Functions, Hyperbolic Func-

tions

The following result follows immediately

42

Proposition 11.2.1. The functions expx, sinx, cos x, cosh x and sinhx can all be definedby power series with infinite radius of convergence so are all differentiable on R. Further:

(i) exp′ x = expx.

(ii) cos′ x = − sinx and sin′ x = cos x.

(iii) cosh′ x = sinhx and sinh′ x = cosh x.

Note 11.2.2. The other trigonometric and hyperbolic functions are defined in terms of cos

and sin or cosh and sinh. For example tanx :=sinx

cos xis defined for those x such that

cos x 6= 0. Then by the quotient rule it is differentiable wherever it is defined, and tan′ x =cos2 x + sin2 x

cos2 x. We will soon give an easy proof that cos2 x + sin2 x = 111

11.3 The Inverse Function

Theorem 11.3.1. Let f : [a, b] → [m, M ] be a strictly increasing continuous function from[a, b] on to [m, M ], with inverse function g : [m, M ] → [a, b]. Suppose that f is differentiableat x0 ∈ (a, b) and that f ′(x0) 6= 0. Then g is differentiable at f(x0), and

g′(f(x0) =1

f ′(x0)

Proof. We have already proved that g is continuous. Write y0 = f(x0). Then for y 6= y0

g(y) − g(y0)

y − y0=

x − x0

f(x) − f(x0)=

1f(x)−f(x0)

x−x0

where x = g(y), and so y = f(x).

Since g is continuous, x = g(y) → g(y0) = x0 as y → y0. Hence

f(x) − f(x0)

x − x0→ f ′(x0) as y → y0.

As f ′(x0) 6= 0 we use AOL to see that

limy→y0

g(y) − g(y0)

y − y0=

1

f ′(x0)

exists. That is, g is differentiable at y0, and

g′(y0) =1

f ′(x0)=

1

f ′(f−1(y0)).

11The Pythagoras Theorem.

43

11.4 Logarithms

We continue to deal only with the real case where, in section 6, we defined log : (0,∞) → R

as the inverse function of the real exponential function.

To see that log is differentiable at any y > 0 proceed as we did when we discussed continuity,by finding an A such that exp(−A) < y < exp(A) and then using the Inverse FunctionTheorem on the differentiable function exp : [−A, A] → [exp(−A), exp(A)]. We will find that

log′ y =1

exp′(log y)=

1

exp(log y)=

1

y

which may not be surprising—nevertheless it is good to have a proof.

11.5 Powers

For any x > 0 and any α ∈ R in section 6 we defined xα = exp(α log x). From the ChainRule and the properties of exponentials and logarithms we therefore have that

d

dxxα = αxα−1.

12 The Mean Value Theorem

12.1 Local maxima and minima

Definition 12.1.1. Let E ⊆ R and f : E → R.

(i) x0 ∈ E is a local maximum if for some δ > 0, f(x) 6 f(x0) whenever x ∈ (x0−δ, x0 +δ) ∩ E.

(ii) x0 ∈ E is a local minimum if for some δ > 0, f(x) > f(x0) whenever x ∈ (x0−δ, x0 +δ) ∩ E.

A local maximum or minimum is called a local extremum. If the inequality is strict wewill say that the extremum is strict.

Here is the crucial property (which, of course, you knew long before you started the course).

Proposition 12.1.2 (Fermat). Let f : (a, b) → R. Suppose that x0 ∈ (a, b) is a localextremum and f is differentiable at x0. Then f ′(x0) = 0.

Proof. If x0 is a local maximum, then there exists δ > 0 such thatf(x) − f(x0)

x − x06 0 whenever

0 < x − x0 < δ and x ∈ (a, b), so that

f ′+(x0) = lim

x→x0+

f(x) − f(x0)

x − x06 0.

44

On the other hand,f(x) − f(x0)

x − x0> 0 whenever −δ < x − x0 < 0 and x ∈ (a, b), so that

f ′−(x0) = lim

x→x0−

f(x) − f(x0)

x − x0> 0.

Since f is differentiable at x0, f ′(x0) = f ′−(x0) = f ′

+(x0) and hence f ′(x0) = 0.

Similarly if x0 is a local minimum.

Remark 12.1.3. It is essential that the interval (a, b) is open. Why?

12.2 Rolle’s Theorem

Theorem 12.2.1 (Rolle, 1691). Let f : [a, b] → R be continuous, and suppose that f isdifferentiable on (a, b). Suppose further that f(a) = f(b). Then there exists a point ξ ∈ (a, b)such that f ′(ξ) = 0.

Proof. If f is constant in [a, b], then f ′(x) = 0 for every x ∈ (a, b), so that any point—sayξ = 1

2(a + b)—will do.

As f is continuous on [a, b] it attains its maximum and minimum on [a, b] (by Theorems 4.1.2and 4.1.6.) Suppose ξ1 is the minimum and ξ2 is the maximum. As f(a) = f(b), either ξ1

or ξ2 lies in the open interval (a, b), or else f is constant and we are done. Suppose thenthat ξ ∈ (a, b) gives either the maximum or minimum. It is then a local extremum, and byFermat’s result f ′(ξ) = 0.

We can express this informally by saying‘between any two roots of f there is a root of f ′’.

Note 12.2.2. (i) Remember that f is differentiable implies that f is continuous. Thusthe hypotheses of Rolle would be satisfied if f was differentiable on [a, b] and f(a) = f(b).However, often it is important that Rolle holds under the given weaker conditions.

(ii) When using these theorems remember to check ALL conditions including the continuityand differentiability conditions. For example f : [−1, 1] → R given by f(x) = |x| satisfies allconditions of Rolle except that f is not differentiable at x = 0. But there is no ξ such thatf ′(ξ) = 0.

12.3 The Mean Value Theorem

This is one of the most important results in this course. It is a rotated version of Rolle.12

Theorem 12.3.1 (MVT). Let f : [a, b] → R be continuous, and suppose that f is differen-tiable on (a, b). Then there exists a point ξ ∈ (a, b) such that

f(b) − f(a) = f ′(ξ)(b − a).

12If in an examination you are asked to prove the Mean Value Theorem, then you need to provide alsoproofs of Fermat’s result and Rolle’s Theorem.

45

Proof. Apply Rolle’s theorem to the function

F (x) = f(x) − k(x − a),

where k is a constant to be determined. F : [a, b] → R is continuous, and is differentiable

on (a, b). We choose k so that F (a) = F (b), that is k =f(b) − f(a)

b − a. Thus Rolle’s theorem

applies, so for some number ξ between a and b, F ′(ξ) = 0. But F ′(x) = f ′(x) − k, so

f ′(ξ) = k =f(b) − f(a)

b − a, as required.

Note 12.3.2. Suppose we have the hypotheses of the MVT. Then for any a 6 a1 < b1 6 bwe can apply the MVT to f restricted to [a1, b1] and get

f(b1) − f(a1) = f ′(ξ1)(b1 − a1) for some ξ1 ∈ (a1, b1).

Note that (for a given function f) the value of ξ1 may depend on a1 and b1.

Corollary 12.3.3 (The Taylor Theorem, mark 1). Suppose that we have the hypotheses ofthe MVT and that x, x + h ∈ [a, b]. Then

f(x + h) − f(x) = f ′(x + θh)h for some θ ∈ (0, 1).

Proof. Suppose h < 0; then a 6 x + h < x 6 b. From the MVT applied to f on the interval[x + h, x] there exists ξ ∈ (x + h, x) such that

f(x) − f(x + h) = f ′(ξ)(−h).

Write ξ = x + θh, and note that x + h < x + θh < x implies—as h < 0—that 0 < θ < 1.

The cases h = 0 and h > 0 are left as exercises.

12.4 The Identity Theorem

Here is one of the most useful consequences of the MVT.

Corollary 12.4.1 (Identity Theorem). Let f : (a, b) → R be differentiable, and satisfyf ′(t) = 0 for all t ∈ (a, b). Then f is constant on (a, b).

Proof. Apply MVT to f on [x, y] where x, y are any two points in (a, b). (Note that fis differentiable on (a, b) implies that f is continuous on (a, b) and hence f is continuouson [x, y].) Then f(x) − f(y) = f ′(ξ)(x − y) for some ξ ∈ (x, y). But f ′(ξ) = 0, so thatf(x) = f(y). Therefore f is constant in (a, b).

Example 12.4.2. Suppose that φ is a function whose derivative is x2. Then we have, forall x, that φ(x) = 1

3x3 + A for some constant A.

Proof. Let f(x) := φ(x) − 13x3; then f is differentiable and we can calculate that f ′(x) =

x2 − 13 · 3x2 = 0. By the Identity Theorem f(x) = A for some constant A. You can justify

other ‘integrations’ similarly . Just guess the ‘integral’ and proceed as above.

46

Note 12.4.3. Often when applying mathematics we have to solve a differential equation.Last term you learned methods for guessing solutions. The Identity Theorem gives us a toolto prove the uniqueness of solutions of DE, and lets us justify these clever tricks (see Section13.3). Those who do PDEs have already seen this idea this term where you showed thatE′(t) = 0 and then deduced that E(t) is a constant (which then turned out to be zero).

12.5 Derivatives and monotonicity

Corollary 12.5.1. Let f : (a, b) → R be differentiable.

(i) If f ′(x) > 0 for all x ∈ (a, b) then f is increasing on (a, b).

Proof: Apply the MVT to any [x, y] ⊂ (a, b) to get f(y) − f(x) = f ′(ξ)(y − x), a product ofnon-negative numbers. Hence f(y) > f(x) and we are done.

(ii) If f ′(x) 6 0 for all x ∈ (a, b) then f is decreasing on (a, b).

(iii) If f ′(x) > 0 for all x ∈ (a, b) then f is strictly increasing on (a, b).

(iv) If f ′(x) < 0 for all x ∈ (a, b) then f is strictly decreasing on (a, b).

12.6 The Cauchy Mean Value Theorem

Sometimes we are concerned with more than one function, and would like to use the MVTor a MVT type argument. The following is what we need: except in the most trivial cases itnever helps to apply the MVT to the functions separately—we generate too many distinctξ’s.

Corollary 12.6.1 (Cauchy’s Mean Value Theorem). 13 Let f, g : [a, b] → R be continuouson [a, b] and differentiable on (a, b). Suppose that g′(x) 6= 0 for all x ∈ (a, b). Then for someξ ∈ (a, b) we have that

f ′(ξ)

g′(ξ)=

f(b) − f(a)

g(b) − g(a).

Proof. To be supplied later when we need the result.

13 Applications of the MVT

13.1 Exponential and Logarithm

Proposition 13.1.1. exp(x + y) = exp(x) exp(y) for all x, y ∈ R.

Proof. We have to use the Identity Theorem—but on what function? Fixing y and lookingat f(x) = exp(x + y)− exp(x) exp(y) leads to f ′ = f and f(0) = 0 which we could now solveto get f(x) = 0 (see section 13.3).

13This is where the result belongs logically, but in the lectures it will not appear until later, when we doL’Hospital’s Rule.

47

However a much better (more direct) way is to fix x + y instead. So, fix c ∈ R, andput g(t) = exp c − exp t exp(c − t). Then we have that g′(t) = 0 so that g(t) = g(0) bythe Identity Theorem. Now g(0) = exp c − exp 0 exp c = 0. So for any c, t we have thatexp c − exp t exp(c − t) = 0. Put c := (x + y), and t := x to get the result.

Corollary 13.1.2. log(uv) = log(u) + log(v) for all u, v ∈ (0,∞).

Proof. From above

exp(log(u) + log(v)) = exp(log(u)) exp(log(v)) = uv = exp(log(uv))

and take logs.

We can also use the MVT to prove the monotonicity of the exponential function.

Proposition 13.1.3. The function exp : R → (0,∞) is strictly increasing.

Proof. As exp x > 0, its derivative is positive.

13.2 Trigonometric Functions

Proposition 13.2.1 (The Pythagoras Theorem). For all real x we have that

cos2 x + sin2 x = 1.

Proof. Let f(x) := cos2 x + sin2 x − 1. Then by what we have proved about derivatives oftrigonometric functions,

f ′(x) = 2 cos x(− sinx) + 2 sinx cos x − 0 = 0.

for all x.

By the Identity Theorem applied to f on some interval (−A, A) we see that for x ∈ (−A, A)we have that

f(x) = f(0) = cos2 0 + sin2 0 − 1 = (1)2 − 0 − 1 = 0.

As this is true for any A we get that f(x) = 0 on R.

Proposition 13.2.2 (Addition Formulae). For all real x, y we have that

(i) cos(x + y) = cosx cos y − sin x sin y;

(ii) sin(x + y) = sinx cos y + cos x sin y.

Proof. It is enough to prove one, the other is got by fixing y and taking the derivative of theresulting function of x.

We recall what we did for exponentials: let

h(x) = cos c − cos x cos(c − x) + sinx sin(c − x)

48

whose derivative is

h′(x) = 0 + sinx cos(c − x) − cos x sin(c − x) + cos x sin(c − x) − sinx cos(c − x) = 0

so that by the Identity Theorem h(x) = h(c) = 0.

Proposition 13.2.3. The function cos x :=∑∞

0

(−)kx2k

(2k)!has a least positive zero which we

denote (for the moment) by α.

Proof. First we need to see that there are positive zeros. Note that

cos 0 =∞

∑

0

(−)k02k

(2k)!= 1 > 0

and (by looking at pairs of terms)

cos x = 1 − x2

2!+

x4

4!−

∞∑

1

x4k+2

(4k + 2)!

(

1 − x2

(4k + 4)(4k + 3)

)

6 1 − x2

2!+

x4

4!

provided x2 6 (4 + 4)(4 + 3). As 1 − x2

2! + x4

4! = 124

[

(x2 − 6)2 − 12]

we see that cos√

6 < 0.

By the IVT, cos x has at least one zero in [0,√

6].

Now letS = {t > 0 : cos t = 0}.

Then S 6= ∅ and S is bounded below, so that α = inf S exists. By definition of inf S, given nthere exists tn such that α ≤ tn < α+1/n. Thus tn → α as n → ∞. But cos x is continuous,so that cos tn → cos α, and hence cos α = 0. But cos 0 = 1 so that α is the minimum positivezero required.

Proposition 13.2.4. sinα = 1.

Proof. By Pythagoras, sinα = ±1. Suppose sinα = −1; then by the MVT there would be

some ξ ∈ (0, α) such that cos ξ =sinα − sin 0

α − 0=

−1

α< 0. However, cos 0 = 1, and α is the

first root, so by the IVT cos ξ cannot be negative.

Proposition 13.2.5 (Periodicity). For all real x we have that

(i) cos(x + α) = − sinx and sin(x + α) = cosx;

(ii) cos(x + 2α) = − cos x and sin(x + 2α) = − sinx;

(iii) cos(x + 4α) = cos x and sin(x + 4α) = sinx.

Proof. We just use the addition formula repeatedly, inserting the values cos α = 0 andsin α = 1.

Now that we have proved these results, and the danger of using ‘obvious’ but unprovedproperties of π has passed we can make the following definition:

49

Definition 13.2.6. π := 2 · inf{β | β > 0 and cos β = 0}(= 2α).

We need one more result, and then we have established “all” the usual facts about thetrigonometric functions.

Proposition 13.2.7. The zeros of cos x are at precisely the points {(k + 12)π : k ∈ Z}.

Proof. By Proposition 13.2.5(ii) and (iii), for k ∈ Z, cos(12π + kπ) = (−1)k cos 1

2π = 0 sothese are all zeros. If β were such that cos β = 0 then as above we have a k such thatβ0 = β + kπ ∈ (0, π] is a zero of cos x. Clearly β0 6< 1

2π by definition. Using

cos(π − x) = − cos(−x) = − cos(x)

we see that if β0 > 12π then π − β0 < 1

2π is a zero of cos x, which cannot be. Hence β0 = 12π,

and β has the required form.

13.3 Differential Equations

Here is a very fundamental example of how we use the MVT to establish the uniqueness ofa solution of a differential equation.

Example 13.3.1. Show that the general solution for f ′(x) = f(x) ; x ∈ (0, +∞), isf(x) = A exp(x) where A is a constant.

The ‘trick’ for solving differential equations is to manipulate them so that they look liked

dxF = 0 for some F , and then ‘integrate’. We often achieve this by multiplying by ‘inte-

grating factors’, for which there are recipe books. The same ‘trick’ lets us apply the MVT(or Identity Theorem) to prove that we have a solution.

Given this differential equationdf

dx− f = 0 we would multiply it by e−

R

1dx, rewrite it as

d

dx(e−xf(x)) = 0 and deduce that e−xf(x) = A.

Let’s write this as a piece of pure mathematics!

Consider g(x) := f(x) exp(−x). Then g′(x) = f ′(x) exp(−x) − f(x) exp(−x) = 0. Hence, bythe Identity Theorem g(x) is constant; that is f(x) exp(−x) = A say, and so f(x) = A exp(x).

Example 13.3.2. This example is based on a Calculus question from a Mods Collection.Find all solutions of

y′′ − 2

1 + x2y = 0.

(The emphasis for us is on “all”.)

I will probably not do this in the lectures.

The following is all motivated by the method for finding a second solution for second-orderlinear ordinary differential equations when one solution is known, which you learnt in the

50

‘Calculus of One Variable’ course last term. We use the methods from this course to showthat these are all the solutions.

We can check easily that (1 + x2) is a solution; so we write

z(x) =y(x)

1 + x2.

An easy calculation yields

y′ = z′(1 + x2) + 2xz and y′′ = z′′(1 + x2) + 4xz′ + 2z,

so that z must satisfyz′′(1 + x2) + 4xz′ = 0

and hence[

z′(1 + x2)2]′

= (z′′(1 + x2) + 4xz′)(1 + x2) = 0.

By the Identity Theoremz′(1 + x2)2 = A

for some constant A and so

z′(x) =A

(1 + x2)2.

Although of course we can’t ‘integrate up’ yet — we don’t know what that means — we cantake the hint and look at what the integral would be, namely

w(x) =1

2

[

arctanx +x

1 + x2

]

;

here arctan is the inverse function of tan. So by the Inverse Function Theorem and the otherrules of differentiation which we have established we can check that

w′(x) =1

(1 + x2)2= z′(x)/A.

Hence by the Identity Theorem z(x) − Aw(x) = B for some constant B, and so the onlysolutions are

y(x) =A

2

[

(1 + x2) arctan x + x]

+ B(1 + x2).

13.4 The functionsin x

x

This is a good example of how the Mean Value Theorem and its various corollaries are used practically.I will probably not do this in lectures.

Proposition 13.4.1. Let 0 < x < 1

2π. Then

(i) sinx < x < tan x and so cos x < sin xx

< 1;

(ii) limx→0sin x

x= 1;

(iii) 2

π< sin x

x< 1 (Jordan’s inequality).

51

We therefore have the following bounds:

max{cos x,2

π} <

sin x

x< 1.

Proof. To prove the first inequality, consider f(x) = tan x−x, for x ∈ [0, 1

2π). Then f is differentiable

on (0, 1

2π) and

f ′(x) =1

cos2 x− 1 > 0 for all x ∈ (0, 1

2π).

Hence f is strictly increasing on [0, 1

2π); in particular f(x) > f(0) for any x ∈ (0, 1

2π) which yields

tan x > x. Considering x − sinx in the same way will give x > sin x.

The second inequality in (i) is got by inverting and multiplying by sinx; this is justified since sinx > 0until the smallest positive zero of cos x.

For (ii) we use a version of the sandwich theorem and the continuity of cosx to get that limx→0+sin x

x

exists and

1 = limx→0+

cos x 6 limx→0+

sinx

x6 1.

Assin x

xis an even function this gives that limx→0

sinx

x= 1.

Now consider

h(x) =sin x

xfor x ∈ (0, 1

2π].

Then

h′(x) =cos x(x − tan x)

x2< 0 for all x ∈ (0, 1

2π)

so that h is strictly decreasing, and hence g(x) > g( 1

2π) for any x ∈ (0, 1

2π); this gives the first

inequality of (iii). The second is already included in (i).

14 L’Hopital’s Rule

This section is devoted to a variety of rules and techniques for calculating limits of quotients.They derive from results of Guillaume de l’Hopital; perhaps they are really due to JohannBernoulli whose lecture notes l’Hopital published in 1696.

14.1 The Cauchy Mean Value Theorem

As promised earlier here is the proof of Cauchy’s symmetric form of the MVT. (At first sightone might think we could just apply the MVT to f and g separately. However, a moment’sreflection will show that we would then get two different ξ.)

Theorem 14.1.1 (Cauchy’s Mean Value Theorem). Let f, g : [a, b] → R be continuous on[a, b] and differentiable on (a, b). Suppose that g′(x) 6= 0 for all x ∈ (a, b). Then for someξ ∈ (a, b) we have that

f ′(ξ)

g′(ξ)=

f(b) − f(a)

g(b) − g(a).

52

Proof. First, this makes sense: we cannot have g(b) − g(a) = 0 or by Rolle’s Theorem therewould be a point η ∈ (a, b) with g′(η) = 0.

Now let the function F be defined on [a, b] by

F (x) :=

∣

∣

∣

∣

∣

∣

1 1 1f(x) f(a) f(b)g(x) g(a) g(b)

∣

∣

∣

∣

∣

∣

that is

F (x) = (f(a)g(b) − f(b)g(a)) + f(x) (g(a) − g(b)) + g(x) (f(b) − f(a))

which, being a linear combination of f and g is continuous on [a, b] and differentiable on(a, b). Clearly F (a) = F (b) = 0; so Rolle’s Theorem applies and yields a ξ ∈ (a, b) such thatF (ξ) = 0. But

0 = F ′(ξ) = 0 + f ′(ξ) (g(a) − g(b)) + g′(ξ) (f(b) − f(a))

and we are done after dividing by the non-zero g′(ξ)(g(b) − g(a)).

14.2 The L’Hopital Rule

Proposition 14.2.1. Suppose f , g are continuous on [a, a + δ] (for some δ > 0), and

differentiable in (a, a+ δ), and that f(a) = g(a) = 0. Suppose further that l := limx→a+f ′(x)g′(x)

exists.

Then

limx→a+

f(x)

g(x)= lim

x→a+

f ′(x)

g′(x).

Proof. Note that there must exist a δ′ < δ such that on (a, a + δ′] we have that g′(x) 6= 0,for otherwise the function f ′(x)/g′(x) would not be defined near a and so this limit couldnot be defined.

For every x ∈ (a, a+δ′), apply Cauchy’s MVT to f , g on the interval [a, x]: there is ξx ∈ (a, x)such that

f(x)

g(x)=

f(x) − f(a)

g(x) − g(a)=

f ′(ξx)

g′(ξx).

But if x → a+, then ξx → a with ξx > a, so that

limx→a+

f ′(ξx)

g′(ξx)= l.

Hence

limx→a+

f(x)

g(x)= lim

x→a+

f ′(ξx)

g′(ξx)= l

Similarly we prove

53

Corollary 14.2.2. Suppose f , g are continuous on [a− δ, a] (for some δ > 0), and differen-

tiable in (a − δ, a), and that f(a) = g(a) = 0. Suppose further that l := limx→a−f ′(x)g′(x) exists.

Then

limx→a−

f(x)

g(x)= lim

x→a−

f ′(x)

g′(x).

The proof of the following is now immediate.

Corollary 14.2.3 (L’Hopital’s Rule (L’HR)). Suppose f , g are continuous on [a − δ, a + δ](for some δ > 0), and differentiable in (a− δ, a+ δ)\{a}, and that f(a) = g(a) = 0. Suppose

further that l := limx→af ′(x)g′(x) exists. Then

limx→a

f(x)

g(x)= lim

x→a

f ′(x)

g′(x).

Note 14.2.4. Sometimes this is called the 00 case of L’HR.

14.3 Some Applications

Example 14.3.1. Prove that

limx→0

1 − cos x

x2=

1

2.

We argue like this:

limx→0

1 − cos x

x2= lim

x→0

sinx

2xby L’HR, provided this limit exists

= limx→0

cos x

2by L’HR, provided this limit exists

=1

2and this limit exists by the continuity of cosx;

so the above equalities hold.

To justify this we need to see L’HR, which we have used twice is actually applicable. Butby standard results we have already proved:1−cos x and x2 are continuous on [−1

2π, 12π], zero at zero, and differentiable on (1

2π, 12π)\{0}

with derivatives sinx and 2x;sin x, and 2x are continuous on [−1

2π, 12π], zero at zero, and differentiable on (1

2π, 12π) \ {0}

with derivatives cosx and 2

Exercise: Prove similary that limx→0sin x

x = 1.

Example 14.3.2.

limx→0

log(1 + x)

x= 1.

54

Again we argue:

limx→0

log(1 + x)

x= lim

x→0

log′(1 + x)

x′by L’HR, provided this limit exists

= limx→0

1

1 + xderivative of log t is

1

t

= 1 by continuity of1

1 + x—as this exists previous equalities hold.

To justify the use of L’HR we need to see that log(1 + x) and x are continuous on [−12 , 1

2 ], 0at 0, and differentiable on (−1

2 , 12) \ {0}.

Example 14.3.3.

limx→0

(1 + x)1x = e.

Recall that by definition (1 + x)1x := exp

(

1x log(1 + x)

)

. So consider first log(1+x)x . By the

previous example this has limit 1. Now by the continuity of exp(x) we see that

(1 + x)1x = exp

(

log(1 + x)

x

)

→ exp(1) = e as x → 0.

14.4 L’Hopital’s Rule: infinite limits

If we have all the hypotheses for L’Hopital’s rule, except that we have

f ′(x)

g′(x)→ +∞ as x → a

then we swap f and g, then use L’HR and conclude that

g(x)

f(x)→ 0 as x → a.

14.5 L’Hopital’s Rule at ∞

Suppose f, g : (a,+∞) → R are continuous and differentiable, with f(x) → 0 and g(x) → 0

as x → ∞. If g′(x) 6= 0 on (a,+∞) and f ′(x)g′(x) → l as x → ∞, then we can deduce that

limx→∞f(x)g(x) = l.

All we need do is apply L’HR to the functions F (x) = f( 1x) and G(x) = g( 1

x), with F (0) =0 = G(0), checking carefully that the hypotheses hold.

14.6 L’Hopital’s Rule—the ∞∞

case

There is one important variant which we cannot obtain by algebraic manipulation, or bytaking logarithms or exponentials or similar tricks. This will probably not be covered in thelectures.

55

Proposition 14.6.1 (L’HR, the ∞∞ case). Let f, g : (a, a+ δ) → R be differentiable for some

δ > 0. Suppose further that f(x) → ∞ and g(x) → ∞ as x → a+ and that limx→a+f ′(x)g′(x)

exists.

Then

limx→a+

f(x)

g(x)= lim

x→a+

f ′(x)

g′(x).

Note 14.6.2. We do not want to make too much heavy weather in this proof; checking allthe details is a good exercise.

Proof. Write K := limx→a+f ′(x)g′(x) . Let ε > 0, then there exists a δ1 > 0 such that δ1 < δ and

∣

∣

∣

∣

f ′(x)

g′(x)− K

∣

∣

∣

∣

< 12ε for all x ∈ (a, a + δ1).

Now fix some c in (a, a + δ1).

For any x ∈ (a, c) we apply Cauchy’s MVT to f , g on [x, c]: there is a number ξx ∈ (x, c)such that

f(c) − f(x)

g(c) − g(x)=

f ′(ξx)

g′(ξx).

Since ξx ∈ (x, c) ⊂ (a, a + δ1), we have that∣

∣

∣

∣

f(x) − f(c)

g(x) − g(c)− K

∣

∣

∣

∣

=

∣

∣

∣

∣

f ′(ξx)

g′(ξx)− K

∣

∣

∣

∣

< 12ε for all x ∈ (a, c).

(Unlike the 00 case we cannot conclude immediately that f(x)−f(c)

g(x)−g(c) → K as x → a+ (although

it does !!), as there is no guarantee that ξx will tend to a as x → a+).

Clearing the fraction we have that

|f(x) − f(c) − Kg(x) + Kg(c)| < 12ε |g(x) − g(c)|

so that the Triangle Law gives us

|f(x) − Kg(x)| < 12ε |g(x) − g(c)| + |f(c) − Kg(c)|

or∣

∣

∣

∣

f(x)

g(x)− K

∣

∣

∣

∣

< 12ε

∣

∣

∣

∣

1 − g(c)

g(x)

∣

∣

∣

∣

+|f(c) − Kg(c)|

|g(x)| .

Now use the fact that g(x) → ∞; we can find a δ2 > 0, such that δ2 < δ1 and such that∣

∣

∣

∣

1 − g(c)

g(x)

∣

∣

∣

∣

< 32 and

|f(c) − Kg(c)||g(x)| < 1

4ε

so that for a < x < a + δ2 we have∣

∣

∣

∣

f(x)

g(x)− K

∣

∣

∣

∣

< 12

32ε + 1

4ε = ε

as required.

56

14.7 More applications

These examples might be better done by using the standard limit from Analysis I that, ifα > 0, x exp(−αx) → 0 as x → ∞ .

Example 14.7.1. limx→+∞log xxµ = 0 for any µ > 0.

Let g(x) = xµ = exp(µ log x). Then g′(x) = µxµ−1. So by L’Hopital’s rule (∞∞ case) we have

limx→+∞

log x

xµ= lim

x→+∞

1x

µxµ−1provided this limit exists

= limx→+∞

1

µxµ= 0 which does exist.

Example 14.7.2. For any µ > 0, limx→0+ xµ log x = 0 .

We transform this into ∞∞ form and then by L’HR

limx→0+

xµ log x = limx→0+

log x

x−µ

= limx→0+

log′ x

(x−µ)′if this limit exists

= limx→0+

1x

(−µ)x−µ−1= lim

x→0+

xµ

(−µ)= 0 which does exist.

Finally

Example 14.7.3. Show that

limx→0

(

sin xx

)

11−cos x = e−

13 .

Since f(x) =(

sin xx

)1

1−cos x is an even function, we only need to show that limx→0+ f(x) = e−13 .

According to definition

f(x) = exp

(

1

1 − cos xlog

sinx

x

)

= exp

(

log sinx − log x

1 − cos x

)

.

By the L’Hopital Rule,

limx→0+0

log sin x − log x

1 − cos x= lim

x→0+

cos xsin x − 1

x

sinx[provided it exists; recall

sinx

x→ 1]

= limx→0+

x cos x − sinx

x sin2 x

= limx→0+

cos x − x sinx − cos x

sin2 x + 2x sinx cos x[if exists,using L’Hopital]

= − limx→0+

x

sin x + 2x cos x

= − limx→0+

1

cos x + 2 cos x − 2x sinx[if exists,using L’Hopital]

= −1

3[continuity] .

57

Since exp is continuous at −13 ,

limx→0+

(

sinx

x

)1

1−cos x

= limx→0+

exp

(

log sinx − log x

1 − cos x

)

= exp

(

limx→0+

log sinx − log x

1 − cos x

)

[by continuity of exp]

= exp

(

−1

3

)

.

14.8 Health Warning

L’Hopital’s Rule is very seductive. But it is often not the best way to evaluate limits. Taylor’sTheorem, to which we turn next, is often more useful, and indeed more informative.

If you doubt this, then use L’HR to work out limx→0sinhx4 − x4

(x − sin x)4, and then later use Taylor’s

Theorem to write it down at sight—and decide which is better.

15 Taylor’s Theorem

15.1 Motivation

Suppose that f : (a − δ, a + δ) → R and that for some n > 1 the derivatives f ′, f ′′, . . . , f (n)

exist on the interval. For convenience write f0 := f .

We can then form the Taylor polynomials

Pn(x) := f(a) + f ′(a)(x − a) +f ′′(a)

2!(x − a)2 + · · · + f (n)(a)

n!(x − a)n

a polynomial of degree n in x. This polynomial ‘agrees with f ’ to the extent that P(k)n−1(a) =

f (k)(a) for k = 0, . . . , n.

We have

P0(x) = f(a) constant approximation, not very interesting;

P1(x) = f(a) + f ′(a)(x − a) linear approximation;

P2(x) = f(a) + f ′(a)(x − a) +f ′′(a)

2!(x − a)2 quadratic approximation;

· · · and so on.

We might hope, on the basis of our experience, that Pn(x) is a good approximation to f(x);we’d like to justify that intuition.

We’d also like to consider the power series

P (x) :=∞

∑

k=0

f (k)(a)

k!(x − a)k

58

which is called the Taylor expansion of f at a. Our previous experience leads us to conjecturethat this must equal f(x).

To investigate these questions we will look at the ‘error term’

En(x) := f(x) − Pn(x).

(Clearly, if f has derivatives of all orders, Pn(x) → f(x) as n → ∞ if and only if En(x) → 0.)Unfortunately, even if f has derivatives of all orders, it need not be true that En(x) → 0 asn → ∞, so we have to move more carefully. First, we will prove Taylor’s Theorem which willgive us information about En(x). Secondly, in individual cases we have to consider whetherEn(x) → 0 as n → ∞.

15.2 A cautionary example

Our intuition, built on experience of polynomials, trigonometric and exponential functions,is misleading. The following example shows us that there are functions f(x), with derivatives

of all orders at every point of R, such that∑ fk(0)

k! xk is convergent for every x—but for whichEn(x) 6→ 0.

Consider f : R → R defined by

f(x) =

{

exp(− 1x2 ) whenever x 6= 0,

0 for x = 0.

A little experimentation shows that the k-th derivative must look like

f (k)(x) =

{

Qk(1x) exp(− 1

x2 ) whenever x 6= 0,0 for x = 0,

for some polynomial Qk of degree 3k. We can prove this by induction: At points x 6= 0 thisis routine use of linearity, the product rule and the chain rule. But at x = 0 we need to takemore care, and use the definition:

f (k)(x) − f (k)(0)

x − 0=

1

xQk

(

1

x

)

exp

(

− 1

x2

)

=3k+1∑

s=1

asexp(− 1

x2 )

xs

which we must prove tends to zero as x → 0; if we change the variable to t = 1x then we have

a finite sum of terms like ts exp−t2 which we know tend to zero as |t| tends to infinity.

So for this function f the series∑ f (k)(0

k! xk = 0 so converges to 0 at every x. But the errorterm En(x) is the same for all n (it equals f(x)) and so does not tend to 0 at any pointexcept 0.

Note that we can add this function to expx and sinx and so on, and get functions withthe same set of derivatives at 0 as these functions, so that they will have the same Taylorpolynomials—but are different functions.

Remark 15.2.1. Functions defined and differentiable on C are very different: for them, ournaive intuition is a good guide—but that is next year’s Analysis course.

59

15.3 Taylor’s Theorem with Lagrange Remainder

We now concentrate on the Taylor polynomial and investigate its difference from the function.

Theorem 15.3.1 (Taylor’s Theorem). Let f : [a, b] → R. Suppose that for some n > 1 wehave that f , f ′, f ′′, . . . , f (n−1) exist and are continuous on [a, b] and that f (n) exists on (a, b).

Then there is a number ξ ∈ (a, b) such that

f(b) =n−1∑

k=0

f (k)(a)

k!(b − a)k +

f (n)(ξ)

n!(b − a)n .

Note 15.3.2. Recall that at the end points a and b ‘differentiable’ means ‘left- (or right-)differentiable.

Note 15.3.3. The term f (n)(ξ)n! (b − a)n is called Lagrange’s form of the remainder. Note

that the crucial parameter ξ, may depend on (i) the function f ; (ii) the degree n; (iii) theend points a and b.14

Note 15.3.4. If we set b − a = h, then Taylor’s theorem may be stated as

f(a + h) =n−1∑

k=0

f (k)(a)

k!hk +

f (n)(a + θh)

n!hn (Tn)

where θ is some number between 0 and 1.

Proof. We look at the Taylor polynomial part,

n−1∑

k=0

f (k)(a)

k!(b − a)k

and we use the method of “varying a constant”: that is we look at the following functiondefined on [a, b]

F (x) :=n−1∑

k=0

f (k)(x)

k!(b − x)k .

This is clearly continuous on [a, b] and on (a, b) we have that

F ′(x) =n−1∑

k=0

f (k+1)(x)

k!(b − x)k +

n−1∑

k=0

f (k)(x)

k!(−1)k (b − x)k−1

=n

∑

k=1

f (k)(x)

(k − 1)!(b − x)k−1 −

n−1∑

k=1

f (k)(x)

(k − 1)!(b − x)k−1 =

f (n)(x)

(n − 1)!(b − x)n−1 .

Note also that F (a) =∑n−1

k=0f (k)(a)

k! (b − a)k and F (b) = f(b).

14When applying Taylor’s Theorem to different functions (perhaps as similar as f(x) and f(−x)) or differentranges (perhaps as similar as [0, b] and [−b, 0]) it is essential to use a different letter for each ξ that is introduced.

60

Let G(x) be continuous on [a, b] and differentiable on (a, b). We use Cauchy’s Mean ValueTheorem on this pair of functions to see that there exists a ξ ∈ (a, b) such that

F (a) − F (b)

G(a) − G(b)=

F ′(ξ)

G′(ξ).

That is∑n−1

k=0f (k)(a)

k! (b − a)k − f(b)

G(a) − G(b)=

f (n)(ξ)(n−1)! (b − ξ)n−1

G′(ξ). (∗)

But if we takeG(x) := (b − x)n,

which is clearly continuous and differentiable on (a, b) with derivative −n(b − x)n−1 < 0,then (*) simplifies at once to

f(b) =

n−1∑

k=0

f (k)(a)

k!(b − a)k +

f (n)(ξ)

n!(b − a)n .

We have proved the strongest theorem we could. But often we know a bit more, and canget, for example this symmetric version:

Corollary 15.3.5 (Taylor’s Theorem). Let f : (a − δ, a + δ) → R for some δ > 0. Supposethat for some n > 1 we have that f ′, f ′′, . . . , f (n) exist. Let x ∈ (a − δ, a + δ). Then there isa number ξ between a and x such that

f(x) =n−1∑

k=0

f (k)(a)

k!(x − a)k +

f (n)(ξ)

n!(x − a)n .

Proof. If x > a then this is just the Taylor Theorem we have proved. If x < a we justuse the Taylor Theorem we have proved on the function f(−x) and sort out the signs andinequalities. If x = a then take ξ = a.

15.4 Other forms of the remainder

In the proof of Taylor’s Theorem we may use any function G which is continuous in [a, b],differentiable in (a, b), and such that G′ 6= 0. Then we will have a ξ ∈ (a, b) such that

f(b) = Pn−1(b) +f (n)(ξ)

(n − 1)!(b − ξ)n−1 G(b) − G(a)

G′(ξ).

By choosing different functions G, you may prove Taylor’s Theorem with the remainder ofdifferent forms. For example, if we choose G(x) = x − a, then G(b)−G(a)

G′(ξ) = b − a. Thus

f(b) = Pn−1(b) +f (n)(ξ)

(n − 1)!(b − a) (b − ξ)n−1 for some ξ ∈ (a, b).

Exercise 15.4.1. Try G(x) = (x − a)m for a power m > 1 to see what kind of Taylor’sformula you can get.

61

15.5 The error estimate

Taylor’s Theorem also provides us with an explicit estimate of the difference between f(x)

and its n-Taylor approximation∑n−1

k=0f (k)(a)

k! (x − a)k:

Corollary 15.5.1. Let f : [a, b] → R satisfy the conditions in Taylor’s Theorem, and let

En := |b−a|n

n! supξ∈(a,b) f (n)(ξ). Then

∣

∣

∣

∣

∣

f(x) −n−1∑

k=0

f (k)(a)

k!(x − a)k

∣

∣

∣

∣

∣

6 En for all x ∈ [a, b].

Of course this may not be useful, as the supremum may be infinite. If however in a givensituation we know a bit more—for example, that f (n) is differentiable on [a, b] then we canuse standard calculus to evaluate En.

15.6 Example: the function log(1 + x)

By way of an example we prove the following:

Proposition 15.6.1. We have

log(1 + x) =∞

∑

n=1

(−1)n−1 xn

nfor all x ∈ (−1, 1].

Before we start, note that there is no point in trying to prove that these are equal on anylarger real domain as the radius of convergence of the series is by the ratio test equal to 1and at the other end point x = −1 the series is the notoriously divergent Harmonic Series∑ 1

n .

But we will prove equality on all of (−1, 1], in particular that log 2 =∑∞

n=1(−1)n−1

n .

Proof. Consider f(x) = log(1 + x). We have already proved that on (−1,∞) the function f

is differentiable with f ′(x) = 11+x ; and so, by the usual rules we have f (n)(x) = (−1)n+1(n−1)!

(1+x)n

for all n > 1.

Hence, by Taylor’s Theorem (the symmetric version)

log(1 + x) −n−1∑

k=1

(−1)k−1xk

k= (−1)n−1 1

n

(

x

1 + ξn

)n

for some ξn between 0 and x.

To get our result it will be enough to show that∣

∣

∣

∣

x

1 + ξn

∣

∣

∣

∣

6 1

for every n and x ∈ (−1, 1].

62

For x > 0 this is no problem, 0 < ξn < 1 and so 1 + ξn > 1; hence x1+ξn

< x 6 1.

For negative x it is not so easy; the nearer x is to −1 the nearer 1+ξn may get to 0. However,if x > −1

2 we have−1

2 6 x 6 ξn 6 0

and so12 6 1 + x 6 1 + ξn 6 1

which implies

2 >1

1 + x>

1

1 + ξn> 1;

multiplying by x < 0 yields

2x 6x

1 + x6

x

1 + ξn6 x.

Now 2x > −1 and x 6 1 so we have∣

∣

∣

∣

x

1 + ξn

∣

∣

∣

∣

6 1

as required.

That is, the functions log(1 + x) and∑∞

k=1(−1)k−1 xk

k are equal on [−12 , 1].

What about (−1,−12)? We must use a different argument.

Consider the functions f(x) = log(1 + x) and g(x) =∑∞

k=1(−1)k−1 xk

k on (−1, 1). Bothare differentiable there; we have proved f ′(x) = 1

1+x , and by the theorem on term-by-termdifferentiation of power series

g′(x) =∞

∑

k=1

(−1)k−1kxk−1

k=

∞∑

k=1

(−1)k−1xk−1 =1

1 + x.

Hence f ′(x) − g′(x) = 0, so by the Identity Theorem,

f(x) − g(x) = f(0) − g(0) = 0.

That is, on the whole of (−1, 1] we have the required series expansion.

Remark 15.6.2. The last part has actually proved the result for x ∈ (−1, 1). It is only atx = 1 that we have to prove that the error tends to zero.

16 The Binomial Theorem

In this section we use many of the theorems we have proved about uniform convergence andcontinuity, power series, monotonicity as well as Taylor’s Theorem. As well as proving animportant result we are showing off the techniques we now have available to us.

63

16.1 Motivation and Preliminary Algebra

By simple induction we can prove that for any natural number n (including 0) we have forall real or complex x that

(1 + x)n =

k=n∑

k=0

(

n

k

)

xk;

where the coefficient(

nk

)

of xk can be proved to be

(

n

k

)

=n!

k!(n − k)!=

n · (n − 1) · · · · · (n − k + 1)

k · (k − 1) · · · · · 1 .

We have also seen in our work on sequences and series that

(1 + x)−1 =∞

∑

k=0

(−1)kxk for all |x| < 1

and here the coefficient of xk is

(−1)k =(−1) · (−2) · · · · · (−k)

k · (k − 1) · · · · · 1 ;

and we can prove by induction (for example using differentiation term by term) that for allnatural numbers n > 1 we have that

(1 + x)−n =∞

∑

k=0

(−n) · (−(n + 1)) · · · · · (−(n + k − 1))

k · (k − 1) · · · · · 1 xk for all |x| < 1.

In this section we are going to generalise these—in the case of some real values of x—to allvalues of n, not just integers. Note that this is altogether deeper: (1 + x)p is defined fornon-integral p, and for (real) x > −1, to be the function exp(p log(1 + x)).

Definition 16.1.1. For all p ∈ R and all k ∈ N we extend the definition of binomial

coefficient as follows:(

p

0

)

:= 1; and

(

p

k

)

:=p(p − 1) . . . (p − k + 1)

k!.

We now make sure that the key properties of binomial coefficients are still true in this moregeneral setting.

Lemma 16.1.2.

k

(

p

k

)

= p

(

p − 1

k − 1

)

, for all k > 1.

Proof. If k = 1 then by the definition we must see 1 · p1 = p · 1 which is clear. Otherwise

k

(

p

k

)

= kp(p − 1) . . . (p − k + 1)

k!= p

(p − 1) . . . (p − k + 1)

(k − 1)!= p

(

p − 1

k − 1

)

.

64

Lemma 16.1.3.(

p

k

)

+

(

p

k − 1

)

=

(

p + 1

k

)

, for all k > 1.

Proof. When k = 1 we must prove p1 + 1 = p+1

1 which is clear. Otherwise

(

p

k

)

+

(

p

k − 1

)

=p(p − 1) . . . (p − k + 1)

k!+

p(p − 1) . . . (p − k + 2)

(k − 1)!

=p(p − 1) . . . (p − k + 2)

k![(p − k + 1) + k]

=(p + 1)p(p − 1) . . . (p − k + 2)

k!

=

(

p + 1

k

)

.

16.2 The Real Binomial Theorem

Theorem 16.2.1 (The Binomial Expansion). Let p be a real number. Then

(1 + x)p =∞

∑

k=0

(

p

k

)

xk for all |x| < 1.

Note that the coefficients are non-zero provided p is not a natural number or zero; as wehave a proof of the expansion in that case we may assume that p 6∈ N ∪ {0}.

Lemma 16.2.2. The function f defined on (−1, 1) by f(x) := (1+x)p is differentiable, andsatisfies (1 + x)f ′(x) = pf(x). Also, f(0) = 1.

Proof. The derivative is easily got by the chain rule from the definition of f ; it is f ′(x) =p(1+x)p−1. Multiply by (1+x) and get the required relationship. The value at 0 is clear.

Lemma 16.2.3. The radius of convergence of∑∞

k=0

(

pk

)

xk is R = 1.

Proof. Use the ratio test; we have that “|ak+1xk+1/akx

k|” is

∣

∣

∣

∣

p · (p − 1) · · · · · (p − k)

(k + 1) · k · (k − 1) · · · · · 1 · k · (k − 1) · · · · · 1p · (p − 1) · · · · · (p − k + 1)

x

∣

∣

∣

∣

=

∣

∣

∣

∣

p − k

k + 1x

∣

∣

∣

∣

→ |x| as k → ∞.

Lemma 16.2.4. The function g defined on (−1, 1) by g(x) =∑∞

k=0

(

pk

)

xk is differentiable,with derivative satisfying (1 + x)g′(x) = pg(x). Also, g(0) = 1.

65

Proof.

(1 + x)g′(x) = (1 + x)∞

∑

k=0

(

p

k

)

kxk−1

= (1 + x)∞

∑

k=1

(

p

k

)

kxk−1

= p(1 + x)∞

∑

k=1

(

p − 1

k − 1

)

xk−1

= p

{

∞∑

k=1

(

p − 1

k − 1

)

xk−1 +∞

∑

k=1

(

p − 1

k − 1

)

xk

}

= p

{

∞∑

m=0

(

p − 1

m

)

xm +∞

∑

m=1

(

p − 1

m − 1

)

xm

}

= p

{

1 +∞

∑

m=1

(

p − 1

m

)

xm +∞

∑

m=1

(

p − 1

m − 1

)

xm

}

= p

{

1 +∞

∑

m=1

[(

p − 1

m

)

+

(

p − 1

m − 1

)]

xm

}

= p

{

1 +∞

∑

m=1

(

p

m

)

xm

}

= p∞

∑

m=0

(

p

m

)

xm

= pg(x).

Proof of the Binomial Theorem. Consider φ(x) =g(x)

f(x), which is well-defined on (−1, 1) as

f(x) > 0. By the Quotient Rule we can calculate φ′(x), and then use the lemmas:

φ′(x) =f(x)g′(x) − f ′(x)g(x)

f(x)2=

p

1 + x

f(x)g(x) − f(x)g(x)

f(x)2= 0.

Hence by the Identity Theorem, φ(x) is constant, φ(x) = φ(0) = 1. This implies thatf(x) = g(x) on (−1, 1).

66

16.3 The end points: preliminary issue

This section will probably be omitted from the lectures.

The existence of these functions and their equality at the end points requires more sophisticatedargument. The following sections should be viewed as illustrations of the way Taylor’sTheorem can be exploited, rather than theorems to be learnt.

The cases x = 1 or x = −1 need to be considered separately. But there is a difference betweenthese! For x = −1 we have not yet defined (1 + x)p; recall our definition for arbitrary real p, that(1 + x)p := exp p log(1 + x). For integral p (such as p = 0) we have the usual algebraic definition,which is consistent with the exp log definition when both apply. Can we define 0p sensibly for anyother values of p? For p > 0 we’d clearly like to define 0p = 0. But if we do so, then to preserve therule of exponents ApAq = Ap+q we cannot define negative powers; if p > 0 then 0−p makes no sense.

So let us extend out definition of (1 + x)p in this way, in the case when p > 0.

But we need to take care.

Lemma 16.3.1. If p > 0 then the function (1 + x)p is continuous on [−1,∞).

Lemma 16.3.2. If p > 1 then the function (1 + x)p is differentiable on [−1,∞) with derivativep(1 + x)p−1.

Proofs. Exercises.

16.4 The end points: p 6 −1

Let p 6 −1. Then as remarked above, the function (1 + x)p is not defined at x = −1. Further theexpansion is not vaild at x = 1:

Proposition 16.4.1. The series∑

∞

k=0

p · (p − 1) · · · · · (p − k + 1)

k · (k − 1) · · · · · 1 is divergent.

Proof. Write q = −p > 1; then the modulus of the k-th term∣

∣

∣

∣

p · (p − 1) · · · · · (p − k + 1)

k · (k − 1) · · · · · 1

∣

∣

∣

∣

=

∣

∣

∣

∣

(−1)k q

1· · · q + s

s + 1· · · q + k − 1

k

∣

∣

∣

∣

> 1;

the terms alternate in sign but as they do not tend to 0 the series diverges.

16.5 The end points: −1 < p < 0

Let −1 < p < 0; note that p + 1 > 0. Again the function (1 + x)p is not defined at x = −1. However,now the expansion is valid at x = 1:


∞

k=0

p · (p − 1) · · · · · (p − k + 1)

k · (k − 1) · · · · · 1 is convergent with sum 2p.

Proof. We apply Taylor’s Theorem to (1 + x)p on the interval [0, 1] and find, for each n > 1, a pointξn ∈ (0, 1) such that

2p =n−1∑

k=0

p · (p − 1) · · · · · (p − k + 1)

k · (k − 1) · · · · · 1 + En where En =p · (p − 1) · · · · · (p − n + 1)

n · (n − 1) · · · · · 1 (1 + ξn)p−n.

67

We have then that

|En| 6

∣

∣

∣

∣

p · (p − 1) · · · · · (p − n + 1)

n · (n − 1) · · · · · 1

∣

∣

∣

∣

;

and we will have the result if we prove that this tends to 0 as n → ∞. We rewrite the part dependingon n as

∣

∣

∣

∣

[(p + 1) − 1] · · · · · [(p + 1) − s] · · · · [(p + 1) − n]

1 · 2 · · · · · n

∣

∣

∣

∣

=

(

1 − p + 1

1

)

·(

1 − p + 1

2

)

. . .

(

1 − p + 1

s

)

. . .

(

1 − p + 1

n

)

.

Now exp(−x) + x − 1 has positive derivative on (0, 1) so by the MVT we have that

(

1 − p + 1

s

)

6 exp

(

−p + 1

s

)

so that

|En| 6 exp

(

−(p + 1)n

∑

s=1

1

s

)

.

As the harmonic series diverges and (p + 1) > 0, we get that En → 0 as n → ∞.

16.6 The end points: 0 < p

Let 0 < p. In this case the expansion is valid at x = 1 and x = −1.


∞

k=0

p · (p − 1) · · · · · (p − k + 1)

k · (k − 1) · · · · · 1 is convergent with sum 2p; and

the series∑

∞

k=0

p · (p − 1) · · · · · (p − k + 1)

k · (k − 1) · · · · · 1 (−1)k is convergent with sum 0.

Proof. The end point x = +1 is straightforward; use Taylor’s Theorem as before and consider theerror estimate

En =p · (p − 1) · · · · · (p − n + 1)

n · (n − 1) · · · · · 1 (1 + ξn)p−n

for some ξn ∈ (0, 1). Then

|En| 6p

n

∣

∣

∣

∣

(p − 1) · · · · · (p − n + 1)

1 · 2 · · · · · (n − 1)

∣

∣

∣

∣

2p

1n.

Now

∣

∣

∣

∣

p − s

s

∣

∣

∣

∣

6 1 whenever 2s > p; so we get that

|En| 6p

n

∣

∣

∣

∣

∣

(p − 1) · · · · · (p −[

p

2

]

)

1 · 2 · · · · · ([

p

2

]

)

∣

∣

∣

∣

∣

2p

1n→ 0 as n → ∞

as required. The end point x = −1 is more difficult. What we do is prove that the sum con-

verges. Noting that as soon as k > p + 1 all the terms have the same sign, we see that this meanswe have proved that the series is absolutely convergent. Now by the properties of power series∑

∞

k=0

p · (p − 1) · · · · · (p − k + 1)

k · (k − 1) · · · · · 1 xk is absolutely convergent on (−1, 1). In particular we have that

68

the series is absolutely convergent on the closed interval [−1, 0]. Hence the series is uniformly con-vergent on that interval; and so the series is continuous on [−1, 0]. As the series is equal to (1 + x)p

on (−1, 0] we have by continuity that there is equality at −1 as well.

So we must prove that the series converges. We claim that if we can prove this for any p then we can

prove it for (p + 1). This is because for all n > 2p + 2 we have that

∣

∣

∣

∣

p + 1

p − n + 1

∣

∣

∣

∣

6 1; this allows us

to compare the n-th terms and see that those for (p + 1) are smaller in modulus. As both series areultimately the series of terms of constant sign, the comparison test will yield that convergence for pyields convergence for (p + 1). So assume from now that 0 < p < 1; it will suffice to deal with thiscase.

The modulus of the n-th term can then be written

|un| =p

n

(

1 − p

1

)

. . .(

1 − p

s

)

. . .

(

1 − p

n − 1

)

and so, using again (1 − t) 6 exp(−t), we have that

|un| 6p

nexp

(

−p

n−1∑

s=1

1

s

)

=p

nexp

(

−p

(

n−1∑

s=1

1

s− log n

))

exp (−p(log n))

=p

n

1

npexp

(

−p

(

n−1∑

s=1

1

s− log n

))

.

Now we have (Integral Test argument) that

n−1∑

s=1

1

s− log n → γ as n → ∞ (γ is Euler’s constant).

Hence we have a constant C such that

|un| 6 C1

n · npfor sufficiently large n,

and so, by the Comparison Test,∑ |un| converges.15

15P

1ns is convergent for s > 1 by the Integral Test.

69

oxford notes on limits and cont

Documents