executive summary: calculus -...

Executive Summary: Calculus 1 / 81

Executive Summary: Calculus

Paul E. Johnson

August 10, 2015


What is Calculus for?

1. Vocabulary of Relationships.

Calculus is the studyof curves, change, andaccumulation.

Calculus gives uswords to describerelationships.

a)

f (x)

X

Y

b) X

Y

f (x)

c) X

f (x)

Y

d) X

f (x)

Y



2. Optimization (Find Top or Bottom)

An optimization problem usually has 3 parts.

1 An objective function, say f .

2 Although we often write about f (x) as a matter of habit, that’smisleading and probably confusing to most new students. Almost allstatistical calculations take the data, usually called x , as fixedquantities, and instead they adjust coefficients β or θ (or whatever).But I write about f (x) as well, just because it is traditional.

3 Constraints that restrict the extent to which we can adjust theinputs.

Finding the top of a hill is a primary objective in math & engineeringafter high school.



Sometimes, We Wander With Purpose

Choosing the horizontal variableto maximize the vertical.

We are getting “higher” if theresult is increasing

At the top, we are at a pointwhere we can’t move ineither direction withoutreducing the outcome.

This is differential calculus!.

An important characteristicof an optimum is that theslope of the objectivefunction is 0 at the optimumpoint. We are at the top ofthe hill. 0 5 10 15 20 25 30

05

1015

2025

30

Adjust left and right to find the maximum

function: wanderObjective

to the top

warmer

warmerYou are

some hints!

hot

warmer

colder

colder

I'll give you

getting



Why is that so Difficult?

Some functions are easy to optimize& visualize

Some are difficult, even with acomputer

Some are analytically solvable, sono numerical approximation isneeded

But most of “maximum likelihood”and advanced stats are not“solvable”

b̂1 b̂2

Tangent plane

When an optimum point is found, it has the property that the slope ofthe function at that point is 0.



3: Understanding Diversity

This is a probability modelfor 40 yard dash times

Very rare to find a 300pound man who can run 40yards in 5 seconds or less.

How rare is it? IntegralCalculus is used to find thearea of the shaded area.

4 6 8 10 12

0.0

00

.05

0.1

00

.15

0.2

00

.25

0.3

0

40 Yard Dash Times of 300 Pound Men (seconds)


Necessary Terminology

Real Numbers, R

Does it go without saying that we are studying the “real numbers”?

Includes all fractions, decimal numbers, integers, and 0Familiar to most school children!

Definition

The Real Numbers

Math book will give a definition about a closed group of values, a setwhere we can add and multiply the members and get back values that arealso in the same set. A set of values (x1, x2, x3,)

1 x1 − x2 and x1 + x2 are in R1 x1 · x2 and x1/x2 are in R2 There exists a value 1 such that 1 · x1 = x1

3 There exists a value 0 that can nullify any value:0 · x1 = 0.



Cartesian Plane

Cartesian plane: All pairs you can get bychoosing 2 numbers from R.

A point is an ordered pair that isrepresented by a dot in the plane.

Anticipate confusion because various fieldsre-name the axes. In political sciencerational choice models, we often refer tothem as X1 and X2 and points are (x1, x2),(y1, y2) and so forth. This notation maybe nicer because, in a 10 dimensionalmodel, it is easier to remember(x1, x2, . . . , x10) than to give special lettersfor every dimension.



Critical Terminology

Critical point value of input atwhich the outputis either at the topof a peak or in thebottom of a valley.

Local vs Global Maximum.

0 5 10 15 20 25 30

05

10

15

20

25

30

Input

Outp

ut

Critical

Points

Local Maximum

Inflection

Point

Global Maximum

Minimum

Global



We study Relationships

concavity: curvature

“concave down”

“concave up”

Important because we useconcavity to knowif we are at aminimum or amaximum.

f (x)

X

Y

X

Y

f (x)



Detour: Convex Combinations

Concave down means“peaked”, generally.

A more formal definitiontakes us into a detour on theidea of convex combination.

Consider a weightingcoefficient λ that varies froma to b

A convex combination of twopoints is

λa + (1− λ)b

a b

The convex combination

λa+ (1 − λ)b

λ = 1





λa + (1− λ)b

Please notice the beautifulillustration, which tookabout 5 hours of fiddlingabout

a b


λa+ (1 − λ)b

λ = 1

λ = 0.7

0.7a+ (1 − 0.7)b





λa + (1− λ)b


a b


λa+ (1 − λ)b

λ = 1

λ = 0.7

0.7a+ (1 − 0.7)b

λ = 0.3

0.3a+ (1 − 0.3)b





λa + (1− λ)b


a b


λa+ (1 − λ)b

λ = 1

λ = 0.7

0.7a+ (1 − 0.7)b

λ = 0.3

0.3a+ (1 − 0.3)b

λ = 0



Convex Combination in 2 Dimensions

Points are pairs, like (x1, y1),and (x2, y2)

In other things I’ve written,I’ve used notation like(x1, x2), (y1, y2), (z1, z2), solets be cautious aboutconfusing the presenter.

Calculate the value of thecombination

λ(x1, y1) + (1− λ)(x2, y2)

= (λx1 + (1− λ)x2, λy1 + (1− λ)y2)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0 Calculation of

λ(x1, y1) + (1 − λ)(x2, y2)Traces a line from (x1, y1)

●

●

(x1, y1)

(x2, y2)

λ = 1




λ(x1, y1) + (1− λ)(x2, y2)

= (λx1+(1−λ)x2, λy1+(1−λ)y2)

As λ varies from 1 to 0, thecombination “traces” astraight line between the twopoints

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0 Calculation of

λ(x1, y1) + (1 − λ)(x2, y2)Traces a line from (x1, y1) to (x2, y2)

●

●

(x1, y1)

(x2, y2)

λ = 1λ = 0.7




λ(x1, y1) + (1− λ)(x2, y2)

= (λx1+(1−λ)x2, λy1+(1−λ)y2)

As λ varies from 1 to 0, thecombination “traces” astraight line between the twopoints

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0 Calculation of

λ(x1, y1) + (1 − λ)(x2, y2)Traces a line from (x1, y1) to (x2, y2)

●

●

(x1, y1)

(x2, y2)

λ = 1λ = 0.7

λ = 0.4




λ(x1, y1) + (1− λ)(x2, y2)

= (λx1+(1−λ)x2, λy1+(1−λ)y2)

As λ varies from 1 together0, the combination “traces” astraight line between the twopoints

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0 Calculation of

λ(x1, y1) + (1 − λ)(x2, y2)Traces a line to (x2, y2)

●

●

(x1, y1)

(x2, y2)

λ = 1λ = 0.7

λ = 0.4

λ = 0



Convexity of Sets Defined Similarly

A convex set has to be roundor egg shaped. It can’t haveany lacunae or kidneyshapes.

Formally, we say any theconvex combination of anypair of points in the set liesentirely within the set.

0 2 4 6 8 10

02

46

810

A Convex Set

X

Any convex combination

(λxi + (1 − λ)xj, λyi + (1 − λ)yj)belongs to X

(x1, y1)

(x2, y2)

●

●

(x3, y3)

(x4, y4)

●

●

(x5, y5)

(x6, y6)

●

●



And We Redefine Concave Down

A function is concave down ifwe can select any two pointson the function’s image andthe line connecting them isentirely below the function

Phrase that as convexcombination of points.

A Concave Down Function

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

All line segments are below f(x)

●

●

●

●●

●



We study relationships.

Inflection Point: where concavitychanges.

0 5 10 15 20 25 30

05

10

15

20

25

30

Input

Outp

ut

Critical

Points

Local Maximum

Inflection

Point

Global Maximum

Minimum

Global



Continuity

Continuous: informal def: Candraw withoutlifting pencil frompaper

Note how the value of thefunction “hops” at one point.

The open loop indicates that thevalue of thefunction f (x) isundefined.

“Pinholes” like this interfere withanalysis, we (usually) assumefunctions are continuous.

X

f (x)

Y



Notation: Open & Closed Intervals

Closed Interval: [0, 1] all real values between 0 and 1, INCLUDING 0and 1

Open Interval: (0, 1) all real values between 0 and 1, NOTINCLUDING 0 and 1

As long as we know a function’s domain is closed, then we know (forsure!) that the function has a well defined maxima and minima

On open intervals, minimum may not exist: pick the smallest valueinside (0, 1). Undefined!


Differential Calculus

The Big ideas

Outline

1 What is Calculus for?

2 Necessary Terminology

3 Differential CalculusThe Big ideasDerivatives I can Remember

4 OptimizationFirst Order ConditionsSecond-order conditionsMultivariate

5 Integration

6 Conclusion



The Big ideas

We are Aiming to Understand This

0 5 10 15 20 25 30

05

10

15

20

25

30

Slope of this tangent line is 0

Slope > 0

Slope < 0

Objective

x

functionf (x)



The Big ideas

Slope of a Straight Line

Comparing two points (x1, y1)and (x2, y2):

differencehorizontally∆x = x2 − x1.

difference vertically∆y = y2 − y1.

Slope is the ratio of the twochanges.

slope =∆y

∆x=

y2 − y1

x2 − x1(1)

“the rise over the run.”

0 5 10 15 20 25 30

05

10

15

20

25

30

X

Y

is the same,

no matter how

you pick

The slope

or

(x1,y1)

∆x

∆y(x2,y2)

x1

x1 x2

y1

y2

x2



The Big ideas

Slope is a Local Concept

A “local” property is a property thatis not “true everywhere.” It is not a“global” property.

The calculation of the slopedepends on “where you are,” eitheron the left or the right of the breakpoint.

0 5 10 15 20 25 30

05

10

15

20

25

30

X

Y

$y_1$

y2

y3

∆y

∆x(x2,y2)

x1 x2 x3

(x1,y1)



The Big ideas

Derivative: the slope of a “smooth curve”.

Consider the smoothcurve.

1 Start at x1

2 Step to x2

3 Stem to x3

0 5 10 15 20 25 30

05

10

15

20

25

30

(x2, y2)

(x1, y1)

(x3, y3)

x x3 x2 x1

y3

y2

y1

(x , y)y

f (x)



The Big ideas

Approaching the tangent

Keep making

But

converges to the slope

and

smaller and smaller

shrinks.

at

of the line

that is tangent to

(x , y)

∆y∆x

∆x

∆y

f (x) (x , y) (x5, y5)

(x4, y4)

(x3, y3)

“x” is a particularpoint and valuesx1, x2, x3 get closerand closer to x .

As xi gets closer to x ,the slope of thedotted line, ∆y

∆x , getssteeper.



The Big ideas

Derivative Definition

The Derivative is the value of ∆y∆x when ∆x shrinks to nothing.

The dotted line–the tangent line–is “just barely” touching f (x).The derivative is, formally speaking, the slope of the tangent line.

Very Important: This is a smooth, “differentiable” function. It is notjagged, there are no gaps.



The Big ideas

Optimization from a Derivative’s point of View

0 5 10 15 20 25 30

05

10

15

20

25

30

Slope of this tangent line is 0

Slope > 0

Slope < 0

Objective

x

functionf (x)

Optimization: Find a point at which thetangent line’s slope is 0



The Big ideas

Various Notations for Derivatives

One classic notation is:

df (x)

dx(2)

or y = f (x) then:

dy

dx(3)

More succinct: “f prime of x”, as in:

f ′(x) (4)

My engineering friends like:

Df (x) (5)



The Big ideas

Detour: Limits

Derivatives cannot be defined unless we have the tool known as “thelimit”.

Consider a function f (x)

As x tends to infinity, what doesf (x) tend towards?It tends toward 4.0, but it never“quite gets there”

As x tends to 0, f (x) tends to −∞,but it never “gets there”.

For 0 < x <∞, f (x) can becalculated. It is finite.

for x = 50, f (x) is defined, 3.894 x

y

0 100 200 300 400 500

3.0

3.5

4.0

4.5

5.0

(x1 = 50, y1 = 3.894)

y = 4 − 5.3× (1 x)



The Big ideas

Limit Definition

Notation for limits. When x is tending to any value x0 is

limx→x0 f (x)

The arrow → is pronounced “goes to” or “approaches”.

Easy Case: if f (x0) exists (is defined, finite) that is the limit.

More Difficult: f (x0) may not exist, but a limit may still exist.



The Big ideas

As x goes to ∞, 1/x goes to...

1/x gets smaller as x gets larger, butnever reaches 0. However

limx→∞1

x= 0

x

y

0 2 4 6 8 100

24

68

10

f(x) = 1 xlimitx→∞ = 0



The Big ideas

Derivative: Formal Definition

Consider a continuous function f (x).

1 Add “just a bit” to x , we arrive at x + ∆x .

2 Resulting change in y :

∆y = f (x + ∆x)− f (x) (6)

3 The derivative–is defined only if the following limit exists:

lim∆x→0∆y

∆x= lim∆x→0

f (x + ∆x)− f (x)

∆x(7)



The Big ideas

Linearity: A Vital Property of Derivatives

In some optimization problems in statistics, we are trying to find thesmallest sum of squared errors or to maximize the likelihood of asample by adjusting 10s or 100s of coefficients.

If we can manipulate the formula so that it is a sum of terms, wecan get the separate derivatives and simplify the work.

Additivity: The derivative of the sum is the sum of the derivatives.

For functions f and g , their combined slope breaks down to the sum ofthe separate slopes.

d

dx{f (x) + g(x)} =

d

dxf (x) +

d

dxg(x) = f ′(x) + g ′(x) (8)



The Big ideas

Scaling Factors

Also in statistics, we have data that is rescaled, from Euro toDollars, Fahrenheit to Celsius, and so forth.

It is enormously simplifying to know that

Proportionality: the derivative of a · f (x) is a times the derivative of f .

d

dxa · f (x) = a · d

dxf (x) = a · f ′(x) (9)



The Big ideas

Break Apart Complicated Problems

Complicated expressions can be “broken down”

d

dx{a · f (x) + b · g(x)} = a

d

dxf (x) + b

d

dxg(x) (10)



Derivatives I can Remember

Outline





5 Integration

6 Conclusion




Powers of x.

1. Slope of line.If

y = b · x (11)

then (is this too obvious?)

dy

dx= b (12)

Digression: Think backwards for a minute. You usually think of b asa constant, but change gears for a minute to noticedydb = x




Powers of x.

2a. Slope of x squared. Ify = x2 then

dy

dx= 2x (13)

x

y

−4 −2 2 4

−6

−4

−2

24

6

y = x2

●

●(− 2, 4)

●(1, 1)

●(2, 4)

tanget at x = 1 has slope = 2

●(− 1.25, 1.5625)tanget at x = −1.25

has slope = −2.5

The slope of x2 is 2x




Another view of that

x

y

−4 −2 2 4

−6

−4

−2

24

6

y = x2

●

●(− 2, 4)

●(1, 1)

●(2, 4)


●(− 1.25, 1.5625)tanget at x = −1.25

has slope = −2.5


x

y

−4 −2 0 2 4

−6

−4

−2

12

46

y = x2

slope of curve is 2x (the red line)

●

●




We worked so hard on that one figure...

x

y

−4 −2 2 4

−6

−4

−2

24

6

y = x2

●

●(− 2, 4)

●(1, 1)

●(2, 4)


●(− 1.25, 1.5625)tanget at x = −1.25

has slope = −2.5


slope of curve is 2x (the red line)

●

●




Proof of d(x2)/dx = 2x

That is one of the easiest ones to prove and “really believe.” Use thedefinition:

dy

dx= lim∆x→0

(x + ∆x)2 − x2

∆x(14)

= lim∆x→0x2 + 2x∆x + (∆x)2 − x2

∆x(15)

= lim∆x→02x∆x + (∆x)2

∆x(16)

= lim∆x→0(2x + ∆x) ·∆x

∆x(17)

= lim∆x→02x + ∆x (18)

= 2x (19)

Anyway, I believe in that, and take a lot of the rest on faith (insert smileyface here, please).




More Powers of x.

2b. Slope of a cube. Ify = x3

thendy

dx= 3x2

2c. Slope of y = x4.dy

dx= 4x3

2d. Slope of y = x−1 = 1/x

dy

dx= −1x−2

2e. Slope of√

x = x1/2

dy

dx=

1

2x−1/2




More Powers of x. ...

You start to notice a general pattern? Here’s the most importantderivative rule:

d

dxxN = N · xN−1 (20)

That is true, whether N is a whole number or a fraction.




Logarithms

3. The natural logarithm.

d

dxln(x) =

1

x(21)

If you have a log to a differentbase, say log10(), the derivativeinvolves a “constant ofproportionality.”

d

dxlogb(x) =

1

ln(b)· 1

x(22)

0 2 4 6 8

−1

01

23

x

y

● Slope at 1 is 1

● Slope at 0.25 is 4

● Slope at 3 is 1/3

●

Slope at 6 is 1/6

y = ln(x)

dy

dx= 1 x




Exponentials

Recall Euler’s constant, e, the base of the natural logarithm.

ex = exp(x) (23)

4. The derivative is:

d

dxex =

d

dxexp(x) = exp(x) (24)

In other words, you “get the same thing back”.As in the case of the logarithm, if you are taking powers of some numberbesides e then a constant of proportionality enters the picture. The booksays

d

dxbx = ln(b) · bx (25)

Note that ln(e) = 1, so this is consistent if e is the base.




Derivative of a product

5. Product rule

d

dx{g(x) · h(x)} =

d

dxg(x) · h(x) + g(x)

d

dx· h(x) (26)

or, if you like the prime notations,

d

dx{g(x) · h(x)} = g ′(x)h(x) + g(x) · h′(x)




Function of a function

6. The chain rule states that:

d

dx{f (g(x))} =

df

dx|g(x) ·

dg(x)

dx(27)

That’s the derivative of f (x) calculated at the location given by the valueg(x), multiplied by the derivative of g(x). Confusing enough? Probably.Suppose, for example, you had

g(x) = x2

andf (x) = ln(x)

so

f (g(x)) = ln(x2)

d

dxf (g(x)) =

1

x2· 2x


Optimization

First Order Conditions

Outline





5 Integration

6 Conclusion


Optimization


Find where the Derivative equals 0

In many statistical problems, we can exactly solve to find theoptimal parameter estimates.

This is done by finding a solution for the derivative–the “first ordercondition”–and doing some followup checking.


Optimization


Peaks And Valleys

We suppose we are adjusting aparameter, the thing we have beencalling x in these notes. We find a pointwhere

f ′(x) = 0 (28)

it means we have found the exact “topof the hill.” (Yeah!)Or in the “bottom of the bowl.” (Booh!)

x

y

Maximum: slope tangent line = 0

Minimum: slope tangent line = 0


Optimization

Second-order conditions

Outline





5 Integration

6 Conclusion


Optimization


Second Derivative

Have found a maximum or minimum?

Second Derivative: The derivative of the first derivative. How much isthe slope changing?

Literally, that would bed

dx

{dy

dx

}but they usually don’t write it out like that.

Meaning you start at a critical point x and go an “itty bitty” amountto the right, how much does the slope change.

Notation:

f ′′(x) = D2f (x) =d2y

dx2=

d2f (x)

dx2(29)

I was trained by people who prefer this kind of notation, f ′′(x). But theother is nice too


Optimization


Telling Your Hill from Your Bowl (Minima and Maxima)

Minima: If the slope is gettingbigger around xmin, thatmeans the impact of x is“accelerating”. You are ina bowl.

If f ′(x) = 0 and f ′′(x) > 0, then xis a local minimum. f is “concaveup” at that point.

Maxima: If the slope is gettingsmaller around xmax , youare heading downward atthe top.

If f ′(x) = 0 and f ′′(x) < 0, then xis a local maximum. f is “concavedown” at x .

−3 −2 −1 0 1 2 3

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

x

y

●

●


Optimization

Multivariate

Outline





5 Integration

6 Conclusion


Optimization

Multivariate

Calculus with many variables

Suppose we have 3 input variables, x1, x2, and x3.

y = f (x1, x2, x3)

A partial derivative is the change in f (x1, x2, x3) that results when all ofthe variables are being held constant except one. The most commonnotations for the partial derivative “with respect to x1” are

∂y

∂x1(30)

or:f1(x1, x2, x3)

Note the subscript indicates the variable under consideration


Optimization

Multivariate

m First Order Conditions

f (x1, x2, x3) has 3 variables. That means it has 3 first order conditions.We require that ALL OF THE PARTIAL DERIVATIVES equal 0. That is,simultaneously solve

∂y

∂x1= 0

∂y

∂x2= 0

∂y

∂x3= 0

Note, you could as well think of this as a vector of derivatives:

Df =

∂y∂x1∂y∂x2∂y∂x3

= 0 (31)


Optimization

Multivariate

m First Order Conditions ...

One tires quickly of writing down 3 rows of derivatives over and over, soone often just refers to this condition for a maximum or minimum asDf = 0.It is very common in maximum likelihood analysis to call this the “scoreequation”.


Optimization

Multivariate

Second order conditions

This is the point at which calculus with many variables turns to hellbecomes frustrating.It is easy to calculate a second partial derivative of f ()with respect to x1

∂2y

∂x1∂x1=

∂2y

∂2x1

And one can also find the “cross partial” of ∂y∂x1

with respect to anothervariable, say x2.

∂2y

∂x1∂x2

I prefer short hand notation like f11() or f12() for these.


Optimization

Multivariate

Second order conditions ...

Suppose you begin with the vector of first partials. Differentiate eachitem by each of the 3 variables. You end up a 3x3 matrix of secondpartial derivatives like so:

D ′ =

f11 f12 f13

f21 f22 f23

f31 f32 f33

=

∂2y

∂x1∂x1

∂2y∂x1∂x2

∂2y∂x1∂x3

∂2y∂x2∂x1

∂2y∂x2∂x2

∂2y∂x2∂x3

∂2y∂x3∂x1

∂2y∂x3∂x2

∂2y∂x3∂x3

(32)

This is the so-called Hessian matrix.In practical applications, we find often that

1 Time consuming to calculate the Hessian

2 Calculations are numerically unstable because of digital computinglimitations

3 The matrix has some flaw which indicates that we can’t tell if we areat a maximum or not. Software will return an error about theHessian matrix being “non-positive definite.” That is to say, we can’tsay.


Optimization

Multivariate

Positive Definite Hessians (or the lack thereof)

I have not done a lot of this kind of work, so take this as a “back of theenvelope sketch.”Here’s the idea of a “positive definite” matrix. We are “at” (x1, x2, x3), asindicated by the first order condition.We’ve got the second derviative matrix

D ′ =

f11 f12 f13

f21 f22 f23

f31 f32 f33

Take a small perturbation vector z. It is supposed to represent a smallchange from the location dictated by the first order condition. Calculatethe quantity:

[z1, z2, z3]

f11 f12 f13

f21 f22 f23

f31 f32 f33

z1

z2

z3

(33)


Optimization

Multivariate

Positive Definite Hessians (or the lack thereof) ...

There’s a theorem that says:If z ′D ′z > 0, then D ′ is a positive definite matrix. Congratulations, youhave a minimum.If z ′D ′z < 0, then D ′ is a negative definite matrix. Its a maximum!This is similar to the univariate case, where f ′′(x) < 0 means you have amaximum.


Integration

Horrible Scary S symbol

The elongated S symbol representsintegration.

ˆ b

a

f (x)dx

This means “the total area under thecurve f (x) between a and b.”

The symbol dx represents the“dummy variable of integration.” Itis a signal that you are supposed tomove along the x axis when yousum up from a to b.

a b x

f(x)

y

Area under f(x) between a and b is the integral


Integration

Easy: Area under a flat curve

You always drive 50 miles per hour. After 2 hours, you have gone2× 50 = 100 miles.

Time (hours)

Spe

ed (

mile

s pe

r ho

ur)

0 5 10 15 20

010

2030

4050

60

f(t)

Distance traveled is the areaunder the curve.time × speed (mph).


Integration

Easy: Area under a flat curve

If you always drive exactly 50 miles per hour, the distance traveled, F(t),is easy to calculate.

Time (hours)

Spe

ed (

mile

s pe

r ho

ur)

0 5 10 15 20

010

2030

4050

60

f(t)

Time (hours)

Dis

tanc

e T

rave

led

(mile

s)

0 5 10 15 200

200

400

600

800

250 ●

500 ●

750 ●

1000 ●

F(t)

We know, for example, that at time 10, the distance traveled is 500miles. And it is also easy to see that the distance traveled between hours5 and 10 is F (10)− F (5) = 250.


Integration

Not So Easy: Variable Speeds

Time (hours)

Spe

ed (

mile

s pe

r ho

ur)

0 5 10 15 20

020

4060

80

f(t)

Speed, f (t), is not constant.


Integration

Variable Speed

Time (hours)

Spe

ed (

mile

s pe

r ho

ur)

0 5 10 15 20

020

4060

80

f(t)

Time (hours)

Dis

tanc

e T

rave

led

(mile

s)

0 5 10 15 20

020

040

060

080

0

249.07 ●

446.02 ●

543.61 ●

760.03 ●

F(t)

The distance traveled after 10 hours, F (10), is 446.02, and we can seethat the trip’s progress from 5 to 10 hours, 446.02− 249.07 = 196.95miles, is quite a bit better than the progress between 10 and 15 hours,543.61− 446.02 = 97.59 miles.


Integration

Between Hours 5 and 10

Time (hours)

Spe

ed (

mile

s pe

r ho

ur)

0 5 10 15 20

020

4060

80

f(t)

The distance traveled between hours 5and 10. Using the integral notation,

ˆ 10

5

f (t)dt = 196.95 (34)

And that area is equal toF (10)− F (5).


Integration

I Just Snuck In the FTOC (without you noticing)

The functions f (t) and F (t) arelinked together. For points a andb,

ˆ b

a

f (t)dt = F (b)− F (a) (35)

The accumulator function F (t) isthe key.

The Fundamental Theorem ofCalculus states that the previousexpression is true if F (t) isdifferentiable and

dF (t)

dt= F ′(t) = f (t)

b

The areawe want tomeasure is

removed from the part to the left

f (t)

the part to the left of a is

of b

F (b)− F (a)

F (b) is the

what’s left after

a

F (a) is the area

area on the left of b

area on the left of a


Integration

I’m Unlikely to Prove That

But I can wave my handswildly

The differencebetween the “areaunder the curve” andthe “area of therectangle below it” arecompared.

Outer edges: a andb = a + ∆t.

The area of therectangle (∆t · f (b))is easily seen to besmaller than the areaunder the curve.

Area of rectangle is

under curve and area of

a b = a + ∆t

f (a)

∆t · f (b)

f (b) f (t)

Mismatch between area

rectangle shrinks as ∆t shrinks

∆t


Integration

Could Adjust the Area Measurement

Suppose we wanted to leavea and b far apart and use anapproximating rectangle.

We could! Just “top” therectangle at the correct spot(known as the “Mean ValueTheorem”)

Goldilocks

If f (a) gives a rectangle thatis “too big,” and

if f (b) gives a rectangle thatis too small,

f (c) is just right!

Area of rectangle is

b = a + ∆t

f (a)

∆t · f (a)

f (b) f (t)

∆t

f (c)

a


Integration

Make ∆t→ 0.

As ∆t → 0, All three of the rectangles would converge to the samesize. See?

b → a and

c → a.

lim∆t→0f (c) = f (a)

lim∆t→0f (b) = f (a).

Now consider the “just right” area measurement:

F (a + ∆t)− F (a) = f (c)×∆t (36)

Divide both sides by ∆t.


Integration

Make ∆t→ 0. ...

F (a + ∆t)− F (a)

∆t= f (c) (37)

The left hand side is starting to look like a derivative, so take limits.

lim∆t→0F (a + ∆t)− F (a)

∆t= lim∆t→0f (c) (38)

dF (a)

dt= f (a)

In words, the derivative of F (t) at point a equals the value of f (t)at a.

If we only had F (t), life would be sweet!

And we do, sorta...

F (t) is the function which, when differentiated, would be equal tof (t). The slope of F (t) is f (t). For this reason, F (t) is known as an“anti-derivative” of f (t).


Integration

If we were doing this numerically

We sometimes have no analyticalmethod to calculate F (x), especially inmultiple dimensions. So there is a highpriority on developing numericalmethods that can approximate.

Time (hours)

Spe

ed (

mile

s pe

r ho

ur)

0 5 10 15 200

2040

6080

f(t)


Integration

Finer Rectangles = More Calculation

Waste more computing time, get abetter answer.

This “brute force” numericalapproach only works inlow-dimensional problems

Advanced optimization jargon inyour future

Gaussian quadrature: smart waysto choose the rectangle positionsMonte Carlo math: estimate areaby random sampling (GHKalgorithm)

Time (hours)

Spe

ed (

mile

s pe

r ho

ur)

0 5 10 15 200

2040

6080

f(t)


Conclusion

Key ideas: Derivative

There are many details, butthe essence of the differentialcalculus boils down to this.

Critical Point

Derivative: where slope is 0

Local Maximum

Second Order Conditions: Are weat the top of a hill?

0 5 10 15 20 25 30

05

10

15

20

25

30

Input

Outp

ut

Critical

Points

Local Maximum

Inflection

Point

Global Maximum

Minimum

Global


Conclusion

Key ideas: Integrals

Possible to calculate(analytically or numerically)the area under a curve.

Analytically, theFundamental Theorem ofCalculus gives conditionsunder which the area can becalculated exactly, withoutnumerical approximation.

Many applications inprobability have analyticalsolutions.

In practice, the models wewant to solve are generallynot analytically solvable, socomputer approximation isvital.

4 6 8 10 12

0.0

00.0

50.1

00.1

50.2

00.2

50.3

0

40 Yard Dash Times of 300 Pound Men (seconds)


Conclusion

Where to go from here

Step number: don’t let your skin crawl when somebody says“derivative”, “integral”, or “Calculus”

If you are going into a field where you will be deriving statisticalestimators, or using “cutting edge tools,” NOW is the time to refreshyour Calculus skills.

executive summary: calculus -...

Documents