executive summary: calculus -...
TRANSCRIPT
Executive Summary: Calculus 1 / 81
Executive Summary: Calculus
Paul E. Johnson
August 10, 2015
Executive Summary: Calculus 2 / 81
What is Calculus for?
1. Vocabulary of Relationships.
Calculus is the studyof curves, change, andaccumulation.
Calculus gives uswords to describerelationships.
a)
f (x)
X
Y
b) X
Y
f (x)
c) X
f (x)
Y
d) X
f (x)
Y
Executive Summary: Calculus 3 / 81
What is Calculus for?
2. Optimization (Find Top or Bottom)
An optimization problem usually has 3 parts.
1 An objective function, say f .
2 Although we often write about f (x) as a matter of habit, that’smisleading and probably confusing to most new students. Almost allstatistical calculations take the data, usually called x , as fixedquantities, and instead they adjust coefficients β or θ (or whatever).But I write about f (x) as well, just because it is traditional.
3 Constraints that restrict the extent to which we can adjust theinputs.
Finding the top of a hill is a primary objective in math & engineeringafter high school.
Executive Summary: Calculus 4 / 81
What is Calculus for?
Sometimes, We Wander With Purpose
Choosing the horizontal variableto maximize the vertical.
We are getting “higher” if theresult is increasing
At the top, we are at a pointwhere we can’t move ineither direction withoutreducing the outcome.
This is differential calculus!.
An important characteristicof an optimum is that theslope of the objectivefunction is 0 at the optimumpoint. We are at the top ofthe hill. 0 5 10 15 20 25 30
05
1015
2025
30
Adjust left and right to find the maximum
function: wanderObjective
to the top
warmer
warmerYou are
some hints!
hot
warmer
colder
colder
I'll give you
getting
Executive Summary: Calculus 5 / 81
What is Calculus for?
Why is that so Difficult?
Some functions are easy to optimize& visualize
Some are difficult, even with acomputer
Some are analytically solvable, sono numerical approximation isneeded
But most of “maximum likelihood”and advanced stats are not“solvable”
b̂1 b̂2
Tangent plane
When an optimum point is found, it has the property that the slope ofthe function at that point is 0.
Executive Summary: Calculus 6 / 81
What is Calculus for?
3: Understanding Diversity
This is a probability modelfor 40 yard dash times
Very rare to find a 300pound man who can run 40yards in 5 seconds or less.
How rare is it? IntegralCalculus is used to find thearea of the shaded area.
4 6 8 10 12
0.0
00
.05
0.1
00
.15
0.2
00
.25
0.3
0
40 Yard Dash Times of 300 Pound Men (seconds)
Executive Summary: Calculus 7 / 81
Necessary Terminology
Real Numbers, R
Does it go without saying that we are studying the “real numbers”?
Includes all fractions, decimal numbers, integers, and 0Familiar to most school children!
Definition
The Real Numbers
Math book will give a definition about a closed group of values, a setwhere we can add and multiply the members and get back values that arealso in the same set. A set of values (x1, x2, x3,)
1 x1 − x2 and x1 + x2 are in R1 x1 · x2 and x1/x2 are in R2 There exists a value 1 such that 1 · x1 = x1
3 There exists a value 0 that can nullify any value:0 · x1 = 0.
Executive Summary: Calculus 8 / 81
Necessary Terminology
Cartesian Plane
Cartesian plane: All pairs you can get bychoosing 2 numbers from R.
A point is an ordered pair that isrepresented by a dot in the plane.
Anticipate confusion because various fieldsre-name the axes. In political sciencerational choice models, we often refer tothem as X1 and X2 and points are (x1, x2),(y1, y2) and so forth. This notation maybe nicer because, in a 10 dimensionalmodel, it is easier to remember(x1, x2, . . . , x10) than to give special lettersfor every dimension.
Executive Summary: Calculus 9 / 81
Necessary Terminology
Critical Terminology
Critical point value of input atwhich the outputis either at the topof a peak or in thebottom of a valley.
Local vs Global Maximum.
0 5 10 15 20 25 30
05
10
15
20
25
30
Input
Outp
ut
Critical
Points
Local Maximum
Inflection
Point
Global Maximum
Minimum
Global
Executive Summary: Calculus 10 / 81
Necessary Terminology
We study Relationships
concavity: curvature
“concave down”
“concave up”
Important because we useconcavity to knowif we are at aminimum or amaximum.
f (x)
X
Y
X
Y
f (x)
Executive Summary: Calculus 11 / 81
Necessary Terminology
Detour: Convex Combinations
Concave down means“peaked”, generally.
A more formal definitiontakes us into a detour on theidea of convex combination.
Consider a weightingcoefficient λ that varies froma to b
A convex combination of twopoints is
λa + (1− λ)b
a b
The convex combination
λa+ (1 − λ)b
λ = 1
Executive Summary: Calculus 12 / 81
Necessary Terminology
Detour: Convex Combinations
A convex combination of twopoints is
λa + (1− λ)b
Please notice the beautifulillustration, which tookabout 5 hours of fiddlingabout
a b
The convex combination
λa+ (1 − λ)b
λ = 1
λ = 0.7
0.7a+ (1 − 0.7)b
Executive Summary: Calculus 13 / 81
Necessary Terminology
Detour: Convex Combinations
A convex combination of twopoints is
λa + (1− λ)b
Please notice the beautifulillustration, which tookabout 5 hours of fiddlingabout
a b
The convex combination
λa+ (1 − λ)b
λ = 1
λ = 0.7
0.7a+ (1 − 0.7)b
λ = 0.3
0.3a+ (1 − 0.3)b
Executive Summary: Calculus 14 / 81
Necessary Terminology
Detour: Convex Combinations
A convex combination of twopoints is
λa + (1− λ)b
Please notice the beautifulillustration, which tookabout 5 hours of fiddlingabout
a b
The convex combination
λa+ (1 − λ)b
λ = 1
λ = 0.7
0.7a+ (1 − 0.7)b
λ = 0.3
0.3a+ (1 − 0.3)b
λ = 0
Executive Summary: Calculus 15 / 81
Necessary Terminology
Convex Combination in 2 Dimensions
Points are pairs, like (x1, y1),and (x2, y2)
In other things I’ve written,I’ve used notation like(x1, x2), (y1, y2), (z1, z2), solets be cautious aboutconfusing the presenter.
Calculate the value of thecombination
λ(x1, y1) + (1− λ)(x2, y2)
= (λx1 + (1− λ)x2, λy1 + (1− λ)y2)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0 Calculation of
λ(x1, y1) + (1 − λ)(x2, y2)Traces a line from (x1, y1)
●
●
(x1, y1)
(x2, y2)
λ = 1
Executive Summary: Calculus 16 / 81
Necessary Terminology
Convex Combination in 2 Dimensions
λ(x1, y1) + (1− λ)(x2, y2)
= (λx1+(1−λ)x2, λy1+(1−λ)y2)
As λ varies from 1 to 0, thecombination “traces” astraight line between the twopoints
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0 Calculation of
λ(x1, y1) + (1 − λ)(x2, y2)Traces a line from (x1, y1) to (x2, y2)
●
●
(x1, y1)
(x2, y2)
λ = 1λ = 0.7
Executive Summary: Calculus 17 / 81
Necessary Terminology
Convex Combination in 2 Dimensions
λ(x1, y1) + (1− λ)(x2, y2)
= (λx1+(1−λ)x2, λy1+(1−λ)y2)
As λ varies from 1 to 0, thecombination “traces” astraight line between the twopoints
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0 Calculation of
λ(x1, y1) + (1 − λ)(x2, y2)Traces a line from (x1, y1) to (x2, y2)
●
●
(x1, y1)
(x2, y2)
λ = 1λ = 0.7
λ = 0.4
Executive Summary: Calculus 18 / 81
Necessary Terminology
Convex Combination in 2 Dimensions
λ(x1, y1) + (1− λ)(x2, y2)
= (λx1+(1−λ)x2, λy1+(1−λ)y2)
As λ varies from 1 together0, the combination “traces” astraight line between the twopoints
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0 Calculation of
λ(x1, y1) + (1 − λ)(x2, y2)Traces a line to (x2, y2)
●
●
(x1, y1)
(x2, y2)
λ = 1λ = 0.7
λ = 0.4
λ = 0
Executive Summary: Calculus 19 / 81
Necessary Terminology
Convexity of Sets Defined Similarly
A convex set has to be roundor egg shaped. It can’t haveany lacunae or kidneyshapes.
Formally, we say any theconvex combination of anypair of points in the set liesentirely within the set.
0 2 4 6 8 10
02
46
810
A Convex Set
X
Any convex combination
(λxi + (1 − λ)xj, λyi + (1 − λ)yj)belongs to X
(x1, y1)
(x2, y2)
●
●
(x3, y3)
(x4, y4)
●
●
(x5, y5)
(x6, y6)
●
●
Executive Summary: Calculus 20 / 81
Necessary Terminology
And We Redefine Concave Down
A function is concave down ifwe can select any two pointson the function’s image andthe line connecting them isentirely below the function
Phrase that as convexcombination of points.
A Concave Down Function
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
All line segments are below f(x)
●
●
●
●●
●
Executive Summary: Calculus 21 / 81
Necessary Terminology
We study relationships.
Inflection Point: where concavitychanges.
0 5 10 15 20 25 30
05
10
15
20
25
30
Input
Outp
ut
Critical
Points
Local Maximum
Inflection
Point
Global Maximum
Minimum
Global
Executive Summary: Calculus 22 / 81
Necessary Terminology
Continuity
Continuous: informal def: Candraw withoutlifting pencil frompaper
Note how the value of thefunction “hops” at one point.
The open loop indicates that thevalue of thefunction f (x) isundefined.
“Pinholes” like this interfere withanalysis, we (usually) assumefunctions are continuous.
X
f (x)
Y
Executive Summary: Calculus 23 / 81
Necessary Terminology
Notation: Open & Closed Intervals
Closed Interval: [0, 1] all real values between 0 and 1, INCLUDING 0and 1
Open Interval: (0, 1) all real values between 0 and 1, NOTINCLUDING 0 and 1
As long as we know a function’s domain is closed, then we know (forsure!) that the function has a well defined maxima and minima
On open intervals, minimum may not exist: pick the smallest valueinside (0, 1). Undefined!
Executive Summary: Calculus 24 / 81
Differential Calculus
The Big ideas
Outline
1 What is Calculus for?
2 Necessary Terminology
3 Differential CalculusThe Big ideasDerivatives I can Remember
4 OptimizationFirst Order ConditionsSecond-order conditionsMultivariate
5 Integration
6 Conclusion
Executive Summary: Calculus 25 / 81
Differential Calculus
The Big ideas
We are Aiming to Understand This
0 5 10 15 20 25 30
05
10
15
20
25
30
Slope of this tangent line is 0
Slope > 0
Slope < 0
Objective
x
functionf (x)
Executive Summary: Calculus 26 / 81
Differential Calculus
The Big ideas
Slope of a Straight Line
Comparing two points (x1, y1)and (x2, y2):
differencehorizontally∆x = x2 − x1.
difference vertically∆y = y2 − y1.
Slope is the ratio of the twochanges.
slope =∆y
∆x=
y2 − y1
x2 − x1(1)
“the rise over the run.”
0 5 10 15 20 25 30
05
10
15
20
25
30
X
Y
is the same,
no matter how
you pick
The slope
or
(x1,y1)
∆x
∆y(x2,y2)
x1
x1 x2
y1
y2
x2
Executive Summary: Calculus 27 / 81
Differential Calculus
The Big ideas
Slope is a Local Concept
A “local” property is a property thatis not “true everywhere.” It is not a“global” property.
The calculation of the slopedepends on “where you are,” eitheron the left or the right of the breakpoint.
0 5 10 15 20 25 30
05
10
15
20
25
30
X
Y
$y_1$
y2
y3
∆y
∆x(x2,y2)
x1 x2 x3
(x1,y1)
Executive Summary: Calculus 28 / 81
Differential Calculus
The Big ideas
Derivative: the slope of a “smooth curve”.
Consider the smoothcurve.
1 Start at x1
2 Step to x2
3 Stem to x3
0 5 10 15 20 25 30
05
10
15
20
25
30
(x2, y2)
(x1, y1)
(x3, y3)
x x3 x2 x1
y3
y2
y1
(x , y)y
f (x)
Executive Summary: Calculus 29 / 81
Differential Calculus
The Big ideas
Approaching the tangent
Keep making
But
converges to the slope
and
smaller and smaller
shrinks.
at
of the line
that is tangent to
(x , y)
∆y∆x
∆x
∆y
f (x) (x , y) (x5, y5)
(x4, y4)
(x3, y3)
“x” is a particularpoint and valuesx1, x2, x3 get closerand closer to x .
As xi gets closer to x ,the slope of thedotted line, ∆y
∆x , getssteeper.
Executive Summary: Calculus 30 / 81
Differential Calculus
The Big ideas
Derivative Definition
The Derivative is the value of ∆y∆x when ∆x shrinks to nothing.
The dotted line–the tangent line–is “just barely” touching f (x).The derivative is, formally speaking, the slope of the tangent line.
Very Important: This is a smooth, “differentiable” function. It is notjagged, there are no gaps.
Executive Summary: Calculus 31 / 81
Differential Calculus
The Big ideas
Optimization from a Derivative’s point of View
0 5 10 15 20 25 30
05
10
15
20
25
30
Slope of this tangent line is 0
Slope > 0
Slope < 0
Objective
x
functionf (x)
Optimization: Find a point at which thetangent line’s slope is 0
Executive Summary: Calculus 32 / 81
Differential Calculus
The Big ideas
Various Notations for Derivatives
One classic notation is:
df (x)
dx(2)
or y = f (x) then:
dy
dx(3)
More succinct: “f prime of x”, as in:
f ′(x) (4)
My engineering friends like:
Df (x) (5)
Executive Summary: Calculus 33 / 81
Differential Calculus
The Big ideas
Detour: Limits
Derivatives cannot be defined unless we have the tool known as “thelimit”.
Consider a function f (x)
As x tends to infinity, what doesf (x) tend towards?It tends toward 4.0, but it never“quite gets there”
As x tends to 0, f (x) tends to −∞,but it never “gets there”.
For 0 < x <∞, f (x) can becalculated. It is finite.
for x = 50, f (x) is defined, 3.894 x
y
0 100 200 300 400 500
3.0
3.5
4.0
4.5
5.0
(x1 = 50, y1 = 3.894)
y = 4 − 5.3× (1 x)
Executive Summary: Calculus 34 / 81
Differential Calculus
The Big ideas
Limit Definition
Notation for limits. When x is tending to any value x0 is
limx→x0 f (x)
The arrow → is pronounced “goes to” or “approaches”.
Easy Case: if f (x0) exists (is defined, finite) that is the limit.
More Difficult: f (x0) may not exist, but a limit may still exist.
Executive Summary: Calculus 35 / 81
Differential Calculus
The Big ideas
As x goes to ∞, 1/x goes to...
1/x gets smaller as x gets larger, butnever reaches 0. However
limx→∞1
x= 0
x
y
0 2 4 6 8 100
24
68
10
f(x) = 1 xlimitx→∞ = 0
Executive Summary: Calculus 36 / 81
Differential Calculus
The Big ideas
Derivative: Formal Definition
Consider a continuous function f (x).
1 Add “just a bit” to x , we arrive at x + ∆x .
2 Resulting change in y :
∆y = f (x + ∆x)− f (x) (6)
3 The derivative–is defined only if the following limit exists:
lim∆x→0∆y
∆x= lim∆x→0
f (x + ∆x)− f (x)
∆x(7)
Executive Summary: Calculus 37 / 81
Differential Calculus
The Big ideas
Linearity: A Vital Property of Derivatives
In some optimization problems in statistics, we are trying to find thesmallest sum of squared errors or to maximize the likelihood of asample by adjusting 10s or 100s of coefficients.
If we can manipulate the formula so that it is a sum of terms, wecan get the separate derivatives and simplify the work.
Additivity: The derivative of the sum is the sum of the derivatives.
For functions f and g , their combined slope breaks down to the sum ofthe separate slopes.
d
dx{f (x) + g(x)} =
d
dxf (x) +
d
dxg(x) = f ′(x) + g ′(x) (8)
Executive Summary: Calculus 38 / 81
Differential Calculus
The Big ideas
Scaling Factors
Also in statistics, we have data that is rescaled, from Euro toDollars, Fahrenheit to Celsius, and so forth.
It is enormously simplifying to know that
Proportionality: the derivative of a · f (x) is a times the derivative of f .
d
dxa · f (x) = a · d
dxf (x) = a · f ′(x) (9)
Executive Summary: Calculus 39 / 81
Differential Calculus
The Big ideas
Break Apart Complicated Problems
Complicated expressions can be “broken down”
d
dx{a · f (x) + b · g(x)} = a
d
dxf (x) + b
d
dxg(x) (10)
Executive Summary: Calculus 40 / 81
Differential Calculus
Derivatives I can Remember
Outline
1 What is Calculus for?
2 Necessary Terminology
3 Differential CalculusThe Big ideasDerivatives I can Remember
4 OptimizationFirst Order ConditionsSecond-order conditionsMultivariate
5 Integration
6 Conclusion
Executive Summary: Calculus 41 / 81
Differential Calculus
Derivatives I can Remember
Powers of x.
1. Slope of line.If
y = b · x (11)
then (is this too obvious?)
dy
dx= b (12)
Digression: Think backwards for a minute. You usually think of b asa constant, but change gears for a minute to noticedydb = x
Executive Summary: Calculus 42 / 81
Differential Calculus
Derivatives I can Remember
Powers of x.
2a. Slope of x squared. Ify = x2 then
dy
dx= 2x (13)
x
y
−4 −2 2 4
−6
−4
−2
24
6
y = x2
●
●(− 2, 4)
●(1, 1)
●(2, 4)
tanget at x = 1 has slope = 2
●(− 1.25, 1.5625)tanget at x = −1.25
has slope = −2.5
The slope of x2 is 2x
Executive Summary: Calculus 43 / 81
Differential Calculus
Derivatives I can Remember
Another view of that
x
y
−4 −2 2 4
−6
−4
−2
24
6
y = x2
●
●(− 2, 4)
●(1, 1)
●(2, 4)
tanget at x = 1 has slope = 2
●(− 1.25, 1.5625)tanget at x = −1.25
has slope = −2.5
The slope of x2 is 2x
x
y
−4 −2 0 2 4
−6
−4
−2
12
46
y = x2
slope of curve is 2x (the red line)
●
●
Executive Summary: Calculus 44 / 81
Differential Calculus
Derivatives I can Remember
We worked so hard on that one figure...
x
y
−4 −2 2 4
−6
−4
−2
24
6
y = x2
●
●(− 2, 4)
●(1, 1)
●(2, 4)
tanget at x = 1 has slope = 2
●(− 1.25, 1.5625)tanget at x = −1.25
has slope = −2.5
The slope of x2 is 2x
slope of curve is 2x (the red line)
●
●
Executive Summary: Calculus 45 / 81
Differential Calculus
Derivatives I can Remember
Proof of d(x2)/dx = 2x
That is one of the easiest ones to prove and “really believe.” Use thedefinition:
dy
dx= lim∆x→0
(x + ∆x)2 − x2
∆x(14)
= lim∆x→0x2 + 2x∆x + (∆x)2 − x2
∆x(15)
= lim∆x→02x∆x + (∆x)2
∆x(16)
= lim∆x→0(2x + ∆x) ·∆x
∆x(17)
= lim∆x→02x + ∆x (18)
= 2x (19)
Anyway, I believe in that, and take a lot of the rest on faith (insert smileyface here, please).
Executive Summary: Calculus 46 / 81
Differential Calculus
Derivatives I can Remember
More Powers of x.
2b. Slope of a cube. Ify = x3
thendy
dx= 3x2
2c. Slope of y = x4.dy
dx= 4x3
2d. Slope of y = x−1 = 1/x
dy
dx= −1x−2
2e. Slope of√
x = x1/2
dy
dx=
1
2x−1/2
Executive Summary: Calculus 47 / 81
Differential Calculus
Derivatives I can Remember
More Powers of x. ...
You start to notice a general pattern? Here’s the most importantderivative rule:
d
dxxN = N · xN−1 (20)
That is true, whether N is a whole number or a fraction.
Executive Summary: Calculus 48 / 81
Differential Calculus
Derivatives I can Remember
Logarithms
3. The natural logarithm.
d
dxln(x) =
1
x(21)
If you have a log to a differentbase, say log10(), the derivativeinvolves a “constant ofproportionality.”
d
dxlogb(x) =
1
ln(b)· 1
x(22)
0 2 4 6 8
−1
01
23
x
y
● Slope at 1 is 1
● Slope at 0.25 is 4
● Slope at 3 is 1/3
●
Slope at 6 is 1/6
y = ln(x)
dy
dx= 1 x
Executive Summary: Calculus 49 / 81
Differential Calculus
Derivatives I can Remember
Exponentials
Recall Euler’s constant, e, the base of the natural logarithm.
ex = exp(x) (23)
4. The derivative is:
d
dxex =
d
dxexp(x) = exp(x) (24)
In other words, you “get the same thing back”.As in the case of the logarithm, if you are taking powers of some numberbesides e then a constant of proportionality enters the picture. The booksays
d
dxbx = ln(b) · bx (25)
Note that ln(e) = 1, so this is consistent if e is the base.
Executive Summary: Calculus 50 / 81
Differential Calculus
Derivatives I can Remember
Derivative of a product
5. Product rule
d
dx{g(x) · h(x)} =
d
dxg(x) · h(x) + g(x)
d
dx· h(x) (26)
or, if you like the prime notations,
d
dx{g(x) · h(x)} = g ′(x)h(x) + g(x) · h′(x)
Executive Summary: Calculus 51 / 81
Differential Calculus
Derivatives I can Remember
Function of a function
6. The chain rule states that:
d
dx{f (g(x))} =
df
dx|g(x) ·
dg(x)
dx(27)
That’s the derivative of f (x) calculated at the location given by the valueg(x), multiplied by the derivative of g(x). Confusing enough? Probably.Suppose, for example, you had
g(x) = x2
andf (x) = ln(x)
so
f (g(x)) = ln(x2)
d
dxf (g(x)) =
1
x2· 2x
Executive Summary: Calculus 52 / 81
Optimization
First Order Conditions
Outline
1 What is Calculus for?
2 Necessary Terminology
3 Differential CalculusThe Big ideasDerivatives I can Remember
4 OptimizationFirst Order ConditionsSecond-order conditionsMultivariate
5 Integration
6 Conclusion
Executive Summary: Calculus 53 / 81
Optimization
First Order Conditions
Find where the Derivative equals 0
In many statistical problems, we can exactly solve to find theoptimal parameter estimates.
This is done by finding a solution for the derivative–the “first ordercondition”–and doing some followup checking.
Executive Summary: Calculus 54 / 81
Optimization
First Order Conditions
Peaks And Valleys
We suppose we are adjusting aparameter, the thing we have beencalling x in these notes. We find a pointwhere
f ′(x) = 0 (28)
it means we have found the exact “topof the hill.” (Yeah!)Or in the “bottom of the bowl.” (Booh!)
x
y
Maximum: slope tangent line = 0
Minimum: slope tangent line = 0
Executive Summary: Calculus 55 / 81
Optimization
Second-order conditions
Outline
1 What is Calculus for?
2 Necessary Terminology
3 Differential CalculusThe Big ideasDerivatives I can Remember
4 OptimizationFirst Order ConditionsSecond-order conditionsMultivariate
5 Integration
6 Conclusion
Executive Summary: Calculus 56 / 81
Optimization
Second-order conditions
Second Derivative
Have found a maximum or minimum?
Second Derivative: The derivative of the first derivative. How much isthe slope changing?
Literally, that would bed
dx
{dy
dx
}but they usually don’t write it out like that.
Meaning you start at a critical point x and go an “itty bitty” amountto the right, how much does the slope change.
Notation:
f ′′(x) = D2f (x) =d2y
dx2=
d2f (x)
dx2(29)
I was trained by people who prefer this kind of notation, f ′′(x). But theother is nice too
Executive Summary: Calculus 57 / 81
Optimization
Second-order conditions
Telling Your Hill from Your Bowl (Minima and Maxima)
Minima: If the slope is gettingbigger around xmin, thatmeans the impact of x is“accelerating”. You are ina bowl.
If f ′(x) = 0 and f ′′(x) > 0, then xis a local minimum. f is “concaveup” at that point.
Maxima: If the slope is gettingsmaller around xmax , youare heading downward atthe top.
If f ′(x) = 0 and f ′′(x) < 0, then xis a local maximum. f is “concavedown” at x .
−3 −2 −1 0 1 2 3
−1.
5−
1.0
−0.
50.
00.
51.
01.
5
x
y
●
●
Executive Summary: Calculus 58 / 81
Optimization
Multivariate
Outline
1 What is Calculus for?
2 Necessary Terminology
3 Differential CalculusThe Big ideasDerivatives I can Remember
4 OptimizationFirst Order ConditionsSecond-order conditionsMultivariate
5 Integration
6 Conclusion
Executive Summary: Calculus 59 / 81
Optimization
Multivariate
Calculus with many variables
Suppose we have 3 input variables, x1, x2, and x3.
y = f (x1, x2, x3)
A partial derivative is the change in f (x1, x2, x3) that results when all ofthe variables are being held constant except one. The most commonnotations for the partial derivative “with respect to x1” are
∂y
∂x1(30)
or:f1(x1, x2, x3)
Note the subscript indicates the variable under consideration
Executive Summary: Calculus 60 / 81
Optimization
Multivariate
m First Order Conditions
f (x1, x2, x3) has 3 variables. That means it has 3 first order conditions.We require that ALL OF THE PARTIAL DERIVATIVES equal 0. That is,simultaneously solve
∂y
∂x1= 0
∂y
∂x2= 0
∂y
∂x3= 0
Note, you could as well think of this as a vector of derivatives:
Df =
∂y∂x1∂y∂x2∂y∂x3
= 0 (31)
Executive Summary: Calculus 61 / 81
Optimization
Multivariate
m First Order Conditions ...
One tires quickly of writing down 3 rows of derivatives over and over, soone often just refers to this condition for a maximum or minimum asDf = 0.It is very common in maximum likelihood analysis to call this the “scoreequation”.
Executive Summary: Calculus 62 / 81
Optimization
Multivariate
Second order conditions
This is the point at which calculus with many variables turns to hellbecomes frustrating.It is easy to calculate a second partial derivative of f ()with respect to x1
∂2y
∂x1∂x1=
∂2y
∂2x1
And one can also find the “cross partial” of ∂y∂x1
with respect to anothervariable, say x2.
∂2y
∂x1∂x2
I prefer short hand notation like f11() or f12() for these.
Executive Summary: Calculus 63 / 81
Optimization
Multivariate
Second order conditions ...
Suppose you begin with the vector of first partials. Differentiate eachitem by each of the 3 variables. You end up a 3x3 matrix of secondpartial derivatives like so:
D ′ =
f11 f12 f13
f21 f22 f23
f31 f32 f33
=
∂2y
∂x1∂x1
∂2y∂x1∂x2
∂2y∂x1∂x3
∂2y∂x2∂x1
∂2y∂x2∂x2
∂2y∂x2∂x3
∂2y∂x3∂x1
∂2y∂x3∂x2
∂2y∂x3∂x3
(32)
This is the so-called Hessian matrix.In practical applications, we find often that
1 Time consuming to calculate the Hessian
2 Calculations are numerically unstable because of digital computinglimitations
3 The matrix has some flaw which indicates that we can’t tell if we areat a maximum or not. Software will return an error about theHessian matrix being “non-positive definite.” That is to say, we can’tsay.
Executive Summary: Calculus 64 / 81
Optimization
Multivariate
Positive Definite Hessians (or the lack thereof)
I have not done a lot of this kind of work, so take this as a “back of theenvelope sketch.”Here’s the idea of a “positive definite” matrix. We are “at” (x1, x2, x3), asindicated by the first order condition.We’ve got the second derviative matrix
D ′ =
f11 f12 f13
f21 f22 f23
f31 f32 f33
Take a small perturbation vector z. It is supposed to represent a smallchange from the location dictated by the first order condition. Calculatethe quantity:
[z1, z2, z3]
f11 f12 f13
f21 f22 f23
f31 f32 f33
z1
z2
z3
(33)
Executive Summary: Calculus 65 / 81
Optimization
Multivariate
Positive Definite Hessians (or the lack thereof) ...
There’s a theorem that says:If z ′D ′z > 0, then D ′ is a positive definite matrix. Congratulations, youhave a minimum.If z ′D ′z < 0, then D ′ is a negative definite matrix. Its a maximum!This is similar to the univariate case, where f ′′(x) < 0 means you have amaximum.
Executive Summary: Calculus 66 / 81
Integration
Horrible Scary S symbol
The elongated S symbol representsintegration.
ˆ b
a
f (x)dx
This means “the total area under thecurve f (x) between a and b.”
The symbol dx represents the“dummy variable of integration.” Itis a signal that you are supposed tomove along the x axis when yousum up from a to b.
a b x
f(x)
y
Area under f(x) between a and b is the integral
Executive Summary: Calculus 67 / 81
Integration
Easy: Area under a flat curve
You always drive 50 miles per hour. After 2 hours, you have gone2× 50 = 100 miles.
Time (hours)
Spe
ed (
mile
s pe
r ho
ur)
0 5 10 15 20
010
2030
4050
60
f(t)
Distance traveled is the areaunder the curve.time × speed (mph).
Executive Summary: Calculus 68 / 81
Integration
Easy: Area under a flat curve
If you always drive exactly 50 miles per hour, the distance traveled, F(t),is easy to calculate.
Time (hours)
Spe
ed (
mile
s pe
r ho
ur)
0 5 10 15 20
010
2030
4050
60
f(t)
Time (hours)
Dis
tanc
e T
rave
led
(mile
s)
0 5 10 15 200
200
400
600
800
250 ●
500 ●
750 ●
1000 ●
F(t)
We know, for example, that at time 10, the distance traveled is 500miles. And it is also easy to see that the distance traveled between hours5 and 10 is F (10)− F (5) = 250.
Executive Summary: Calculus 69 / 81
Integration
Not So Easy: Variable Speeds
Time (hours)
Spe
ed (
mile
s pe
r ho
ur)
0 5 10 15 20
020
4060
80
f(t)
Speed, f (t), is not constant.
Executive Summary: Calculus 70 / 81
Integration
Variable Speed
Time (hours)
Spe
ed (
mile
s pe
r ho
ur)
0 5 10 15 20
020
4060
80
f(t)
Time (hours)
Dis
tanc
e T
rave
led
(mile
s)
0 5 10 15 20
020
040
060
080
0
249.07 ●
446.02 ●
543.61 ●
760.03 ●
F(t)
The distance traveled after 10 hours, F (10), is 446.02, and we can seethat the trip’s progress from 5 to 10 hours, 446.02− 249.07 = 196.95miles, is quite a bit better than the progress between 10 and 15 hours,543.61− 446.02 = 97.59 miles.
Executive Summary: Calculus 71 / 81
Integration
Between Hours 5 and 10
Time (hours)
Spe
ed (
mile
s pe
r ho
ur)
0 5 10 15 20
020
4060
80
f(t)
The distance traveled between hours 5and 10. Using the integral notation,
ˆ 10
5
f (t)dt = 196.95 (34)
And that area is equal toF (10)− F (5).
Executive Summary: Calculus 72 / 81
Integration
I Just Snuck In the FTOC (without you noticing)
The functions f (t) and F (t) arelinked together. For points a andb,
ˆ b
a
f (t)dt = F (b)− F (a) (35)
The accumulator function F (t) isthe key.
The Fundamental Theorem ofCalculus states that the previousexpression is true if F (t) isdifferentiable and
dF (t)
dt= F ′(t) = f (t)
b
The areawe want tomeasure is
removed from the part to the left
f (t)
the part to the left of a is
of b
F (b)− F (a)
F (b) is the
what’s left after
a
F (a) is the area
area on the left of b
area on the left of a
Executive Summary: Calculus 73 / 81
Integration
I’m Unlikely to Prove That
But I can wave my handswildly
The differencebetween the “areaunder the curve” andthe “area of therectangle below it” arecompared.
Outer edges: a andb = a + ∆t.
The area of therectangle (∆t · f (b))is easily seen to besmaller than the areaunder the curve.
Area of rectangle is
under curve and area of
a b = a + ∆t
f (a)
∆t · f (b)
f (b) f (t)
Mismatch between area
rectangle shrinks as ∆t shrinks
∆t
Executive Summary: Calculus 74 / 81
Integration
Could Adjust the Area Measurement
Suppose we wanted to leavea and b far apart and use anapproximating rectangle.
We could! Just “top” therectangle at the correct spot(known as the “Mean ValueTheorem”)
Goldilocks
If f (a) gives a rectangle thatis “too big,” and
if f (b) gives a rectangle thatis too small,
f (c) is just right!
Area of rectangle is
b = a + ∆t
f (a)
∆t · f (a)
f (b) f (t)
∆t
f (c)
a
Executive Summary: Calculus 75 / 81
Integration
Make ∆t→ 0.
As ∆t → 0, All three of the rectangles would converge to the samesize. See?
b → a and
c → a.
lim∆t→0f (c) = f (a)
lim∆t→0f (b) = f (a).
Now consider the “just right” area measurement:
F (a + ∆t)− F (a) = f (c)×∆t (36)
Divide both sides by ∆t.
Executive Summary: Calculus 76 / 81
Integration
Make ∆t→ 0. ...
F (a + ∆t)− F (a)
∆t= f (c) (37)
The left hand side is starting to look like a derivative, so take limits.
lim∆t→0F (a + ∆t)− F (a)
∆t= lim∆t→0f (c) (38)
dF (a)
dt= f (a)
In words, the derivative of F (t) at point a equals the value of f (t)at a.
If we only had F (t), life would be sweet!
And we do, sorta...
F (t) is the function which, when differentiated, would be equal tof (t). The slope of F (t) is f (t). For this reason, F (t) is known as an“anti-derivative” of f (t).
Executive Summary: Calculus 77 / 81
Integration
If we were doing this numerically
We sometimes have no analyticalmethod to calculate F (x), especially inmultiple dimensions. So there is a highpriority on developing numericalmethods that can approximate.
Time (hours)
Spe
ed (
mile
s pe
r ho
ur)
0 5 10 15 200
2040
6080
f(t)
Executive Summary: Calculus 78 / 81
Integration
Finer Rectangles = More Calculation
Waste more computing time, get abetter answer.
This “brute force” numericalapproach only works inlow-dimensional problems
Advanced optimization jargon inyour future
Gaussian quadrature: smart waysto choose the rectangle positionsMonte Carlo math: estimate areaby random sampling (GHKalgorithm)
Time (hours)
Spe
ed (
mile
s pe
r ho
ur)
0 5 10 15 200
2040
6080
f(t)
Executive Summary: Calculus 79 / 81
Conclusion
Key ideas: Derivative
There are many details, butthe essence of the differentialcalculus boils down to this.
Critical Point
Derivative: where slope is 0
Local Maximum
Second Order Conditions: Are weat the top of a hill?
0 5 10 15 20 25 30
05
10
15
20
25
30
Input
Outp
ut
Critical
Points
Local Maximum
Inflection
Point
Global Maximum
Minimum
Global
Executive Summary: Calculus 80 / 81
Conclusion
Key ideas: Integrals
Possible to calculate(analytically or numerically)the area under a curve.
Analytically, theFundamental Theorem ofCalculus gives conditionsunder which the area can becalculated exactly, withoutnumerical approximation.
Many applications inprobability have analyticalsolutions.
In practice, the models wewant to solve are generallynot analytically solvable, socomputer approximation isvital.
4 6 8 10 12
0.0
00.0
50.1
00.1
50.2
00.2
50.3
0
40 Yard Dash Times of 300 Pound Men (seconds)
Executive Summary: Calculus 81 / 81
Conclusion
Where to go from here
Step number: don’t let your skin crawl when somebody says“derivative”, “integral”, or “Calculus”
If you are going into a field where you will be deriving statisticalestimators, or using “cutting edge tools,” NOW is the time to refreshyour Calculus skills.