Lecture 3: The Normal Distribution andStatistical Inference
Sandy [email protected]
24 April 2008
1 / 36
A Review and Some Connections
The Normal Distribution
The Central Limit Theorem
Estimates of means and proportions: uses and properties
Confidence intervals and Hypothesis tests
2 / 36
The Normal Distribution
A probability distribution for continuous data
Characterized by a symmetric bell-shaped curve(Gaussian curve)
Symmetric about its mean µ
Under certain conditions, can be used to approximateBinomial(n,p) distribution
np>5n(1-p)>5
3 / 36
Normal Distribution
x
Norm
al D
ensity, f(
x)
−− ∞∞ µµ ++ ∞∞
Takes on values between −∞ and +∞Mean = Median = Mode
Area under curve equals 1
Notation for Normal random variable: X ∼ N(µ, σ2)
Parametersµ = meanσ = standard deviation
4 / 36
Formula: Normal Probability Density Function (pdf)
x
No
rma
l D
en
sity,
f(x)
−− ∞∞ µµ ++ ∞∞
The normal probability density function for X ∼ N(µ, σ2) is:
f (x) =1
√2πσ
· e−(x−µ)2/2σ2,−∞ < x < +∞
Note: π ≈ 3.14 and e ≈ 2.72 are mathematical constants
5 / 36
Standard Normal
Definition: a Normal distribution N(µ, σ2) with parametersµ = 0 and σ = 1
Its density function is written as:
f (x) =1
√2π
· e−x2/2,−∞ < x < +∞
We typically use the letter Z to denote a standard normalrandom variable (Z ∼ N(0, 1))
Important! We use the standard normal all the time becauseif X ∼ N(µ, σ2), then X−µ
σ ∼ N(0, 1)
This process is called “standardizing” a normal randomvariable
6 / 36
68-95-99.7 Rule I
68% of the density is within one standard deviation of the mean
x
Norm
al D
ensity, f(
x)
−− ∞∞ µµ ++ ∞∞µµ −− 1σσ µµ ++ 1σσ
0.68
0.16 0.16
7 / 36
68-95-99.7 Rule II
95% of the density is within two standard deviations of the mean
x
Norm
al D
ensity, f(
x)
−− ∞∞ µµ ++ ∞∞µµ −− 2σσ µµ ++ 2σσ
0.95
0.025 0.025
8 / 36
68-95-99.7 Rule III
99.7% of the density is within three standard deviations of themean
x
Norm
al D
ensity, f(
x)
−− ∞∞ µµ ++ ∞∞µµ −− 3σσ µµ ++ 3σσ
0.997
0.0015 0.0015
9 / 36
Different Means
x
No
rma
l D
en
sity
µµ1 µµ2 µµ3
Three normal distributions with different meansµ1 < µ2 < µ3
10 / 36
Different Standard Deviations
x
No
rma
l D
en
sity
σσ1
σσ2
σσ3
Three normal distributions with different standard deviationsσ1 < σ2 < σ3
11 / 36
Standard Normal N(0,1)
−4 −2 0 2 4
µµ=0
No
rma
l D
en
sity σσ=1
12 / 36
Example: Birthweights (in grams) of infants in apopulation
Weights
Density
0 1000 2000 3000 4000 5000 6000
Continuous data
Mean = Median = Mode = 3000 = µ
Standard deviation = 1000 = σ
The area under the curve represents the probability(proportion) of infants with birthweights between certainvalues
13 / 36
Normal Probabilities
We are often interested in the probability that z takes on valuesbetween z0 and z1
P(z0 ≤ z ≤ z1) =
∫ z1
z0
1√
2π· e−z2/2dz
How do we calculate this probability?
Equivalent to finding area under the curveContinuous distribution, so we cannot use sums to findprobabilitiesPerforming the integration is not necessary since tables andcomputers are available
14 / 36
Z Tables
15 / 36
But...we’ll use R
For standard normal random variables Z ∼ N(0, 1) we’ll use1 pnorm(?) to find P(Z ≤?)2 pnorm(?, lower.tail=F) to find P(Z ≥?)
<?
?
>?
?
For any normal random variable X ∼ N(µ, σ2)(but taking X ∼ N(2, 32) as an example) we’ll use
1 pnorm(?, mean=2, sd=3) to find P(X ≤?)2 pnorm(?, mean=2, sd=3, lower.tail=F) to find P(X ≥?)
16 / 36
Example: Birthweights (in grams)
Weights
De
nsity
0 1000 2000 3000 4000 5000 6000
µ = 3000
σ = 1000
X = birthweight
Z =X − µ
σ17 / 36
Question I
What is the probability of an infant weighing more than 5000g?
P(X > 5000) = P(X − µ
σ>
5000 − 3000
1000)
= P(Z > 2)
= 0.0228
Get this using pnorm(2, lower.tail=F) (since we standardized)
18 / 36
Question II
What is the probability of an infant weighing less than 3500g?
P(X < 3500) = P(X − µ
σ<
3500 − 3000
1000)
= P(Z < 0.5)
= 0.6915
19 / 36
Question III
What is the probability of an infant weighing between 2500 and4000g?
P(2500 < X < 4000) = P(2500 − 3000
1000<
X − µ
σ<
4000 − 3000
1000)
= P(−0.5 < Z < 1)
= 1 − P(Z > 1) − P(Z < −0.5)
= 1 − 0.1587 − 0.3085
= 0.5328
20 / 36
Statistical Inference
Populations and samples
Sampling distributions
21 / 36
Definitions
Statistical inference is “the attempt to reach a conclusionconcerning all members of a class from observations of onlysome of them.” (Runes 1959)
A population is a collection of observations
A parameter is a numerical descriptor of a population
A sample is a part or subset of a population
A statistic is a numerical descriptor of the sample
22 / 36
Population vs. Sample
Population
population size = N
µ = mean, a measure of center
σ2 = variance, a measure of dispersion
σ = standard deviation
Sample from the population is used to calculate sample estimates(statistics) that approximate population parameters
sample size = n
X̄ = sample mean
s2 = sample variance
s = sample standard deviation
Population: parameters
Sample: statistics23 / 36
Estimating the population mean, µ
Usually µ is unknown and we would like to estimate it
We use X̄ to estimate µ
We know the sampling distribution of X̄
Definition: Sampling distribution
The distribution of all possible values of some statistic, computedfrom samples of the same size randomly drawn from the samepopulation, is called the sampling distribution of that statistic
24 / 36
Sampling Distribution of X̄
Population Distribution of X
X
Density
µµ
X~N(µµ,σσ2)
Distribution of Sample Mean X
X
Density
µµ
X~N(µµ,σσ2
n)
n=10
n=30
n=100
When sampling from a normally distributed population
X̄ will be normally distributedThe mean of the distribution of X̄ is equal to the true mean µof the population from which the samples were drawnThe variance of the distribution is σ2/n, where σ2 is thevariance of the population and n is the sample sizeWe can write: X̄ ∼ N(µ, σ2/n)
When sampling from a population whose distribution is not normal
and the sample size is large, use the Central Limit Theorem 25 / 36
The Central Limit Theorem (CLT)
Given a population of any distribution with mean, µ, and variance,σ2, the sampling distribution of X̄ , computed from samples of sizen from this population, will be approximately N(µ, σ2/n) whenthe sample size is large
In general, this applies when n ≥ 25
The approximation of normality becomes better as n increases
26 / 36
What if a random variable has a Binomial distribution?
First, recall that a Binomial variable is just the sum of n
Bernoulli variable: Sn =∑n
i=1 Xi
Notation:
Sn ∼ Binomial(n,p)Xi ∼ Bernoulli(p) = Binomial(1, p) for i = 1, . . . , n
In this case, we want to estimate p by p̂ where
p̂ =Sn
n=
∑ni=1 Xi
n= X̄
p̂ is just a sample mean!
So we can use the central limit theorem when n is large
27 / 36
Binomial CLT
For a Bernoulli variable
µ = mean = pσ2 = variance = p(1-p)
X̄ ≈ N(µ, σ2/n) as before
Equivalently, p̂ ≈ N(p, p(1−p)n
)
28 / 36
Distribution of Differences
Often we are interested in detecting a difference between twopopulations
Differences in average income by neighborhood
Differences in disease cure rates by age
29 / 36
Distribution of Differences: Notation
Population 1:
Size = N1
Mean = µ1
Standard deviation = σ1
Population 2:
Size = N2
Mean = µ2
Standard deviation = σ2
Samples of size n1 from Population 1:
Mean = µX̄1= µ1
Standard deviation =σ1/
√n1 = σX̄1
Samples of size n2 from Population 2:
Mean = µX̄2= µ2
Standard deviation =σ2/
√n2 = σX̄2
30 / 36
Distribution of Differences: CLT result
Now by CLT, for large n:
X̄1 ∼ N(µ1, σ21/n1)
X̄2 ∼ N(µ2, σ22/n2)
and X̄1 − X̄2 ≈ N(µ1 − µ2,σ2
1n1
+σ2
2n2
)
31 / 36
Difference in proportions?
We’re done if the underlying variable is continuous. What ifthe underlying variable is Binomial?
Then X̄1 − X̄2 ≈ N(µ1 − µ2,σ2
1n1
+σ2
2n2
)is replaced by:
p̂1 − p̂2 ≈ N(p1 − p2,p1(1 − p1)
n1+
p2(1 − p2)
n2)
32 / 36
Summary of Sampling Distributions
Sampling Distribution
Statistic Mean Variance
X̄ µ σ2
n
X̄1 − X̄2 µ1 - µ2σ2
1n1
+σ2
2n2
p̂ p pqn
np̂ np npq
p̂1 − p̂2 p1 − p2p1q1
n1+ p2q2
n2
33 / 36
Statistical inference
Two methods
Estimation (Confidence intervals)Hypothesis testing
Both make use of sampling distributions
Remember to use CLT
34 / 36
Rest of material moved to lecture 4
We didn’t get a chance to cover the rest of the material, so it hasbeen moved to lecture 4.
35 / 36
Lecture 3 Summary
The Normal Distribution
The Central Limit Theorem
Sampling distributions
Next time, we’ll discuss
Confidence intervals for population parameters
The t-distribution
Hypothesis testing (p-values)
36 / 36