lecture on sampling distributions
TRANSCRIPT
-
8/8/2019 Lecture on Sampling Distributions
1/31
Sampling Distributions
Stat 515 Lecture
-
8/8/2019 Lecture on Sampling Distributions
2/31
12/18/2010 2
Inching Towards Inference
Recall that one of our main goals is to make
inference about the unknown parameters of the
population or the distribution, such as the mean Q,
the standard deviation W, or some other summary
measures such as the median, etc.
We now have possible models for the population,
which are provided by the probability distributions
(Binomial, Poisson, Normal, Uniform, others).
We also know how to compute sample statistics suchas the sample mean, sample standard deviation, and
others, with these sample statistics to be used for
making inference about the parameters.
-
8/8/2019 Lecture on Sampling Distributions
3/31
12/18/2010 3
Sampling as a Random Experiment
To understand the notion of a sampling distribution of
a sample statistic, it is important to realize that the
process of taking a sample from a population could
be viewed as a random experiment.
To illustrate this idea, consider a population taking 3
values: 2, 4, 5 according to the following probability
distribution.
Probability Function: p(2) = .4, p(4) = .5, p(5) = .1 You may imagine that 40% of all the values in the
population equals 2; 50% equals 4; and 10% equals
5.
-
8/8/2019 Lecture on Sampling Distributions
4/31
12/18/2010 4
The Population
4s
2s
5s
-
8/8/2019 Lecture on Sampling Distributions
5/31
12/18/2010 5
Characteristics of the Population
For this population, we have the parameters:
Q = (2)(.4) + (4)(.5) + (5)(.1) = .8 + 2 + .5 = 3.3
W2 = (2 - 3.3)2(.4) + (4 - 3.3)2(.5) + (5 - 3.3)2(.1) = 1.21
W = (1.21)
1/2
= 1.1
Its shape is given by the bar graph below:
0
0.2
0.4
0.6
2 3 4 5
-
8/8/2019 Lecture on Sampling Distributions
6/31
12/18/2010 6
Possible Outcomes of Sampling Process
Possible
Samples
Proba bil ity of
S a mp le
Sa m ple M ean S a m p le
Variance
(2 , 2) (.4)(.4 ) = .16 2 0
(2 , 4) (.4)(.5 ) = .20 3 2
(2 , 5) (.4)(.1 ) = .04 3 .5 4 .5
(4 , 2) (.5)(.4 ) = .20 3 2
(4 , 4) (.5)(.5 ) = .25 4 0
(4 , 5) (.5)(.1 ) = .05 4 .5 .5(5 , 2) (.1)(.4 ) = .04 3 .5 4 .5
(5 , 4) (.1)(.5 ) = .05 4 .5 .5
(5 , 5) (.1)(.1 ) = .01 5 0
Now, consider the sampling process of taking n = 2
observations (with replacement) from this population
or distribution. Below is a table of possibilities.
-
8/8/2019 Lecture on Sampling Distributions
7/31
12/18/2010 7
Some Points about the Preceding Table
Since we are sampling with replacement, to obtain
the probability of each possible sample, we simply
multiply the probabilities of each of the observations
(Think of a tree diagram!). The 9 possible samples represent the elementary
events of the experiment of taking a sample of size 2
from the population or distribution.
The sample mean ( ) is obtained the usual way. The sample variance is computed the usual way. For
example, for the second sample, we have
S2 = [(2-3)2 + (4-3)2]/(2-1) = [1 + 1]/1 = 2
X
-
8/8/2019 Lecture on Sampling Distributions
8/31
12/18/2010 8
Sample Statistics as Random Variables
Since the sample mean and the sample variance are
numerical characteristics of each of the possible
samples, they can be viewed as random variables in
this sampling experiment. Therefore, we could obtain the probability
distributions of the sample mean and sample
variance.
These probability distributions are called samplingdistributions.
Thus we will have the sampling distribution of the
sample mean, as well as the sample variance.
-
8/8/2019 Lecture on Sampling Distributions
9/31
12/18/2010 9
Sampling Distribution of the Sample Mean
From the earlier table, we could construct theprobability distribution of the sample mean, now
called the sampling distribution of the sample mean.
This is given by the following table.
X )(XP )(* XPX)()3.3(
2XPX
2 .16 0.32 .2704
3 .20 + .20 = .40 1.20 .0360
3.5 .04 + .04 = .08 0.28 .0032
4 .25 1.00 .12254.5 .05 + .05 = .10 0.45 .1440
5 .01 0.05 .0289
Sums 1.00 3.3 .6050
-
8/8/2019 Lecture on Sampling Distributions
10/31
12/18/2010 10
Graph of the Sampling Distribution of
the Sample Mean
Note that it has become more concentrated near the
population mean of 3.3, compared to the original
distribution.
3
.
.
.
.3
.
X
a r
Xbar
am pl ing Dis t r ibu tion o f the
a m p l e Me a n
a s e d o n
a
a m p l e o f
i
e n
-
8/8/2019 Lecture on Sampling Distributions
11/31
12/18/2010 11
Parameters of the Sampling Distribution
Because the sampling distribution is just like any
other probability distribution, we are also able to
obtain its mean, variance, and standard deviation.
Thus, for the sampling distribution of the sample
mean, we find the mean to be 3.3, which coincideswith the original population mean; while
the variance of the sampling distribution of the
sample mean turns out to be equal to .605, which is
equal to (1.21)/2, the population variance divided bythe sample size.
The standard deviation of the sample mean, now
called the standard error (SE), is (.605)1/2 = .7778.
-
8/8/2019 Lecture on Sampling Distributions
12/31
12/18/2010 12
Recapitulation
Sampling from a probability distribution or population
could be viewed as a random experiment, and the
elementary outcomes are the possible samples.
Sample statistics, such as the sample mean, could
be viewed as random variables, and as such have
their associated probability distributions, which are
called sampling distributions.
The sampling distribution also has a mean.
And it also has a variance.
The standard deviation of the sampling distribution is
called the standard error (SE).
-
8/8/2019 Lecture on Sampling Distributions
13/31
12/18/2010 13
Sampling Distribution of the Sample Mean
The mean of the sampling distribution of the sample
mean equals the population mean.
The variance of the sampling distribution of the
sample mean equals the population variance divided
by the sample size.
These two characteristics are always true for the
sampling distribution of the sample mean when
sampling with replacement.
-
8/8/2019 Lecture on Sampling Distributions
14/31
12/18/2010 14
Obtaining Sampling Distributions
In the example considered, we obtained the sampling
distribution of the sample mean by enumerating all
the possible samples that could arise.
However, such a method is not feasible if the sample
size is large. For instance, if n = 10, then there will
be a total of (3)(3)(3)(3) = 310 = 59049 possible
samples, and complete enumeration is not anymore
possible.
How do we obtain sampling distributions?
-
8/8/2019 Lecture on Sampling Distributions
15/31
12/18/2010 15
Some Methods for Obtaining Sampling
Distributions of Statistics Complete enumeration, if possible.
Computer simulation or via the onte Carlo method.
In this method the computer generates many, many
samples, and then constructs the probabilityhistogram of the values of the statistic of interest.
This will provide an empirical approximation.
Using theoretical results such as, for instance, when
sampling from a Bernoulli population the number ofsuccesses is binomially-distributed.
Using theoretical approximations such as the Central
Limit Theorem or the de oivre approximation.
-
8/8/2019 Lecture on Sampling Distributions
16/31
12/18/2010 16
Illustrating the Monte Carlo Method
We illustrate the use of the simulation or onte Carlo
method by approximating the sampling distribution of
the sample mean based on n = 10 observations from
the population considered earlierwhich has:
p(2) = .4, p(4) = .5, p(5) = .1
We generate 500 samples of size n = 10 from this
population, and for each sample we compute the
sample mean.
This simulation was done using initab.
-
8/8/2019 Lecture on Sampling Distributions
17/31
12/18/2010 17
First 10 of the 500Generated Samples
The table below shows the first 10 samples of size n
= 10 that were generated from the population.
Also included are their corresponding sample means.
y p(y) x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Sample ean2 0.4 4 2 2 5 4 2 4 2 2 4 3.14 0.5 4 4 2 2 5 4 4 4 2 2 3.3
5 0.1 4 2 2 2 2 2 4 4 4 5 3.1
2 2 4 2 2 2 2 2 2 2 2.22 5 2 4 4 4 2 2 5 2 3.2
4 4 4 2 2 4 2 2 2 4 3.04 4 4 4 2 2 5 2 2 4 3.3
2 2 2 4 2 2 4 2 2 2 2.42 4 2 2 4 5 5 2 2 2 3.0
2 5 4 4 2 2 4 4 4 2 3.3
-
8/8/2019 Lecture on Sampling Distributions
18/31
12/18/2010 18
Relative Frequency Histogram of the 500
Sample Means
2
0
10
S l
RelieFequency(in%)
Si ul ed Sampling Di i uti n of the Sample
Mean Based on 10 Observations when Sampling
from the Population p(2) = .4, p(4) = .5, and p(5) = .1
-
8/8/2019 Lecture on Sampling Distributions
19/31
12/18/2010 19
Points to Ponder
This relative frequency histogram of the simulated
sample means serves as an approximation to the
sampling distribution of the sample mean when n =
10 and when sampling from the given population.
Notice that the values of the sample means are nowclustered around the population mean of 3.3, and
furthermore, the shape of the histogram is almost
bell-shaped.
Looking at this histogram, it also shows that thechances of getting a sample of size n = 10 whose
sample mean is less than 2.5 or greater than 4.5 is
rather small.
-
8/8/2019 Lecture on Sampling Distributions
20/31
12/18/2010 20
When the mean of the 500 sample means is
computed, it turns out to be 3.3094. [Their median isexactly 3.30!]
Recall that the population mean is 3.30.
The standard deviation of the 500 sample means
turns out to be 0.3497. Recall that the population standard deviation is
(1.21)1/2 = 1.1, so
.3478.1622.3
1.1
10
1.1!!!
n
W
-
8/8/2019 Lecture on Sampling Distributions
21/31
12/18/2010 21
We therefore note that the mean of the simulated
sample means is very close to the population mean,and
the standard deviation of the simulated sample
means is also very close to the population standard
deviation divided by the square root of the sample
size.
Indeed, we always have the theoretical results:
nX
X
X
X
WW
QQ
!!
!!
orrorStd.
oean
-
8/8/2019 Lecture on Sampling Distributions
22/31
12/18/2010 22
An Important Result About the Sampling
Distribution of the Sample Mean
When the population being sampled is a
normal population with mean Q and standard
deviation W, then the sampling distribution ofthe sample mean is also normal with mean Q
and standard error ofW/n1/2, forany sample
size n.
When the population is not normal, however,
then the sampling distribution of the sample
mean need not be normal. But we have:
-
8/8/2019 Lecture on Sampling Distributions
23/31
12/18/2010 23
Central Limit Theorem
If a random sample of size n is taken from a
population or distribution with mean Q and standard
deviation W, and if the sample size is large (n > 30),
then the sampling distribution of the sample mean isapproximately normal with mean Q and standard
deviation (or standard error) ofW/n1/2. That is,
.,approx.is2
nNX WQ
-
8/8/2019 Lecture on Sampling Distributions
24/31
12/18/2010 24
Uses of the Central Limit Theorem
Because of this approximation, when computing
probabilities associated with the sample mean, we
can use the approximation given below which uses
the standard normal distribution. Note: Z b N(0,1), the standard normal variable.
_ a .
ee}ee
n
bZ
n
aPbXaPWQ
WQ
-
8/8/2019 Lecture on Sampling Distributions
25/31
12/18/2010 25
Applications of the CLT
Situation 1: Suppose we take a sample of
size n = 30 from the population described by
the probability function p(2) = 0.4, p(3) = 0.5,
p(5) = 0.1. This is the population we wereusing earlier.
Question 1: We seek the approximate
probability that the sample mean is between
3.1 and 3.5. Question 2: Find the approximate probability
that the sample mean is less than 2.6.
-
8/8/2019 Lecture on Sampling Distributions
26/31
12/18/201026
Applications continued
Situation 2: The systolic blood pressure
population data set has mean Q = 114.58 and
standard deviation ofW = 14.06. Its
distribution is not normal as it is right-skewed.Suppose we take a random sample of n = 50
people, and obtain the sample mean of their
systolic blood pressures.
Question 1: What is the approximateprobability that this sample mean will exceed
120?
-
8/8/2019 Lecture on Sampling Distributions
27/31
12/18/201027
Continued ...
Question 2: What would be the value of A
such that the probability that the samplemean of the systolic blood pressures of a
sample of size 50 is greater than A is 0.95?
-
8/8/2019 Lecture on Sampling Distributions
28/31
12/18/201028
Sampling a Bernoulli Population
A Bernoulli population is one where there are only
two possible values or outcomes, called a Success,
denoted by the value of = 1, and a Failure,
denoted by a value of = 0. The probability of a
Success is denoted by p.
For such a population we have:
ean = Q = p;
Variance = W2 = p(1-p).
Consider now taking a sample of size n from this
population and letting equal the proportion of
successes in the sample. That is,
p
-
8/8/2019 Lecture on Sampling Distributions
29/31
12/18/201029
Sample Proportion
.
1
n
Successes""oNumber
1
X
Xn
p
n
i
i
!
!
!
!
Because the Bernoulli observations are either
0 or 1 (with 1 representing success), then
the sample proportion could be defined via:
-
8/8/2019 Lecture on Sampling Distributions
30/31
12/18/201030
Sampling Distribution of the Sample
Proportion
.,approx.is 2
!!
n
pqpNp
ppWQ
Since the sample proportion is the sample mean of
the observations from a Bernoulli population, by the
Central Limit Theorem, it follows that the sampling
distribution of the sample proportion, when thesample size is large (that is n > 30), is approximately
normal with mean of p and SE of [p(1-p)/n]1/2.
-
8/8/2019 Lecture on Sampling Distributions
31/31
12/18/201031
An Application
Situation: One of the ways most Americans relieve
stress is to reward themselves with sweets.
According to one study, 46% admit to overeating
sweet foods when stressed. Suppose that the 46%
figure is correct and we take a random sample of sizen = 100 Americans and ask them if they overeat
sweets when they are stressed out.
Question 1: What is the probability that theproportion who overeats sweets in this sample
exceeds 0.50?