statistics and quantitative analysis u4320 segment 5: sampling and inference prof. sharyn...
TRANSCRIPT
![Page 1: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/1.jpg)
Statistics and Quantitative Analysis U4320
Segment 5: Sampling and
inference Prof. Sharyn O’Halloran
![Page 2: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/2.jpg)
Sampling A. Basics
1. Ways to Describe Data Histograms Frequency Tables, etc.
2. Ways to Characterize Data Central Tendency
Mode Median Mean
Dispersion Variance Standard Deviation
![Page 3: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/3.jpg)
Sampling(cont.)
3. Probability of Events If Discrete
Rely on Relative Frequency If Continuous
Rely on the distribution of events Example: Standard Normal Distribution
4. Samples We can take a sample of the population and make
inferences about the population. 5. Central Question
How well does the sample represent the underlying population?
![Page 4: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/4.jpg)
Sampling (cont.)
B. Random Sampling 1. Problems with Sample Bias
The way we collect our data may bias our results. That is, the average response in our sample may not represent the average response in the whole population.
Examples: Literary Digest Phone Book Poll Primaries Relation between economic growth and education
looking only at OECD countries
2. Solution Random Sampling
![Page 5: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/5.jpg)
Sampling (cont.)
C. Moments of the Sample 1. Characteristics of Sample Mean
2= variance
= mean
![Page 6: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/6.jpg)
Sampling (cont.)
Example Draw a single observation
X
![Page 7: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/7.jpg)
Sampling (cont.)
Draw two observations
X XXmean=
![Page 8: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/8.jpg)
Sampling (cont.)
Draw 4 Observations
X XX X Xmean=
![Page 9: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/9.jpg)
Sampling (cont.)
2. Generalization Every sample has an expected mean of . But as our sample size increases, we are more
confident of our results. That is, the standard deviation (or standard error
as we will call it) of our results is decreasing. So as N increases, X
![Page 10: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/10.jpg)
Sampling (cont.)
3. Hat Experiment Mean = 10.5 Standard deviation = 5.77
Now let's take a sample of size 1. (With replacement.)
Now one of size 2. Now one of size 6.
10.5=
![Page 11: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/11.jpg)
Sampling (cont.)
4. Equations For a sample of size n from a population of mean
and standard deviation , the sample mean has:
SE( ): it's called the standard error of the sampling process.
X
E X
SE Xn
( )
( ) .
X
![Page 12: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/12.jpg)
Inference
We make inferences about a population from a given sample.
A. Population and Sampling Parameters We have a population with parameters
and . We then take a sample with parameters
and s. We want to know how well the sample mean
approximates the population mean .
X
X
![Page 13: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/13.jpg)
Inference (cont.)
On average the sample mean equals the population mean.
PopulationSample
x, s
draw sample
X
make inference about how good an estimate
X is of
SE(X)
SE(X) = n
![Page 14: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/14.jpg)
Inference (cont.)
B. Referring Back to the Hat Experiment 1. Sample Error decreases as n increases For instance, before we drew samples of sizes 1,
2, and 6 from the hat. The first sample of size 1 had standard error 5.77/ 1 =
5.77. The second sample of size 2 had standard error 5.77/ 2
= 4.08. The third sample of size 6 had standard error 5.77/ 6 =
2.36.
![Page 15: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/15.jpg)
Inference (cont.)
C. Shape of the Sampling Distribution If you take a sample and find its mean, then
take another sample and find its mean and repeat this process a large number of times then
is a random variable with its own mean and standard error.
X
![Page 16: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/16.jpg)
Inference (cont.)
1. Central Limit Theorem Take a large number of samples, then, the sample
mean is normally distributed with mean and standard error .
X
n
Standard Error
![Page 17: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/17.jpg)
Inference (cont.)
2. Example: 3 different distributions Example 1;
A population of men on a small, Eastern campus has a mean height =69" and a standard deviation =3.22". If a random sample of n=10 men is drawn, what is the chance that the sample mean will be within 2" of the population mean?
![Page 18: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/18.jpg)
Inference (cont.)
Answer: From the Central Limit Theorem, we know that
is normally distributed, with mean 69 and standard error:
Xn = 3.2210 = 1.02.
Standard Error= 1.02
X = 67 X = 71
![Page 19: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/19.jpg)
Inference (cont.)
Answer (cont.) Find z-score P(Z>1.96) = 0.025. Since there are two tails,
the area in the middle is:
So there's a 95% probability that the sample mean falls between 67 and 71.
1-.025-.025 = .95.
![Page 20: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/20.jpg)
Inference (cont.)
Example 2: Suppose a large class in statistics has marks
normally distributed around = 72 with = 9. Find the probability that
a) An individual student drawn at random will have a mark over 80.
![Page 21: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/21.jpg)
Inference (cont.)
Answer: The Z-score is (80-72)/9 = .89 Looking this up in the table gives P(Z>.89) = .187, or
about 19%.
b) Now, what's the probability that a sample of size 10 has an average of over 80?
80
![Page 22: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/22.jpg)
Inference (cont.)
Answer: The standard error is = 9/ 10 = 2.85. So the Z-Score becomes (80-72)/2.85 = 2.81. P(Z> 2.81) = .002.
n
80
SE = 2.85
.002
![Page 23: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/23.jpg)
Inference (cont.)
Example 3: I f the number of miles per gallon achieved by
all cars of a particular model has = 25 and = 2, what is the probability that for a random sample of 20 such cars, average miles per gallon will be less than 24? (assume that the population is normally distributed.)
Step 1: Standardize X P(X<24) = PXSE SELNM
OQP
2425
SE = n = 2/20 = .4472
P(X<24) = PXSELNM
OQP
24254472.
= 2.24
![Page 24: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/24.jpg)
Inference (cont.)
Step 2: Then Find the Z scores (From the standard Normal tables)
So there is about a 1.3 percent chance that from a sample of 20 the average will be less than 24.
= P [Z < -2 .24 ] = P [Z > 2 .24 ] = 0 .01 3 (b y sym m etry)
26
SE = 0.4472
.013
24
![Page 25: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/25.jpg)
Inference (cont.)
D. Proportions 1. Proportions as Means
A proportion (P) is just the mean of a dichotomous variable.
Example Ask 50 people what they think of Clinton;
0 if think he's doing a poor job; and 1 if think he is doing a good job.
Suppose 30 of the 50 respondents say he's doing a good job
Then, the sample mean P is 30/50 = .60. This is just another way of saying that 60% of those
surveyed approved of his job performance.
![Page 26: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/26.jpg)
Inference (cont.)
2. Formula for Standard Error For a large enough sample of size n, P
(the proportion) will be normally distributed with mean and standard deviation .
Population Mean = Population Proportion Sample Mean = Sample Proportion P Population SD =
( )1
SEn
( )
.1
![Page 27: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/27.jpg)
Inference (cont.)
3. Example: Polling Suppose that the true approval rating for
Clinton is .50. That is, 50 percent of the population believe he is doing a good job. = .5
If we sample 50 people, what is the probability that we will observe an approval rating as high as 60 percent or above?
![Page 28: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/28.jpg)
Inference (cont.)
We know that the true population mean is =.5,
The Standard Error = = 0.0707 Then the Z-score is (.6-.5) / 0.0707= 1.414 Looking this up in the Z-table, P(Z>1.414)
= .079, or about 8 %.
.5(1-.5)
50
![Page 29: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/29.jpg)
Inference (cont.)
4. Example Of your first 15 grandchildren, what is the
chance that there will be more than 10 boys?
![Page 30: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/30.jpg)
Inference (cont.)
Answer: What the probability is that the
proportion of boys is at least 10/15=2/3. We know that the population mean is
=1/2, The standard error =
Then the Z-score is (.667-.5) / 0.129 = 1.29. Looking this up in the table, P(Z>1.29) = .099,
or about 10%.
.5(1-.5)
150129.
![Page 31: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/31.jpg)
Point Estimation: Properties
A. Unbiased Estimators When an estimator has the property
that it converges to the correct value, we say that it is unbiased. Def of Unbiased: as N , then X converges towards .
![Page 32: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/32.jpg)
Point Est. Properties (cont.)
B. Efficient Estimators Def of Efficient: One estimator is
more efficient than another if its standard error is lower.
![Page 33: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/33.jpg)
Point Est. Properties (cont.)
C. N-1 Problem 1. Known
When we take a sample of size n, if we had the real from the population, we could calculate
Then there wouldn't be a problem; would
be a consistent estimator of , if we knew .
22
( )X
Ni
sX
ni2
2
( )
2 s2
![Page 34: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/34.jpg)
Point Est. Properties (cont.)
2. Unknown But we usually don't have , so we have to
use the sample mean instead. What's the difference? Why don't we just say that
It turns out that we can show that minimizes the expression .
X
sX X
ni2
2
( )
X( _ _ )X i 2
![Page 35: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/35.jpg)
Point Est. Properties (cont.)
2. Unknown (cont.)
So if we used instead, then, the expression would be bigger.
The right way to correct for this is to multiply by , so
The bottom line is that we use n-1 to make a consistent, unbiased estimate of the population variance.
nn 1
sX X
n
n
ni2
2
1
( )
sXX
ni2
2
1
( )
.
![Page 36: Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran](https://reader036.vdocuments.net/reader036/viewer/2022062423/56649f565503460f94c7a15a/html5/thumbnails/36.jpg)
IV. Review Homework IV. Review Homework