lecture 3: review review of point and interval estimators statistical significance hypothesis...
TRANSCRIPT
Lecture 3: Review
Review of Point and Interval EstimatorsStatistical SignificanceHypothesis Testing
Point Estimates
An estimator (as opposed to an estimate) is a sample statistic that predicts a value of a parameter. For example, the sample mean and sample standard deviation are estimators.
A point estimate is a particular value of an estimator used to predict the population value. For example, an estimate of the population mean from the sample mean may be “12”.
Point Estimates
2 characteristics of point estimators are their efficiency and bias.
An efficient estimator is one which has the lowest standard error, relative to other estimators
An estimator is biased to the extent that the sampling distribution is not centred around the population parameter.
Efficiency and Bias
Sampling Distributions
Biased Estimator
Unbiased, but inefficient Estimator
Efficient Estimator
Point Estimates
Sample Mean
NYi /
nyy i /
nyy i /
Population Mean
Sample Mean as a point estimate of population mean
Point Estimates
Sample Standard Deviation
Sample Standard Deviation as estimate of Population Standard Deviation
n
yys i
2
N
Yi
2
1
ˆ2
n
yys i
Population Standard Deviation
Interval Estimates
Confidence intervals for a mean A confidence interval is an interval estimate around a mean. A confidence interval is range of values in which the mean (or other statistic, such as a proportion) has a certain probability of falling.
95% Confidence Interval of a Mean
95% Confidence Interval
95% of thearea
2.5% of thearea
Sampling Distribution of Means
x
96.1z 96.1z
Confidence Interval of a Mean
yZyIC .. Z is the z-score corresponding to a .the confidence probability;
and is an estimate of the standard error of the mean (for large samples);
s is the sample standard deviation.
n
sy y
Confidence Interval of a Mean
Example: you wish to estimate the average height for all Canadian men 18-21 from sample of 50 with mean height of 166 cm and standard deviation of 30 cm. Calculate the 95% confidence interval for the mean.
24.450
30ˆ
n
sy
173159
03.7166
)24.4(96.1166..
ˆ..
)95(
)95(
IC
ZyIC y
We are 95% confident that the true population mean falls in the interval from 159cm to 173cm.
Point Estimates for Proportions
The sample proportion of individuals in category x n
Xp
N
X
n
Xp
The population proportion of individuals with characteristic x
Sample proportion is an estimator of the population proportion
Confidence Intervals for Proportions
The standard deviation of the sample probability distribution
The standard error of the sample proportion
confidence interval for a proportion
n
pps
1
nn
ˆ1ˆˆ
ˆ ˆ
n
ZZ ˆ1ˆ
ˆˆ ˆ
Confidence Intervals for Proportions
Example: Consider a poll predicting election results with a sample size of 1500. Of those who responded to a question about their voting intentions in an upcoming election, 840 (56%) answered that they would vote for the current governing party.
If we use the sample proportion as an estimator of the population proportion, then
56.01500/840ˆ
and the proportion that would not vote for the governing party;
44.056.011
Confidence Intervals for Proportions
Example: Consider a poll predicting election results with a sample size of 1500. Of those who responded to a question about their voting intentions in an upcoming election, 840 (56%) answered that they would vote for the current governing party.
The standard error of the estimated proportion is then;
n
ˆ1ˆ
ˆ ˆ
1500
44.056.0ˆ ˆ
0128.0002.ˆ ˆ
Confidence Intervals for Proportions
Example: Consider a poll predicting election results with a sample size of 1500. Of those who responded to a question about their voting intentions in an upcoming election, 840 (56%) answered that they would vote for the current governing party.
The 99% confidence interval is:
ˆ99 58.2ˆ.. IC
)0128(.58.256.0.. 99 IC
033.056.0
We are 99% confident that the true population proportion lies between 0.527 and 0.593.
Significance Tests
Significance Tests
Elements of significance tests:
(1) Assumptions: - Type of data- Distributions- Sampling- Sample size
Significance Tests
Elements of significance tests:
(2) Hypotheses: - H0: There is no effect, relationship, or
difference- Ha: There is an effect, difference, relationship
Significance Tests
Elements of significance tests:
(3) Test Statistic
(4) Significance level, critical value
(5) Decision whether or not to reject H0
Large Sample Z-Test for a Mean
Example:
You sample 400 high school teachers and submit them to an IQ test, which is known to have a population mean of 100. You wish to know whether the sample mean IQ of 115 is significantly different from the population mean. The sample s.d. is 18.
Large Sample Z-Test for a Mean
Assumptions: A z-test for a mean assumes random sampling and n greater than 30.
Hypotheses:
100:
100:0
aH
H
Large Sample Z-Test for a Mean
Test Statistic:
ns
yz
yz
y
0
0
ˆ
66.16
40018
100115
testz
z
Large Sample Z-Test for a Mean
Significance Level and Critical Value:
α = .05, .01, or .001
Find the appropriate critical values (z-scores) from the table of areas under the normal curve.
For a 2-tailed z-test at α=.05 zcritical = 1.96.
Decision:
Because ztest is higher than zcritical, we will reject the null hypothesis and conclude that high school teachers do have significantly higher IQ scores than the general population
Critical Values
While in 231 we always found a critical value for a test statistic given our determined α-level, and compared it to our observed value, we can instead find the probability of observing the test statistic value, and compare it to our determined α-level.
Computer output gives us the exact probability, rather than a test statistic
Large-Sample Z-tests for proportions
Example: You randomly sample 500 cattle and dairy farmers in Saskatchewan, asking them whether they have enough stored feed to last through a summer of severe drought. 275 report that they do not. You know that of the total population (all Canadian farmers), 50% do not have adequate stored feed. Is the proportion of Saskatchewan farmers that is unprepared significantly higher?
Large-Sample Z-tests for proportions
Assumptions:
Z-tests assume that the sample size is large enough that the sampling distribution will be approximately normal. In practice this means 30 cases or more. Random sampling is also assumed.
Hypotheses:
50.0:
:
0
00
H
H
50.0:
: 0
a
a
H
H
Large-Sample Z-tests for proportions
Test Statistic:
n
z00
0
ˆ
0
1
ˆ
ˆ
ˆ
231.2500
25.
50.0500
)50.1(50.
50.055.0
testz
z
z
Large-Sample Z-tests for proportions
Significance level or critical value:
If we look at the table of normal curve probabilities for the probability of finding a zcritical value of 2.23, we find that the one-tail probability is about 0.0129.
This is greater than the .01 level, so we would fail to reject the null. However, it is less than .05.
Conclusion:
A significantly higher proportion of Saskatchewan farmers reported that they did not have enough stored feed to withstand a drought (p=.00129).
Small-Sample Inference for Means
The t-distribution
The t-distribution is another bell-shaped symmetrical distribution centred on 0. The t-distribution differs from the normal distribution in that as sample size decreases, the tails of the t-distribution become thicker than normal. However, when n ≥ 30, the t-distribution is practically the same as the normal distribution.
Small-Sample Inference for Means
Example: Suppose we have a sample of 25 young offenders whom we have interviewed. We are interested in the effects of being born to young parents. The average age of the mothers at the birth of the future young offenders was 20.8, with a s.d. of 3.5 years. The average age at first birth for women in Canada in 1990 was 23.2 years. Is the average age of the mothers of young offenders different than that of mothers in the general Canadian population?
Small-Sample Inference for Means
Assumptions:
It is assumed that the sample is SRS, from a population that is normally distributed (but it is pretty robust to violations of this assumptions).
Hypotheses: 2.23:
:
0
00
H
H
2.23:
: 0
a
a
H
H
Small-Sample Inference for Means
Test Statistic:
ns
yyt
y /ˆ00
255.3
2.238.20 t
43.2actualt
Small-Sample Inference for Means
Significance level or critical value: We look for the probability of finding a t-value of -2.43, with (n-1) degrees of freedom.
If our decided P=value is .05, we look for the tcritical value associated with a single-tail probability of .025 (half of the total 2-tailed p-value).
tcritical at p=.05 (2-tailed), 24 df = 2.064.
Small-Sample Inference for Means
Decision:We decide to reject the null hypothesis, because the actual value of t is greater than our critical value of t at (p=.05). This means that the observed difference between the observed sample mean and the hypothesized population mean is sufficiently great that we are willing to conclude that the sample comes from a population with a different mean age at first birth than the Canadian population.
Significance Tests
Note that the form of the z and t- statistics are similar;
ns
yyz
y
00
ˆ
n
z00
0
ˆ
0
1
ˆˆ
)1(
/ˆ00
ndf
ns
yyt
y
Large-sample test for means
Large-sample test for proportions
Small-sample test for means
Decisions and Types of Errors
“Confusion” Matrix
H0 is true H0 is false
Reject H0 Type I error
(α)
Correct decision
(1-β) or “Power”
Fail to Reject H0
Correct decision
(1-α)
Type II error
(β)
Type I and Type II Errors
a
P-level
Distribution under H0 Distribution under Ha