bullard assumptions talk[1]
TRANSCRIPT
-
8/3/2019 Bullard Assumptions Talk[1]
1/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Assumptions for Statistical Inference
Floyd Bullard
The NC School of Science & Mathematics
26-27 January 2007
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
2/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Assumptions in the sciences
Some assumptions we might make when solving problems inthe other sciences:
Physics: There is no air resistance.
Ecology: Foxes and rabbits are the only animals.
Epidemiology: People only die of disease or old age.
Oceanography: Seawater has the same compositioneverywhere.
Archaeology: At a given site, older objects are deeper in
the ground than younger objects. etc.
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
3/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Assumptions in (AP) Statistics
In AP Statistics, nearly all assumptions are of three types.
The sample is representative of the population.
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
4/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Assumptions in (AP) Statistics
In AP Statistics, nearly all assumptions are of three types.
The sample is representative of the population.
The sample is large enough that the distribution ofsome statistic is approximately equal to its limitingdistribution.
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
5/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Assumptions in (AP) Statistics
In AP Statistics, nearly all assumptions are of three types.
The sample is representative of the population.
The sample is large enough that the distribution ofsome statistic is approximately equal to its limitingdistribution.
Modeling assumptions. (In AP statistics, these arise inthe regression context.)
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
6/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
When we extrapolate information from a sample to a
population, we are naturally assuming that the sample isrepresentative of the population in some way. In particular,lets suppose that X is some random variable whosedistribution over the population is f(x).
We will be observing data whose distribution is not f(x), butrather g(x|S)the conditional distribution of X givenmembership in the sample.
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
7/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
When we extrapolate information from a sample to a
population, we are naturally assuming that the sample isrepresentative of the population in some way. In particular,lets suppose that X is some random variable whosedistribution over the population is f(x).
We will be observing data whose distribution is not f(x), butrather g(x|S)the conditional distribution of X givenmembership in the sample.
Is it fair to observe g(x|S) and treat it as if it it were f(x)?
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
8/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
When we extrapolate information from a sample to a
population, we are naturally assuming that the sample isrepresentative of the population in some way. In particular,lets suppose that X is some random variable whosedistribution over the population is f(x).
We will be observing data whose distribution is not f(x), butrather g(x|S)the conditional distribution of X givenmembership in the sample.
Is it fair to observe g(x|S) and treat it as if it it were f(x)?
Under what conditions are the conditional and unconditionaldistributions of X the same?
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
9/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
The distributions f(x) and g(x
|S) will be the same if and
only if X and S are independentthat is, if the value of therandom variable and the elements membership in thesample are completely unrelated to one another. Can weguarantee that?
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
10/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
The distributions f(x) and g(x
|S) will be the same if and
only if X and S are independentthat is, if the value of therandom variable and the elements membership in thesample are completely unrelated to one another. Can weguarantee that?
Of course we can. If membership in the sample is completelyrandom, then it is independent of anything we can think of.Thats why we like random samples so much. They allow usto treat the Xs in our sample as if they had the samedistribution as those in the population. Random sampling
permits inference.
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
11/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
A Problem
But theres a problem. Random samples are hard to comeby. So we often assume for the sake of inference that our
sample is random even though we know for a fact it isnt.
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
12/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
A Problem
But theres a problem. Random samples are hard to comeby. So we often assume for the sake of inference that our
sample is random even though we know for a fact it isnt.Is that okay? What will happen if the assumption is reallyquite wrong?
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
13/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Alices project
A student named Alice wants to estimate the proportion ofstudents in her school who can name her states two U.S.Senators. She plans to sample 100 students and ask them to
name the two senators. Shell use the sample proportion shegets to construct a confidence interval estimate of thepopulation proportion.
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
14/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Alices project
A student named Alice wants to estimate the proportion ofstudents in her school who can name her states two U.S.Senators. She plans to sample 100 students and ask them to
name the two senators. Shell use the sample proportion shegets to construct a confidence interval estimate of thepopulation proportion.
How should she get her sample?
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
15/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Alices project (continued)
Here are some ways she might sample 100 students.
Include all the students in her classes until she gets 100.
Include her friends and her friends friends.
Send out an all-school email and include the first 100students who reply.
Stand outside the school in the morning and includeevery fifth student until she has 100.
A i fR b
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
16/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Roberts project
Roberts school is considering starting school a half hourlater in the morning and ending a half hour later in the
afternoon. Robert wants to estimate the proportion ofstudents in the school who would be in favor of this. WouldAlices sampling method work for him?
A ti fA l l j
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
17/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
A popular class project
You plan to guide your students through a class project inwhich they will estimate the quality of five brands of papertowels. (The students will determine how to definequality.) You buy one roll of each of five brands of paper
towels and bring them to class. The students take six towelsof each brand and measure each ones quality. Parallelboxplots of the brands quality scores give an idea of whichbrands are better than others.
Ass mptions forA l l j
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
18/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
A popular class project
You plan to guide your students through a class project inwhich they will estimate the quality of five brands of papertowels. (The students will determine how to definequality.) You buy one roll of each of five brands of paper
towels and bring them to class. The students take six towelsof each brand and measure each ones quality. Parallelboxplots of the brands quality scores give an idea of whichbrands are better than others.
What assumption are you and your students making? Is itjustified?
Assumptions forC /
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
19/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Capture/recapture
Forty squirrels are captured in a park and tagged. A monthlater, fifty squirrels in the park are captured, and ten arefound to be tagged. Thats 20% of the second sample, so we
might assume that N = 5 40 = 200 is a good estimate ofthe number of squirrels in the park.
What assumptions are being made here? Are theyreasonable? What will the effect be on the population sizeestimator N if they are not reasonable?
Assumptions for
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
20/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
The upshot:
In practice, we often do not have the luxury of true randomsamples. We may make the assumption that a sample is asimple random sample (SRS) so that we may extrapolate itsproperties to the population. Whether this is reasonable or
not depends on whether we believe that sample membershipand the properties of interest are more or less independent ofone another.
Assumptions for
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
21/52
Assumptions forStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
The upshot:
In practice, we often do not have the luxury of true randomsamples. We may make the assumption that a sample is asimple random sample (SRS) so that we may extrapolate itsproperties to the population. Whether this is reasonable or
not depends on whether we believe that sample membershipand the properties of interest are more or less independent ofone another.
Reasonable people may disagree about whether the
assumption is justified.
Assumptions for
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
22/52
ssu pt o s oStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
What do all of the following statements have in common?
p N p1 p2 N X N
X
s/n t(n1)
X1X2
s21/n1+s22/n2
t(n)
(OiEi)2Ei
2(df)
Assumptions for
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
23/52
pStatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
What do all of the following statements have in common?
p N p1 p2 N X N
X
s/n t(n1)
X1X2
s21/n1+s22/n2
t(n)
(OiEi)2Ei
2(df)
Theyre all limiting distributions.
Assumptions for
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
24/52
StatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
We rely on sample sizes being large enough to justifyusing a limiting distribution. How do we know whats large
enough?
Assumptions forProportions
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
25/52
StatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Proportions
For proportions, we often require that np and n(1 p) bothbe at least 10 (or sometimes 5). At least one text requires
the single condition that np(1 p) > 5.Where did these come from?
Assumptions forS
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
26/52
StatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Lets require that the mean of p (which is p) be at least
three standard deviations (one standard deviation isp(1 p)/n) above 0.
p > 3
p(1 p)/n
Assumptions forS i i l
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
27/52
StatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Lets require that the mean of p (which is p) be at least
three standard deviations (one standard deviation isp(1 p)/n) above 0.
p > 3
p(1 p)/np2 > 9p(1
p)/n
Assumptions forSt ti ti l
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
28/52
StatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Lets require that the mean of p (which is p) be at least
three standard deviations (one standard deviation isp(1 p)/n) above 0.
p > 3
p(1 p)/np2 > 9p(1
p)/n
np2 > 9p(1 p)
Assumptions forStatistical
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
29/52
StatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Lets require that the mean of p (which is p) be at least
three standard deviations (one standard deviation isp(1 p)/n) above 0.
p > 3
p(1 p)/np2 > 9p(1
p)/n
np2 > 9p(1 p)np > 9(1 p)
Note that this is guaranteed by np> 10. (Do you see why?)
And np> 5 would guarantee that p> 2
p(1 p)/n.
Assumptions forStatistical
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
30/52
StatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
The requirement n(1 p) > 10 will similarly insure that p isat least three standard deviations below 1.
Assumptions forStatistical
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
31/52
StatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
The requirement n(1 p) > 10 will similarly insure that p isat least three standard deviations below 1.
If we are comparing two proportions, then both must obeythis rule-of-thumb.
Assumptions forStatistical
means
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
32/52
StatisticalInference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
X will have an approximately normal distribution (and henceX
s/n will have an approximately t(n1) distribution) if thesample size n is large enough.
Assumptions forStatistical
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
33/52
Inference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Heres a common rule-of-thumb.
If n 10 and the data display no obvious outliers orskew, then continue with inference using the tdistribution; but the inference still relies on theassumption that the population is approximately normal.
If 10 < n 40 and the data display at most only one ortwo outliers and no severe skew, then continue with
with inference using the t distribution; the populationneed not be approximately normal.
If n > 40, then except for extraordinarly severeskewwhich would be indicated by numerous
outliersinference using the t distribution is okay.
Assumptions forStatistical
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
34/52
Inference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Heres a common rule-of-thumb.
If n 10 and the data display no obvious outliers orskew, then continue with inference using the tdistribution; but the inference still relies on theassumption that the population is approximately normal.
If 10 < n 40 and the data display at most only one ortwo outliers and no severe skew, then continue with
with inference using the t distribution; the populationneed not be approximately normal.
If n > 40, then except for extraordinarly severeskewwhich would be indicated by numerous
outliersinference using the t distribution is okay. (Butyou might question whether inference on such apopulations mean is what you really want to be doing.)
Assumptions forStatisticallinear regression
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
35/52
Inference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Our third type of assumption is the modeling assumption.We choose a mathematical model that we think will describethe underlying phenomenon that generated our data. If themodel is very poor, then our inference will be meaningless.
Assumptions forStatisticallinear regression
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
36/52
Inference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Our third type of assumption is the modeling assumption.We choose a mathematical model that we think will describethe underlying phenomenon that generated our data. If themodel is very poor, then our inference will be meaningless.
The only example of this students see in AP statistics is the
linear regression model, which is:
yi = 0 + 1xi + ei,
where eiiid
N(0, ).
In this model there are three parameters to be estimated:0, 1, and .
Assumptions forStatisticalI f
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
37/52
Inference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Another way of stating the model is:
yi N(0 + 1xi, )
In other words, the means of the ys have a linearrelationship with the xs, but there is variability in the actualy data about those meansnormally distributed errors withconstant variability across all values of x.
Assumptions forStatisticalI f
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
38/52
Inference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
To check whether the model is reasonable, we:
Look at the residuals from the linear regression to seewhether there is a pattern.
Verify that the residuals are of roughly constant
magnitude for all xs. Check to see whether the residuals appear to be
approximately normally distributed.
Assumptions forStatisticalInference
Sample is not random
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
39/52
Inference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
If a sample is assumed to be random when in fact there is anassociation between sample membership and a measured
variable of interest, then the sampling procedure is biased.Conclusions will tend to systematically overestimate orunderestimate the parameters of interest.
Assumptions forStatisticalInference
Proportions: small samples
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
40/52
Inference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
At each lattice point in this graph, 10,000 random sampleswere simulated and a 95% confidence interval estimate of pconstructed under the assumption that p has a normaldistribution. Coverage rates are shown.
sample size
tru
e
population
proportion
Confidence level accuracy when assuming normality
np=10
np=5
n(1p)=10
n(1p)=5
np(1p)>5
10 20 30 40 50 60 70 80 90 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Assumptions forStatisticalInference
Means: small samples
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
41/52
Inference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
0 1 2 3 4 50
0.5
1
1.5
2
2.5
x
f(x)
0 1 2 3 4 50
0.2
0.4
0.6
0.8
x
f(x)
0 1 2 3 4 50
0.2
0.4
0.6
0.8
1
x
f(x)
0 1 2 3 4 50
0.5
1
1.5
x
f(x)
Figure: These four distributions were used to simulate random
samples of different sizes.
Assumptions forStatisticalInference
Means: small samples
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
42/52
Inference
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
10,000 random samples were simulated for each sample sizefrom 2 to 50. Confidence intervals were computed assumingX
N. Coverage rates are shown.
0 10 20 30 40 500.85
0.9
0.95
1
sample size
estimated
coverageprobability
0 10 20 30 40 500.85
0.9
0.95
1
sample size
estimated
coverageprobability
0 10 20 30 40 500.85
0.9
0.95
1
sample size
estimatedcoveragep
robability
0 10 20 30 40 500.85
0.9
0.95
1
sample size
estimatedcoveragep
robability
Assumptions forStatisticalInference
Chi-square goodness-of fit test: small samples.
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
43/52
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Here we tested the claimed distribution: (0.45, 0.25, 0.20,0.05, 0.05). Rejection rates from 10,000 samples are shown.
0 20 40 60 80 100 1200.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
sample size
rejectionratewhen=
0.0
5
Chi square rejection rates when null is true
Assumptions forStatisticalInference
Chi-square goodness-of fit test: small samples.
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
44/52
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Here we tested the claimed distribution: (0.45, 0.25, 0.20,0.09, 0.01). Rejection rates from 10,000 samples are shown.
0 20 40 60 80 100 1200.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
sample size
Chi square rejection rates when null is true
rejectionratewhen=
0.0
5
Assumptions forStatisticalInference
Modeling assumptions: linear model is a badd l
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
45/52
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
model.If data do not come from a linear model, but you impose one
anyway, then estimates of0, 1, and wont mean much.
0 1 2 3 4 5 6 7 8 980
60
40
20
0
20
40
60
80
time
radialvelocity
The orbiting planet of 51 Pegasi
Figure: 0 = 23.8, 1 = 33.4, s = 37.7
Assumptions forStatisticalInference
Modeling assumptions: linear model is a badd l
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
46/52
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
model.
But note that nothing about your software will prevent you
from estimating them anyway! Without checking themodeling assumptions, you wont know whether the model isany good or not!
0 1 2 3 4 5 6 7 8 980
60
40
20
0
20
40
60
80
time
radialvelocity
The orbiting planet of 51 Pegasi
Assumptions forStatisticalInference
A sample problem
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
47/52
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Suppose the following is an AP exam question.
8 apples are randomly sampled from an orchard, and their
weights in pounds are measured to be 0.44, 0.43, 0.33, 0.56,0.50, 0.50, 0.45, 0.38. Estimate the mean weight of apples
in the orchard, including a reasonable margin of error.
Assumptions forStatisticalInference
A sample problem (continued)
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
48/52
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
The rubric requires students to check inferentialassumptions. Suppose one student writes the following:
np> 10 and n(1
p) > 10
n < 40, assume normality.
Assumptions forStatisticalInference
A sample problem (continued)
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
49/52
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
The rubric requires students to check inferentialassumptions. Suppose one student writes the following:
np> 10 and n(1
p) > 10
n < 40, assume normality. How do you think the rubric will score this response for thecheck of assumptions?
Assumptions forStatisticalInference
A sample problem (continued)
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
50/52
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
The following would make an AP reader happy.
Random sampleyes, given in problem. Small data set(n = 8) requires population to be approximately normal.Reasonable?
2 1.5 1 0.5 0 0.5 1 1.5 2
0.35
0.4
0.45
0.5
0.55
0.6
0.65
Normal probability plot looks roughly linear, so assumption isreasonable. We continue with the construction of a 95%t-Confidence Interval...
Assumptions forStatisticalInference
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
51/52
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Three types of assumptions in AP statistics:
Sample is random. (Reasonableness cannot be checkedwith the data.)
Sample is large enough to assume a limiting distributionfor a statistic.
Linear model is appropriate for bivariate data.
Checking assumptions is a big part of all statistics. Thisshould be a part of every inference problem students do allyear, not just something they study in an idolated unit.
Assumptions forStatisticalInference
http://goforward/http://find/http://goback/ -
8/3/2019 Bullard Assumptions Talk[1]
52/52
Floyd Bullard
Overview
Random samples
Limiting
distributions ofstatistics
Modelingassumptions: linearregression
When assumptionsarent met
Conclusion
Thank you for coming!
http://goforward/http://find/http://goback/