bullard assumptions talk[1]

Upload: beylerbeyi

Post on 06-Apr-2018

228 views

Category:

Documents


1 download

TRANSCRIPT

  • 8/3/2019 Bullard Assumptions Talk[1]

    1/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Assumptions for Statistical Inference

    Floyd Bullard

    The NC School of Science & Mathematics

    26-27 January 2007

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    2/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Assumptions in the sciences

    Some assumptions we might make when solving problems inthe other sciences:

    Physics: There is no air resistance.

    Ecology: Foxes and rabbits are the only animals.

    Epidemiology: People only die of disease or old age.

    Oceanography: Seawater has the same compositioneverywhere.

    Archaeology: At a given site, older objects are deeper in

    the ground than younger objects. etc.

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    3/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Assumptions in (AP) Statistics

    In AP Statistics, nearly all assumptions are of three types.

    The sample is representative of the population.

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    4/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Assumptions in (AP) Statistics

    In AP Statistics, nearly all assumptions are of three types.

    The sample is representative of the population.

    The sample is large enough that the distribution ofsome statistic is approximately equal to its limitingdistribution.

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    5/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Assumptions in (AP) Statistics

    In AP Statistics, nearly all assumptions are of three types.

    The sample is representative of the population.

    The sample is large enough that the distribution ofsome statistic is approximately equal to its limitingdistribution.

    Modeling assumptions. (In AP statistics, these arise inthe regression context.)

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    6/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    When we extrapolate information from a sample to a

    population, we are naturally assuming that the sample isrepresentative of the population in some way. In particular,lets suppose that X is some random variable whosedistribution over the population is f(x).

    We will be observing data whose distribution is not f(x), butrather g(x|S)the conditional distribution of X givenmembership in the sample.

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    7/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    When we extrapolate information from a sample to a

    population, we are naturally assuming that the sample isrepresentative of the population in some way. In particular,lets suppose that X is some random variable whosedistribution over the population is f(x).

    We will be observing data whose distribution is not f(x), butrather g(x|S)the conditional distribution of X givenmembership in the sample.

    Is it fair to observe g(x|S) and treat it as if it it were f(x)?

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    8/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    When we extrapolate information from a sample to a

    population, we are naturally assuming that the sample isrepresentative of the population in some way. In particular,lets suppose that X is some random variable whosedistribution over the population is f(x).

    We will be observing data whose distribution is not f(x), butrather g(x|S)the conditional distribution of X givenmembership in the sample.

    Is it fair to observe g(x|S) and treat it as if it it were f(x)?

    Under what conditions are the conditional and unconditionaldistributions of X the same?

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    9/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    The distributions f(x) and g(x

    |S) will be the same if and

    only if X and S are independentthat is, if the value of therandom variable and the elements membership in thesample are completely unrelated to one another. Can weguarantee that?

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    10/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    The distributions f(x) and g(x

    |S) will be the same if and

    only if X and S are independentthat is, if the value of therandom variable and the elements membership in thesample are completely unrelated to one another. Can weguarantee that?

    Of course we can. If membership in the sample is completelyrandom, then it is independent of anything we can think of.Thats why we like random samples so much. They allow usto treat the Xs in our sample as if they had the samedistribution as those in the population. Random sampling

    permits inference.

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    11/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    A Problem

    But theres a problem. Random samples are hard to comeby. So we often assume for the sake of inference that our

    sample is random even though we know for a fact it isnt.

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    12/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    A Problem

    But theres a problem. Random samples are hard to comeby. So we often assume for the sake of inference that our

    sample is random even though we know for a fact it isnt.Is that okay? What will happen if the assumption is reallyquite wrong?

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    13/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Alices project

    A student named Alice wants to estimate the proportion ofstudents in her school who can name her states two U.S.Senators. She plans to sample 100 students and ask them to

    name the two senators. Shell use the sample proportion shegets to construct a confidence interval estimate of thepopulation proportion.

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    14/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Alices project

    A student named Alice wants to estimate the proportion ofstudents in her school who can name her states two U.S.Senators. She plans to sample 100 students and ask them to

    name the two senators. Shell use the sample proportion shegets to construct a confidence interval estimate of thepopulation proportion.

    How should she get her sample?

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    15/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Alices project (continued)

    Here are some ways she might sample 100 students.

    Include all the students in her classes until she gets 100.

    Include her friends and her friends friends.

    Send out an all-school email and include the first 100students who reply.

    Stand outside the school in the morning and includeevery fifth student until she has 100.

    A i fR b

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    16/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Roberts project

    Roberts school is considering starting school a half hourlater in the morning and ending a half hour later in the

    afternoon. Robert wants to estimate the proportion ofstudents in the school who would be in favor of this. WouldAlices sampling method work for him?

    A ti fA l l j

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    17/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    A popular class project

    You plan to guide your students through a class project inwhich they will estimate the quality of five brands of papertowels. (The students will determine how to definequality.) You buy one roll of each of five brands of paper

    towels and bring them to class. The students take six towelsof each brand and measure each ones quality. Parallelboxplots of the brands quality scores give an idea of whichbrands are better than others.

    Ass mptions forA l l j

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    18/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    A popular class project

    You plan to guide your students through a class project inwhich they will estimate the quality of five brands of papertowels. (The students will determine how to definequality.) You buy one roll of each of five brands of paper

    towels and bring them to class. The students take six towelsof each brand and measure each ones quality. Parallelboxplots of the brands quality scores give an idea of whichbrands are better than others.

    What assumption are you and your students making? Is itjustified?

    Assumptions forC /

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    19/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Capture/recapture

    Forty squirrels are captured in a park and tagged. A monthlater, fifty squirrels in the park are captured, and ten arefound to be tagged. Thats 20% of the second sample, so we

    might assume that N = 5 40 = 200 is a good estimate ofthe number of squirrels in the park.

    What assumptions are being made here? Are theyreasonable? What will the effect be on the population sizeestimator N if they are not reasonable?

    Assumptions for

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    20/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    The upshot:

    In practice, we often do not have the luxury of true randomsamples. We may make the assumption that a sample is asimple random sample (SRS) so that we may extrapolate itsproperties to the population. Whether this is reasonable or

    not depends on whether we believe that sample membershipand the properties of interest are more or less independent ofone another.

    Assumptions for

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    21/52

    Assumptions forStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    The upshot:

    In practice, we often do not have the luxury of true randomsamples. We may make the assumption that a sample is asimple random sample (SRS) so that we may extrapolate itsproperties to the population. Whether this is reasonable or

    not depends on whether we believe that sample membershipand the properties of interest are more or less independent ofone another.

    Reasonable people may disagree about whether the

    assumption is justified.

    Assumptions for

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    22/52

    ssu pt o s oStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    What do all of the following statements have in common?

    p N p1 p2 N X N

    X

    s/n t(n1)

    X1X2

    s21/n1+s22/n2

    t(n)

    (OiEi)2Ei

    2(df)

    Assumptions for

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    23/52

    pStatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    What do all of the following statements have in common?

    p N p1 p2 N X N

    X

    s/n t(n1)

    X1X2

    s21/n1+s22/n2

    t(n)

    (OiEi)2Ei

    2(df)

    Theyre all limiting distributions.

    Assumptions for

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    24/52

    StatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    We rely on sample sizes being large enough to justifyusing a limiting distribution. How do we know whats large

    enough?

    Assumptions forProportions

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    25/52

    StatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Proportions

    For proportions, we often require that np and n(1 p) bothbe at least 10 (or sometimes 5). At least one text requires

    the single condition that np(1 p) > 5.Where did these come from?

    Assumptions forS

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    26/52

    StatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Lets require that the mean of p (which is p) be at least

    three standard deviations (one standard deviation isp(1 p)/n) above 0.

    p > 3

    p(1 p)/n

    Assumptions forS i i l

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    27/52

    StatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Lets require that the mean of p (which is p) be at least

    three standard deviations (one standard deviation isp(1 p)/n) above 0.

    p > 3

    p(1 p)/np2 > 9p(1

    p)/n

    Assumptions forSt ti ti l

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    28/52

    StatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Lets require that the mean of p (which is p) be at least

    three standard deviations (one standard deviation isp(1 p)/n) above 0.

    p > 3

    p(1 p)/np2 > 9p(1

    p)/n

    np2 > 9p(1 p)

    Assumptions forStatistical

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    29/52

    StatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Lets require that the mean of p (which is p) be at least

    three standard deviations (one standard deviation isp(1 p)/n) above 0.

    p > 3

    p(1 p)/np2 > 9p(1

    p)/n

    np2 > 9p(1 p)np > 9(1 p)

    Note that this is guaranteed by np> 10. (Do you see why?)

    And np> 5 would guarantee that p> 2

    p(1 p)/n.

    Assumptions forStatistical

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    30/52

    StatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    The requirement n(1 p) > 10 will similarly insure that p isat least three standard deviations below 1.

    Assumptions forStatistical

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    31/52

    StatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    The requirement n(1 p) > 10 will similarly insure that p isat least three standard deviations below 1.

    If we are comparing two proportions, then both must obeythis rule-of-thumb.

    Assumptions forStatistical

    means

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    32/52

    StatisticalInference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    X will have an approximately normal distribution (and henceX

    s/n will have an approximately t(n1) distribution) if thesample size n is large enough.

    Assumptions forStatistical

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    33/52

    Inference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Heres a common rule-of-thumb.

    If n 10 and the data display no obvious outliers orskew, then continue with inference using the tdistribution; but the inference still relies on theassumption that the population is approximately normal.

    If 10 < n 40 and the data display at most only one ortwo outliers and no severe skew, then continue with

    with inference using the t distribution; the populationneed not be approximately normal.

    If n > 40, then except for extraordinarly severeskewwhich would be indicated by numerous

    outliersinference using the t distribution is okay.

    Assumptions forStatistical

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    34/52

    Inference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Heres a common rule-of-thumb.

    If n 10 and the data display no obvious outliers orskew, then continue with inference using the tdistribution; but the inference still relies on theassumption that the population is approximately normal.

    If 10 < n 40 and the data display at most only one ortwo outliers and no severe skew, then continue with

    with inference using the t distribution; the populationneed not be approximately normal.

    If n > 40, then except for extraordinarly severeskewwhich would be indicated by numerous

    outliersinference using the t distribution is okay. (Butyou might question whether inference on such apopulations mean is what you really want to be doing.)

    Assumptions forStatisticallinear regression

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    35/52

    Inference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Our third type of assumption is the modeling assumption.We choose a mathematical model that we think will describethe underlying phenomenon that generated our data. If themodel is very poor, then our inference will be meaningless.

    Assumptions forStatisticallinear regression

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    36/52

    Inference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Our third type of assumption is the modeling assumption.We choose a mathematical model that we think will describethe underlying phenomenon that generated our data. If themodel is very poor, then our inference will be meaningless.

    The only example of this students see in AP statistics is the

    linear regression model, which is:

    yi = 0 + 1xi + ei,

    where eiiid

    N(0, ).

    In this model there are three parameters to be estimated:0, 1, and .

    Assumptions forStatisticalI f

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    37/52

    Inference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Another way of stating the model is:

    yi N(0 + 1xi, )

    In other words, the means of the ys have a linearrelationship with the xs, but there is variability in the actualy data about those meansnormally distributed errors withconstant variability across all values of x.

    Assumptions forStatisticalI f

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    38/52

    Inference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    To check whether the model is reasonable, we:

    Look at the residuals from the linear regression to seewhether there is a pattern.

    Verify that the residuals are of roughly constant

    magnitude for all xs. Check to see whether the residuals appear to be

    approximately normally distributed.

    Assumptions forStatisticalInference

    Sample is not random

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    39/52

    Inference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    If a sample is assumed to be random when in fact there is anassociation between sample membership and a measured

    variable of interest, then the sampling procedure is biased.Conclusions will tend to systematically overestimate orunderestimate the parameters of interest.

    Assumptions forStatisticalInference

    Proportions: small samples

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    40/52

    Inference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    At each lattice point in this graph, 10,000 random sampleswere simulated and a 95% confidence interval estimate of pconstructed under the assumption that p has a normaldistribution. Coverage rates are shown.

    sample size

    tru

    e

    population

    proportion

    Confidence level accuracy when assuming normality

    np=10

    np=5

    n(1p)=10

    n(1p)=5

    np(1p)>5

    10 20 30 40 50 60 70 80 90 100

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    Assumptions forStatisticalInference

    Means: small samples

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    41/52

    Inference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    0 1 2 3 4 50

    0.5

    1

    1.5

    2

    2.5

    x

    f(x)

    0 1 2 3 4 50

    0.2

    0.4

    0.6

    0.8

    x

    f(x)

    0 1 2 3 4 50

    0.2

    0.4

    0.6

    0.8

    1

    x

    f(x)

    0 1 2 3 4 50

    0.5

    1

    1.5

    x

    f(x)

    Figure: These four distributions were used to simulate random

    samples of different sizes.

    Assumptions forStatisticalInference

    Means: small samples

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    42/52

    Inference

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    10,000 random samples were simulated for each sample sizefrom 2 to 50. Confidence intervals were computed assumingX

    N. Coverage rates are shown.

    0 10 20 30 40 500.85

    0.9

    0.95

    1

    sample size

    estimated

    coverageprobability

    0 10 20 30 40 500.85

    0.9

    0.95

    1

    sample size

    estimated

    coverageprobability

    0 10 20 30 40 500.85

    0.9

    0.95

    1

    sample size

    estimatedcoveragep

    robability

    0 10 20 30 40 500.85

    0.9

    0.95

    1

    sample size

    estimatedcoveragep

    robability

    Assumptions forStatisticalInference

    Chi-square goodness-of fit test: small samples.

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    43/52

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Here we tested the claimed distribution: (0.45, 0.25, 0.20,0.05, 0.05). Rejection rates from 10,000 samples are shown.

    0 20 40 60 80 100 1200.03

    0.04

    0.05

    0.06

    0.07

    0.08

    0.09

    0.1

    0.11

    sample size

    rejectionratewhen=

    0.0

    5

    Chi square rejection rates when null is true

    Assumptions forStatisticalInference

    Chi-square goodness-of fit test: small samples.

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    44/52

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Here we tested the claimed distribution: (0.45, 0.25, 0.20,0.09, 0.01). Rejection rates from 10,000 samples are shown.

    0 20 40 60 80 100 1200.03

    0.04

    0.05

    0.06

    0.07

    0.08

    0.09

    0.1

    0.11

    sample size

    Chi square rejection rates when null is true

    rejectionratewhen=

    0.0

    5

    Assumptions forStatisticalInference

    Modeling assumptions: linear model is a badd l

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    45/52

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    model.If data do not come from a linear model, but you impose one

    anyway, then estimates of0, 1, and wont mean much.

    0 1 2 3 4 5 6 7 8 980

    60

    40

    20

    0

    20

    40

    60

    80

    time

    radialvelocity

    The orbiting planet of 51 Pegasi

    Figure: 0 = 23.8, 1 = 33.4, s = 37.7

    Assumptions forStatisticalInference

    Modeling assumptions: linear model is a badd l

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    46/52

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    model.

    But note that nothing about your software will prevent you

    from estimating them anyway! Without checking themodeling assumptions, you wont know whether the model isany good or not!

    0 1 2 3 4 5 6 7 8 980

    60

    40

    20

    0

    20

    40

    60

    80

    time

    radialvelocity

    The orbiting planet of 51 Pegasi

    Assumptions forStatisticalInference

    A sample problem

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    47/52

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Suppose the following is an AP exam question.

    8 apples are randomly sampled from an orchard, and their

    weights in pounds are measured to be 0.44, 0.43, 0.33, 0.56,0.50, 0.50, 0.45, 0.38. Estimate the mean weight of apples

    in the orchard, including a reasonable margin of error.

    Assumptions forStatisticalInference

    A sample problem (continued)

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    48/52

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    The rubric requires students to check inferentialassumptions. Suppose one student writes the following:

    np> 10 and n(1

    p) > 10

    n < 40, assume normality.

    Assumptions forStatisticalInference

    A sample problem (continued)

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    49/52

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    The rubric requires students to check inferentialassumptions. Suppose one student writes the following:

    np> 10 and n(1

    p) > 10

    n < 40, assume normality. How do you think the rubric will score this response for thecheck of assumptions?

    Assumptions forStatisticalInference

    A sample problem (continued)

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    50/52

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    The following would make an AP reader happy.

    Random sampleyes, given in problem. Small data set(n = 8) requires population to be approximately normal.Reasonable?

    2 1.5 1 0.5 0 0.5 1 1.5 2

    0.35

    0.4

    0.45

    0.5

    0.55

    0.6

    0.65

    Normal probability plot looks roughly linear, so assumption isreasonable. We continue with the construction of a 95%t-Confidence Interval...

    Assumptions forStatisticalInference

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    51/52

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Three types of assumptions in AP statistics:

    Sample is random. (Reasonableness cannot be checkedwith the data.)

    Sample is large enough to assume a limiting distributionfor a statistic.

    Linear model is appropriate for bivariate data.

    Checking assumptions is a big part of all statistics. Thisshould be a part of every inference problem students do allyear, not just something they study in an idolated unit.

    Assumptions forStatisticalInference

    http://goforward/http://find/http://goback/
  • 8/3/2019 Bullard Assumptions Talk[1]

    52/52

    Floyd Bullard

    Overview

    Random samples

    Limiting

    distributions ofstatistics

    Modelingassumptions: linearregression

    When assumptionsarent met

    Conclusion

    Thank you for coming!

    [email protected]

    http://goforward/http://find/http://goback/