comparing ca
TRANSCRIPT
-
7/27/2019 Comparing CA
1/2
Larry Zhang
AP Statistics, Mr. Thill
Case Study: Comparing Two Groups
In this case study, I compare 500 high school students in New York and California from whom
various data was collected in 2013. The students travel times (in minutes) to school were compared
based on which state they came from. Data was obtained via the CensusAtSchool random sampling form(http://www.amstat.org/censusatschool/RandomSampleForm.cfm).
To the left is a dot plot of the two groups.
Note that there was an outlier in the New York
group of 11,000 minutes, more time than in an
entire day, which is obviously impossible. This
outlier was removed. Additionally, 10 students
from California neglected to answer this question,
as did 3 from New York, leaving a total of 334 cases
from California and 152 from New York.
It is quite evident that the New York group
(bottom) is more spread out and centered farther
to the right than the California group (top). The
average time required to commute to school for
the California group is 20.6 minutes, and 40.2 for
the New York group, with a standard deviation of
15.2 and 22.9 respectively. With an IQR of 20 and
22.5 respectively, though, it seems that, while the
overall spread is large, the amount of time most
students take to get to school within each group have approximately the same range of times of
commute, only averaging at different times. However, the students who live far away from their schoolshave a longer commute in New York than they do in California.
Case #24, an individual from California, had an average travel time of 28 minutes. With the
mean and standard deviation for Californians
being 20.6 minutes and 15.2 minutes
respectively, this individuals z-score is (28-
20.6)/15.2 = 0.487. This means that she is 0.487
standard deviations (15.2 minutes) above the
mean commute time of 20.6 minutes for all
California resident cases in this study.
A percentile plot of all Californians vs
their commute time shows this individual at the
71.1th percentile. This means that this individual
has a commute time longer than 71.1% of all
California student cases in this study.
-
7/27/2019 Comparing CA
2/2
Larry Zhang
AP Statistics, Mr. Thill
Were a normal model used to approximate the California group, it would predict a percentile
corresponding to a commute time of 28 minutes (z-score of 0.487) ofnormalcdf(-, 0.487) = 68.7%. This
means that the normal model predicts that about 68.7% of students would have a commute time lower
than that of the individual with a z-score of 0.487, or a commute time of 28 minutes.
A normal model actually does approximate the majority of this group, to a certain degree. It
doesnt exactly fit the group down to every case, but neither can it be thrown out as a possibility. We
first look at the normal quantile plot.
Looking at the entire group shows us
immediately that a normal model is nowhere near
appropriate. However, eliminating a few
outliers on the edges of the group gives us the
next graph.
In this trimmed group, a normal model
fits the data much better. Note that the large
group at about 5 minutes is to be expected, as
we generally divide shorter periods of time into 5
and 15 minute pieces; students with a shortcommute time most likely psychologically
gravitate towards answering with a simple 5
minutes instead of trying to figure out the actual
time. That being said, this is guesswork.
However, it cannot be ignored that this normal
quantile plot of the trimmed group exhibits a
distinct normal-like behavior.
Looking at summary statistics, we see that the mean of the California group is distinctly different
from its median (20.6 as compared to 15). Again, though, this is most likely due to the large group of
students with long commute times at the edge of the group. In a normal model, though, the mean isequal to the median; this isnt even close to being true in the general California group. However, looking
at the percentiles calculated in part 2b and 3 (71.1% actual, vs 68.7 normal predicted), again we see that
a normal model (almost) accurately describes part of the graph.
Therefore, the California group is NOT described well by a normal model in general. However, if
some data points at the higher end of the group are trimmed off, the group is now described
moderately well by the normal model.