comparing ca

7/27/2019 Comparing CA

1/2

Larry Zhang

AP Statistics, Mr. Thill

Case Study: Comparing Two Groups

In this case study, I compare 500 high school students in New York and California from whom

various data was collected in 2013. The students travel times (in minutes) to school were compared

based on which state they came from. Data was obtained via the CensusAtSchool random sampling form(http://www.amstat.org/censusatschool/RandomSampleForm.cfm).

To the left is a dot plot of the two groups.

Note that there was an outlier in the New York

group of 11,000 minutes, more time than in an

entire day, which is obviously impossible. This

outlier was removed. Additionally, 10 students

from California neglected to answer this question,

as did 3 from New York, leaving a total of 334 cases

from California and 152 from New York.

It is quite evident that the New York group

(bottom) is more spread out and centered farther

to the right than the California group (top). The

average time required to commute to school for

the California group is 20.6 minutes, and 40.2 for

the New York group, with a standard deviation of

15.2 and 22.9 respectively. With an IQR of 20 and

22.5 respectively, though, it seems that, while the

overall spread is large, the amount of time most

students take to get to school within each group have approximately the same range of times of

commute, only averaging at different times. However, the students who live far away from their schoolshave a longer commute in New York than they do in California.

Case #24, an individual from California, had an average travel time of 28 minutes. With the

mean and standard deviation for Californians

being 20.6 minutes and 15.2 minutes

respectively, this individuals z-score is (28-

20.6)/15.2 = 0.487. This means that she is 0.487

standard deviations (15.2 minutes) above the

mean commute time of 20.6 minutes for all

California resident cases in this study.

A percentile plot of all Californians vs

their commute time shows this individual at the

71.1th percentile. This means that this individual

has a commute time longer than 71.1% of all

California student cases in this study.

7/27/2019 Comparing CA

2/2

Larry Zhang

AP Statistics, Mr. Thill

Were a normal model used to approximate the California group, it would predict a percentile

corresponding to a commute time of 28 minutes (z-score of 0.487) ofnormalcdf(-, 0.487) = 68.7%. This

means that the normal model predicts that about 68.7% of students would have a commute time lower

than that of the individual with a z-score of 0.487, or a commute time of 28 minutes.

A normal model actually does approximate the majority of this group, to a certain degree. It

doesnt exactly fit the group down to every case, but neither can it be thrown out as a possibility. We

first look at the normal quantile plot.

Looking at the entire group shows us

immediately that a normal model is nowhere near

appropriate. However, eliminating a few

outliers on the edges of the group gives us the

next graph.

In this trimmed group, a normal model

fits the data much better. Note that the large

group at about 5 minutes is to be expected, as

we generally divide shorter periods of time into 5

and 15 minute pieces; students with a shortcommute time most likely psychologically

gravitate towards answering with a simple 5

minutes instead of trying to figure out the actual

time. That being said, this is guesswork.

However, it cannot be ignored that this normal

quantile plot of the trimmed group exhibits a

distinct normal-like behavior.

Looking at summary statistics, we see that the mean of the California group is distinctly different

from its median (20.6 as compared to 15). Again, though, this is most likely due to the large group of

students with long commute times at the edge of the group. In a normal model, though, the mean isequal to the median; this isnt even close to being true in the general California group. However, looking

at the percentiles calculated in part 2b and 3 (71.1% actual, vs 68.7 normal predicted), again we see that

a normal model (almost) accurately describes part of the graph.

Therefore, the California group is NOT described well by a normal model in general. However, if

some data points at the higher end of the group are trimmed off, the group is now described

moderately well by the normal model.

comparing ca

Documents