andrew chuang stat 234 pset 4

6
Andrew Chuang, Susan Yan, Eileen Li STAT 234 Pset 4 4. A. The results given by the statistic summary are very close to what we would expect from a normal distribution. The first quartile result is 0.566, which is very close to the first quartile number we construct through multiplying the right end point by the 25 th percentile 2(0.25)=0.5. The median, 1.05, is very close to what we get if we multiply the right endpoint by 50% 2(0.5) = 1. The mean is 1.036758, which is similarly close to 1. The third quartile, 1.533 is very close to 2(0.75) =1.5. There is still a difference between the randomly generated numbers and the quartiles we produced by multiplying percentiles by the right end point. > require(mosaic) > n <- 100 > y <- runif(n, min=0, max=2) > favstats(y) > bwplot(~ y) This box and whiskers plot is very close to being symmetric. The left whisker and endpoint is a little farther from the median than the right endpoint. The right endpoint is also not completely on 2.0. The first and third quartiles are very close to where we would expect them to be, as previously expressed in the statistic summary given by the R console. > bins <- seq(from=0, to=2, by=0.2)

Upload: andrew-chuang

Post on 11-Jul-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Andrew Chuang Stat 234 Pset 4

Andrew Chuang, Susan Yan, Eileen LiSTAT 234 Pset 4

4. A. The results given by the statistic summary are very close to what we would expect from a normal distribution. The first quartile result is 0.566, which is very close to the first quartile number we construct through multiplying the right end point by the 25th percentile 2(0.25)=0.5. The median, 1.05, is very close to what we get if we multiply the right endpoint by 50% 2(0.5) = 1. The mean is 1.036758, which is similarly close to 1. The third quartile, 1.533 is very close to 2(0.75) =1.5. There is still a difference between the randomly generated numbers and the quartiles we produced by multiplying percentiles by the right end point.

> require(mosaic)> n <- 100> y <- runif(n, min=0, max=2)> favstats(y)

> bwplot(~ y)

This box and whiskers plot is very close to being symmetric. The left whisker and endpoint is a little farther from the median than the right endpoint. The right endpoint is also not completely on 2.0. The first and third quartiles are very close to where we would expect them to be, as previously expressed in the statistic summary given by the R console.

> bins <- seq(from=0, to=2, by=0.2)> bins" [1] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0This is the sequence of numbers from 0, to 2, with a separation of 0.2. > histogram(~ y, breaks=bins, type="density")

Page 2: Andrew Chuang Stat 234 Pset 4

B. The histogram is not entirely consistent with my drawing of the histogram in exercise 4.63. The area of the histogram is not distributed evenly, with a gap in the middle of the histogram where I would have expected the height to be around 0.5. (We got 0.5 from dividing the total area of the density curve by the right endpoint.) There are bins that have high numbers of entries, that hit 0.6 and 0.7, which is way higher than I drew my histogram in 4.63. C. > sum(y <= 1.6) / n[1] 0.83

D. > ( sum(y < 1.7) - sum(y <= 0.5)) / n[1] 0.6> sum(y >= 0.95) / n[1] 0.56

E. The results given by the statistic summary are very close to what we would expect from a normal distribution. The first quartile result is 0.555, which is very close to the first quartile number we construct through multiplying the right end point by the 25th percentile 2(0.25)=0.5. The median, 1.03, is very close to what we get if we multiply the right endpoint by 50% 2(0.5) = 1. The mean is 1.029, which is similarly close to 1. The third quartile, 1.519 is very close to 2(0.75) =1.5. There is still a difference between the randomly generated numbers and the quartiles we produced by multiplying percentiles by the right end point. The differences between these numbers and the expected percentile values are smaller than the differences between the numbers generated using a sample size of only 100 and the expected percentile values. This can be attributed to the larger sample size in this trial of random generation.

> n <- 1000> y <- runif(n, min=0, max=2)> favstats(y)

> bwplot(~ y)

Page 3: Andrew Chuang Stat 234 Pset 4

> bins <- seq(from=0, to=2, by=0.2)> bins [1] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0This is the sequence of numbers from 0, to 2, with a separation of 0.2. > histogram(~ y, breaks=bins, type="density")

The histogram is more consistent with the histogram I drew for 4.63, because the area of the histogram is more evenly distributed across the entire range of the histogram and the less evenly distributed peaks of the histogram are not higher than 0.6. In the other histogram, there were parts of the histogram that reached 0.6 and even 0.7. The height of this histogram is more evenly distributed across the entire area of the histogram. > sum(y <= 1.6) / n[1] 0.802> ( sum(y < 1.7) - sum(y <= 0.5)) / n[1] 0.624> sum(y >= 0.95) / n[1] 0.537

F. Firstly, due to the increment of sample size, we are able to find that the both the mean and the median of the distribution is closer to 1, which is the true mean/median from 0 to 2. And from the histogram of the sample when n=1000, the density value for each interval does not vary as much as the case of the sample when n=100, which it means that we are closer to the accurate estimation of the population. However, since the results are really close, it shows that it takes a significant amount of sample size to increase the accuracy for a tiny bit.

Page 4: Andrew Chuang Stat 234 Pset 4
Page 5: Andrew Chuang Stat 234 Pset 4