labs 2.0

Upload: derk9012

Post on 04-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Labs 2.0

    1/22

    Descriptive Statistics

    N Minimum Maximum Mean Std. Deviation

    Derek 7 1.00 9.00 4.7143 2.69037

    Olinger 7 1.00 8.00 5.1429 2.54484

    Valid N (listwise) 7

    Descriptive Statistics

    N Minimum Maximum Mean Std. Deviation

    Bathrooms 362 1 4 1.61 .543

    Annual taxes 362 1000 6470 3390.91 890.435

    Valid N (listwise) 362

  • 7/29/2019 Labs 2.0

    2/22

    The shape of V1 is unusual. It peaks early and then comes back down, only to peak again. A normally

    distributed graph should rise to a peak then fall down to the starting point.

    The shape of this histogram slightly resembles a bell curve. It starts low and peaks, then falls back down.

    It is not quite uniform though as there are some outliers and its rise and fall are rather choppy rather than

    gradual. The shape of Mean1 does resemble a bell-shaped curve though. Since the data in Mean1 was

    formed using the means collected by many samples, the Central Limit Theorem states that the histogram

    should resemble a bell-shaped curve.

    Page 61 #5:

  • 7/29/2019 Labs 2.0

    3/22

    Yes, the mean of Difference was what I expected, because if the mean of Mean1 is expected to be 1 and

    the mean of Mean2 is expected to be .5, then the mean of Difference should have been around .5.

    The likelihood for the difference between means of any samples from any two populations will be greater

    than three standard errors above/below the mean difference between the two population means is very

    low.

    Most of the sample differences lie in the middle of the graph.

    The histogram would better resemble a bell-shaped curve, and the mean would become closer to .5.

  • 7/29/2019 Labs 2.0

    4/22

    Descriptive Statistics

    N Minimum Maximum Mean Std. Deviation

    Grade Average 10 2.20 4.00 3.5040 .60476

    Valid N (listwise) 10

  • 7/29/2019 Labs 2.0

    5/22

    There are 14 intervals.

    The interval width is 2.

    There are 50 data points in the entire 26 mile interval.

    Question on page 24:

    The interval width expanded to 3.

  • 7/29/2019 Labs 2.0

    6/22

    Page 14:

    If you increase the interval size in a histogram the number of intervals will decrease

    You have too many intervals and should decrease the amount of them.

  • 7/29/2019 Labs 2.0

    7/22

    Lab 4: Part 2

    Pg35:

    The mean is 24.35, the median is 7.15, and the standard deviation is 98.379.

    The Channel Catfish had the highest mean level of DDT. The lowest was the Large-Mouth Bass.

  • 7/29/2019 Labs 2.0

    8/22

    325 had the highest mean level

    The Small-mouth Buffalo fish is the heaviest on average. The lightest is the Large-Mouth Bass

    The longest on average is the Channel Catfish. The shortest is the Large-mouth Buffalo

    The different standard deviations of DDT levels among the species of fish tells us that the data distribu-tion is wide spread. For Channel Catfish, the standard deviation is around 119.475. This means that values

    differ around 119 from the mean which is supposed to be the average of the group as a whole. In order to

    have such a large variance between values, there are likely a lot of outliers. For the Small-mouth Buffalo,

    the standard deviation is 11.283, this means that the values only skew around 11 from the groups aver-

    age. This can be attributed to little outliers or just a large variety. For the Large-mouth Bass the standard

    deviation is 2.043; therefore, the amount of DDT only differs about 2 from the average. The Large-mouth

    Bass has the tightest grouping of data.

  • 7/29/2019 Labs 2.0

    9/22

    Lab4: Part 3

    Lab4: Part 4

    The probability of rejection is .07548.

  • 7/29/2019 Labs 2.0

    10/22

    About 11 runs got rejected for having more than two bad protectors.

    About 5.5% of all runs were rejected.

    Page 25:

    No, they will not. Ideally this would be the case, but in reality it does not work out this way. Although

    there are 4 sides of a dreidle, if you spin it 4 times you may not get all the possible outcomes; in fact, the

    probability of doing so is rather low. According to the central limit theorem, the greater the sample size

    the closer to a normal distribution you will get. We can see this by comparing the one20 histogram to the

    one1000 histogram. The first does not appear to have a normal distribution, in other words it is not very

    symmetrical. As we increase the sample size to 1000 though, the histogram shows a more normal distri-

    bution and is therefore more symmetrical.

    Page 26:

    The probability of an event that is certain to occur is 1. When you roll two dice you are guaranteed to get

    a face value sum of 2-12. That is to say, there is a 100% chance of getting a face value sum of 2-12.

  • 7/29/2019 Labs 2.0

    11/22

    Therefore the .0225 probability of getting a 2 is a portion of that total 100%. Loaded dice skew the distri-

    bution of the outcomes. The face value sum of 2 had a higher probability than the rest, which means that

    you had a greater chance of rolling a two than any other number. Therefore, the face value sums that in-

    cluded a roll of 2, are have a higher probability of occurring.

    You would load the dice a little bit more. The number 2 is more likely to appear every time you roll the

    dice. In the unfair experiment the number of face value sums that contained a 2 increased rather dramati-

    cally compared to the fair trial.

    Descriptive StatisticsN Minimum Maximum Mean Std. Deviation

    n 10 3.21 6.43 4.7630 1.12129Valid N (listwise) 10

  • 7/29/2019 Labs 2.0

    12/22

    Descriptive Statistics

    N Minimum Maximum Mean Std. Deviation

    VAR00002 1000 2.21 8.23 5.0441 1.00464

    Valid N (listwise) 1000

    Page 48:

    1. Yes, the mean is relatively close. The sample mean is 4.76 and the populationmean is 5. The standard deviation is also close. The sample standard devia-tion is 1.21 and the population standard deviation is 1. The sample graphdoesnt really resemble the population graph because it does not resemble anormal bell curve at all. Although the sample mean and standard deviationare similar to the population, the histogram does not represent the overallpopulation well.

    2. Yes, the sample mean is 5.04 which is very close to the population mean of 5.The standard deviation is 1.005, which is also very close to the populationstandard deviation of 1. In addition the histogram has a normal bell curveshape. The sample of 1000 does a good job of representing the population.

    3. The sample mean of the 1000 was only .04 away from the population meanof 5; whereas the sample size of 10 had a mean that was .24 away from 5.

    The sample standard deviation of the 1000 was only .005 away from the pop-ulation standard deviation of 1; whereas the sample size of 10 had a standarddeviation that was .121 away from 1. The 1000 sample graph looks more likethe population graph than it did with the smaller sample. In comparison, thelarger sample size represents the overall population better than the smallersample with is consistent with the central limit theorem.

    Part 2:

  • 7/29/2019 Labs 2.0

    13/22

    Page 52:

    1. The expected mean is 5.

    2. The expected standard deviation is 3.

    3. The histograms differ because the means histogram more closely represents

    a bell curve than the V1 histogram.

    4. The mean of the means is 4.99, which is very close to 5. The mean of themeans should be very close to 5, because the mean for each variable shouldbe somewhat close anyway.

    5. The standard deviation for the mean of the means histogram is .44. This tellsus that the variable means only stray .44 away from the population mean of5, indicating they are not very spread out.

    Part 3:

    Page 53:

    1) The means graph began to take shape as I pressed animated to add moredata. In the beginning it was skewed and varied widely. As I continued to adddata the graph began to take on a more uniform bell shape. As I continued topress animate the graph got closer and closer to the population graph meanof 16. The graph standard deviation also approached the population graphstandard deviation of 5 as I added more data.

  • 7/29/2019 Labs 2.0

    14/22

    2) With each increase in the number of samples, the sample data graphs mean,standard deviation, and median came closer and closer to the parent popula-tions. This is consistent with the central limit theorem in that as the numberof samples increases the sample graph begins to look like the parent popula-tion graph.

    Sample Size Mean Standard Devi-ation

    Parent Popula-tion

    16 5

    5 15.22 .96

    1000 15.89 2.05

    10000 15.99 2.23

    3) Increasing the sample size made the graph more symmetrical and it resem-bled the parent graph more than the smaller sample size. Changing the sam-

    ple size didnt really affect the mean for me. I got a mean of 16 for both the1000 and 10000 sample sizes, which could be due to the fact that with 1000the sample size is large enough to accurately resemble the parent popula-tion. The standard deviation went down as I increased the sample size, whichmeans that there are fewer outliers.

    Sample Size Mean Standard Devi-ation

    Parent Popula-tion

    16 5

    5 16.10 .69

    1000 16 .96

    10000 16 1.01

    4) Changing the sample size made the graph start to resemble the skewed par-ent population graph more and more. Changing the sample size increased themean of the means, ending up only being .01 off of the parent graph after10,000 samples. The standard deviation increased after jumping to 1,000samples, but then slowly started decreasing as it became more uniform.

    Sample Size Mean Standard Devi-ation

    Parent Popula-tion

    8.08 6.22

    5 5.5 0

    1000 8.03 2.74

    10000 8.07 2.77

    Part 4:

  • 7/29/2019 Labs 2.0

    15/22

    Page 46:

    1) The shape of the histogram will take a uniform bell shape because it onlyshows the mean of the means, which are relatively close together. Sincethere are fewer outliers the histogram will be uniform in shape.

    2) If you have a normal distribution and take the mean of the means, you willget a graph that resembles a bell shape. The graph isnt as widely spread andbecomes very tight and uniform. If enough data was added, it could be sym-metrical.

    3) If the original population is normal then the graph will have a normal distribu-tion after adding enough data. A non-symmetric distribution does not meanthat mean of means graph will be as well. We didnt put in normal binomialsfor our means variable graph and though it is not 100% symmetric and re-sembling a bell curve, you can see that it is starting to take that shape. Thatmeans that regardless of whether or not the population is normal or if you aredealing with the means the graph will end up being uniformly distributed.

    4) The central limit theorem states that as you take more and larger samplesthe distribution will more closely and accurately resemble the parent popula-tion.

    5) The standard deviation is affected by the large range of values. If the valuesrange widely then the standard deviation will be higher. If the values are allrelatively similar then the standard deviation will be lower. When it comes tothe means if you make a graph of the means of the means the standard devi-ation is much lower than the original population. This is because there isntthat much variation within the means so the standard deviation is lower. Out-liers will still somewhat affect the standard deviation but not nearly as muchas the original population because its based off of means rather than everyvariable.

    6) This example utilizes the Central Limit Theorem because the bell curve is pos-itively skewed. The central limit theorem says that the larger the sample themore normally distributed something becomes. You can get this example tobe normally distributed using the formula that converts a skewed distributionto a normal one by getting the mean to 0 and the standard deviation to 1.

    Then using this you can find the area which is the probability of the z scores2.94 and 3.06. You then subtract these to only get the segment that youwant.

  • 7/29/2019 Labs 2.0

    16/22

    Page 63 #5:

    1. The second data set has a wider confidence interval. It ranges from about 19 to about 34 while the firstgoes from about 25 to about 27.

    2. The data in number two has a greater range than number one. The first one goes from 24 to 28. Thesecond data set goes from 14 to 37.

    3.Standard deviation because the data in the first one has a larger standard deviation than that of thesecond one. This widens the confidence interval.

    One-Sample Test

    Test Value = 13

    t df Sig. (2-tailed)

    Mean Differ-

    ence

    95% Confidence Interval of the

    Difference

    Lower Upper

    Sample1 1.103 9 .299 1.50000 -1.5769 4.5769

    One-Sample Test

    Test Value = 25

    t df Sig. (2-tailed)

    Mean Differ-

    ence

    95% Confidence Interval of the

    Difference

    Lower Upper

    Sample1 -7.720 9 .000 -10.50000 -13.5769 -7.4231

    Page 75:

    1. The tails are infinite so they never actually reach zero, they just get so small

    that SPSS says 0.0.

    Part 2:

  • 7/29/2019 Labs 2.0

    17/22

    One-Sample Test

    Test Value = 25

    t df Sig. (2-tailed)

    Mean Differ-

    ence

    95% Confidence Interval of the

    Difference

    Lower Upper

    Sample1 -7.720 9 .000 -10.50000 -13.5769 -7.4231

    Page 70:

    1. Is the mean weight of the Florida birds less than the population weight? H0:

    is 12.3. H1 is

  • 7/29/2019 Labs 2.0

    18/22

    Page 63 #7:

    1.If the data had a very large range it could contribute to a wider confidence interval for the difference ofmeans. If the data had a smaller range it would have a smaller confidence level.

    Page 69:

    1. The interval got wider because in order to say something with a larger amount of confidence you mustcover more of the data. This is difficult because not every variable can be in the 99% confidence.

    2.The confidence interval got smaller because more data was covered and is more easily placed in the90% confidence.

    Part 3:

  • 7/29/2019 Labs 2.0

    19/22

    Page 63: 7

    1.Group0 has a wider confidence interval than Group1. Group1 has more data than Group0 and all of thedata in Group1 are relatively close together which indicates that it will have a higher confidence level.

    2.The width increased when the confidence level increased.

    3.The interval for the difference of two means for Group3 and Group4 is wider than that of Group1 andGroup2. The standard deviation is greater for Group3 and 4, which means the data has a higher vari-ance. The standard deviation and variance is different between the two data sets. Group 3 and 4 have aremore spread out and cover a larger range than Group1 and 2.

    4.The confidence interval of a population mean is the likelihood of any data being the mean. The stan-

    dard deviation can affect the width at any given level in a population and sample. If there is a huge varia-tion in data the mean will not give an appropriate measure of center. Therefore, the confidence intervalwill have to become wider to cover all of the data.

    5. The confidence level for the difference between two means is a certain percentage of what numberswill you most likely find there. This is different than a regular data set because it will be a difference of twomeans. The number that is the difference between two means will fall within the confidence interval. Likethe confidence interval for a population mean, standard deviation can affect the confidence interval for thedifference between two means. If one data set has a high standard deviation the difference in means willbe large. This will cause the confidence level interval to be wider.

  • 7/29/2019 Labs 2.0

    20/22

    Page 85:

    1. Non-runners have a higher median.

    2. Non-runners have a larger range.

    3. The whiskers arent symmetric, which indicates that the pulse rates of the

    non-runners have a wider range than that of runners.

    Page 87:

  • 7/29/2019 Labs 2.0

    21/22

    1. H0: 1-2=0

    2. H1: 1-20

    3. If the p-value goes past alpha then the null hypothesis would be rejected.

    4. We are assuming that there are two normal populations, that these popula-tions are independent, and that they have equal variances.

    Page 89:

    1. Asdf

  • 7/29/2019 Labs 2.0

    22/22

    Page 78 & 79:#4:

    A.1. 1= Median=215 , 1st Quartile=185 , 3rd Quartile=2202. 2= Median=210 , 1st Quartile=182 , 3rd Quartile=2253. 3= Median=215 , 1st Quartile=175 , 3rd Quartile=230

    4. 4= Median=195 , 1st Quartile=170 , 3rd Quartile=2175. 5= Median=212 , 1st Quartile=190 , 3rd Quartile=248

    B. No, there are no outliers.C. Out of the 5 sites, site 4 was the only site thats median was under 200.

    #5:A. H0: 1 =200, H0: 1>200B. P-ValueC. If the p-value goes by alpha, then we would reject the null hypothesisD.