statistical analysis. null hypothesis: observed differences are due to chance (no causal...

Statistical Analysis

Null hypothesis: observed differences are due to chance (no causal relationship)• Ex. If light intensity increases, then the rate of

photosynthesis will not be affected

Alternative hypothesis: states that a causal relationship exists between independent variable and observed data• Ex: If light intensity increases, then the rate of

photosynthesis will increase

In statistics, the world is null until proven alternative

Null vs. Alternative Hypothesis

A mean is an average of all data points in a set A median is the middle value in a data set A mode is the most common value in a data set Percent difference shows the difference between

the means of the experimental and control groups• % difference = (│experimental – control│/ control) x 100

Standard deviation is the average measure of how much each value differs, or deviates, from the mean

With 2 data sets, you could have the same mean but very different standard deviations.

A small standard deviation shows more consistency

Mean, Median, Mode, % Difference, & Standard Deviation

Formula for Standard Deviation

What does this mean?

MeanN = Total # of values Each individual

value

SD example

Data Set 1: 4,4,4,4,4,6,6,6,6,6,5,5,5,5,5

Data Set 2: 5,5,5,4,4,6,6,3,3,7,7,1,1,9,9

Both sets have an identical mean…which data set has a smaller standard deviation?

Set 1 has less spread around the mean, which would give it a lower standard deviation

Mean and SD

For our data sets: Set 1: Mean = 5, SD = 0.8 Set 2: Mean = 5, SD = 2.4 What these numbers really mean is that, given a

normal (bell curve) distribution, 68% of data points fall within 1 SD of the mean, and 95% fall within 2 standard deviations

Precision of Data- BE CONSISTENT

Which data set is more useful? Why?

Error Bars When we graph our data,

we can use error bars to show the SD for each mean

What is the approximate standard deviation of meal worms per tray in the canopy cover group at 4 m from cover?

Error Bars When we graph our data,

we can use error bars to show the SD for each mean

What is the approximate standard deviation of meal worms per tray in the canopy cover group at 4 m from cover?

Answer: ~1 mealworm per tray

Chi-square Analysis

A chi-square analysis tests the significance of results

Answers the question: were the differences in the means large enough to reject the null hypothesis (and support the alternative hypothesis)?

Tests the probability of observed differences being random and NOT due to the independent variable

In the chi-square formula, the expected (e) values are those that you would expect if the null were true.

o = observed values

e= expected values

A p-value of .05 means there is a 5% chance that the difference between observed and expected data is random (95% chance that there is a significant difference)

Critical value – predetermined value establishing boundary for rejecting/accepting null hypothesis• Maximum chi-square value that would fail to reject null

hypothesis (i.e., chi-square value higher than the critical value shows support for the alternative hypothesis)

• Critical values will be provided in a chi-square table

• Dependent on degrees of freedom: number of possible outcomes minus 1 (d = N – 1)

Null vs. Alternative Hypothesis

CHI-SQUARE DISTRIBUTION TABLE Critical values

Accept Null Hypothesis (difference due to

chance) Reject Null Hypothesis

Probability (p-value)

Degrees of Freedom

0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001

1 0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83

2 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 9.21 13.82

3 0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.82 11.34 16.27

4 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.38 18.47

5 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52

6 1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46

7 2.17 2.83 3.82 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32

8 2.73 3.49 4.59 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.12

9 3.32 4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88

10 3.94 4.86 6.18 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59

Chi-square Analysis For example, using a p-value of .05 and 3 degrees of

freedom, a chi-square value must be greater than __________ (the critical value) to reject the null hypothesis and support the alternative hypothesis.

Put another way, a calculated chi-square value that is greater than 7.82 means there is a greater than 95% chance that there is a significant difference between the observed and expected data (less than 5% chance that the difference is random).

1.1.5 T-test

A T-test determines whether or not there is a significant difference between 2 samples

Assume we’re measuring wing span of 2 populations of eagles, 1 wild and 1 captive bred

We want to know if the difference between the lengths is significant (as opposed to being due to chance)

1.1.5 T-test

Captive: 180 cm, 187, 212, 196, 200, 204, 194, 189 Wild: 188, 205, 201, 214, 194, 189, 206, 203 Degrees of Freedom = 8 + 8 – 2 = 14 When we apply the T-test, and use a T value chart,

we obtain a 66% confidence level that the differences are significant. Not enough.

We need a confidence level of 95%, with a minimum sample size of 5.

1.1.6 Correlation and Causality

Simply because data shows a correlation does not imply causation.

Causation requires that one variable causes the other to occur.

The number of cavities in children shows a strong positive correlation with their vocabulary level.

? We should not assume that well spoken children

will have dentures by college.

Stats Quiz

1. Define standard deviation as required on the syllabus. (2)

2. State the usefulness of knowing a standard deviation. (2)

3. Give the minimum confidence level for results to be significant in science. (1)

4. If I told you that, based on measurements in a previous class, that blue haired people are hard of hearing, how would you respond (regarding the relationship)? (2)

Stats Quiz Answers

1. Summarize spread of values around mean, 68% of data lies within 1 SD of mean (95% within 2)

2. Comparing samples/ data points, large SD = bad, low SD = consistent

3. Greater than 95%4. Just because there is correlation does

not imply causation.

statistical analysis. null hypothesis: observed differences are due to chance (no causal...

Documents