population dynamics required background knowledge: data and variability concepts data collection...
TRANSCRIPT
POPULATION DYNAMICS POPULATION DYNAMICS Required background knowledge:
• Data and variability concepts
Data collection
• Measures of central tendency (mean, median, mode, variance, stdev)
• Normal distribution and SE
• Student’s t-test and 95% confidence intervals
• Chi-Square tests
• MS Excel
IF n is very, very large : we use Z distribution to calculate normal deviates
Z = (x – μ)
σx
STATISTICS: z-DISTRIBUTION
t = (x – μ)
sx Equation 3
If n is not large, we must uset distribution:
But first..WHY do we do all this??
Integral part of science…
HYPOTHESIS TESTING
Model Explanation or theory (maybe >1)
Hypothesis Prediction deduced from modelGenerate null hypothesis – H0: Falsification test
Test Experiment•IF H0 rejected – model supported•IF H0 accepted – model wrong
Pattern Observation Rigorously Describe
HYPOTHESIS TESTING
You can say with 95% certainty that
the pattern you have observed is
not due to chance alone
You can say with 99% certainty that
the pattern you have observed is
not due to chance alone
p-value
Measure of certainty
1.0 0
0.05
0.01
α
Not significant
Significant
These are proportions…if expressed as %
1. Collect data
2. Analyse data
3. Set up hypotheses:
• H0 = results are due to CHANCE alone
• H1 = results are significant and are not due to chance alone
4. Test hypotheses:
Determine significance level for hypothesis testing (α) ~ termed ‘Alpha’
Usually either α = 0.05 or α = 0.01
Calculate probability value (p)
If p < α then reject H0 ; accept H1 (i.e results are significant and are NOT due to chance alone)
If p > α then reject H1; accept H0 (i.e results are not significant and ARE due to chance alone)
POPULATION DYNAMICS POPULATION DYNAMICS Required background knowledge:
• Data and variability concepts
Data collection
• Measures of central tendency (mean, median, mode, variance, stdev)
• Normal distribution and SE
• Student’s t-test and 95% confidence intervals
• Chi-Square tests
• MS Excel
First, some important concepts about t-tests…
Because it is based on the normal distribution, the t distribution has all the attributes of the normal distribution:• Completely symmetrical• Area under any part of the curve reflects proportion of t values involved• etc….
STATISTICS: t-DISTRIBUTION
Height (mm)
Fre
qu
ency
(%
)
02468
1012
0 2 4 6 8 10 12 14 16 18 20 22 24
Shape of the t distribution
varies with v (Degrees of Freedom: n-1): the bigger the
n, the less spread the distribution
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
t
V = 100V = 10V = 5V = 1
Tails of the t-distribution
0.10.1
-4 -3 - 2 -1 0 1 2 3 4
t
α (1)
-4 -3 - 2 -1 0 1 2 3 4
t
α (1)
One-Tailed hypothesis testing
0.05 0.05
-4 -3 - 2 -1 0 1 2 3 4
t
α (2)
Two-Tailed hypothesis testing
STATISTICS: t-DISTRIBUTION CONCEPTS
Example: if our sample size is 11 (v = 10), what is the value of t beyond which 10% (0.1) of the curve is enclosed? – Two possible t-values
H0 : μ = 25
H1 : μ < 25
H0 : μ = 25
H1 : μ ≠ 25
OR
Measure of certainty
1.0 0
0.05
0.01
Critical t-value
Not significant
Significant
T-statistic
t
STATISTICS: T-DISTRIBUTION: CONCEPTS
Critical values
p-value
Measure of certainty
1.0 0
0.05
0.01
α
Not significant
Significant
-4 -3 - 2 -1 0 1 2 3 4
α (2)
-2.064 2.064
t = (x – μ)
sx
α = 0.05
T-statistic compared with critical value
If t-statistic > 2.064 OR < -2.064 then reject H0 ; accept H1
(i.e results are significant and are NOT due to chance alone)
Critical values
α (2) 0.5 0.2 0.1 0.05 0.02
α (1) 0.25 0.1 0.05 0.025 0.011 1.000 3.078 6.314 12.7062 0.816 1.886 2.920 4.3033 0.765 1.638 2.353 3.1824 0.741 1.533 2.132 2.7765 0.727 1.476 2.015 2.571
6 0.718 1.440 1.943 2.4477 0.711 1.415 1.895 2.3658 0.706 1.397 1.860 2.3069 0.703 1.383 1.833 1.26210 0.700 1.372 1.812 2.228
11 0.697 1.363 1.796 2.20112 0.695 1.356 1.782 2.17913 0.694 1.350 1.771 2.16014 0.692 1.345 1.761 2.14515 0.691 1.341 1.753 2.131
16 0.690 1.337 1.746 2.12017 0.689 1.333 1.740 2.11018 0.688 1.330 1.734 2.10119 0.688 1.328 1.729 2.09320 0.687 1.325 1.725 2.086
21 0.686 1.323 1.721 2.08022 0.686 1.321 1.717 2.07423 0.685 1.319 1.714 2.06924 0.685 1.318 1.711 2.06425 0.684 1.316 1.708 2.060
v
-4 -3 - 2 -1 0 1 2 3 4
t
α (1)
0.1
-1.372
One-TailedV=10
0.05 0.05
-4 -3 - 2 -1 0 1 2 3 4
t1.812-1.812
α (2) Two-TailedV=10
If our sample size is 11 (v = 10), what is the value of t beyond which 10% (0.1) of the curve is enclosed (i.e what is the critical value of t)?
STATISTICS: T-DISTRIBUTION: CONCEPTS
Critical values are found on the t-tables
1. Establish hypotheses (determine if one-tail or two-tailed test
• One tail: H0 has > or < in it
• Two tail: H0 has ≠ in it
2. Determine: n, x, μ, s and v (n-1)
3. Calculate the t-statistic using
4. Determine significance level for hypothesis testing (α) ~ termed ‘Alpha
• Usually either α = 0.05 or α = 0.01 (area in each tail)
5. Calculate the critical value of t
• use T-statistic table, looking up the value for t
6. Compare t-statistic with critical value to know if you should accept or reject H0
Steps of Student t-tests:
t = (x – μ)
sx
t significance level (α 1 or 2), v
Based on this observation we want to determine if the intensification of agricultural practices has resulted in a
significant change to the nitrate concentration of the freshwater resources.
HOW? …Need to determine the probability that a the sample (n = 25, x =
24.23 mg.l-1) could be randomly generated from a population with μ = 22 mg.l-1?
The mean nitrate concentration of water in all the upstream tributaries of a large river prior to intensive agriculture is 22 mg.l-1.Afterwards the mean nitrate concentration in 25 of these tributaries is 24.23 mg.l-1 and s = 4.24 mg.l-1
OBSERVATION MADE:
STATISTICS: T-DISTRIBUTION: EXAMPLE
Nitrate (before agriculture)
μ = 22 mg.l-1
n= ALL tributaries
Nitrate (after
agriculture)x = 24.23 mg.l-1
n= 25 sample tributaries
1. Establish hypotheses
2. Determine: n, x, μ, s, n and v (n-1)
3. Calculate the t-statistic
4. Determine significance level (α)
5. Calculate the critical value of t
• use T-statistic table, looking up the value for t
• One tail or two tail?
Student t-tests: steps for calculation
t significance level (α 1 or 2), v
H0: μ = 22
H1: μ ≠ 22
What is the probability that a the sample (n=25, x = 24.23 mg.l-1) could be randomly generated from a population with μ = 22 mg.l-1?
n = 25, x = 24.23, μ = 22.00, s = 4.24, v = 24
t = (x – μ)
sx
(24.23 – 22)
0.848
= 2.23
0.848
= = 2.629
sx
s
n
=
√ 4.24
25
=
√ 4.24
5= = 0.848
t = 2.629
Either α = 0.05 or α = 0.01 (area in each tail)
α = 0.05
t 0.05 (α 2), 24
t
α (1)
0.05
One-Tailed
0.025 0.025
t
α (2)
Two-Tailed
Go to the hypotheses H0: μ = 22 H1: μ ≠ 22
The critical value of t 0.05 (α 2), 24 =2.064
-4 -3 - 2 -1 0 1 2 3 4
t 2.064-2.064
0.025 0.025
α (2) 0.5 0.2 0.1 0.05 0.02
α (1) 0.25 0.1 0.05 0.025 0.011 1.000 3.078 6.314 12.7062 0.816 1.886 2.920 4.3033 0.765 1.638 2.353 3.1824 0.741 1.533 2.132 2.7765 0.727 1.476 2.015 2.571
6 0.718 1.440 1.943 2.4477 0.711 1.415 1.895 2.3658 0.706 1.397 1.860 2.3069 0.703 1.383 1.833 1.26210 0.700 1.372 1.812 2.228
11 0.697 1.363 1.796 2.20112 0.695 1.356 1.782 2.17913 0.694 1.350 1.771 2.16014 0.692 1.345 1.761 2.14515 0.691 1.341 1.753 2.131
16 0.690 1.337 1.746 2.12017 0.689 1.333 1.740 2.11018 0.688 1.330 1.734 2.10119 0.688 1.328 1.729 2.09320 0.687 1.325 1.725 2.086
21 0.686 1.323 1.721 2.08022 0.686 1.321 1.717 2.07423 0.685 1.319 1.714 2.06924 0.685 1.318 1.711 2.06425 0.684 1.316 1.708 2.060
v
t = 2.629 > critical value
1. Establish hypotheses
2. Determine: n, x, μ, s, n and v (n-1)
3. Calculate the t-statistic
4. Determine significance level (α)
5. Calculate the critical value of t
6. Compare t-statistic with critical value
H0: μ = 22
H1: μ ≠ 22n = 25, x = 24.23, μ = 22.00, s =
4.24, v = 24t = 2.629
α = 0.05
STATISTICS: T-DISTRIBUTION: EXAMPLE
Critical value = 2.064
-4 -3 - 2 -1 0 1 2 3 4
t 2.064-2.064
0.025 0.025
2.629
SO…means it is very unlikely that a random sample (size 25)
would generate a mean of 24.23 mg.l-1 from a population with a
mean of 22 mg.l-1
So unlikely, in fact, that we don’t believe it can happen by
chance…Reject H0 and accept H1
What is the probability that a the sample (n=25, x = 24.23 mg.l-1) could be randomly generated from a population with μ = 22 mg.l-1?
STATISTICS: T-DISTRIBUTION: EXAMPLES
Nitrate (before agriculture)
μ = 22 mg.l-1
n= ALL tributaries
Nitrate (after
agriculture)x = 24.23 mg.l-1
n= 25 sample tributaries
What we can then say, is that the before and after nitrate levels in the water are (statistically) significantly different from each other
(p < 0.05)
We are not making any judgment about whether there is more nitrate in the water after than before, only that the
concentrations are different…though some things are self evident!
Now you try…25 intertidal crabs were exposed to air at 24.3 C, and their body temperatures were measured.
Student-t steps to follow:
1. Establish hypotheses
2. Determine: n, x, μ, s, n and v (n-1)
3. Calculate the t-statistic
4. Determine significance level (α)
5. Calculate the critical value of t
6. Compare t-statistic with critical value
H0: μ = 24.3 C
i.e crab body temp is NOT different from ambient temp
H1: μ ≠ 24.3 C
i.e crab body temp IS different from ambient temp
Q: Is the mean body temperature of this species of crab the same as the ambient air temperature of 24.3 C
Now you try…25 intertidal crabs were exposed to air at 24.3 C, and their body temperatures were measured.
Student-t steps to follow:
1. Establish hypotheses
2. Determine: n, x, μ, s, n and v (n-1)
3. Calculate the t-statistic
4. Determine significance level (α)
5. Calculate the critical value of t
6. Compare t-statistic with critical value
Q: Is the mean body temperature of this species of crab the same as the ambient air temperature of 24.3 C
Switch to Excel and do the calculations25.4025
22.9024
24.8023
27.0022
23.9021
25.5020
25.4019
26.3018
23.5017
24.8016
28.1015
25.5014
23.3013
24.6012
24.3011
26.2010
23.909
24.508
24.007
27.306
25.105
22.904
26.103
24.602
25.801
Body temp (C)Crab ID
α = 0.05
Now you try…25 intertidal crabs were exposed to air at 24.3 C, and their body temperatures were measured.
Student-t steps to follow:
1. Establish hypotheses
2. Determine: n, x, μ, s, n and v (n-1)
3. Calculate the t-statistic
4. Determine significance level (α)
5. Calculate the critical value of t
6. Compare t-statistic with critical value
Q: Is the mean body temperature of this species of crab the same as the ambient air temperature of 24.3 C
t = 2.7128
t significance level (α 1 or 2), v
α (2) 0.5 0.2 0.1 0.05 0.02
α (1) 0.25 0.1 0.05 0.025 0.011 1.000 3.078 6.314 12.7062 0.816 1.886 2.920 4.3033 0.765 1.638 2.353 3.1824 0.741 1.533 2.132 2.7765 0.727 1.476 2.015 2.571
6 0.718 1.440 1.943 2.4477 0.711 1.415 1.895 2.3658 0.706 1.397 1.860 2.3069 0.703 1.383 1.833 1.26210 0.700 1.372 1.812 2.228
11 0.697 1.363 1.796 2.20112 0.695 1.356 1.782 2.17913 0.694 1.350 1.771 2.16014 0.692 1.345 1.761 2.14515 0.691 1.341 1.753 2.131
16 0.690 1.337 1.746 2.12017 0.689 1.333 1.740 2.11018 0.688 1.330 1.734 2.10119 0.688 1.328 1.729 2.09320 0.687 1.325 1.725 2.086
21 0.686 1.323 1.721 2.08022 0.686 1.321 1.717 2.07423 0.685 1.319 1.714 2.06924 0.685 1.318 1.711 2.06425 0.684 1.316 1.708 2.060
v
t 0.05 (α 2), v
α = 0.05
Now you try…25 intertidal crabs were exposed to air at 24.3 C, and their body temperatures were measured.
Student-t steps to follow:
1. Establish hypotheses
2. Determine: n, x, μ, s, n and v (n-1)
3. Calculate the t-statistic
4. Determine significance level (α)
5. Calculate the critical value of t
6. Compare t-statistic with critical value
Q: Is the mean body temperature of this species of crab the same as the ambient air temperature of 24.3 C
t = 2.713
Critical value = 2.064t = 2.7128 Critical value =
2.064>
H0: μ = 24.3 C [i.e crab body temp is NOT different from ambient temp]
H1: μ ≠ 24.3 C [i.e crab body temp IS different from ambient temp]
REJECT
-4 -3 - 2 -1 0 1 2 3 4
t 2.064-2.064
0.025 0.025
2.173
POPULATION DYNAMICS POPULATION DYNAMICS Required background knowledge:
• Data and variability concepts
Data collection
• Measures of central tendency (mean, median, mode, variance, stdev)
• Normal distribution and SE
• Student’s t-test and 95% confidence intervals
• Chi-Square tests
• MS Excel
To do this, we need a set of t-tables, and V (N-1)sx
The t-Distribution allows us to calculate the 95% (or 99%) confidence intervals around an estimate of the population
mean
0.025 0.025
t
α (2)
Two-Tailed
In other words, what are limits around our estimate of the population mean, WITHIN which we can be 95% (or 99%) confident that the REAL value of the population mean lies
When we express dispersion around some measure of central tendency, we normally use Standard Deviation:
x s±
STATISTICS: 95 % CONFIDENCE INTERVALS
To do this, we need a set of t-tables, and V (n-1)sx
IF
n
sx
x = 42.3 mm
= 26 (V = 25)
= 2.15
α (2) 0.5 0.2 0.1 0.05 0.02
α (1) 0.25 0.1 0.05 0.025 0.011 1.000 3.078 6.314 12.7062 0.816 1.886 2.920 4.3033 0.765 1.638 2.353 3.1824 0.741 1.533 2.132 2.7765 0.727 1.476 2.015 2.571
6 0.718 1.440 1.943 2.4477 0.711 1.415 1.895 2.3658 0.706 1.397 1.860 2.3069 0.703 1.383 1.833 1.26210 0.700 1.372 1.812 2.228
11 0.697 1.363 1.796 2.20112 0.695 1.356 1.782 2.17913 0.694 1.350 1.771 2.16014 0.692 1.345 1.761 2.14515 0.691 1.341 1.753 2.131
16 0.690 1.337 1.746 2.12017 0.689 1.333 1.740 2.11018 0.688 1.330 1.734 2.10119 0.688 1.328 1.729 2.09320 0.687 1.325 1.725 2.086
21 0.686 1.323 1.721 2.08022 0.686 1.321 1.717 2.07423 0.685 1.319 1.714 2.06924 0.685 1.318 1.711 2.06425 0.684 1.316 1.708 2.060
v
Then the 95% Confidence Interval (CI) around the mean is calculated as:
sx
* tά 2
The Confidence Interval expression is then written as: 42.3 mm ± 4.43 mmi.e we are 95% confident that μ lies between 37.87 and 46.73
STATISTICS: 95 % CONFIDENCE INTERVALS
= 4.429= 2.15 *2.06
- 4.43 mm
+ 4.43 mm
0.025 0.025
α (2)
x = 42.3 mm= 4.429
POPULATION DYNAMICS POPULATION DYNAMICS Required background knowledge:
• Data and variability concepts
Data collection
• Measures of central tendency (mean, median, mode, variance, stdev)
• Normal distribution and SE
• Student’s t-test and 95% confidence intervals
• Chi-Square tests
• MS Excel
Nominal data – gender, colour, species, genus, class, town, country, model etc
Continuous data – concentration, depth, height, weight, temperature, rate etc
Discrete data – numbers per unit space, numbers per entity etc
Types of Data
The type of data collected influences their statistical analysis
Male Female
Blue Red Black White
100 g 200 g
121.34 g 162.18 g 180.01 g
5 people
Understanding stats…
Nominal Continuous Discrete1
DATA
Type
z-testst-tests
ANOVA…etc3 Choice of
statistical test
Chi - squared
2 Distribution
NormalBinomial
Poisson…etc
+
Understanding stats…
Data do NOT
have to be
normally
distributed
POPULATION DYNAMICS POPULATION DYNAMICS Required background knowledge:
• Data and variability concepts
Data collection
• Measures of central tendency (mean, median, mode, variance, stdev)
• Normal distribution and SE
• Student’s t-test and 95% confidence intervals
• Chi-Square tests
• MS Excel
Testing Patterns in Discrete (count) Data: the Chi-Square Test
Examples of count data: Number of petals per flowerNumber of segments per insect legNumber of worms per quadratNumber of white cars on campus…etc
You can covert continuous data to discrete data, by assigning data to data classes
1.85 1.65 1.55 1.91.6 1.95 1.7 1.7
1.95 1.75 1.8 1.71.65 1.55 1.65 1.751.45 1.85 1.85 1.81.9 1.75 1.7 2.051.4 2 1.35 21.8 1.65 1.5 1.81.9 2.1 1.8 1.5
1.75 1.2 1.5 2.151.3 1.7 1.6 1.55
1.85 1.45 1.8 1.851.5 1.75 1.75 1.251.8 1.95 1.75 21.9 1.7 1.8 1.9
1.75 1.85 1.8 1.751.7 1.9 1.45 1.65
1.35 1.65 1.7 1.61.75 1.5 1.55 1.551.6 1.8 1.75 1.85
2.05 1.6 1.85 1.71.65 1.7 1.4 1.751.95 1.9 1.65 1.61.75 1.65 1.7 1.851.8 1.75 1.95 1.65
1.55 2.2 1.751.7 1.6 1.6
0
2
4
6
8
10
12
14
16
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9 2
2.1
2.2
Height (m)
Fre
qu
en
cy
Height (m) Frequency1.2 1
1.25 11.3 1
1.35 21.4 2
1.45 31.5 5
1.55 61.6 8
1.65 101.7 12
1.75 151.8 11
1.85 91.9 7
1.95 52 3
2.05 22.1 1
2.15 12.2 1
Often want to determine if the population from which you have obtained count data conforms to a certain prediction
Q: Does the OBSERVED ratio differ (SIGNIFICANTLY) from the EXPECTED ratio?
STATISTICS: CHI-SQUARED TESTS
Hypothesised (EXPECTED) ratio:
n =134
Observed numbers:
113 yellow
21 green
Expected numbers: 100.5 yellow
33.5 green
=134 * 0.75 =134 * 0.25
3 : 1 ¾ : ¼OR 0.75 : 0.25
OR
113 : 21OBSERVED ratio: 5.4 : 1OR
= Σχ2 (O – E)2
E[ ] Equation 4
Where O = Observed, E = Expected
The bigger the difference between O and E, the greater the χ2
When there is no difference will be ZERO = Goodness of Fitχ2
A geneticist raises a progeny of 134 flowers from this cross:
STATISTICS: CHI-SQUARED TESTS
1.Establish hypotheses
2.Determine Observed and Expected frequencies
3.Calculate the X2-statistic using
4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01)
5.Calculate the critical value of X2
• use X2-statistic table
6.Compare X2-statistic with critical value
7.If X2-statistic > critical value reject H0 (significant differences between O and E)
8.If X2-statistic < critical value accept H0 (no significant differences between O and E)
NB: must always use counts (frequencies) NOT percentages or proportions
= Σχ2 (O – E)2
E[ ]
Steps of X2 tests:
Critical value: X2 significance level,
vNumber of
categories (K) -1
STATISTICS: CHI-SQUARED TESTS
1.Establish hypotheses
• H0: Observed and expected ratios are not significantly different
• H1: Observed and expected ratios are significantly different
2.Determine Observed and Expected frequencies
• Yellow flowers: Observed = 113 ; Expected = 100.5
• Green flowers: Observed = 21 ; Expected = 33.5
3.Calculate the X2-statistic using
4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01)
5.Calculate the critical value of X2
Does the OBSERVED ratio (113:21) differ (SIGNIFICANTLY) from the Expected (100.5:33.5) ratio?
Critical value: X2 significance level,
v= χ2 (113 – 100.5)2
100.5[ ] [ ](21 – 33.5)2
33.5+ = 1.55 + 4.66 = 6.22
Yellow flowers Green flowers
v α = 0.999 0.995 0.99 0.975 0.95 0.9 0.75 0.5 0.25 0.1 0.05 0.025 0.01
1 0.000 0.000 0.000 0.001 0.004 0.016 0.102 0.455 1.323 2.706 3.841 5.024 6.635
2 0.002 0.010 0.020 0.051 0.103 0.211 0.575 1.386 2.773 4.605 5.991 7.378 9.21
3 0.024 0.072 0.115 0.216 0.352 0.584 1.213 2.366 4.108 6.251 7.815 9.348 11.345
4 0.091 0.207 0.297 0.484 0.711 1.064 1.923 3.357 5.385 7.779 9.488 11.143 13.277
5 0.210 0.412 0.554 0.831 1.145 1.610 2.675 4.351 6.626 9.236 11.07 12.833 15.086
6 0.381 0.676 0.872 1.237 1.635 2.204 3.455 5.364 7.841 10.645 12.592 14.449 16.812
7 0.590 0.989 1.239 1.690 2.167 2.833 4.255 6.346 9.037 12.017 14.067 16.013 18.475
8 0.857 1.344 1.646 2.180 2.733 3.490 5.071 7.344 10.219 13.362 15.507 17.535 20.09
9 1.152 1.735 2.088 2.700 3.325 4.168 5.899 8.343 11.389 14.684 16.919 19.023 21.666
10 1.479 2.156 2.558 3.247 3.940 4.865 6.737 9.342 12.549 15.987 18.307 20.483 23.209
11 1.834 2.603 3.053 3.816 4.575 5.578 7.584 10.341 13.701 17.275 19.675 21.92
12 2.214 3.074 3.571 4.404 5.226 6.304 8.438 11.340 14.845 18.549 21.026
13 2.617 3.565 4.107 5.009 5.892 7.042 9.299 12.340 15.984 19.812
14 3.041 4.075 4.660 5.629 6.571 7.790 10.165 13.339 17.117
15 3.483 4.601 5.229 6.262 7.261 8.547 11.037 14.339
16 3.942 5.142 5.812 6.908 7.962 9.312 11.912
17 4.416 5.697 6.408 7.564 8.672 10.085
18 4.905 6.265 7.015 8.231 9.390
19 5.407 6.844 7.633 8.907
20 5.921 7.434 8.260
21 6.447 8.034
22 6.983
•Degrees of Freedom (v) = K – 1, where K = number of categories
•in this case two categories: (yellow-flowering and green-flowering) = (2 – 1)•…therefore v = 1
Critical value: X2 0.05, vCritical value: X2
0.05, 1
Critical value = 3.841
STATISTICS: CHI-SQUARED TESTS
1.Establish hypotheses
• H0: Observed and expected ratios are not significantly different
• H1: Observed and expected ratios are significantly different
2.Determine Observed and Expected frequencies
• Yellow flowers: Observed = 113 ; Expected = 100.5
• Green flowers: Observed = 21 ; Expected = 33.5
3.X2-statistic = 6.22
4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01)
5.Critical value = 3.841
6.X2-statistic > critical value therefore reject H0
Q: Does the OBSERVED ratio (113:21) differ (SIGNIFICANTLY) from the Expected (100.5:33.5) ratio?
A: the observed ratio is significantly different from the expected ratio
1.Establish hypotheses
2.Determine Observed and Expected frequencies
3.Calculate the X2-statistic using
4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01)
5.Calculate the critical value of X2
• use X2-statistic table
6.Compare X2-statistic with critical value
7.If X2-statistic > critical value reject H0 (significant differences between O and E)
8.If X2-statistic < critical value accept H0 (no significant differences between O and E)
= Σχ2 (O – E)2
E[ ]
Critical value: X2 significance level,
v
STATISTICS: CHI-SQUARED TESTS
Seed Type Observed Count Expected Ratio Expected Count (O-E) (O-E)2 (O-E)2 / EYellow - Smooth 152 9 140.63 11.38 129.39 0.92Yellow - Wrinkled 39 3 46.88 -7.88 62.02 1.32Green - Smooth 53 3 46.88 6.13 37.52 0.80Green - Wrinkled 6 1 15.63 -9.63 92.64 5.93
Total 250 16 250 8.97
Q: Has the geneticist sampled from a population having a
ratio of 9:3:3:1 ?
A plant geneticist has done some crossing between plants and come up with the following numbers of
different seeds
Now you try…
H0: Population sampled has YS:YW:GS:GW seeds in the ratio 9:3:3:1
H1: Population sampled does not have YS:YW:GS:GW seeds in the ratio 9:3:3:1
1.Establish hypotheses
2.Determine Observed and Expected frequencies
3.Calculate the X2-statistic using
4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01)
5.Calculate the critical value of X2
• use X2-statistic table
6.Compare X2-statistic with critical value
7.If X2-statistic > critical value reject H0 (significant differences between O and E)
8.If X2-statistic < critical value accept H0 (no significant differences between O and E)
= Σχ2 (O – E)2
E[ ]
Critical value: X2 significance level,
v
Seed Type Observed Count Expected Ratio Expected Count (O-E) (O-E)2 (O-E)2 / EYellow - Smooth 152 9 140.63 11.38 129.39 0.92Yellow - Wrinkled 39 3 46.88 -7.88 62.02 1.32Green - Smooth 53 3 46.88 6.13 37.52 0.80Green - Wrinkled 6 1 15.63 -9.63 92.64 5.93
Total 250 16 250 8.97
Now you try…STATISTICS: CHI-SQUARED TESTS
Q: Has the geneticist sampled from a population having a
ratio of 9:3:3:1 ?
A plant geneticist has done some crossing between plants and come up with the following numbers of
different seeds
Switch to Excel
1.Establish hypotheses
2.Determine Observed and Expected frequencies
3.Calculate the X2-statistic
4.Determine significance level for hypothesis testing
5.Calculate the critical value of X2
• use X2-statistic tableCritical value: X2 significance level,
v
Seed Type Observed Count Expected Ratio Expected Count (O-E) (O-E)2 (O-E)2 / EYellow - Smooth 152 9 140.63 11.38 129.39 0.92Yellow - Wrinkled 39 3 46.88 -7.88 62.02 1.32Green - Smooth 53 3 46.88 6.13 37.52 0.80Green - Wrinkled 6 1 15.63 -9.63 92.64 5.93
Total 250 16 250 8.97
Now you try…STATISTICS: CHI-SQUARED TESTS
Q: Has the geneticist sampled from a population having a
ratio of 9:3:3:1 ?
A plant geneticist has done some crossing between plants and come up with the following numbers of
different seeds
χ2 = 8.97α = 0.05
What is the critical value of χ2
v α = 0.999 0.995 0.99 0.975 0.95 0.9 0.75 0.5 0.25 0.1 0.05 0.025 0.01
1 0.000 0.000 0.000 0.001 0.004 0.016 0.102 0.455 1.323 2.706 3.841 5.024 6.635
2 0.002 0.010 0.020 0.051 0.103 0.211 0.575 1.386 2.773 4.605 5.991 7.378 9.21
3 0.024 0.072 0.115 0.216 0.352 0.584 1.213 2.366 4.108 6.251 7.815 9.348 11.345
4 0.091 0.207 0.297 0.484 0.711 1.064 1.923 3.357 5.385 7.779 9.488 11.143 13.277
5 0.210 0.412 0.554 0.831 1.145 1.610 2.675 4.351 6.626 9.236 11.07 12.833 15.086
6 0.381 0.676 0.872 1.237 1.635 2.204 3.455 5.364 7.841 10.645 12.592 14.449 16.812
7 0.590 0.989 1.239 1.690 2.167 2.833 4.255 6.346 9.037 12.017 14.067 16.013 18.475
8 0.857 1.344 1.646 2.180 2.733 3.490 5.071 7.344 10.219 13.362 15.507 17.535 20.09
9 1.152 1.735 2.088 2.700 3.325 4.168 5.899 8.343 11.389 14.684 16.919 19.023 21.666
10 1.479 2.156 2.558 3.247 3.940 4.865 6.737 9.342 12.549 15.987 18.307 20.483 23.209
11 1.834 2.603 3.053 3.816 4.575 5.578 7.584 10.341 13.701 17.275 19.675 21.92
12 2.214 3.074 3.571 4.404 5.226 6.304 8.438 11.340 14.845 18.549 21.026
13 2.617 3.565 4.107 5.009 5.892 7.042 9.299 12.340 15.984 19.812
14 3.041 4.075 4.660 5.629 6.571 7.790 10.165 13.339 17.117
15 3.483 4.601 5.229 6.262 7.261 8.547 11.037 14.339
16 3.942 5.142 5.812 6.908 7.962 9.312 11.912
17 4.416 5.697 6.408 7.564 8.672 10.085
18 4.905 6.265 7.015 8.231 9.390
19 5.407 6.844 7.633 8.907
20 5.921 7.434 8.260
21 6.447 8.034
22 6.983
Critical value: X2 0.05, 3
1.Establish hypotheses
2.Determine Observed and Expected frequencies
3.Calculate the X2-statistic
4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01)
5.Calculate the critical value = 7.815
6.Compare X2-statistic with critical value
7.If X2-statistic > critical value
Seed Type Observed Count Expected Ratio Expected Count (O-E) (O-E)2 (O-E)2 / EYellow - Smooth 152 9 140.63 11.38 129.39 0.92Yellow - Wrinkled 39 3 46.88 -7.88 62.02 1.32Green - Smooth 53 3 46.88 6.13 37.52 0.80Green - Wrinkled 6 1 15.63 -9.63 92.64 5.93
Total 250 16 250 8.97
Now you try…STATISTICS: CHI-SQUARED TESTS
Q: Has the geneticist sampled from a population having a
ratio of 9:3:3:1 ?
A plant geneticist has done some crossing between plants and come up with the following numbers of
different seeds
χ 2= 8.97
Reject the Null Hypothesis that sample drawn from a
population showing 9:3:3:1 ratio of YS:YW:GS:GW
IF Expected Counts are LESS than ONE, then you must combine the categories
Seed Type Observed Count Expected Ratio Expected Count (O-E) (O-E)2 (O-E)2 / EA 17 50 43.17 -26.17 684.80 15.86B 21 25 21.58 -0.58 0.34 0.02C 21 12.5 10.79 10.21 104.20 9.66D 23 6.25 5.40 17.60 309.90 57.43F 2 3.125 2.70 -0.70 0.49 0.18G 1 1.563 1.35 -0.35 0.12 0.09H 1 0.781 0.67 0.33 0.11 0.16I 0 0.391 0.34 -0.34 0.11 0.34
Total 86 100 86 83.73
Seed Type Observed Count Expected Ratio Expected Count (O-E) (O-E)2 (O-E)2 / EA 17 50 43.17 -26.17 684.79 15.86B 21 25 21.58 -0.58 0.34 0.02C 21 12.5 10.79 10.21 104.20 9.66D 23 6.25 5.40 17.60 309.90 57.43F 2 3.125 2.70 -0.70 0.49 0.18G 1 1.563 1.35 -0.35 0.12 0.09H + I 1 1.172 1.01 -0.01 0.00 0.00
Total 86 100 86 83.24
NB: By combining data you reduce value of K and also v
STATISTICS: CHI-SQUARED TESTS…final word…
POPULATION DYNAMICS POPULATION DYNAMICS Required background knowledge:
• Data and variability concepts
Data collection
• Measures of central tendency (mean, median, mode, variance, stdev)
• Normal distribution and SE
• Student’s t-test and 95% confidence intervals
• Chi-Square tests
• MS Excel
Continuous Discrete
DATA
Looking for probabilities: Z-TESTS
Comparing two means: T-TESTS
Chi - squared
Which stats test to use?
Use Getting started with data.xls for further advice