chi-squared goodness of fit. what does it do? tests whether data you’ve collected are in line with...
TRANSCRIPT
Chi-squared
Goodness of fit
2(O -E)
E
What does it do?
Tests whether data you’ve collected are in line with national or regional statistics.
Are there similar numbers of hot and cold days in town X as in the region generally?
Are the frequencies with which local households recycle in line with national statistics?
NB: Do NOT use this test to compare, for example, two towns. Chi-squared Association is the test for that.
Planning to use it?
• You are working with numbers of people/ things, not, eg area, weight, length, %…
• You have an average of at least 5 people/things in each category
• You have some national/regional/global data to compare your data to.
Make sure that…
How does it work?
• You assume (null hypothesis) that local figures are in accordance with national figures
• It compares observed values the data you collected expected valueswhat you’d get if the local data really did match the national data
Doing the test
These are the stages in doing the test:
1. Write down your hypotheses
2. Work out the expected values
3. Use the chi-squared formula to get a chi-squared value
4. Work out your degrees of freedom
5. Look at the tables
6. Make a decision Click here for an example
Hypotheses
H0: Data collected is in accordance with national/regional data
H1: Data collected is not in accordance with national/regional data
Be specific about what the data are you are collecting, and the data you are comparing it to!
Expected Values
• Use your national data to work out the percentage of people/things in each category.
• Find the total number of people/things in your sample.
• Work out the numbers you’d expect in each category by doing:
national % for categorytotal in your sample
100
Chi-Squared Formula
• For each category, work out2(O -E)
E
O = Observed value – your data
E = Expected value – which you’ve calculated
• Then add all your values up. This gives the chi-squared value
2
2 (O -E)
E = “Sum of”
Degrees of freedomThe formula here for degrees of freedom is
degrees of freedom = n – 1
Where n is the number of categories
You do not need to worry about what this means –just make sure you know the formula!
But in case you’re interested – the more categories you have, the more likely you are to get a “strange” result in one or more of them. The degrees of freedom is a way of allowing for this in the test.
Tables
This is a chi-squared table
These are your degrees of
freedom (df)
These are your significance levels
eg 0.05 = 5%
Make a decision
• If the value you calculated is bigger than the tables, you reject your null hypothesis – so your figures do not fit national data/ predictions
• If the value you calculated is smaller than the tables, you accept your null hypothesis – so your figures do fit national data/predictions.
Example: Comparing Birmingham weather to the West Midlands overall
A student decided to investigate whether Birmingham had comparable numbers of hot and cold days to the West Midlands in general.
Hypotheses:H0: The number of hot and cold days in Birmingham is
in accordance with the West Midlands generally
H1: The number of hot and cold days in Birmingham is not in accordance with the West Midlands generally
The Data Obtained
Between 01/09/2002 and 31/08/2003, in Birmingham there were:
16 hot days (mean temperature > 20oC)11 cold days (mean temperature < 0oC)338 neither hot nor cold days
The West Midlands Data
Over the ten-year period 1993-2002, percentages of hot and cold days in the West Midlands were:
2.47% hot days 2.37% cold days95.16% neither hot nor cold days
Finding Expected Values
Use the regional % figures to find the expected values:• Find the total number of days• Work out the expected number of days in each category using:
regional % for category
total number of days100
Expected number
The Expected ValuesTotal number of days = 365Expected ValuesCategory Expected Hot 2.47 365 100 = 9.02Cold 2.37 365 100 = 8.65Neither 95.16 365 100 = 347.33
The calculations: (O-E)2/E
Category O E (O – E)2/E
Hot 16 9.02 5.401
Cold 11 8.65 0.638
Neither 338 347.33 0.251
The test
2= 5.401 + 0.638 + 0.251
2 = 6.290
Degrees of freedom = 3 – 1 = 2Critical value (5%) = 5.991
So we reject H0 – the number of hot and cold days in Birmingham is not in accordance with the West Midlands generally.
2( ) O EE