chi-squared goodness of fit. what does it do? tests whether data you’ve collected are in line with...

18
Chi-squared Goodness of fit 2 (O -E) E

Upload: kristopher-byrd

Post on 28-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

Chi-squared

Goodness of fit

2(O -E)

E

Page 2: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

What does it do?

Tests whether data you’ve collected are in line with national or regional statistics.

Are there similar numbers of hot and cold days in town X as in the region generally?

Are the frequencies with which local households recycle in line with national statistics?

NB: Do NOT use this test to compare, for example, two towns. Chi-squared Association is the test for that.

Page 3: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

Planning to use it?

• You are working with numbers of people/ things, not, eg area, weight, length, %…

• You have an average of at least 5 people/things in each category

• You have some national/regional/global data to compare your data to.

Make sure that…

Page 4: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

How does it work?

• You assume (null hypothesis) that local figures are in accordance with national figures

• It compares observed values the data you collected expected valueswhat you’d get if the local data really did match the national data

Page 5: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

Doing the test

These are the stages in doing the test:

1. Write down your hypotheses

2. Work out the expected values

3. Use the chi-squared formula to get a chi-squared value

4. Work out your degrees of freedom

5. Look at the tables

6. Make a decision Click here for an example

Page 6: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

Hypotheses

H0: Data collected is in accordance with national/regional data

H1: Data collected is not in accordance with national/regional data

Be specific about what the data are you are collecting, and the data you are comparing it to!

Page 7: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

Expected Values

• Use your national data to work out the percentage of people/things in each category.

• Find the total number of people/things in your sample.

• Work out the numbers you’d expect in each category by doing:

national % for categorytotal in your sample

100

Page 8: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

Chi-Squared Formula

• For each category, work out2(O -E)

E

O = Observed value – your data

E = Expected value – which you’ve calculated

• Then add all your values up. This gives the chi-squared value

2

2 (O -E)

E = “Sum of”

Page 9: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

Degrees of freedomThe formula here for degrees of freedom is

degrees of freedom = n – 1

Where n is the number of categories

You do not need to worry about what this means –just make sure you know the formula!

But in case you’re interested – the more categories you have, the more likely you are to get a “strange” result in one or more of them. The degrees of freedom is a way of allowing for this in the test.

Page 10: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

Tables

This is a chi-squared table

These are your degrees of

freedom (df)

These are your significance levels

eg 0.05 = 5%

Page 11: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

Make a decision

• If the value you calculated is bigger than the tables, you reject your null hypothesis – so your figures do not fit national data/ predictions

• If the value you calculated is smaller than the tables, you accept your null hypothesis – so your figures do fit national data/predictions.

Page 12: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

Example: Comparing Birmingham weather to the West Midlands overall

A student decided to investigate whether Birmingham had comparable numbers of hot and cold days to the West Midlands in general.

Hypotheses:H0: The number of hot and cold days in Birmingham is

in accordance with the West Midlands generally

H1: The number of hot and cold days in Birmingham is not in accordance with the West Midlands generally

Page 13: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

The Data Obtained

Between 01/09/2002 and 31/08/2003, in Birmingham there were:

16 hot days (mean temperature > 20oC)11 cold days (mean temperature < 0oC)338 neither hot nor cold days

Page 14: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

The West Midlands Data

Over the ten-year period 1993-2002, percentages of hot and cold days in the West Midlands were:

2.47% hot days 2.37% cold days95.16% neither hot nor cold days

Page 15: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

Finding Expected Values

Use the regional % figures to find the expected values:• Find the total number of days• Work out the expected number of days in each category using:

regional % for category

total number of days100

Expected number

Page 16: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

The Expected ValuesTotal number of days = 365Expected ValuesCategory Expected Hot 2.47 365 100 = 9.02Cold 2.37 365 100 = 8.65Neither 95.16 365 100 = 347.33

Page 17: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

The calculations: (O-E)2/E

Category O E (O – E)2/E

Hot 16 9.02 5.401

Cold 11 8.65 0.638

Neither 338 347.33 0.251

Page 18: Chi-squared Goodness of fit. What does it do? Tests whether data you’ve collected are in line with national or regional statistics.  Are there similar

The test

2= 5.401 + 0.638 + 0.251

2 = 6.290

Degrees of freedom = 3 – 1 = 2Critical value (5%) = 5.991

So we reject H0 – the number of hot and cold days in Birmingham is not in accordance with the West Midlands generally.

2( ) O EE