sadc course in statistics numerical summaries for quantitative data module i3 sessions 4 and 5

21
SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

Upload: julian-hughes

Post on 28-Mar-2015

228 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

SADC Course in Statistics

Numerical summaries for quantitative data

Module I3 Sessions 4 and 5

Page 2: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

Learning objectives

• Students should be able to:

• Explain why it is important to summarise• the variability of a dataset

• Provide from first principles and explain • the role of the common summary statistics for average

and spread for a simple dataset

• Visualise a dataset to estimate • the standard deviation from a graph of the data

• Visualise a dataset to construct • a histogram or boxplot, given a numerical summary

• Explain the formulae • for the variance, standard deviation and mean deviation

Page 3: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

Contents

Activity 1: Power point presentation• To stress the importance of understanding summary

statistics.

Activity 2: Practical 1• Calculate averages and measures of variation

Activity 3: Practical 2• Interpret and explain averages and measures of

variation

Activity 4: Review of key points and concepts

Page 4: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

Why variation is SO important

• From D. S. Moore•In Statistics: A Guide to the Unknown – 4th Edition

• “Variation is everywhere•Individuals vary.•Repeated measurements on the same individual vary.

• The science of statistics

• provides tools for dealing with variation”• Give examples of the two statements in blue:

•time of arrival at a lecture, •blood pressure, •reaction times, •penalty taking in football.

Page 5: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

Look at the wide range of situations!

• Record some examples on the board or flip chart.

• How many people said the same thing?

• How many areas of application can be considered?

Page 6: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

CAST and summary statistics

CAST will be used extensively in one of the practicals

Page 7: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

DFID and climate – was this area mentioned?

Reducing the vulnerability of the poor to current climate variability is the starting point for adaptation to climate change.

Climatic variability is a fundamental driver of poverty in poor countries. The climate is changing and it is highly likely that it will worsen poverty and hinder efforts to achieve the Millennium Development Goals.

The poor cannot cope with current climatic variation in many parts of the world, but this issue is often ignored in poverty assessments or national development planning.

Responses to existing climatic variability should be mainstreamed into national development plans and processes.

Current responses by individuals and governments to the impacts of climate variability can be used as the basis for adaptation to the increasing climate variability that will be associated with longer-term climate change.

Page 8: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

So• To practice statistics

• You must be able to summarise sets of data• Including giving a measure of “average”• And particularly to summarise the variability

• The simple summaries of variability are easy• The extremes (maximum and minimum) and the range• The quartiles

• But the most used measure of variation• Is called the standard deviation• You can calculate it easily – in Excel!!!• But you must understand and be able to interpret it• And that is what you need to learn from these sessions

Page 9: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

Activity 2: Practical 1

• Trivial data sets• By hand – for understanding• And using Excel

• To explain the formulae• So you can also use them

• Including the coefficient of variation (cv)• Which provides a good initial test of your understanding

• The cv is useful, but also overused• We ask you to explain when it should NOT be used

Page 10: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

Activity 3: Using CAST for help

• You work in pairs

• Learning from CAST

• and then taking on a teacher’s role• You need to understand a topic well• To be able to explain it to someone else

• CAST also gives exercises• To estimate the variability from a histogram or boxplot• To draw the histogram or boxplot, given the summary

values

• You also try these tasks• With your partner to help – or hinder!

Page 11: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

Discussion

• From practical 1:• Suppose marks in a test are• 12, 15, … so the mean = 20 and the s.d. = 8• Students are all given 15 marks bonus for attending• They all attended, so all get the extra 15• What is the mean and what is the standard deviation?

Page 12: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

A possible problem with Excel

• Software should give the right answer • We show that Excel standard functions did not

– though SSC-Stat is OK• Give the mean and standard deviation of:

1 2 3 4 5mean = 3 s.d. = 1.58

• What is the mean and s.d. if we add 10? 11 12 13 14 15

mean = ??? s.d. = ???

Page 13: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

A possible problem with Excel

• Software should give the right answer • We show that Excel standard functions did not

– though SSC-Stat is OK• Give the mean and standard deviation of:

1 2 3 4 5mean = 3 s.d. = 1.58

• What is the mean and s.d. if we add 10? 11 12 13 14 15

mean = 13 s.d. = 1.58 again

* Check you are absolutely clear that this is true• And if you add 100 the s.d. = ???• And if you add 1000 the s.d. = ???

Page 14: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

Standard deviation in Excel 2000

Same as previous slide ooops!

Page 15: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

This problem with Excel

• It was fixed in Excel 2003• But it should make you worry • that other answers might still be wrong

• We return to this point in Session 13

• Now the key idea is your understanding of the measures of variation

Page 16: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

The coefficient of variation – (cv)

• It is popular in some areas of application• And easy to misuse

• It is given by • cv = 100 * s.d./mean

• When should it NOT be used1. When the s.d. should not be used. When is that?2. When it is not sensible to divide by the mean. When is

that?

Page 17: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

Training – how did it go?

• Did you get good marks as trainers?

• What suggestions did you have for improvements?

Page 18: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

Exercises – how did you do?

Page 19: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

My reasoning was as follows:

In the figure, everything is between 100 and 300

Most data (not quite all) are within 2 * s.d., so s.d. must be less than 50. So I said 45!

Page 20: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

Learning objectives

• Are you now able to:

• Explain why it is important to summarise• the variability of a dataset

• Provide from first principles and explain • the role of the common summary statistics for average

and spread for a simple dataset

• Visualise a dataset to estimate • the standard deviation from a graph of the data

• Visualise a dataset to construct • a histogram or boxplot, given a numerical summary

• Explain the formulae • for the variance, standard deviation and mean deviation

Page 21: SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

Now you know about the common summary statistics,

the next sessions put them to use