topic 8a basic statistics

47
Data Analysis and Interpretation 1: Basic Statistics Lecturer: Yee Bee Choo IPGKTHO Topic 8

Upload: yee-bee-choo

Post on 11-Jun-2015

130 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Topic 8a Basic Statistics

Data Analysis and Interpretation 1:

Basic StatisticsLecturer: Yee Bee Choo

IPGKTHO

Topic 8

Page 2: Topic 8a Basic Statistics

Basic Statistics

Measure of Central Tendency

Mode

Median

Mean

Measure of Dispersion

Range

Variance & Standard Deviation

Standard Score

Z Score T Score

Page 3: Topic 8a Basic Statistics

Two kinds of measures:1. Measures of central tendency 2. Measures of dispersion Both these types of measures are useful in

score reporting. They are frequently used to describe data. These are often called descriptive

statistics because they can help you describe your data.

Basic Statistics

Page 4: Topic 8a Basic Statistics

Central tendency measures the extent to which a set of scores gathers around.

There are three major measures of central tendency:

1. Mode2. Median3. Mean

Measure of Central Tendency

Page 5: Topic 8a Basic Statistics

Mode ◦ The “mode” for a set of data is the number (or item) that

occurs most frequently.

◦ Sometimes data can have more than one mode. This

happens when two or more numbers (or items) occur an

equal number of times in the data.

◦ A data set with two modes is called bimodal.

◦ A data set with 3 modes is called trimodal

◦ It is also possible to have a set of data with no mode.

Measure of Central Tendency

Page 6: Topic 8a Basic Statistics

Mode Mode is the most common number To find the mode, put the numbers in order,

choose the number that appears the most frequently.

Data: 3, 5, 5, 6, 4, 3, 2, 1, 5, 6

Put in order: 1, 2, 3, 3, 4, 5, 5, 5, 6, 6

The mode is 5.

Measure of Central Tendency

Page 7: Topic 8a Basic Statistics

Mode

BimodalData: 2, 5, 2, 3, 5, 4, 72, 2, 3, 4, 5, 5, 7Modes = 2 and 5

TrimodalData: 2, 5, 2, 7, 5, 4, 72, 2, 4, 5, 5, 7, 7Modes = 2, 5, and 7

Measure of Central Tendency

Page 8: Topic 8a Basic Statistics

Mode

Data: 3, 5, 6, 4, 7, 8, 9, 2, 1, 0

What is the mode?0,1,2,3,4,5,6,7,8,9Is the mode = 0?

Mode = no mode

Measure of Central Tendency

Page 9: Topic 8a Basic Statistics

Mode

The mode can be useful for dealing with categorical data. For example, if a sandwich shop sells 10 different types of sandwiches, the mode would represent the most popular sandwich.

The mode can be useful for summarising survey data or election votes.

Measure of Central Tendency

Page 10: Topic 8a Basic Statistics

Median

A median is a measure of the "middle" value of a set of data.

To find the median, put the numbers in order and find the middle number.

If the total number of values in the sample is even, the median is calculated by finding the mean of the two values in the middle.

Data: 45, 47, 50, 51, 52, 54, 65Median = 51

Measure of Central Tendency

Page 11: Topic 8a Basic Statistics

Median

Data: 45, 47, 50, 51, 52, 54, 65Median = 51

Data: 45, 47, 50, 51, 52, 53, 54, 65Median =(51 + 52)/2

= 51.5

Measure of Central Tendency

Page 12: Topic 8a Basic Statistics

Mean

The ‘Mean’ is the ‘Average’ value of numerical data.

The Mean (or average) is found by adding all scores together and dividing by the number of scores.

Measure of Central Tendency

Page 13: Topic 8a Basic Statistics

Mean

Data: 3, 5, 5, 6, 4, 3, 2, 1, 5, 6

Add up the numbers:

3 + 5 + 5 + 6 + 4 + 3 + 2 + 1 + 5 + 6 = 40

Divide by how many numbers:

40 ÷ 10 = 4

Mean = 4

Measure of Central Tendency

Page 14: Topic 8a Basic Statistics

Exercise 1

Below is a set of marks obtained by 7 students:

82 55 73 48 88 67 67

Find the mean, mode and median.

Measure of Central Tendency

Page 15: Topic 8a Basic Statistics

Exercise 2

On a standardised reading test, the nationwide average for Year 3 pupils is 7.0. A teacher is interested in comparing class reading scores with the national average. The scores for the 16 pupils in this class are as follows:

8, 6, 5, 10, 5, 6, 8, 9,

7, 6, 9, 5, 14, 4, 7, 6

a) Find the mean and the median reading scores for this class.

b) If the mean is used to define the class average, how does this class compare with the national norm?

c) If the median is used to define the class average, how does this class compare with the national norm?

Measure of Central Tendency

Page 16: Topic 8a Basic Statistics

Measure of Dispersion tells about the spread of scores in a data set.

There are three major measures of dispersion:

1. Range2. Standard deviation3. Variance

Measure of Dispersion

Page 17: Topic 8a Basic Statistics

Consider these means for weekly candy bar consumption.

X = {7, 8, 6, 7, 7, 6, 8, 7}

X = (7+8+6+7+7+6+8+7)/8

X = 7

X = {12, 2, 0, 14, 10, 9, 5, 4}

X = (12+2+0+14+10+9+5+4)/8

X = 7

What is the difference?

Measure of Dispersion

Page 18: Topic 8a Basic Statistics

Measure of Dispersion

Page 19: Topic 8a Basic Statistics

How well does the mean represent the scores in a distribution?

The logic here is to determine how much spread is in the scores. How much do the scores "deviate" from the mean? Think of the mean as the true score or as your best guess. If every X were very close to the Mean, the mean would be a very good predictor.

If the distribution is very sharply peaked then the mean is a good measure of central tendency and if you were to use the mean to make predictions you would be right or close much of the time.

Measure of Dispersion

Page 20: Topic 8a Basic Statistics

Range A range represents the distance on a

numeric scale from the minimum to the maximum.

You can calculate the range by subtracting the minimum value from the maximum value.

Range = maximum - minimum If the maximum grade was 100 and the

minimum was 55, the range would be Range= 100-55

= 45.

Measure of Dispersion

Page 21: Topic 8a Basic Statistics

Variance & Standard Deviation The variance and standard deviation describe how

far or close the numbers or observations of a data set lie from the mean (or average).

Variance is the measure of the average distance between each of a set of data points and their mean value; equal to the sum of the squares of the deviation from the mean value.

Standard deviation though calculated as the square root of the variance is the absolute value calculated to indicate the extent of deviation from the average of the data set.

Measure of Dispersion

Page 22: Topic 8a Basic Statistics

Variance & Standard Deviation

Formulae: Variance:

2( )iX X

sN

2

2 ( )iX Xs

N

Standard Deviation:

Measure of Dispersion

Page 23: Topic 8a Basic Statistics

Standard Deviation Standard deviation refers to how much the

scores deviate from the mean. There are two methods of calculating

standard deviation which are the deviation method and raw score method which are illustrated by the following formulae.

Measure of Dispersion

Page 24: Topic 8a Basic Statistics

Standard Deviation (Deviation Method) To illustrate this, we will use 20, 25,30.

Using standard deviation method, we come up with the following table:

Measure of Dispersion

Page 25: Topic 8a Basic Statistics

Standard Deviation (Raw Score Method) Using the raw score method, we can come up with

the following:

Measure of Dispersion

Page 26: Topic 8a Basic Statistics

Standard Deviation Both methods result in the same final value of 5. If you are calculating standard deviation with a

calculator, it is suggested that the deviation method be used when there are only a few scores and the raw score method be used when there are many scores.

This is because when there are many scores, it will be tedious to calculate the square of the deviations and their sum.

Measure of Dispersion

Page 27: Topic 8a Basic Statistics

Exercise 3

Calculate the range, variance and standard deviation for the following sample.

41, 17, 25, 34, 14, 40, 27, 19, 50, 39

26, 22, 28, 18, 42, 33, 25, 28, 27, 33

34, 7, 12, 36, 34, 16, 49, 19, 40, 28,

26, 30, 48, 33, 33, 25, 50, 29, 26, 30

Measure of Dispersion

Page 28: Topic 8a Basic Statistics

Standard Score Standardised scores are necessary when we

want to make comparisons across tests and measurements.

Z scores and T scores are the more common forms of standardised scores.

A standardised score can be computed for every raw score in a set of scores for a test.

Measure of Dispersion

Page 29: Topic 8a Basic Statistics

Exercise 4Consider the two sets of scores below:

A= 10, 36, 38, 40, 42, 44, 70B= 10, 12, 14, 40, 66, 68, 70

Find the range and mean.

Measure of Dispersion

Page 30: Topic 8a Basic Statistics

Standard Score Both set A and set B have the same range

and mean. However, set B is more dispersed. The

difference between the value 70 and other values is more significant than set A.

To make a comparison more clearly, we can standardised the score, by transforming it into another distribution.

Measure of Dispersion

Page 31: Topic 8a Basic Statistics

Standard Scorei. Z scoreThe Z score is the basic standardised score. It is referred to as thebasic form as other computations of standardised scores must firstcalculate the Z score. The formula is as follows:

Measure of Dispersion

Page 32: Topic 8a Basic Statistics

Standard Scorei. Z scoreCalculate the Z Score for a set of scores below:

25, 34, 40, 45The mean for this set of scores is 36 and the SD is 8.6.

Table 1:

Measure of Dispersion

Raw Score Application of Formula

(Raw score- Mean)/ SD

Z Score

25 25-36/8.6 -1.28

34 34-36/8.6 -0.23

40 40-36/8.6 0.47

45 45-36/8.6 1.04

Page 33: Topic 8a Basic Statistics

Standard Scorei. Z score

Measure of Dispersion

Page 34: Topic 8a Basic Statistics

Exercise 5

Ahmad obtained 90 marks (total mark is 100) in English test. The mean for the achievement of the whole class is 70 and the standard deviation (SD) is 25. In a Mathematics test, Ahmad obtained 60 marks. The mean achievement for Mathematics for the whole class is 40 while the SD is 15. In which subject does Ahmad score better?

Measure of Dispersion

Page 35: Topic 8a Basic Statistics

Exercise 6

A distribution of scores has a mean of 70. In this distribution, a score of x=80 is located 10 points above the mean.

a) Calculate z-scores for standard deviation 5 and 20.

b) Sketch the distribution and locate the position of x=80. Compare the two z-scores which corresponds to x=80.

Measure of Dispersion

Page 36: Topic 8a Basic Statistics

Exercise 6

Measure of Dispersion

70 70 X=80X=80

Mean=5

Mean=20

z-score=2 z-score=0.05

Page 37: Topic 8a Basic Statistics

Standard Scorei. Z score Z score values are very small and usually range only from –

2 to 2. Such small values make it inappropriate for score reporting

especially for those unaccustomed to the concept. Imagine what a parent may say if his child comes home

with a report card with a Z score of 0.80 in English Language!

Fortunately, there is another form of standardised score - the T score – with values that are more palatable to the relevant parties.

Measure of Dispersion

Page 38: Topic 8a Basic Statistics

Standard Scoreii. T score The T score is a standardised score which can be

computed using the formula 10 (Z) + 50. As such, the T score for students A, B, C, and D

in the table 1 are as below:

Measure of Dispersion

Raw Score Application of Formula 10(Z) +50

T Score

25 10(-1.28) + 50 37.2

34 10 (-0.23) + 50 47.7

40 10(0.47) + 50 54.7

45 10 (1.04) + 50 60.4

Page 39: Topic 8a Basic Statistics

Standard Scoreii. T score These values seem perfectly appropriate

compared to the Z score. The T score average or mean is always 50

(i.e. a standard deviation of 0) which connotes an average ability and the mid point of a 100 point scale.

Measure of Dispersion

Page 40: Topic 8a Basic Statistics

Interpretation of dataThe standardised score is actually a very important score if we want to compare performance across tests and between students. Let us take the following scenario as an example:

Measure of Dispersion

Page 41: Topic 8a Basic Statistics

Interpretation of dataHow can En. Abu solve this problem? He would have to havestandardised scores in order to decide. This would require thefollowing information:Test 1 : X = 42 standard deviation= 7Test 2 : X = 47 standard deviation= 8Using the information above, En. Abu can find the Z score for eachraw score reported as follows:

Table 2: Z Score for Form 2A

Measure of Dispersion

Page 42: Topic 8a Basic Statistics

Interpretation of data Based on Table 2, both Ali and Chong have a

negative Z score as their total score for both tests. However, Chong has a higher Z score total (i.e. –

1.07 compared to – 1.34) and therefore performed better when we take the performance of all the other students into consideration.

Measure of Dispersion

Page 43: Topic 8a Basic Statistics

Interpretation of dataTHE NORMAL CURVE The normal curve is a hypothetical curve that is

supposed to represent all naturally occurring phenomena.

Test scores that measure any characteristic such as intelligence, language proficiency or writing ability of a specific population is also expected to provide us with a normal curve.

The following is a diagram illustrating how the normal curve would look like.

Measure of Dispersion

Page 44: Topic 8a Basic Statistics

Interpretation of dataTHE NORMAL CURVE

Figure 1: The normal distribution or Bell curve

Measure of Dispersion

Page 45: Topic 8a Basic Statistics

Interpretation of dataTHE NORMAL CURVE The normal curve in Figure 1 is partitioned according to

standard deviations (i.e. – 4s, -3s, + 3s, + 4s) which are indicated on the horizontal axis.

The area of the curve between standard deviations is indicated in percentage on the diagram.

For example, the area between the mean (0 standard deviation) and +1 standard deviation is 34.13%.

Similarly, the area between the mean and –1 standard deviation is also 34.13%. As such, the area between –1 and 1 standard deviations is 68.26%.

Measure of Dispersion

Page 46: Topic 8a Basic Statistics

Interpretation of dataTHE NORMAL CURVE In using the normal curve, it is important to

make a distinction between standard deviation values and standard deviation scores.

A standard deviation value is a constant and is shown on the horizontal axis of the diagram above.

Measure of Dispersion

Page 47: Topic 8a Basic Statistics

Interpretation of dataTHE NORMAL CURVE The standard deviation score, on the other hand, is

the obtained score when we use the standard deviation formula provided earlier.

So, if we find the score to be 5 as in the earlier example, then the score for the standard deviation value of 1 is 5 and for the value of 2 is 5 x 2 = 10 and for the value of 3 is 15 and so on. Standard deviation values of –1, -2, and –3 will have corresponding negative scores of –5, -10, and –15.

Measure of Dispersion