6.describing a distribution

53
DESCRIBING A DISTRIBUTION A good way to describe the distribution of a quantitative variable is to take the following three steps: o Report the center of the distribution. [Measures of Central Tendency] o Report any significant deviations from the center. [Measures of Variation] o Report the general shape of the distribution. [Measures of Skewness and Peakedness] 1

Upload: sonu-kumar

Post on 18-Jul-2015

109 views

Category:

Automotive


0 download

TRANSCRIPT

Page 1: 6.describing a distribution

DESCRIBING A DISTRIBUTION

• A good way to describe the distribution of a quantitative variable is to take the following three steps:o Report the center of the distribution. [Measures of

Central Tendency]o Report any significant deviations from the center.

[Measures of Variation]o Report the general shape of the distribution.

[Measures of Skewness and Peakedness]

1

Page 2: 6.describing a distribution

MEASURES OF CENTRAL TENDENCYCENTER OF DISTRIBUTION

2

Page 3: 6.describing a distribution

AVERAGE

The central tendency is measured by averages. These describe the point about which the various observed values cluster.

In mathematics, an average, or central tendency of a data set refers to a measure of the "middle" or "expected" value of the data set.

An average is a single value which considered as the most representative or a typical value for a given set of data.

Objectives of averagingo To get one single value that describes the characteristics

of the entire data.o To facilitate comparison

3

Page 4: 6.describing a distribution

CHARACTERISTICS OF A GOOD AVERAGE

Easy to understandSimple to computeBased all observationsCapable of further algebraic treatmentShould not be unduly affected by the presence of

extreme values

4

Page 5: 6.describing a distribution

TOOLS TO COMPUTE THE AVERAGE

Mean Median Mode

5

Page 6: 6.describing a distribution

MEAN

It is commonly used measure of central tendency.The mean is obtained by adding together all

observations and by dividing the total by the number of observations.

The mean, in most cases, is not an actual data value.

6

Page 7: 6.describing a distribution

CALCULATION OF AVERAGE DEVIATION

For ungrouped series

For grouped series o Direct Method

o Short cut method

Where,o = Meano X=observationo N = Number of Observationso A = Assumed meano i = Class interval

7

X

N

XX ∑=

N

fXX ∑=

i x N

fdAX ∑+=

Page 8: 6.describing a distribution

MATHEMATICAL PROPERTIES OF MEAN

The algebraic sum of the deviation of all the observations from mean is always zero.

The sum of squared deviation of all the observation from mean is minimum i.e. less than the squared deviation of all observations from any other value than the mean.

If we have the mean and number of observations of two or more than two related groups.

8

.....NN

....XNXNX Mean Combined

21

221112

++++==

Page 9: 6.describing a distribution

MERITS AND DEMERITS OF MEAN

Merits o It possesses the first four out of five characteristics of a

good average.

Demeritso Mean is unduly affected by the presence of extreme values. o In continuous series, it is difficult to compute mean without

making assumption of mid point of the class.o Applicable for only quantitative data.o Some times mean may not be an observation in data.

9

Page 10: 6.describing a distribution

MEDIAN

Median is the measure of central tendency which appears in the ‘middle’ of an ordered sequence (either in ascending or descending order) of values.

It divides whole data into two equal parts. In other words, 50% of the observations are smaller than the median and 50% will larger than it

10

Page 11: 6.describing a distribution

CALCULATION OF MEDIAN

Individual Series

M = Size of the

item Discrete Series

M = Size of the

item Continuous Series

M = Size of the

item

Where,o M= Mediano N = Number of

Observationso L = Lower limit of median

classo cf = Cumulative

frequency of the class preceding median class

o f = frequency of median class observation

o i = Class interval

11

th

2

1N

+

th

2

N

th

2

N

Ccf

N

LM ×−

+=f

21

Page 12: 6.describing a distribution

MERITS AND DEMERITS OF MEDIAN

Meritso It is especially useful in case of continuous series because

mid point is not used for calculation.o It is not influenced by presence of extreme values. o Applicable for quantitative and qualitative data.

Demeritso Not based on every observationso Not capable of algebraic treatmento Tends to be rather unstable value if the number of

observations is small.

12

Page 13: 6.describing a distribution

MODE

• Mode is defined as that value which occurs the maximum number of times i.e. having the maximum frequency

• A data set can have more than one mode.• A data set is said to have no modeno mode if all values occur

with equal frequency.

13

Page 14: 6.describing a distribution

CALCULATION OF MODE

Individual SeriesZ = The item which repeated more number of times

Discrete SeriesZ = The item which repeated more number of times i.e higher frequency

Continuous Series

Where,o Z= Modeo L = Lower limit of median

class

o f1 = Frequency of modal class

o fo = Frequency of the class preceding the modal class

o f2= Frequency of the class succeeding the modal class

o i = Class interval14

ifff

ffLZ ×

−−−+=

201

01

2

Page 15: 6.describing a distribution

MERITS AND DEMERITS OF MODE

Meritso Not affected by extreme valueso Applicable for quantitative and qualitative data.o Can be obtained in continuous series without assuming

the mid point.

Demeritso Limited utility compared to mean and mediano Mode can not be determined if modal class is at the

extreme.o Difficult to compute in case of bimodal distribution

modeo Possibilities of ‘no mode distribution’

15

Page 16: 6.describing a distribution

GENERAL LIMITATION OF AN AVERAGE

Since an average is a single value representing a group values, it must be properly interpreted, otherwise, there is every possibility of jumping to wrong conclusion.

An average may give us a value does not exit in the data.

Some time an average may give absurd result.Measure of central value fail to give us any idea

about the formation of the series. Two or more series may have the same central value but may differ widely in composition.

16

Page 17: 6.describing a distribution

DESCRIBING A DISTRIBUTION

• A good way to describe the distribution of a quantitative variable is to take the following three steps:o Report the center of the distribution. [Measures of

Central Tendency]o Report any significant deviations from the center.

[Measures of Variation]o Report the general shape of the distribution.

[Measures of Skewness and Peakedness]

17

Page 18: 6.describing a distribution

MEASURE OF DISPERSION/VARIATION DEVIATIONS FROM THE CENTER

Page 19: 6.describing a distribution

TODAY’S QUESTION

Two classes took a recent quiz. There were 10 students in each class, and Their scores are as follows

Each class had an average score of 81.5Since the averages are the same, can we assume

that the students in both classes all did pretty much the same on the exam?

The answer is… No.The average (mean) does not tell us anything about the distribution or variation in the grades.

19

Class A 72 76 80 80 81 83 84 85 85 89Class B 57 65 83 94 95 96 98 93 71 63

Page 20: 6.describing a distribution

20

Mean

TODAY’S QUESTION

Page 21: 6.describing a distribution

TODAY’S QUESTION

So, we need to come up with some way of measuring not just the average, but also the spread of the distribution of our data. i.e. variation or dispersion

Variation/dispersion means how spread out are the scores around the mean.

If many observations “bunched up” around the mean which indicates narrowly spread and otherwise widely spread.

If the distribution is narrowly spread the better your ability to make accurate predictions.

21

Page 22: 6.describing a distribution

MEASURE OF VARIATION

A measure of variation/dispersion is designed to state the extent to which the individual observation differ from mean.

The measure of variation gives the degree of variation i.e. amount of variation.

22

Page 23: 6.describing a distribution

SIGNIFICANCE OF MEASURING VARIATION

To determine the reliability of an averageTo compare two or more series with regard to their

variabilityTo facilitate the use of other statistical measures

23

Page 24: 6.describing a distribution

HOW CAN WE QUANTIFY DISPERSION?

The mean deviationThe standard deviation

24

Page 25: 6.describing a distribution

COEFFICIENT OF VARIATION (CV)

All the tools of measurement of variation quantify the variation/deviation. The CV indicates the degree of variation in a scale of 0 to 1.

CV is a measure of relative variability used to:o measure changes that have occurred in a population over

timeo compare variability of two populations that are expressed

in different units of measuremento expressed as a fraction rather than in terms of the units of

the particular datao Always lies between 0 to 1o If CV is near to 0, then the degree of variation less and near

to 1, then degree variation is high.

25

Page 26: 6.describing a distribution

RANGE

Range is defined as difference between the value of smallest observation and largest observation in the distribution.

Range = L-S Coefficient of Range =

Useful for: daily temperature fluctuations or share price movement

Is considered primitive as it considers only the extreme values which may not be useful indicators of the bulk of the population.

An outlieroutlier is an extremely high or an extremely low data value when compared with the rest of the data values.

26

SL

S-L

+

Page 27: 6.describing a distribution

MERITS AND DEMERITS OF RANGE

Meritso Simple to understand and easy to computeo Less time consuming

Demeritso Not based on each and every observation of the

distributiono Can not be calculated in case of open end distributiono Fails to reveal the character of the distribution

27

Page 28: 6.describing a distribution

INTERQUARTILE RANGE OR QUARTILE DEVIATION

Measures the range of the middle 50% of the values only

Is defined as the difference between the upper and lower quartiles

Interquartile range = Q3-Q1

Quartile Deviation =

Coefficient of Q.D. =

28

2

Q-Q Q.D. 13=

13

13

QQ

Q-Q

+

Page 29: 6.describing a distribution

MERITS AND DEMERITS OF QD

Meritso Superior than rangeo Can be calculated for open end classes alsoo Not affected by the presence of extreme values

Demeritso Considers only 50% of the observationso Not capable of mathematical manipulationo Does not show the scatter around an average

29

Page 30: 6.describing a distribution

AVERAGE DEVIATION

Average deviation is obtained by calculating the absolute deviations of each observation from mean or median and then averaging these deviations by taking their arithmetic mean.

Measures the ‘average’ distance of each observation away from the mean of the data

Gives an equal weight to each observationGenerally more sensitive than the range or

interquartile range, since a change in any value will affect it

30

Page 31: 6.describing a distribution

CALCULATION OF AVERAGE DEVIATION

For ungrouped series

For grouped series

Coefficient of Average

Deviation

Whereo AD = Average Deviationo o = Meano f = Frequency of observation

31

N

dAD ∑=

2

N

fdAD ∑=

2

X

AD

X-X d =

X

Page 32: 6.describing a distribution

MERITS AND DEMERITS OF AD

Meritso Relatively simple to calculate.o Based on each and every observations of the datao Less affected by the values of extreme observationso Since deviations are taken from central value, comparison

about formation of different distributions can easily be made.

Demeritso Algebraic sign are ignored o May not give accurate result

32

Page 33: 6.describing a distribution

STANDARD DEVIATION

Most popular tool of measure of variation.It is introduced by Karl Pearson in 1893.It is the square root of the means of square

deviations from the arithmetic mean.Measures the variation of observations from the

meanWorks with squares of residuals not absolute

valuesIf the Standard Deviation is large, it means the

observations are spread out from their mean.If the Standard Deviation is small, it means the observations are close to their mean.

33

Page 34: 6.describing a distribution

CALCULATION OF AVERAGE DEVIATION

For ungrouped series

For grouped series o Direct Method

o Short cut method

Coefficient of Average

Deviation

Whereo = Standard Deviationo o = Meano f = Frequency of observation

34

X-X d =

X

N

d∑=2

σ σ

N

fd∑=2

σ

N

fd

N

fd ∑∑ −=22

σ

100×=X

σ

Page 35: 6.describing a distribution

MATHEMATICAL PROPERTIES OF STANDARD DEVIATION

Combined Standard Deviation

Standard Deviation of natural numbers

The sum of the squares of the deviations of all the

observations from their arithmetic mean is minimum. Standard Deviation is independent of change of origin but

not scale.

35

21

222

211

222

211

12 NN

dNdNNN

++++

= ∑ σσσ

)1(12

1 2 −= Nσ

Page 36: 6.describing a distribution

MERITS AND DEMERITS OF STANDARD DEVIATION

Meritso Based on every item of the distributiono Possible to calculate the combined standard deviation o For comparing the variability of two or more distribution

coefficient of variation is considered to be most appropriate

o It is used most prominently used in further statistical work.

Demeritso Compare to others it is difficult to computeo It gives more weight to extreme values and less to those

which near to mean.

36

Page 37: 6.describing a distribution

DESCRIBING A DISTRIBUTION

• A good way to describe the distribution of a quantitative variable is to take the following three steps:o Report the center of the distribution. [Measures of

Central Tendency]o Report any significant deviations from the center.

[Measures of Variation]o Report the general shape of the distribution.

[Measures of Skewness and Peakedness]

37

Page 38: 6.describing a distribution

MEASURES OF SKEWNESS AND PEAKEDNESS SHAPE OF THE DISTRIBUTION

38

Page 39: 6.describing a distribution

DISTRIBUTION OF DATA

Data can be "distributed" (spread out) in different ways.

39

spread out more on the left spread out more on the right

all jumbled uparound a central value with no bias left or right

Page 40: 6.describing a distribution

NORMAL DISTRIBUTION CURVE [BELL SHAPED CURVE]

40

Page 41: 6.describing a distribution

CHARACTERISTICS OF THE NORMAL DISTRIBUTION

• The normal distribution curve is bell-shaped.• It is symmetrical about mean-50% observations are to one

side of the center; the other 50% observations on the other side.

• The curve never touches the X-axis• The height of the normal curve is at its maximum at the

mean.• The distribution is single peaked, not bimodal or multi-

modal• Most of the cases will fall in the center portion of the curve

and as values of the variable become more extreme they become less frequent, with “outliers” at each of the “tails” of the distribution few in number.

• The Mean, Median, and Mode are the same.41

Page 42: 6.describing a distribution

NORMAL DISTRIBUTION & OTHER TOOLS

Symmetrical distribution and Mean/Median/Mode

o Mode= 3 Median-2 Mean Symmetrical distribution and standard deviation

o covers 68.27% observationso covers 95.45% observationso covers 99.73% observations

42

σ1±X

σ2±X

σ3±X

Page 43: 6.describing a distribution

SKEWNESS• The term skewness refers to lack of symmetry or departure

from symmetry. When a distribution is not symmetrical it is called as skewed distribution.

• In a symmetrical distribution, the values of mean, median and mode are alike.

• If the value of mean is greater than the mode, skewness is said to be positive. A positive skewed distribution contains some values that are much larger than the majority of observations.

• If the value of mode is greater than mean, skewness is said to be negative. A negative skewed distribution contains some values that are much smaller than the majority of observations.

• It is important to emphasize that skewness of a distribution cannot be determined simply by inspection.

• Points to be remember-Zero skewness does not mean that distribution is normal distribution! [A normal distribution should have skewness as zero and peakedness as 3.]

43

Page 44: 6.describing a distribution

SKEWNESS

If Mean = Mode, the skewness is zero.If Mean > Mode, the skewness is positive.If Mean < Mode, the skewness is negative.

44

Page 45: 6.describing a distribution

SKEWNESS DISTRIBUTIONS

45

Page 46: 6.describing a distribution

SKEWNESS DISTRIBUTION

46

Page 47: 6.describing a distribution

MEASURES OF SKEWNESS

Karl Pearson’s Coefficient of Skewness

Bowley’s Coefficient of skewness

47

σMode)-(Mean

Sk p =

13

13B QQ

2Median-QQSk

−+=

Page 48: 6.describing a distribution

COEFFICIENT OF SKEWNESSCoefficient of skewness measures the degree of

skewness and always lies between +1 to -1.If the answer is 0, indicates symmetrical distributionIf the answer is negative, then the distribution is

negatively skewed. o If the answer is close to -1 (say -0.90), then the distribution

is highly negatively skewed.o If the answer is close to 0 (say -0.20), then the distribution

is slightly negatively skewed.If the answer is positive, then the distribution is

negatively skewed. o If the answer is close to 1 (say 0.90), then the distribution

is highly positively skewed.o If the answer is close to 0 (say 0.20), then the distribution

is slightly negatively skewed.48

Page 49: 6.describing a distribution

A PROBLEM

Following data is related to marks scored by three different sections in statistics.

Compute the Mean, Median, Mode, Standard deviation, skewness and interpret the results.

Marks

0-10 10-20

20-30

30-40

40-50

50-60

60-70

Number of

Students

Sec A

3 5 11 22 11 5 3

Sec B

6 15 20 10 5 3 1

Sec C

1 3 5 10 20 15 6

49

Page 50: 6.describing a distribution

SECTION A

Marks X f cf fX d fd fd2

0-10 5 3 3 15 -30 -90 2700

  10-20 15 5 8 75 -20 -100 2000

20-30 25 11 19 275 -10 -110 1100

30-40 35 22 41 770 0 0 0

40-50 45 11 52 495 10 110 1100

50-60 55 5 57 275 20 100 2000

60-70 65 3 60 195 30 90 2700

    60   2100 0 0 10800

MEAN 35

MEDIAN 35

MODE 35

SD 13.9

SKEWNESS 0

50

Page 51: 6.describing a distribution

SECTION B

Marks X f cf fX d fd fd2

0-10 5 6 6 30 -21 -126 2646

  10-20 15 15 21 225 -11 -165 1815

20-30 25 20 41 500 -1 -20 20

30-40 35 10 51 350 9 90 810

40-50 45 5 56 225 19 95 1805

50-60 55 3 59 165 29 87 2523

60-70 65 1 60 65 39 39 1521

    60   1560 63 0 11140

MEAN 26.00

MEDIAN 24.50

MODE 23.33

SD 13.63

SKEWNESS 0.11

51

Page 52: 6.describing a distribution

SECTION C

Marks X f cf fX d fd fd2

0-10 5 1 1 5 -39 -39 1521

  10-20 15 3 4 45 -29 -87 2523

20-30 25 5 9 125 -19 -95 1805

30-40 35 10 19 350 -9 -90 810

40-50 45 20 39 900 1 20 20

50-60 55 15 54 825 11 165 1815

60-70 65 6 60 390 21 65 2646

    60   2640 -63 0 16000

MEAN 44.00

MEDIAN 45.50

MODE 46.67

SD13.6

3

SKEWNESS -0.1152

Page 53: 6.describing a distribution

GRAPHS OF SECTION A, B AND C

.

53