chapter 3 : descriptive statistic : numerical measures (statistics)

25
CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

Upload: della-robertson

Post on 27-Dec-2015

271 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

Page 2: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

DESCRIPTIVE STATISTICS : NUMERICAL MEASURES (STATISTICS)

3.1 Measures of Central Tendency/ Location

There are 3 popular central tendency measures, mean, median & mode.

1) MeanThe mean of a sample is the sum of the measurements divided

by the number of measurements in the set. Mean is denoted by ( )

Mean = Sum of all values / Number of valuesMean can be obtained as below :-

- For raw data, mean is defined by,

_

x

n

xxornnfor

n

xxxx n

_21

_

,...,2,1,.......

Page 3: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

Example 3.1:

The mean sample of CGPA (raw/ungroup) is:_

62 93 126 75 34

5390

578

xx

n

MLB Team2002 Total Payroll (Million of dollars)

Anaheim Angels 62Atlanta Braves 93New York Yankees 126St. Louis Cardinals 75Tampa Bay Devil Rays 34

Total 390

Table 3.1

Page 4: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

- For tabular/group data, mean is defined by:

1

1

Where

  class frequency; 

class mark mid point

n

i ii

n

ii

f xfx

x orf

f

f

x

Page 5: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

Example 3.2 :The mean sample for Table 3.2

1

1

161.75

503.235

n

i ii

n

ii

f xx

f

CGPA (Class)

Frequency, f

Class Mark

(Midpoint), x

fx

2.50 - 2.75 2 2.625 5.2502.75 - 3.00 10 2.875 28.7503.00 - 3.25 15 3.125 46.8753.25 - 3.50 13 3.375 43.8753.50 - 3.75 7 3.625 25.3753.75 - 4.00 3 3.875 11.625

Total 50  161.75

0Table 3.2

Page 6: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

2) Median Median is the middle value of a set of observations arranged

in order of magnitude and normally is devoted by

i) The median for ungrouped data.

- The median depends on the number of observations in the data, .

- If is odd, then the median is the th observation of the ordered observations.

- If is even, then the median is the arithmetic mean of the th observation and the th observation.

~

x

)2

1(

n

n

n

n

2

n )12

( n

Page 7: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

ii) The median of grouped data / frequency of distribution.

The median of frequency distribution is defined by:

where,• = the lower class boundary of the median class;• = the size of the median class interval;• = the sum of frequencies of all classes lower than the median

class• = the frequency of the median class.

j

j

f

f

FcLx 12

~

L

c

1jF

jf

Page 8: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

Example 3.3 for ungrouped data :-

The median of this data 4, 6, 3, 1, 2, 5, 7, 3 is 3.5.

Proof :-- Rearrange the data in order of magnitude becomes 1,2,3,3,4,5,6,7. As n=8 (even), the median

is the mean of the 4th and 5th observations that is 3.5.

Example 3.4 for grouped data :-

j

j

f

f

FcLx 12

~CGPA (Class) Frequency, f

Cum.

frequency

2.50 - 2.75 2 2

2.75 - 3.00 10 12

3.00 - 3.25 15 27

3.25 - 3.50 13 40

3.50 - 3.75 7 47

3.75 - 4.00 3 50

Total 50  

217.315

122525.000.3,

~

xMedian

Table 3.3

Page 9: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

3) Mode• The mode of a set of observations is the observation with the

highest frequency and is usually denoted by ( ). Sometimes mode can also be used to describe the qualitative data.

i) Mode of ungrouped data :-

- Defined as the value which occurs most frequent.

- The mode has the advantage in that it is easy to calculate and eliminates the effect of extreme values.

- However, the mode may not exist and even if it does exit, it may not be unique.

 

x-

Page 10: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

*Note: If a set of data has 2 measurements with higher frequency,

therefore the measurements are assumed as data mode and known as bimodal data.

If a set of data has more than 2 measurements with higher frequency so the data can be assumed as no mode.

ii) The mode for grouped data/frequency distribution data.

- When data has been grouped in classes and a frequency curve is drawn to fit the data, the mode is the value of corresponding to the maximum point on the curve.

 

Page 11: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

- Determining the mode using formula.

*Note:

- The class which has the highest frequency is called the modal class.

1

1 2

1

2

where 

the lower class boundary of the modal class

the size of the modal class interval;

the difference between the modal class frequency and the

class before it;and

the dif

x L c

L

c

ference between the modal class frequency and the

class after it.

Page 12: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

Example 3.5 for ungrouped data :

The mode for the observations 4,6,3,1,2,5,7,3 is 3.

Example 3.6 for grouped data based on table :Proof :-

CGPA (Class) Frequency2.50 - 2.75 22.75 - 3.00 10 3.00 - 3.25 153.25 - 3.50 133.50 - 3.75 73.75 - 4.00 3Total 50

1

1 2

53.00 0.25( )

5 23.179

x L c

Table 3.4

Modal Class

Page 13: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

3.2 Measure of DispersionThe measure of dispersion/spread is the degree to which a set

of data tends to spread around the average value. It shows whether data will set is focused around the mean or

scattered. The common measures of dispersion are:

1) range

2) variance

3) standard deviationThe standard deviation actually is the square root of the

variance. The sample variance is denoted by s2 and the sample standard

deviation is denoted by s.

Page 14: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

1) Range The range is the simplest measure of dispersion to calculate.

Range = Largest value – Smallest value

Example 3.7:-

Table 3.5 gives the total areas in square miles of the four western South-Central states the United States.

Solution:Range = Largest Value – Smallest Value

= 267, 277 – 49, 651 = 217, 626 square miles.

State Total Area (square miles)

Arkansas 53,182

Louisiana 49,651

Oklahoma 69,903

Texas 267, 277

Table 3.4

Page 15: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

2) Variance

i) Variance for ungrouped data The variance of a sample (also known as mean square) for the

raw (ungrouped) data is denoted by s2 and defined by:

ii) Variance for grouped data The variance for the frequency distribution is defined by:

1

)( 22

n

xxS

11

)(

2

22

2

nn

fxfx

fx

xxfS

Page 16: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

Example: Ungrouped Data7 , 6, 8, 5 , 9 ,4, 7 , 7 , 6, 6

Range = 9-4=5

Mean

Variance

Standard Deviation

16

22 ( ) 18.5

2.05561 9

x xS

n

_

6.5x

xn

2( )2.0556 1.4337

1

x xS

n

Page 17: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

Example: Ungrouped Data7 , 6, 8, 5 , 9 ,4, 7 , 7 , 6, 6

17

2

2

2 2 2 2 2 2 2 2 2 2

( )Variance,

1

4 6.5 5 6.5 6 6.5 6 6.5 6 6.5 7 6.5 7 6.5 7 6.5 8 6.5 9 6.5

10 1

18.5

9

2.0556

x xS

n

2( )2.0556 1.4337

1

x xS

n

Page 18: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

Example 3.9 for grouped data :

The variance for frequency distribution in Table 3.5 is:

CGPA (Class) Frequency, fClass Mark,

x fx fx2

2.50 - 2.75 2 2.625 5.250 13.7812.75 - 3.00 10 2.875 28.750 82.6563.00 - 3.25 15 3.125 46.875 146.4843.25 - 3.50 13 3.375 43.875 148.0783.50 - 3.75 7 3.625 25.375 91.9843.75 - 4.00 3 3.875 11.625 45.047

Total 50   161.750 528.031

Table 3.5

0973.0

4950

)75.161(031.528

1

22

2

2

nn

xffx

S

Page 19: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

3) Standard Deviation

i) Standard deviation for ungrouped data :-

ii) Standard deviation for grouped data :-

1

)( 22

n

xxS

11

)(

2

22

2

nn

xffx

fx

xxfS

Page 20: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

Example 3.10 (Based on example 3.8) for ungrouped data:

*Refer example

Example 3.11 (Based on example 3.9) for grouped data:

3119.0

0973.049

50)75.161(

031.528

12

2

2

2

nn

xffx

S

Page 21: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

3.3 Rules of Data Dispersion

By using the mean and standard deviation, we can find the percentage of total observations that fall within the given interval about the mean.

i) Chebyshev’s Theorem

At least of the observations will be in the range of k standard deviation from mean.

where k is the positive number exceed 1 or (k>1).

Applicable for any distribution /not normal distribution.

Steps:1) Determine the interval

2) Find value of

3) Change the value in step 2 to a percent

4) Write statement: at least the percent of data found in step 3 is in the interval found in step 1

2

1(1 )

k

x

ksx

2

1(1 )

k

Page 22: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

Example 3.12 :

Consider a distribution of test scores that are badly skewed to the right, with a sample mean of 80 and a sample standard deviation of 5. If k=2, what is the percentage of the data fall in the interval from mean?

Solution:1) Determine interval

2) Find

3) Convert into percentage: 4) Conclusion: At least 75% of the data is found in the

interval from 70 to 90

)90,70(

)5)(2(80

ksx

4

32

11

11

2

2

k

%754

3

Page 23: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

ii) Empirical Rule

Applicable for a symmetric bell shaped distribution / normal distribution.

k is a constant. k is a 1, 2 or 3 for Empirical Rule.

There are 3 rules:

i. 68% of the observations lie in the interval

ii. 95% of the observations lie in the interval

iii. 99.7% of the observations lie in the interval

If k is not given, then:

Formula for k =Distance between mean and each point

standard deviation

),( sxsx

)2,2( sxsx

)3,3( sxsx

Page 24: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

Example The age distribution of a sample of 5000 persons is bell-shaped with a mean of 40 yrs and a standard deviation of 12 yrs. Determine the approximate percentage of people who are 16 to 64 yrs old.

Solution:95% of the people in the sample are 16 to 64 yrs old.

40 16

1224

122

k

Page 25: CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)

Exercise for summarizing data

The following data give the total number of iPods sold by a mail order company on each of 30 days. Construct a frequency table.

Find the mean, variance and standard deviation, mode and median.

Institut Matematik Kejuruteraan, UniMAP 25

23 14 19 23 20 16 27 9 2114

22 13 26 16 18 12 9 26 2016

8 25 11 15 28 22 10 5 1721