business statistics spring 2005 summarizing and describing numerical data

Post on 05-Jan-2016

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Business Statistics

Spring 2005

Summarizing and Describing Numerical Data

Topics•Measures of Central Tendency Mean, Median, Mode, Midrange, Midhinge•Quartile

•Measures of Variation The Range, Interquartile Range, Variance and Standard Deviation, Coefficient of variation•Shape Symmetric, Skewed, using Box-and-Whisker Plots

Numerical Data Properties

Central Tendency (Location)

Variation (Dispersion)

Shape

Measures of Central Tendency Measures of Central Tendency forfor

Ungrouped DataUngrouped Data

Raw Data

Summary Measures

Central Tendency

MeanMedian

Mode

Midrange

Quartile

Midhinge

Summary Measures

Variation

Variance

Standard Deviation

Coefficient of Variation

Range

Measures of Central Tendency

Central Tendency

Mean Median Mode

Midrange

Midhinge

n

xn

ii

1

Population MeanFor ungrouped data, the population mean is the sum of all

the population values divided by the total number of population values:

where µ stands for the population mean.

N is the total number of observations in the population.

X stands for a particular value.

indicates the operation of adding.

N

X

3-2

Population Mean ExampleParameter: a measurable characteristic of a population.

The Kane family owns four cars. The following is the mileage attained by each car: 56,000, 23,000, 42,000, and 73,000. Find the average miles covered by each car.

The mean is (56,000 + 23,000 + 42,000 + 73,000)/4 = 48,500

3-3

Sample Mean

For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values:

where X stands for the sample mean

n is the total number of values in the sample

3-4

n

xX

Return on Stock

1998

1997

1996

1995

1994

10%

8

12

2

8

17%

-2

16

1

8

Stock X Stock Y

40% 40%

Average Return

on Stock= 40 / 5 = 8%

The Mean (Arithmetic Average)

•It is the Arithmetic Average of data values:

•The Most Common Measure of Central Tendency

•Affected by Extreme Values (Outliers)

n

xn

1ii

n

xxx n 21

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5 Mean = 6

xSample Mean

Properties of the Arithmetic Mean

Every set of interval-level and ratio-level data has a mean.

All the values are included in computing the mean.

A set of data has a unique mean.

The mean is affected by unusually large or small data values.

The mean is relatively reliable.

The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is zero.

3-6

EXAMPLE

Consider the set of values: 3, 8, and 4.

The mean is 5.

Illustrating the fifth property, (3-5) + (8-5) + (4-5) = -2 +3 -1 = 0. In other words,

( )X X 0

3-7

The Median

Median: The midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest. There are as many values above the median as below it in the data array.

Note: For an even set of numbers, the median will be the arithmetic average of the two middle numbers.

3-10

Position of Median in Sequence

Median

Positioning Pointn 1

2

The Median

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5 Median = 5

•Important Measure of Central Tendency

•In an ordered array, the median is the “middle” number.

•If n is odd, the median is the middle number.•If n is even, the median is the average of the 2

middle numbers.•Not Affected by Extreme Values

62

Properties of the Median• There is a unique median for each data set.

• It is not affected by extremely large or small valuesand is therefore a valuable measure of centraltendency when such values occur.

• It can be computed for ratio-level,interval-level, and ordinal-level data.

• It can be computed for an open-ended frequencydistribution if the median does not lie in an open-ended class.

• No arithmetic properties

The Mode

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

•A Measure of Central Tendency•Value that Occurs Most Often•Not Affected by Extreme Values•There May Not be a Mode•There May be Several Modes•Used for Either Numerical or Categorical Data

0 1 2 3 4 5 6

No Mode

Midrange

•A Measure of Central Tendency

•Average of Smallest and Largest

Observation:

•Affected by Extreme Value

2

xx smallestestl arg

Midrange

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Midrange = 5 Midrange = 5

Quartiles

• Not a Measure of Central Tendency• Split Ordered Data into 4 Quarters

• Position of i-th Quartile: position of point

25% 25% 25% 25%

Q1 Q2 Q3

Q i(n+1)i 4

Data in Ordered Array: 11 12 13 16 16 17 18 21 22

Position of Q1 = 2.50 Q1 =12.5= 1•(9 + 1)4

Quartiles

•See text page 107 for “rounding rules” for position of the i-th quartile

• Position (not value) of i-th Quartile:

25% 25% 25% 25%

Q1 Q2 Q3

Q i(n+1)i

4

Midhinge

• A Measure of Central Tendency

• The Middle point of 1st and 3rd Quarters

• Used to Overcome Extreme Values

Midhinge = 2

31 QQ

Data in Ordered Array: 11 12 13 16 16 17 18 21 22

Midhinge = 162

519512

231

..QQ

Summary Measures

Central Tendency

MeanMedian

Mode

Midrange

Quartile

Midhinge

n

xn

ii

1

Summary Measures

Variation

Variance

Standard Deviation

Coefficient of Variation

Range

1n

xxs

2i2

Measures of Variation

Variation

Variance Standard Deviation Coefficient of Variation

PopulationVariance

Sample

Variance

PopulationStandardDeviationSample

Standard

Deviation

Range

Interquartile Range

100%

X

SCV

• Measure of Variation

• Difference Between Largest & Smallest Observations:

Range =

• Ignores How Data Are Distributed:

The Range

SmallestrgestLa xx

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5

Return on Stock

1998

1997

1996

1995

1994

10%

8

12

2

8

17%

-2

16

1

8

Stock X Stock Y

Range on Stock X = 12 - 2 = 10%

Range on Stock Y = 17 - (-2) = 19%

• Measure of Variation

• Also Known as Midspread: Spread in the Middle 50%

• Difference Between Third & First

Quartiles: Interquartile Range =

Interquartile Range

13 QQ Data in Ordered Array: 11 12 13 16 16 17 17 18 21

13 QQ = 17.5 - 12.5 = 5

• IQR = 75th percentile - 25th percentile

•The IQR is useful for checking for outliers

•Not Affected by Extreme Values

Interquartile Range

Data in Ordered Array: 11 12 13 16 16 17 17 18 21

13 QQ = 17.5 - 12.5 = 5

Variance & Variance & Standard DeviationStandard Deviation

Measures of Dispersion

Most Common Measures

Consider How Data Are Distributed

Show Variation About Mean (X or )

4 6 8 10 12

X = 8.3

•Important Measure of Variation

•Shows Variation About the Mean:

•For the Population:

•For the Sample:

Variance

N

X i

22

1

22

n

XXs i

For the Population: use N in the denominator.

For the Sample : use n - 1 in the denominator.

Population Variance

The population variance for ungrouped data is the arithmetic mean of the squared deviations from the population mean.

2

2

( )X

N

4-5

Population Variance EXAMPLE

The ages of the Dunn family are 2, 18, 34, and 42 years. What is the population variance?

X N/ /96 4 24

2 2 944 4 236 ( ) / /X N

2

2

( )X

N

x (x- (x-)2

2 24 -22 48418 24 -6 3634 24 10 10042 24 18 324

944

PopulationStandard Deviation

N

x

2)(

Population Standard Deviation EXAMPLE

The ages of the Dunn family are 2, 18, 34, and 42 years. What is the population variance?

X N/ /96 4 24

2( ) 944236

4

X

N

N

X 2)(

x (x- (x-)2

2 24 -22 48418 24 -6 3634 24 10 10042 24 18 324

944

•Most Important Measure of Variation

•Shows Variation About the Mean:

•For the Population:

•For the Sample:

Standard Deviation

N

X i

2

1

2

n

XXs i

For the Population: use N in the denominator.

For the Sample : use n - 1 in the denominator.

Sample Variance and Standard Deviation

The sample variance estimates the population

am variance. NOTE: important computation formriance estimates the population variance.

1

)(

S

1

)(

22

2

22

nnX

X

n

XXS

The sample standard deviation = 2ss

Example of Standard DeviationDeviation from Mean

Amount X (X - X) ( X - X )600 435 600 - 435 = 165 27,225 350 435 350 - 435 = -85 7,225 275 435 275 - 435 = -160 25,600 430 435 430 -435 = -5 25 520 435 520 - 435 = 85 7,225

0 67,300

( )X X

n

1s =

s == = = 129.7167 300

4

,16 825,

2

2

Example of Standard Deviation(Computational Version)

Amount(X) X (X - X) ( X - X ) X2

600 435 165 27,225 360000350 435 -85 7,225 122500275 435 -160 25,600 75625430 435 -5 25 184900520 435 85 7,225 270400

2175 67,300 1013425

2

2

2

1

xX

nn

s = = =

155

21751013425

2

129.71

Sample Standard Deviation

1

2

n

XX iNOTE: For the Sample : use n - 1 in the denominator.

Data: 10 12 14 15 17 18 18 24

s =

n = 8 Mean =16

18

)1624()1618(.....)1612()1610( 2222

= 4.2426

s

:X i

Interpretation and Uses of the Standard Deviation

Chebyshev’s theorem: For any set of observations, the minimum proportion of the values that lie within k standard deviations of the mean is at least 1 - 1/k2 where k is any constant greater than 1.

Multiply by 100% to get percentage of values within k standard deviations of the mean

4-14

Interpretation and Uses of the Standard Deviation

Empirical Rule: For any symmetrical, bell-shaped distribution, approximately 68% of the observations will lie within of the mean ( );approximately 95% of the observations will lie within of the mean ( ); approximately 99.7% will lie within of the mean ( ).

1

3

2

4-15

Comparing Standard Deviations

1

2

n

XX is =

= 4.2426

N

X i

2 = 3.9686

Value for the Standard Deviation is larger for data considered as a Sample.

Data : 10 12 14 15 17 18 18 24:X i

N= 8 Mean =16

Comparing Standard Deviations

Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

Mean = 15.5 s = .9258

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 4.57

Data C

Coefficient of Variation

•Measure of Relative Variation

•Always a %

•Shows Variation Relative to Mean

•Used to Compare 2 or More Groups

•Formula ( for Sample):

100%

X

SCV

Comparing Coefficient of Variation

Stock A: Average Price last year = $50

Standard Deviation = $5

Stock B: Average Price last year = $100

Standard Deviation = $5

100%

X

SCV

Coefficient of Variation:

Stock A: CV = 10%

Stock B: CV = 5%

Shape

• Describes How Data Are Distributed

• Measures of Shape: Symmetric or skewed

Right-SkewedLeft-Skewed Symmetric

Mean = Median = ModeMean Median Mode Median MeanMode

Box-and-Whisker Plot

Graphical Display of Data Using5-Number Summary

Median

4 6 8 10 12

Q3Q1 XlargestXsmallest

Distribution Shape & Box-and-Whisker Plots

Right-SkewedLeft-Skewed Symmetric

Q1 Median Q3Q1 Median Q3 Q1

Median Q3

Summary• Discussed Measures of Central Tendency Mean, Median, Mode, Midrange, Midhinge

• Quartiles• Addressed Measures of Variation The Range, Interquartile Range, Variance, Standard Deviation, Coefficient of Variation• Determined Shape of Distributions

Symmetric, Skewed, Box-and-Whisker Plot

Mean = Median = ModeMean Median Mode Mode Median Mean

top related