chapter 3 descriptive statistics -...

37
Chapter 3 Descriptive Statistics Statistics for Busi ness and Eco nomi cs, 6 e © 2007 Pearson Education, Inc. Chap 3-2 After completing this chapter, you should be able to: Compute and interpret the mean, median, and mode for a set of data Find the range, variance, standard deviation, and coefficient of variation and know what these values mean Apply the empirical rule to describe the variation of population values around the mean Explain the weighted mean and when to use it Chapter Goals Figure 3.1: 1

Upload: doantu

Post on 28-May-2018

258 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

Chapter 3

Descriptive Statistics

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-2

After completing this chapter, you should be able to:Compute and interpret the mean, median, andmode for a set of dataFind the range, variance, standard deviation, andcoefficient of variation and know what these values meanApply the empirical rule to describe the variation of population values around the meanExplain the weighted mean and when to use it

Chapter Goals

Figure 3.1:

1

Page 2: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

2 CHAPTER 3. DESCRIPTIVE STATISTICS

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-3

Chapter Topics

Measures of central tendency, variation, and shape

Mean, median, mode, geometric meanQuartilesRange, interquartile range, variance and standard deviation, coefficient of variationSymmetric and skewed distributions

Population summary measuresMean, variance, and standard deviation

Figure 3.2:

Page 3: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.1. MEASUREOF CENTRAL TENDENCYORMEASURES OF LOCATION 3

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-5

Describing Data Numerically

Arithmetic Mean

Median

Mode

Describing Data Numerically

Variance

Standard Deviation

Coefficient of Variation

Range

Interquartile Range

Central Tendency Variation

Figure 3.3:

• We will consider the description of a data setXi, with n and N observationsfor the sample and population respectively

• Often referred to as data reduction

3.1 Measure of Central Tendency or Measures of

Location

Page 4: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

4 CHAPTER 3. DESCRIPTIVE STATISTICS

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-6

Measures of Central Tendency

Central Tendency

Mean Median Mode

n

xx

n

1ii∑

==

Overview

Midpoint of ranked values

Most frequently observed value

Arithmetic average

Figure 3.4:

3.1.1 Population Mean

μ =1

N

NXi=1

Xi

Page 5: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.1. MEASUREOF CENTRAL TENDENCYORMEASURES OF LOCATION 5

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-7

Arithmetic MeanThe arithmetic mean (mean) is the most common measure of central tendency

For a population of N values:

For a sample of size n:

Sample sizen

xxxn

xx n21

n

1ii +++==

∑= L Observed

values

Nxxx

N

xμ N21

N

1ii +++==

∑= L

Population size

Population values

Figure 3.5:

Page 6: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

6 CHAPTER 3. DESCRIPTIVE STATISTICSParameter: A characteristic of a population

• Usually we do not have the population available

• The Kiers family owns four vehicles. Following is the current mileage for each:56,000; 23,000; 42,000; and 73,000.

μ = (56, 000 + 23, 000 + 42, 000 + 73, 000)/4 = 48, 500

3.1.2 Sample Mean

X =1

n

nXi=1

Xi

Statistic: A characteristic of a sample

• Sample mean is an estimator of the population mean μ.

Page 7: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.1. MEASUREOF CENTRAL TENDENCYORMEASURES OF LOCATION 7

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-8

Arithmetic Mean

The most common measure of central tendencyMean = sum of values divided by the number of valuesAffected by extreme values (outliers)

(continued)

0 1 2 3 4 5 6 7 8 9 10

Mean = 3

0 1 2 3 4 5 6 7 8 9 10

Mean = 4

3515

554321

==++++ 4

520

5104321

==++++

Figure 3.6:

Page 8: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

8 CHAPTER 3. DESCRIPTIVE STATISTICS• Notice that the unit for the sample mean is the same as the observationsthemselves.

• Mean is sensitive to Outliers!

3.1.3 Combining Sample Means with different Sample Sizes:Weighted Means

Suppose the sample mean in Economics 01 is 65 and that the mean male score is62 and the female was 82. What is the proportion of female students in the class?

Xm =1

nm

nmXi=1

Xi Xf =1

nf

nfXi=1

Xi

and

X =1

(nm + nf)

nm + nfXi=1

Xi

.We know that:

X =nm

(nm + nf)Xm +

nf(nm + nf)

Xf

If we let α be the proportion of females then:

X = (1− α)Xm + αXf

65 = (1− α) × 62 + α × 82

Solving for α gives .15.

• Notice carefully to get the overall sum for the sample we must weight each ofthe respective sample means (male and female) by their relative proportion(α i) of the sample.

• Weighted mean::

Xw =kXi=1

α iXi whereX

αi = 1

• Question: If given unemployment rates for each of the 10 provinces how wouldyou calculate the unemployment rate for Canada?

Page 9: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.1. MEASUREOF CENTRAL TENDENCYORMEASURES OF LOCATION 9

3.1.4 Grouped Data

For grouped data we can calculate the sample mean as

X =1

n

kXi=1

fimi,

where we have k classes and mi is themidpoint (class mark or midpoint) for classi with frequency fi.

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-43

Approximations for Grouped DataSuppose a data set contains values m1, m2, . . ., mk,

occurring with frequencies f1, f2, . . . fK

For a population of N observations the mean is

For a sample of n observations, the mean isN

mfμ

K

1iii∑

==

n

mfx

K

1iii∑

==

∑=

=K

1iifNwhere

∑=

=K

1iifnwhere

Figure 3.7:

Page 10: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

10 CHAPTER 3. DESCRIPTIVE STATISTICS3.1.5 Median-Md (Q2)

The median Md of a set of data is a number such that half of the observationsare below the number and half of the observations are above that number when thedata are ordered in an array from lowest to highest.

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-9

Median

In an ordered list, the median is the “middle”number (50% above, 50% below)

Not affected by extreme values

0 1 2 3 4 5 6 7 8 9 10

Median = 3

0 1 2 3 4 5 6 7 8 9 10

Median = 3

Figure 3.8:

Page 11: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.1. MEASUREOF CENTRAL TENDENCYORMEASURES OF LOCATION 11

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-10

Finding the Median

The location of the median:

If the number of values is odd, the median is the middle numberIf the number of values is even, the median is the average of the two middle numbers

Note that is not the value of the median, only the

position of the median in the ranked data

dataorderedtheinposition2

1npositionMedian +=

21n+

Figure 3.9:

Page 12: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

12 CHAPTER 3. DESCRIPTIVE STATISTICS

Figure 3.10: Number of Movies Showing at Theater

M o v ie ss h o w in g

F r e q u e n c y C u m u la t iv eF r e q u e n c y

1 -2 1 1

3 -4 2 3

5 -6 3 6

7 -8 1 7

9 -1 0 3 1 0

3.1.6 Median with Grouped Data

For grouped data we get the rather complex equation (recall ogive approximationmethod)

Md = L +n2− CF

fmd×Width

where

L = lower boundary of class containing the median

Width = width of class containing median

fmd = frequency of class containing mediann2− CF = n

2minus the cumulative frequency of the class preceding the class

containing the median and n being the total number of observations.

The median class is 5-6, since it contains the 5th value (n/2 =5)

3.1.7 Notes

• If n is odd then the median is the middle value of the ordered array.

• If n is even, then add the two central values and divide by two

• Suppose we have data n = 10 (even) that has been ordered smallest tolargest:

2 5 6 7 10 15 21 21 23 25

Median= (10+15)/2=12.5.

• Median is not so sensitive to outliers in the data

• Unique median for each data set.

Page 13: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.1. MEASUREOF CENTRAL TENDENCYORMEASURES OF LOCATION 13

3.1.8 Mode - Mo

The mode Mo of a set of numbers most frequently occurring value of the data set.

Ungrouped Data

2 4 2 8 7 mode is 2

Grouped Data: The midpoint (class mark) of the class containing the largestnumber of class frequencies.

• Notice the mode need not be unique (i.e. bimodal)

• Not a good measure of central tendency

• Depends on frequency of observation rather than a balance over all observa-tions.

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-11

ModeA measure of central tendencyValue that occurs most oftenNot affected by extreme valuesUsed for either numerical or categorical dataThere may may be no modeThere may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

0 1 2 3 4 5 6

No Mode

Figure 3.11:

Page 14: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

14 CHAPTER 3. DESCRIPTIVE STATISTICS3.1.9 Relationship between Mean, Median and Mode

• Symmetric Distribution:

Median=Mean=(Mode)

• Right Skewed Distribution (Positively skewed):

Mean and Median are to the right of the Mode⇒ Mode<Median<Mean

• Left Skewed Distribution (Negatively Skewed):

Mean and Median are to the left of the Mode⇒ Mean<Median<Mode

Page 15: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.2. MEASURES OF VARIABILITY OR DISPERSIONS 15

3.2 Measures of Variability or Dispersions

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-16

Same center, different variation

Measures of Variability

Variation

Variance Standard Deviation

Coefficient of Variation

Range Interquartile Range

Measures of variation give information on the spread or variability of the data values.

Figure 3.12:

Page 16: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

16 CHAPTER 3. DESCRIPTIVE STATISTICS3.2.1 Range

• The range is the difference between the largest and the smallest values of thedata.

RANGE = Xlargest − Xsmallest

• A sample of five accounting graduates revealed the following starting salaries:$22,000, $28,000, $31,000, $23,000, $24,000. The range is $31,000 - $22,000 =$9,000.

• Problem with this measure of dispersion is that it depends upon only twonumbers and is particularly sensitive to outliers in the data.

• Interquartile Range The interquartile range is Q3 - Q1, this is the range ofthe middle 50 % of the data.

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-19

Interquartile Range

Can eliminate some outlier problems by using the interquartile range

Eliminate high- and low-valued observations and calculate the range of the middle 50% of the data

Interquartile range = 3rd quartile – 1st quartileIQR = Q3 – Q1

Figure 3.13:

Page 17: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.2. MEASURES OF VARIABILITY OR DISPERSIONS 17

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-20

Interquartile Range

Median(Q2)

XmaximumX

minimum Q1 Q3

Example:

25% 25% 25% 25%

12 30 45 57 70

Interquartile range = 57 – 30 = 27

Figure 3.14:

Page 18: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

18 CHAPTER 3. DESCRIPTIVE STATISTICS

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-21

Quartiles

Quartiles split the ranked data into 4 segments with an equal number of values per segment

25% 25% 25% 25%

The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are largerQ2 is the same as the median (50% are smaller, 50% are larger)Only 25% of the observations are greater than the third quartile

Q1 Q2 Q3

Figure 3.15:

Page 19: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.2. MEASURES OF VARIABILITY OR DISPERSIONS 19

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-22

Quartile Formulas

Find a quartile by determining the value in the appropriate position in the ranked data, where

First quartile position: Q1 = 0.25(n+1)

Second quartile position: Q2 = 0.50(n+1)(the median position)

Third quartile position: Q3 = 0.75(n+1)

where n is the number of observed values

Figure 3.16:

Page 20: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

20 CHAPTER 3. DESCRIPTIVE STATISTICS

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-23

(n = 9)Q1 = is in the 0.25(9+1) = 2.5 position of the ranked dataso use the value half way between the 2nd and 3rd values,

so Q1 = 12.5

Quartiles

Sample Ranked Data: 11 12 13 16 16 17 18 21 22

Example: Find the first quartile

Figure 3.17:

3.2.2 Deviation from the Mean

The deviation from the sample mean is

Xi − X, i = 1, ...n.

We might consider one measure of variation as the Average Deviation:

1

n

nXi=1

[Xi − X]

except this is equal to 0 (recall proof in first chapter)

Page 21: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.3. POPULATION VARIANCE 21

3.2.3 Mean (Absolute) Deviations (MAD)

M.A.D . =1

n

nXi=1

|Xi − X|

• Example: The weights of a sample of crates containing books for the book-store are (in lbs.) 103, 97, 101, 106, 103. MAD = 12/5 = 2.4

3.3 Population Variance

σ2 =1

N

NXi=1

[Xi − μ]2

=1

N

NXi=1

Xi2 − μ2

Notes

• σ2 ≥ 0

• The square root of the variance, σ is called the population standard deviationor root-mean squared deviation of X.

• σ is measured in the same units as X.

Sample Variance

s2 = 1n−1

Pni=1[Xi − X]2

Alternatively and try to derive this

s2 = 1n−1 [

Pni=1X

2i − nX2]

s2 =Pn

i=1X2i −(P

Xi)2

n

n−1

Grouped data

s2 =1

n− 1kXi=1

fi[mi − X]2

Page 22: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

22 CHAPTER 3. DESCRIPTIVE STATISTICS

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-44

Approximations for Grouped DataSuppose a data set contains values m1, m2, . . ., mk,

occurring with frequencies f1, f2, . . . fK

For a population of N observations the variance is

For a sample of n observations, the variance isN

μ)(mfσ

K

1i

2ii

2∑=

−=

1n

)x(mfs

K

1i

2ii

2

−=∑=

Figure 3.18:

• We can have two data sets with the same means but quite different samplevariances.

• s2 is measured in units2

• s is measured in the same units as the observations themselves.

Page 23: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.3. POPULATION VARIANCE 23

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-28

Calculation Example:Sample Standard Deviation

Sample Data (xi) : 10 12 14 15 17 18 18 24

n = 8 Mean = x = 16

4.24267

126

1816)(2416)(1416)(1216)(10

1n)x(24)x(14)x(12)X(10s

2222

2222

==

−−++−+−+−=

−−++−+−+−=

L

L

A measure of the “average”scatter around the mean

Figure 3.19:

Page 24: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

24 CHAPTER 3. DESCRIPTIVE STATISTICS

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-29

Measuring variation

Small standard deviation

Large standard deviation

Figure 3.20:

Page 25: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.3. POPULATION VARIANCE 25

Empirical Rule for Bell Shaped (Normal) Distributions

For data that give rise to histograms with the familiar (symmetric) bell-shapedappearance then:

• 68% of data lies within the interval μ± σ

• 95% of data lies within the interval μ± 2σ

• 99.7% of data lies within the interval μ± 3σ

Page 26: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

26 CHAPTER 3. DESCRIPTIVE STATISTICS3.3.1 Empirical Rule for Standard Deviations

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-34

If the data distribution is bell-shaped, then the interval:

contains about 68% of the values in the population or the sample

The Empirical Rule

1σμ ±

μ

68%

1σμ±

Figure 3.21:

Page 27: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.3. POPULATION VARIANCE 27

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-35

contains about 95% of the values in the population or the samplecontains about 99.7% of the values in the population or the sample

The Empirical Rule

2σμ ±

3σμ ±

3σμ±

99.7%95%

2σμ±

Figure 3.22:

Page 28: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

28 CHAPTER 3. DESCRIPTIVE STATISTICS

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-35

contains about 95% of the values in the population or the samplecontains about 99.7% of the values in the population or the sample

The Empirical Rule Normal

2σμ ±

3σμ ±

3σμ±

99.7%95%

2σμ±

Figure 3.23:

Page 29: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.3. POPULATION VARIANCE 29

For most data, 90% or more of the values lie within plus or minus three standarddeviations from the mean

Page 30: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

30 CHAPTER 3. DESCRIPTIVE STATISTICS3.4 Coefficient of Variation (CV )

CV =σ

μ× 100% for population

CV =s

X× 100% for sample

• Adjusts the standard deviation (population or sample) by dividing by themean again either by population or sample)

• Unit free and allows comparisons of variations of similar objects that controlfor size.

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-37

Comparing Coefficient of Variation

Stock A:Average price last year = $50Standard deviation = $5

Stock B:Average price last year = $100Standard deviation = $5

Both stocks have the same standard deviation, but stock B is less variable relative to its price

10%100%$50$5100%

xsCVA =⋅=⋅⎟⎟⎠

⎞⎜⎜⎝

⎛=

5%100%$100$5100%

xsCVB =⋅=⋅⎟⎟⎠

⎞⎜⎜⎝

⎛=

Figure 3.24:

Page 31: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.5. COVARIANCES: POPULATION AND SAMPLE 31

3.5 Covariances: Population and Sample

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-45

The Sample CovarianceThe covariance measures the strength of the linear relationship between two variables

The population covariance:

The sample covariance:

Only concerned with the strength of the relationship No causal effect is implied

N

))(y(xy),(xCov

N

1iyixi

xy

∑=

−−==

μμσ

1n

)y)(yx(xsy),(xCov

n

1iii

xy −

−−==∑=

Figure 3.25:

Page 32: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

32 CHAPTER 3. DESCRIPTIVE STATISTICS

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-46

Covariance between two variables:

Cov(x,y) > 0 x and y tend to move in the same direction

Cov(x,y) < 0 x and y tend to move in opposite directions

Cov(x,y) = 0 x and y are independent

Interpreting Covariance

Figure 3.26:

Page 33: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.6. CORRELATION: POPULATION AND SAMPLE 33

3.6 Correlation: Population and Sample

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-47

Coefficient of Correlation

Measures the relative strength of the linear relationship between two variables

Population correlation coefficient:

Sample correlation coefficient:

YX ssy),(xCovr =

YXσσy),(xCovρ =

Figure 3.27:

Page 34: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

34 CHAPTER 3. DESCRIPTIVE STATISTICS

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-48

Features of Correlation Coefficient, r

Unit freeRanges between –1 and 1The closer to –1, the stronger the negative linear relationshipThe closer to 1, the stronger the positive linear relationshipThe closer to 0, the weaker any positive linear relationship

Figure 3.28:

3.7 Exploratory Data Analysis

3.7.1 Some Remarks

• Often we want to display graphically a measure of central tendency anddispersion and the simplest way to do this is by a box plot.

3.8 BOX-and-Whisker PLOTS

• Plots 5 numbers along a single axis: (Five Number Summary)

• Minimum and Maximum Values

• Q1, Q2 and Q3

• Example:

Based on a sample of 20 deliveries, Marco’s Pizza determined the following in-formation: minimum value = 13 minutes, Q1 = 15 minutes, median = 18 minutes,Q3 = 22 minutes, maximum value = 30 minutes

Page 35: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.9. MATHEMATICAL NOTE: LINEAR TRANSFORMATIONS 35

Figure 3.29: Vertical Box Plots

Usually Box-and-Whisker Plots are horizontal, but for some reason many sta-tistical programs (STATA and Excel) plot them vertically.

3.9 Mathematical Note: Linear Transformations

• Let a and b be two constants and Xi i = 1, . . . n be a variable.

• An example would be with Xi kilometers, and Yi miles, then a = 0 andb = 0.6.

A linear transformation is defined as:

Yi = a + bXi

Y is said to be defined by a linear combination of X .

• We note that a is the intercept and b is the slope of the equation.

Page 36: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

36 CHAPTER 3. DESCRIPTIVE STATISTICS• Notice that a changes the origin and that b changes the scale.

• How is the mean and variance of the variable Y related to the mean and thevariance of X?

• First consider the population case:

Mean of a Linear Transformation

μy = 1N

PNi=1 Yi

= 1N

PNi=1[a + bXi]

= a + b 1N

PNi=1Xi

= a + bμx

Variance of a Linear Transformation

σ2y = 1N

PNi=1[Yi − μy]

2

= 1N

PNi=1[a + bXi − {a + bμx}]2

= 1N

PNi=1[b{Xi − μx}]2

= b2 1N

PNi=1[Xi − μx]

2

= b2 σ2x

This implies

σy = |b|σxIt also follow that for the sample counterparts that

Y = a + bX

s2y = b2 s2x

sy = |b| sx

3.10 Standardized Variable (Zi)

• Finally we may define a standardized variable (in terms of the sample) as

Zi =Xi − X

sx

Notes on Standarizing

Page 37: Chapter 3 Descriptive Statistics - QEDqed.econ.queensu.ca/walras/custom/200/250A/250W2009/lect3.pdf · 8 CHAPTER 3. DESCRIPTIVE STATISTICS • Notice that the unit for the sample

3.10. STANDARDIZED VARIABLE (ZI) 37

— This variable Zi has a mean of zero (prove this using the results aboveon linear transformations) and a sample standard deviation and varianceequal to 1 (a = − X

s, b = 1

s).

— Variables defined in this way are sometimes referred to as , Z-scores.—

Z = 0 s2 = s = 1

— We can do the same type of transformation in terms of the populationmagnitudes μ and σ.

— Notice that if an observation is below the mean then the Z − score isnegative and above if it is positive.

Questions

There are many problems on means, variances and so on in LMM and youshould make sure you are comfortable with the topics by doings as many asyou can.

— NCT 2.28 a-c, 2.21, 2.29a,b,d,

— Linear Transformations Show the relationship between centigrade andFahrenheit.

— What temperature measure shows greater variation and hence providesmore detail on temperature.

— In general comment upon the relationship between the variance of thetransformed variable and the original as it relates to the slope parameter.