chapter 2 – descriptive statistics
DESCRIPTION
Chapter 2 – Descriptive Statistics. Tabular and Graphical Presentations. Chapter Outline. Summarize Qualitative Data Frequency Distribution Bar Charts and Pie Charts Summarize Quantitative Data Frequency Distribution Histogram Cumulative Distributions Crosstabulations - PowerPoint PPT PresentationTRANSCRIPT
1
Chapter 2 – Descriptive Statistics
Tabular
and
Graphical Presentations
2
Chapter Outline
Summarize Qualitative Data Frequency Distribution Bar Charts and Pie Charts
Summarize Quantitative Data Frequency Distribution Histogram Cumulative Distributions
Crosstabulations Scatter Diagrams
3
A Note
An important aspect of statistics is to present the data in an informative way so as to reveal any patterns in the data (no pattern is a pattern!).
Different types of data require different summarization methods and statistical analyses.
4
Summarize Qualitative Data Check out the following data. What pattern can you detect from the raw data?
NBC CBS NBC ABC FOXNBC NBC CNN NBC CBSCBS FOX NBC CNN ABCFOX NBC CBS FOX ABCCNN FOX CBS CBS CNNNBC NBC CBS FOX ABCCBS NBC FOX NBC FOXNBC CNN NBC CBS CBSABC NBC CNN FOX CBSFOX CBS ABC NBC CNN
Table 2.1 Data from a sample of 50 individual responses to the question 'Which network's evening news do you prefer to watch?'
5
Summarize Qualitative DataFrequency Distribution
The raw data in the previous table does not provide any meaningful information ( like any pattern) directly. For qualitative data, we can summarize and present the raw data with ‘Frequency Distribution’.
A frequency distribution is a tabular summary of data showing the number (frequency) of items in each nonoverlapping class.• Please refer to the Excel demonstration ( Chapter 2) on how to
construct the frequency distribution for the data in table 2.1.
• The outcome is shown on the next slide.
6
Frequency Distribution for Data in Table 2.1
Network FrequencyABC 6CBS 12CNN 7FOX 10NBC 15
7
Relative Frequency To obtain relative frequency, simply divide the frequency of each class
by the total number of observations (n). For the data in Table 2.1, n
equals 50.
Network Frequency Relative Frequency Percent FrequencyABC 6 0.12 12CBS 12 0.24 24CNN 7 0.14 14FOX 10 0.2 20NBC 15 0.3 30
15/50=0.315/50=0.315/50=0.315/50=0.3
8
Bar Charts and Pie Charts A frequency distribution is often presented in a graph (a bar chart or a pie chart) to communicate information
visually. Please refer to the Excel demonstration ( Chapter 2) on how to create a bar chart and a pie chart for the
frequency distribution from previous slide.
6
12
7
10
15
0
2
4
6
8
10
12
14
16
ABC CBS CNN FOX NBC
Network
ABC
12%
CBS
24%
CNN
14%
FOX
20%
NBC
30%
Both charts indicate that the most popular network evening news is on NBC.
9
Summarize Quantitative Data Check out the following data. Can you quickly decide how many classes
there should be in the construction of a frequency distribution?
95 77 97 99 89108 120 78 79 8867 97 97 79 9399 103 106 82 9393 97 95 61 10977 88 100 109 9086 89 97 93 8893 105 87 82 98
119 104 93 104 101118 105 82 73 101
Table 2.2 Data of average monthly sales volume ($1000) of a sample of 50 Starbucks stores in New York City in 2012
10
Summarize Quantitative DataFrequency Distribution
Different from the qualitative data in Table 2.1, the quantitative data in Table 2.2 do not indicate the number of classes straightforwardly.
Apply the following procedure to construct a frequency distribution for quantitative data.• Determine the number of non-overlapping classes;
• Determine the class width;
• Determine the class limits;
• Count the item numbers in each class.
11
Summarize Quantitative DataFrequency Distribution
Step one – Determine the number of non-overlapping classes• As a guidance, you can use the ‘2 to the power of k’
rule. That is, to find the smallest integer (k) such that 2k
n ( n is the sample size). Applying the rule to the data in Table 2.2, we find k = 6 since 26=64 ( n=50). Thus, we set the # of classes as 6. (Note that it is only a suggestion, not an absolute rule.)
• Empirically speaking, the # of classes is between 5 and 20.
12
Summarize Quantitative DataFrequency Distribution
Step two – Determine the class width• Use equal class width to avoid misinterpretation
• Approximately, class width =
• For the data in Table 2.2, class width = (120-61)/6= 9.96. We can round it up to 10, which is a much more convenient value to work with for class width.
classes of #
alueSmallest v - lueLargest va
13
Summarize Quantitative DataFrequency Distribution
Step three – Determine the class limits• Class limits should be set so that each data point
belongs to one and only one class, and no data point is left out.
• Similar to class width, class limits can use values that are convenient to work with.
- In our example, the smallest value is 61 and the class width is set as 10. So, the lowest class can be set as 61 – 70. Note that the class width is calculated as 70-61+1=10.
14
Summarize Quantitative DataFrequency Distribution
Step four – count the # of items in each class• For the data in Table 2.2, the frequency distribution is
constructed as follows:
• Please refer to the Excel demonstration ( Chapter 2) on how to construct the frequency distribution for the data in table 2.2.
Sales Volume ($1000) Frequency61-70 271-80 681-90 1191-100 17
101-110 11111-120 3
Total 50
15
Relative Frequency
Sales Volume ($1000) Frequency Relative Frequency Percent Freqency61-70 2 0.04 471-80 6 0.12 1281-90 11 0.22 2291-100 17 0.34 34
101-110 11 0.22 22111-120 3 0.06 6
3/50=0.063/50=0.063/50=0.063/50=0.06
Example: Monthly Sales Volume of 50 Starbucks Stores
16
Interpretation of Frequency Distribution
The frequency distribution of monthly sales volume of 50 Starbucks stores in NYC reveals that
39 stores generated an average monthly sales in 2012 between $81,000 and $110,000.
4% of the sample stores had an average monthly sales no more than $70,000.
6% of the sample stores had an average monthly sales $111,000 or more.
17
Histogram
Like a bar chart, a histogram is a graphical presentation of frequency distribution.
The height of a rectangle ( a bar) drawn above each class interval corresponds to that class’ frequency or relative frequency.
Unlike a bar chart, a histogram has no gap between rectangles of adjacent classes.• Please refer to the Excel demonstration ( Chapter 2) on how to create a
histogram for the frequency distribution of Sales volume of Starbucks stores.
18
HistogramMonthly Sales Volume of 50 Starbucks Stores in NYC
Average Monthly Sales Volume of A Sample of 50 Starbucks Stores in NYC in 2012
2
6
11
17
11
3
0
5
10
15
20
61-70 71-80 81-90 91-100 101-110 111-120
Sales Volume ($1000)
Freq
uenc
y
19
Histogram
Skewness – the lack of symmetry. Symmetric distribution, such as height or weight of human population.
Rela
tive F
req
uen
cyR
ela
tive F
req
uen
cy
.05.05
.10.10
.15.15
.20.20
.25.25
.30.30
.35.35
00
20
Histogram
Negative Skewness – a longer tail to the left. An example: exam scores
Rela
tive F
req
uen
cyR
ela
tive F
req
uen
cy
.05.05
.10.10
.15.15
.20.20
.25.25
.30.30
.35.35
00
21
Rela
tive F
req
uen
cyR
ela
tive F
req
uen
cy
.05.05
.10.10
.15.15
.20.20
.25.25
.30.30
.35.35
00
Histogram
Positive Skewness – a longer tail to the right. An example: home values
22
Cumulative Distributions
Cumulative frequency distribution – shows the # of items with values less than or equal to the upper limit of each class.
Cumulative relative frequency distribution – shows the proportion (percentage) of items with values less than or equal to the upper limit of each class.
23
Cumulative Distributions
Monthly sales volume of 50 Starbucks stores
Sales Volume ($1000)
Cumulative Frequency
Cumulative Relative Frequency
70 2 0.0480 8 0.1690 19 0.38
100 36 0.72110 47 0.94120 50 1
2+6+11=2+6+11=1919
2+6+11=2+6+11=1919
19/50=0.319/50=0.388
19/50=0.319/50=0.388
24
Crosstabulations and Scatter Diagrams
So far, we have studies the methods of summarizing the data of one variable at a time.
In business, it is important to understand the relationships among different variables. For instance, the relationship between sales volume and expenditure on advertisement.
Crosstabulations and scatter diagrams are twoCrosstabulations and scatter diagrams are two methods of descriptive statistics, which are used to methods of descriptive statistics, which are used to
summarize the data to reveal the relationship of two summarize the data to reveal the relationship of two variables.variables.
25
Crosstabulations
A crosstabulation is a tabular summary of data for two variables.
The two variables can be either qualitative or quantitative or one of each.
The left and top margin labels show the classes forThe left and top margin labels show the classes for the two variables.the two variables.
26
Crosstabulations Example: Finger Lakes HomesExample: Finger Lakes Homes
The number of Finger Lakes homes sold for The number of Finger Lakes homes sold for eacheach
style and price for the past two years is shown style and price for the past two years is shown below. below.
PricePriceRangeRange Colonial Log Split A-FrameColonial Log Split A-FrameTotalTotal
< $200,000< $200,000
>> $200,000 $200,00018 6 19 1218 6 19 12 5555
4545
3030 20 35 15 20 35 15TotalTotal 100100
12 14 16 312 14 16 3
Home StyleHome Style
quantitativquantitativee
variablevariable
quantitativquantitativee
variablevariable
categoricacategoricall
variablevariable
categoricacategoricall
variablevariable
27
Crosstabulations Example: Finger Lakes HomesExample: Finger Lakes Homes
Insights Gained from Preceding CrosstabulationInsights Gained from Preceding Crosstabulation
• Only three homes in the sample are an A-FrameOnly three homes in the sample are an A-Frame style and priced at $200,000 or more.style and priced at $200,000 or more.
• The greatest number of homes (19) in the sampleThe greatest number of homes (19) in the sample are a split-level style and priced at less thanare a split-level style and priced at less than $200,000.$200,000.
28
CrosstabulationCrosstabulation
Insights Gained from Preceding CrosstabulationInsights Gained from Preceding Crosstabulation
Only three homes in the sample are an A-FrameOnly three homes in the sample are an A-Frame style and priced at $200,000 or more.style and priced at $200,000 or more.
The greatest number of homes (19) in the sampleThe greatest number of homes (19) in the sample are a split-level style and priced at less thanare a split-level style and priced at less than $200,000.$200,000.
Example: Finger Lakes HomesExample: Finger Lakes Homes
29
PricePriceRangeRange Colonial Log Split A-FrameColonial Log Split A-FrameTotalTotal
< $200,000< $200,000
>> $200,000 $200,00018 6 19 1218 6 19 12 5555
4545
3030 20 35 15 20 35 15TotalTotal 100100
12 14 16 312 14 16 3
Home StyleHome Style
CrosstabulationsCrosstabulationsFrequencyFrequencydistributiondistribution
for thefor theprice rangeprice range
variablevariable
Frequency distribution Frequency distribution forfor
the home style the home style variablevariable
Example: Finger Lakes HomesExample: Finger Lakes Homes
30
Crosstabulations: Simpson’s ParadoxCrosstabulations: Simpson’s Paradox
In some cases the conclusions based upon anIn some cases the conclusions based upon an aggregated crosstabulation can be completelyaggregated crosstabulation can be completely reversed if we look at the unaggregated data. Thereversed if we look at the unaggregated data. The reversal of conclusions based on aggregate andreversal of conclusions based on aggregate and unaggregated data is called unaggregated data is called Simpson’s paradoxSimpson’s paradox..
We must be careful in drawing conclusions about theWe must be careful in drawing conclusions about the relationship between the two variables in therelationship between the two variables in the aggregated crosstabulation.aggregated crosstabulation.
Data in two or more crosstabulations are oftenData in two or more crosstabulations are often aggregated to produce a summary crosstabulation.aggregated to produce a summary crosstabulation.
31
Scatter Diagrams
A A scatter diagramscatter diagram is a graphical presentation of the is a graphical presentation of the relationship between two relationship between two quantitativequantitative variables. variables.
One variable is shown on the horizontal axis and the other One variable is shown on the horizontal axis and the other variable is shown on the vertical axis.variable is shown on the vertical axis.
The general pattern of the plotted points suggests the The general pattern of the plotted points suggests the overall relationship between the variables.overall relationship between the variables.
A A trendlinetrendline provides a provides a linearlinear approximation of the approximation of the relationship.relationship.
32
Scatter Diagrams A Positive Relationship
xx
yy
33
Scatter Diagrams A Negative Relationship
yy
xx
34
Scatter Diagrams No Relationship
yy
xx
35
Scatter Diagrams An example Is there a relationship between gas prices and stock prices?
• For the variable – gas price, let us use the data of the U.S. retail gas price;
• For the variable – stock prices, let us use the data of the S&P 500 Index ( ticker symbol – SPY);
• Weekly data for both variables.
The data are shown in the next slide.
36
Data of U.S. Retail Gas Price and S&P 500 Proxy Price (SPY)
DateU.S. Retail Gas Price
SPY
Jan 28, 2013 3.296 151.24Feb 04, 2013 3.471 151.8Feb 11, 2013 3.537 152.11Feb 18, 2013 3.69 151.89Feb 25, 2013 3.722 152.11Mar 04, 2013 3.698 155.44Mar 11, 2013 3.644 155.83Mar 18, 2013 3.633 155.6Mar 25, 2013 3.616 156.67Apr 01, 2013 3.572 155.86
37
Scatter Diagrams The relationship between gas prices and stock prices
Scatter Diagram
150
151
152
153
154
155
156
157
3.25 3.3 3.35 3.4 3.45 3.5 3.55 3.6 3.65 3.7 3.75
U.S. Retail Gas Price ($/gallon)
SP
Y
38
Scatter Diagrams
The relationship between gas prices and stock prices
The plots in the previous scatter diagram indicate a positive relationship between U.S. retail gas price and the value of SPY.
The relationship is sketchy. When gas price is high, the S&P 500 Index tend to be high.
We need to be cautious in drawing conclusion from a scatter diagram. In the example, there are only 10 data points. Much more data are required to rigorously examine the relationship between gas price and stock prices.