qbm117 business statistics descriptive statistics
TRANSCRIPT
QBM117Business Statistics
Descriptive Statistics
Objectives
• To distinguish between a variable and data
• To distinguish between quantitative and qualitative data
• To discuss the different levels of measurement
• To summarise quantitative data using frequency distributions and histograms
• To learn how to produce a histogram in Excel
Introduction
• Managers, economists and business analysts frequently have access to large masses of potentially useful data.
• Before the data can be used to support a decision (inferential statistics), they must be organised and summarised (descriptive statistics).
Descriptive Statistics
• Descriptive Statistics involves collecting, organising, summarising and presenting numerical data.
• Once the data is collected and organised, it needs to be summarised and presented in such a way that the important features of the data are highlighted.
• Descriptive statistics methods can be applied to data from an entire population and data from a sample.
Variables and Data
• A variable is any characteristic of a population or sample that is of interest to us.
• The term data refers to the actual values of variables.
Example 1
Information concerning a magazine’s readership is of interest to both the publisher and to the magazine’s advertisers. A survey of 100 subscribers included the following questions:
What is your age?
What is your sex?
What is your marital status?
What is your annual income?
What are the variables?
The variables are age, sex, marital status and annual income.
What are the data?
The data are the actual values of the variables;
for the age variable, the data are the actual ages of the 100 subscribers sampled, e.g. 34 years.
for the sex variable, the data are the sexes of the 100 subscribers sampled, e.g. Male or Female.
Types of Data
• Data may be either quantitative (numerical) or qualitative (categorical).
• Quantitative data are numerical observations.
• Qualitative data are categorical observations.
Example 1 revisited
Information concerning a magazine’s readership is of interest to both the publisher and to the magazine’s advertisers. A survey of 100 subscribers included the following questions:
What is your age?
What is your sex?
What is your marital status?
What is your annual income?
For each of the questions determine the data type of the possible responses.
What is your age?
quantitative
What is your sex?
qualitative
What is your marital status?
qualitative
What is your annual income?
quantitative
Levels of Measurement
• Data can also be described in terms of the level of measurement attained.
• All data are generated by one of four scales of measurement:
- nominal
- ordinal
- interval
- ratio
Levels of Measurement of Qualitative Data
• Qualitative data are considered to be measured on a nominal scale or an ordinal scale.
• A nominal scale classifies data into distinct categories in which no ordering is implied.
• An ordinal scale classifies data into distinct categories in which ordering is implied.
Example 2
For each of the following examples of qualitative data, determine the level of measurement.
1. Type of stocks owned (Growth, Income, Technology, Other, None)
Nominal
2. Product satisfaction (Very unsatisfied, Unsatisfied, Neutral, Satisfied, Very satisfied)
Ordinal
3. Student Grades (HD, DI, CR, PS, FL)Ordinal
4. Personal Notebook (Compaq, Toshiba, IBM, Apple, ACER, Other)
Nominal
5. Commodities (Gold, Oil, Aluminium, Cooper, Zinc, Wheat, Wool, Cotton, Sugar)
Nominal
6. Faculty rank (Professor, Associate Professor, Senior Lecturer, Lecturer, Associate
Lecturer)Ordinal
Levels of Measurement ofQuantitative Data
• Quantitative data are considered to be measured on an interval scale or a ratio scale.
• An interval scale is an ordered scale in which the difference between measurements is a meaningful quantity that does not involve a true zero point.
• A ratio scale is an ordered scale in which the difference between points involves a true zero point.
Example 3
For each of the following examples of quantitative data, determine the level of measurement.
1. Temperature (degrees Celsius or Fahrenheit)Interval
2. Height (centimeters or inches) Ratio
3. Calendar YearsInterval
4. Annual income Ratio
Example 4
For each of the following examples of data, determine the data type and the level of measurement.
1. Name of Internet providerqualitative, nominal
2. Monthly charge for Internet servicequantitative, ratio
3. Amount of time spent on the Internet per week
quantitative, ratio
4. Primary purpose for using the Internet
qualitative, nominal
5. Number of emails received per week
quantitative, ratio
6. Number of on-line purchases made in a month
quantitative, ratio
7. Total amount spent on on-line purchases in a month
quantitative, ratio
8. Whether the personal computer as a rewritable CD drive
qualitative, nominal
Graphical and Tabular Methods for Quantitative Data
• The best way to examine large amounts of data is to present it in summary form by constructing appropriate tables and graphs.
• We can then extract the important features from the data from these tables and graphs.
• Often, the first step taken towards summarising a mass of numbers is to form what is known as a frequency distribution.
Frequency Distribution
• A frequency distribution is a tabular summary of a set of data showing the number (frequency) of observations in each of several non-overlapping classes.
• When constructing a frequency distribution you need to
- select an appropriate number of classes- select an appropriate width for each class- make sure that classes are non-overlapping
and contain all observation
Number of observations Number of classes
Less than 50 5-7
50-200 7-9
200-500 9-10
500-1000 10-11
1000-5000 11-13
5000-50000 13-17
More than 50000 17-20
The following table is a guide to the appropriate number of classes for different numbers of observations.
• An alternative rough guide to selecting the appropriate number of classes K required to accommodate n observations is given by Sturge’s formula:
K=1+3.3log10n
• Once the number of classes to be used has been chosen, the approximate class width is calculated using the following formula:
Class width = largest value – smallest value
number of classes
• The class width chosen should allow for convenient and easy reading.
• You need to ensure that the classes do not overlap and that each observation is contained in a class.
• The classes should then be listed in a column.
• You then need to count the number of observations that fall into each class interval.
• The counts (frequencies) are then listed next to their respective classes.
Example 5Exercise 2.41 page 50 of text
The number of items returned to a leading Brisbane retailer by its customers recorded for the last 25 days are as follows:
21 8 17 22 19
18 19 14 17 11
6 21 25 19 9
12 16 16 10 29
24 6 21 20 25
Construct a frequency distribution for these data.
There are n=25 observations.
The table suggests that 5-7 classes would be appropriate.
A rough guide to an appropriate number of classes is
K=1+3.3log1025 =5.61 (2 d.p.)
Approximate class width = 29-6 = 3.83
6
Round this up to 5 as a class width of 5 is easy and convenient.
Now we need to choose non-overlapping intervals of width 5 so that each observation falls into one interval.
Number of items Tally Frequency
>5 up to and including 10 IIII 5
>10 up to and including 15 III 3
>15 up to and including 20 IIII IIII
9
>20 up to and including 25 IIII II 7
>25 up to and including 30 I 1
21 8 17 22 19
18 19 14 17 11
6 21 25 19 9
12 16 16 10 29
24 6 21 20 25
Histograms
• The information in a frequency distribution is often grasped more easily if the distribution is graphed.
• The most common graphical technique used for representing a frequency distribution for quantitative data is the frequency histogram.
Frequency Histograms
A frequency histogram is constructed by placing the variable of interest on the horizontal axis, and the frequency on the vertical axis.
The frequency of each class is shown by drawing a rectangle whose base is the class interval on the horizontal axis and whose height is the corresponding frequency.
Example 5 revisited
The number of items returned to a leading Brisbane retailer by its customers recorded for the last 25 days are as follows:
21 8 17 22 19
18 19 14 17 11
6 21 25 19 9
12 16 16 10 29
24 6 21 20 25
Construct a frequency histogram for these data.
0 5 10 15 20 25 30
Number of Items Returned by Customers
0
2
4
6
8
Histogram of the Number of Items Returned By Customers
Fre
quen
cy
Relative Frequency Histograms
• Instead of showing the absolute frequency of observations in each class, it is often preferable to show the proportion of observations falling into each class.
• To do this we replace the class frequency by the relative class frequency, which is calculated as follows:
class relative frequency = class frequency______ Total number of observations
• We start be forming a relative frequency distribution.
• The frequencies in the frequency distribution are replaced by the relative frequencies.
• We then construct a relative frequency histogram.
• The relative frequency histogram is constructed by placing the relative frequency on the vertical axis (in place of the frequency).
Example 5 revisited
The number of items returned to a leading Brisbane retailer by its customers recorded for the last 25 days are as follows:
21 8 17 22 19
18 19 14 17 11
6 21 25 19 9
12 16 16 10 29
24 6 21 20 25
Number of items Frequency Relative
Frequency
>5 up to and including 10 5 0.20
>10 up to and including 15 3 0.12
>15 up to and including 20 9 0.36
>20 up to and including 25 7 0.28
>25 up to and including 30 1 0.04
Construct a relative frequency distribution for these data.
Construct a relative frequency histogram for these data.
0 5 10 15 20 25 30
Number of Items Returned by Customers
Relative Frequency Histogram of the Number of Items Returned By CustomersR
elat
ive
Fre
quen
cy
0.08
0.16
0.24
0.32
Shapes of Histograms
• The purpose of drawing histograms is to acquire information.
• We describe the shape of a histogram on the basis of the following four characteristics.
- symmetry
- skewness
- number of modes
- bell-shaped
Symmetry• A histogram is said to be symmetric if, when we draw
a vertical line down the centre of the histogram, the two sides are identical in shape and size.
Skewness
• A histogram with a long tail extending to the right is positively skewed.
• A histogram with a long tail extending to the left is negatively skewed.
Number of Modes
• A unimodal histogram is one with a single peak.
• A bimodal histogram is one with two peaks
• A multimodal histogram is one with several peaks.
Bell-shaped
• A special type of symmetric unimodal histogram is one that is bell-shaped.
• You will discover the importance of this in the next topic.
Cumulative Frequency Distribution
• A variation of the frequency distribution that provides another tabular summary of quantitative data is the cumulative frequency distribution.
• The cumulative frequency distribution contains the same number of classes as the frequency distribution.
• However, the cumulative frequency distributions shows the number of observations less than or equal to the upper class limit of each class.
Cumulative Relative Frequency Distribution
• The cumulative relative frequency distribution shows the proportion of observations with values less than or equal to the upper limit of each class.
• The cumulative relative frequency distribution can be computed either by summing the relative frequencies in the relative frequency distribution, or by dividing the cumulative frequencies by the total number of observations.
Ogives
• A graph of the cumulative relative frequency is called an ogive.
• The cumulative relative frequency of each class is plotted above the upper limit of the corresponding class, and the points representing the cumulative relative frequencies are the joined by straight lines.
• The ogive is closed at the lower end by extending a straight line to the lower limit of the first class.
Example 5 revisited
The number of items returned to a leading Brisbane retailer by its customers recorded for the last 25 days are as follows:
21 8 17 22 19
18 19 14 17 11
6 21 25 19 9
12 16 16 10 29
24 6 21 20 25
Construct a cumulative relative frequency distribution for these data.
Number of items Relative
Frequency
Cumulative
Relative
Frequency
> 5 up to and including 10 0.20 0.20
>10 up to and including 15 0.12 0.32
>15 up to and including 20 0.36 0.68
>20 up to and including 25 0.28 0.96
>25 up to and including 30 0.04 1.00
Construct an ogive for these data.
Orgive of the Number of Items Returned by Customers
0
0.2
0.4
0.6
0.8
1
5 10 15 20 25 30
Number of Items Returned by Customers
Cu
mu
lati
ve R
elat
ive
Fre
qu
ency
Histograms for Large Data Sets
We have constructed a frequency distribution and histogram for a small data set by hand.
We are now going to construct a frequency distribution and histogram for a large data set.
To do this by hand would be very time consuming.
Excel
There are many computer software packages available which make dealing with large data sets quite manageable.
We will use Excel rather than a statistical package as most students are familiar with Excel.
However, some of the things Excel does are not “statistically” correct.
Defining Class Intervals
Note that the method we use to define class intervals for frequency distributions is slightly different to the method described in the text.
On page 20 of the text (page 19 of the abridged version) the class intervals for the frequency distribution for Example 2.1 are
0 up to but not including 1515 up to but not including 30
and so on
Using our method the class intervals would be
>0 up to and including 15
>15 up to and including 30
and so on
We use this method as it is consistent with the method of defining intervals used by Excel.
This way manually prepared frequency distributions will be the same as frequency distributions prepared using Excel.
Histograms in Excel
There are instructions on how to produce a histogram in Excel on page 23 of the text (page 21 of the abridged version).
We will modify some of these instructions.
Detailed instructions will be given in Tutorial 1.
The histogram produced by Excel needs some editing.
Excel produces histograms with gaps between the columns.
We need to remove these gaps.
We need to change the horizontal axis label.
We need to remove the legend.
And we need to add an appropriate title to the plot.
Excel allows you to specify the upper limits of the intervals.
However when it creates the histogram, it puts the upper limit in the center of the interval.
The upper limit should be at the extreme right of the interval.
We will use the Chart Wizard to edit the histogram produced by Excel.
As Excel places the upper limit in the middle of the column, we will determine the midpoint of each class and use the Chart Wizard to plot these values instead of the upper limits.
Example 2.1 from text
We are going to produce a histogram of the salary data from Exercise 2.5 from the text.
The data are stored in the file XR02-46.
Histogram from Excel
Histogram
0100200
30
50
70
90
11
0
upper limit
Fre
qu
en
cy
Frequency
Edited Histogram
Histogram of Annual Salaries of Univeristy Academics
0
20
40
60
80
100
120
25 35 45 55 65 75 85 95 105
Salary ($000's)
Fre
qu
ency
Reading for next lecture
• Chapter 2 Section 2.5
• Chapter 3 Sections 3.1-3.2
Exercises
• 2.3
• 2.9 omit part a and revise parts b and c to read “…>20 as the lower limit…”