qbm117 business statistics descriptive statistics

Post on 17-Dec-2015

276 Views

Category:

Documents

13 Downloads

Preview:

Click to see full reader

TRANSCRIPT

QBM117Business Statistics

Descriptive Statistics

Objectives

• To distinguish between a variable and data

• To distinguish between quantitative and qualitative data

• To discuss the different levels of measurement

• To summarise quantitative data using frequency distributions and histograms

• To learn how to produce a histogram in Excel

Introduction

• Managers, economists and business analysts frequently have access to large masses of potentially useful data.

• Before the data can be used to support a decision (inferential statistics), they must be organised and summarised (descriptive statistics).

Descriptive Statistics

• Descriptive Statistics involves collecting, organising, summarising and presenting numerical data.

• Once the data is collected and organised, it needs to be summarised and presented in such a way that the important features of the data are highlighted.

• Descriptive statistics methods can be applied to data from an entire population and data from a sample.

Variables and Data

• A variable is any characteristic of a population or sample that is of interest to us.

• The term data refers to the actual values of variables.

Example 1

Information concerning a magazine’s readership is of interest to both the publisher and to the magazine’s advertisers. A survey of 100 subscribers included the following questions:

What is your age?

What is your sex?

What is your marital status?

What is your annual income?

What are the variables?

The variables are age, sex, marital status and annual income.

What are the data?

The data are the actual values of the variables;

for the age variable, the data are the actual ages of the 100 subscribers sampled, e.g. 34 years.

for the sex variable, the data are the sexes of the 100 subscribers sampled, e.g. Male or Female.

Types of Data

• Data may be either quantitative (numerical) or qualitative (categorical).

• Quantitative data are numerical observations.

• Qualitative data are categorical observations.

Example 1 revisited

Information concerning a magazine’s readership is of interest to both the publisher and to the magazine’s advertisers. A survey of 100 subscribers included the following questions:

What is your age?

What is your sex?

What is your marital status?

What is your annual income?

For each of the questions determine the data type of the possible responses.

What is your age?

quantitative

What is your sex?

qualitative

What is your marital status?

qualitative

What is your annual income?

quantitative

Levels of Measurement

• Data can also be described in terms of the level of measurement attained.

• All data are generated by one of four scales of measurement:

- nominal

- ordinal

- interval

- ratio

Levels of Measurement of Qualitative Data

• Qualitative data are considered to be measured on a nominal scale or an ordinal scale.

• A nominal scale classifies data into distinct categories in which no ordering is implied.

• An ordinal scale classifies data into distinct categories in which ordering is implied.

Example 2

For each of the following examples of qualitative data, determine the level of measurement.

1. Type of stocks owned (Growth, Income, Technology, Other, None)

Nominal

2. Product satisfaction (Very unsatisfied, Unsatisfied, Neutral, Satisfied, Very satisfied)

Ordinal

3. Student Grades (HD, DI, CR, PS, FL)Ordinal

4. Personal Notebook (Compaq, Toshiba, IBM, Apple, ACER, Other)

Nominal

5. Commodities (Gold, Oil, Aluminium, Cooper, Zinc, Wheat, Wool, Cotton, Sugar)

Nominal

6. Faculty rank (Professor, Associate Professor, Senior Lecturer, Lecturer, Associate

Lecturer)Ordinal

Levels of Measurement ofQuantitative Data

• Quantitative data are considered to be measured on an interval scale or a ratio scale.

• An interval scale is an ordered scale in which the difference between measurements is a meaningful quantity that does not involve a true zero point.

• A ratio scale is an ordered scale in which the difference between points involves a true zero point.

Example 3

For each of the following examples of quantitative data, determine the level of measurement.

1. Temperature (degrees Celsius or Fahrenheit)Interval

2. Height (centimeters or inches) Ratio

3. Calendar YearsInterval

4. Annual income Ratio

Example 4

For each of the following examples of data, determine the data type and the level of measurement.

1. Name of Internet providerqualitative, nominal

2. Monthly charge for Internet servicequantitative, ratio

3. Amount of time spent on the Internet per week

quantitative, ratio

4. Primary purpose for using the Internet

qualitative, nominal

5. Number of emails received per week

quantitative, ratio

6. Number of on-line purchases made in a month

quantitative, ratio

7. Total amount spent on on-line purchases in a month

quantitative, ratio

8. Whether the personal computer as a rewritable CD drive

qualitative, nominal

Graphical and Tabular Methods for Quantitative Data

• The best way to examine large amounts of data is to present it in summary form by constructing appropriate tables and graphs.

• We can then extract the important features from the data from these tables and graphs.

• Often, the first step taken towards summarising a mass of numbers is to form what is known as a frequency distribution.

Frequency Distribution

• A frequency distribution is a tabular summary of a set of data showing the number (frequency) of observations in each of several non-overlapping classes.

• When constructing a frequency distribution you need to

- select an appropriate number of classes- select an appropriate width for each class- make sure that classes are non-overlapping

and contain all observation

Number of observations Number of classes

Less than 50 5-7

50-200 7-9

200-500 9-10

500-1000 10-11

1000-5000 11-13

5000-50000 13-17

More than 50000 17-20

The following table is a guide to the appropriate number of classes for different numbers of observations.

• An alternative rough guide to selecting the appropriate number of classes K required to accommodate n observations is given by Sturge’s formula:

K=1+3.3log10n

• Once the number of classes to be used has been chosen, the approximate class width is calculated using the following formula:

Class width = largest value – smallest value

number of classes

• The class width chosen should allow for convenient and easy reading.

• You need to ensure that the classes do not overlap and that each observation is contained in a class.

• The classes should then be listed in a column.

• You then need to count the number of observations that fall into each class interval.

• The counts (frequencies) are then listed next to their respective classes.

Example 5Exercise 2.41 page 50 of text

The number of items returned to a leading Brisbane retailer by its customers recorded for the last 25 days are as follows:

21 8 17 22 19

18 19 14 17 11

6 21 25 19 9

12 16 16 10 29

24 6 21 20 25

Construct a frequency distribution for these data.

There are n=25 observations.

The table suggests that 5-7 classes would be appropriate.

A rough guide to an appropriate number of classes is

K=1+3.3log1025 =5.61 (2 d.p.)

Approximate class width = 29-6 = 3.83

6

Round this up to 5 as a class width of 5 is easy and convenient.

Now we need to choose non-overlapping intervals of width 5 so that each observation falls into one interval.

Number of items Tally Frequency

>5 up to and including 10 IIII 5

>10 up to and including 15 III 3

>15 up to and including 20 IIII IIII

9

>20 up to and including 25 IIII II 7

>25 up to and including 30 I 1

21 8 17 22 19

18 19 14 17 11

6 21 25 19 9

12 16 16 10 29

24 6 21 20 25

Histograms

• The information in a frequency distribution is often grasped more easily if the distribution is graphed.

• The most common graphical technique used for representing a frequency distribution for quantitative data is the frequency histogram.

Frequency Histograms

A frequency histogram is constructed by placing the variable of interest on the horizontal axis, and the frequency on the vertical axis.

The frequency of each class is shown by drawing a rectangle whose base is the class interval on the horizontal axis and whose height is the corresponding frequency.

Example 5 revisited

The number of items returned to a leading Brisbane retailer by its customers recorded for the last 25 days are as follows:

21 8 17 22 19

18 19 14 17 11

6 21 25 19 9

12 16 16 10 29

24 6 21 20 25

Construct a frequency histogram for these data.

0 5 10 15 20 25 30

Number of Items Returned by Customers

0

2

4

6

8

Histogram of the Number of Items Returned By Customers

Fre

quen

cy

Relative Frequency Histograms

• Instead of showing the absolute frequency of observations in each class, it is often preferable to show the proportion of observations falling into each class.

• To do this we replace the class frequency by the relative class frequency, which is calculated as follows:

class relative frequency = class frequency______ Total number of observations

• We start be forming a relative frequency distribution.

• The frequencies in the frequency distribution are replaced by the relative frequencies.

• We then construct a relative frequency histogram.

• The relative frequency histogram is constructed by placing the relative frequency on the vertical axis (in place of the frequency).

Example 5 revisited

The number of items returned to a leading Brisbane retailer by its customers recorded for the last 25 days are as follows:

21 8 17 22 19

18 19 14 17 11

6 21 25 19 9

12 16 16 10 29

24 6 21 20 25

Number of items Frequency Relative

Frequency

>5 up to and including 10 5 0.20

>10 up to and including 15 3 0.12

>15 up to and including 20 9 0.36

>20 up to and including 25 7 0.28

>25 up to and including 30 1 0.04

Construct a relative frequency distribution for these data.

Construct a relative frequency histogram for these data.

0 5 10 15 20 25 30

Number of Items Returned by Customers

Relative Frequency Histogram of the Number of Items Returned By CustomersR

elat

ive

Fre

quen

cy

0.08

0.16

0.24

0.32

Shapes of Histograms

• The purpose of drawing histograms is to acquire information.

• We describe the shape of a histogram on the basis of the following four characteristics.

- symmetry

- skewness

- number of modes

- bell-shaped

Symmetry• A histogram is said to be symmetric if, when we draw

a vertical line down the centre of the histogram, the two sides are identical in shape and size.

Skewness

• A histogram with a long tail extending to the right is positively skewed.

• A histogram with a long tail extending to the left is negatively skewed.

Number of Modes

• A unimodal histogram is one with a single peak.

• A bimodal histogram is one with two peaks

• A multimodal histogram is one with several peaks.

Bell-shaped

• A special type of symmetric unimodal histogram is one that is bell-shaped.

• You will discover the importance of this in the next topic.

Cumulative Frequency Distribution

• A variation of the frequency distribution that provides another tabular summary of quantitative data is the cumulative frequency distribution.

• The cumulative frequency distribution contains the same number of classes as the frequency distribution.

• However, the cumulative frequency distributions shows the number of observations less than or equal to the upper class limit of each class.

Cumulative Relative Frequency Distribution

• The cumulative relative frequency distribution shows the proportion of observations with values less than or equal to the upper limit of each class.

• The cumulative relative frequency distribution can be computed either by summing the relative frequencies in the relative frequency distribution, or by dividing the cumulative frequencies by the total number of observations.

Ogives

• A graph of the cumulative relative frequency is called an ogive.

• The cumulative relative frequency of each class is plotted above the upper limit of the corresponding class, and the points representing the cumulative relative frequencies are the joined by straight lines.

• The ogive is closed at the lower end by extending a straight line to the lower limit of the first class.

Example 5 revisited

The number of items returned to a leading Brisbane retailer by its customers recorded for the last 25 days are as follows:

21 8 17 22 19

18 19 14 17 11

6 21 25 19 9

12 16 16 10 29

24 6 21 20 25

Construct a cumulative relative frequency distribution for these data.

Number of items Relative

Frequency

Cumulative

Relative

Frequency

> 5 up to and including 10 0.20 0.20

>10 up to and including 15 0.12 0.32

>15 up to and including 20 0.36 0.68

>20 up to and including 25 0.28 0.96

>25 up to and including 30 0.04 1.00

Construct an ogive for these data.

Orgive of the Number of Items Returned by Customers

0

0.2

0.4

0.6

0.8

1

5 10 15 20 25 30

Number of Items Returned by Customers

Cu

mu

lati

ve R

elat

ive

Fre

qu

ency

Histograms for Large Data Sets

We have constructed a frequency distribution and histogram for a small data set by hand.

We are now going to construct a frequency distribution and histogram for a large data set.

To do this by hand would be very time consuming.

Excel

There are many computer software packages available which make dealing with large data sets quite manageable.

We will use Excel rather than a statistical package as most students are familiar with Excel.

However, some of the things Excel does are not “statistically” correct.

Defining Class Intervals

Note that the method we use to define class intervals for frequency distributions is slightly different to the method described in the text.

On page 20 of the text (page 19 of the abridged version) the class intervals for the frequency distribution for Example 2.1 are

0 up to but not including 1515 up to but not including 30

and so on

Using our method the class intervals would be

>0 up to and including 15

>15 up to and including 30

and so on

We use this method as it is consistent with the method of defining intervals used by Excel.

This way manually prepared frequency distributions will be the same as frequency distributions prepared using Excel.

Histograms in Excel

There are instructions on how to produce a histogram in Excel on page 23 of the text (page 21 of the abridged version).

We will modify some of these instructions.

Detailed instructions will be given in Tutorial 1.

The histogram produced by Excel needs some editing.

Excel produces histograms with gaps between the columns.

We need to remove these gaps.

We need to change the horizontal axis label.

We need to remove the legend.

And we need to add an appropriate title to the plot.

Excel allows you to specify the upper limits of the intervals.

However when it creates the histogram, it puts the upper limit in the center of the interval.

The upper limit should be at the extreme right of the interval.

We will use the Chart Wizard to edit the histogram produced by Excel.

As Excel places the upper limit in the middle of the column, we will determine the midpoint of each class and use the Chart Wizard to plot these values instead of the upper limits.

Example 2.1 from text

We are going to produce a histogram of the salary data from Exercise 2.5 from the text.

The data are stored in the file XR02-46.

Histogram from Excel

Histogram

0100200

30

50

70

90

11

0

upper limit

Fre

qu

en

cy

Frequency

Edited Histogram

Histogram of Annual Salaries of Univeristy Academics

0

20

40

60

80

100

120

25 35 45 55 65 75 85 95 105

Salary ($000's)

Fre

qu

ency

Reading for next lecture

• Chapter 2 Section 2.5

• Chapter 3 Sections 3.1-3.2

Exercises

• 2.3

• 2.9 omit part a and revise parts b and c to read “…>20 as the lower limit…”

top related