qbm117 business statistics descriptive statistics

60
QBM117 Business Statistics Descriptive Statistics

Upload: dorothy-hoover

Post on 17-Dec-2015

276 views

Category:

Documents


13 download

TRANSCRIPT

Page 1: QBM117 Business Statistics Descriptive Statistics

QBM117Business Statistics

Descriptive Statistics

Page 2: QBM117 Business Statistics Descriptive Statistics

Objectives

• To distinguish between a variable and data

• To distinguish between quantitative and qualitative data

• To discuss the different levels of measurement

• To summarise quantitative data using frequency distributions and histograms

• To learn how to produce a histogram in Excel

Page 3: QBM117 Business Statistics Descriptive Statistics

Introduction

• Managers, economists and business analysts frequently have access to large masses of potentially useful data.

• Before the data can be used to support a decision (inferential statistics), they must be organised and summarised (descriptive statistics).

Page 4: QBM117 Business Statistics Descriptive Statistics

Descriptive Statistics

• Descriptive Statistics involves collecting, organising, summarising and presenting numerical data.

• Once the data is collected and organised, it needs to be summarised and presented in such a way that the important features of the data are highlighted.

• Descriptive statistics methods can be applied to data from an entire population and data from a sample.

Page 5: QBM117 Business Statistics Descriptive Statistics

Variables and Data

• A variable is any characteristic of a population or sample that is of interest to us.

• The term data refers to the actual values of variables.

Page 6: QBM117 Business Statistics Descriptive Statistics

Example 1

Information concerning a magazine’s readership is of interest to both the publisher and to the magazine’s advertisers. A survey of 100 subscribers included the following questions:

What is your age?

What is your sex?

What is your marital status?

What is your annual income?

Page 7: QBM117 Business Statistics Descriptive Statistics

What are the variables?

The variables are age, sex, marital status and annual income.

What are the data?

The data are the actual values of the variables;

for the age variable, the data are the actual ages of the 100 subscribers sampled, e.g. 34 years.

for the sex variable, the data are the sexes of the 100 subscribers sampled, e.g. Male or Female.

Page 8: QBM117 Business Statistics Descriptive Statistics

Types of Data

• Data may be either quantitative (numerical) or qualitative (categorical).

• Quantitative data are numerical observations.

• Qualitative data are categorical observations.

Page 9: QBM117 Business Statistics Descriptive Statistics

Example 1 revisited

Information concerning a magazine’s readership is of interest to both the publisher and to the magazine’s advertisers. A survey of 100 subscribers included the following questions:

What is your age?

What is your sex?

What is your marital status?

What is your annual income?

Page 10: QBM117 Business Statistics Descriptive Statistics

For each of the questions determine the data type of the possible responses.

What is your age?

quantitative

What is your sex?

qualitative

What is your marital status?

qualitative

What is your annual income?

quantitative

Page 11: QBM117 Business Statistics Descriptive Statistics

Levels of Measurement

• Data can also be described in terms of the level of measurement attained.

• All data are generated by one of four scales of measurement:

- nominal

- ordinal

- interval

- ratio

Page 12: QBM117 Business Statistics Descriptive Statistics

Levels of Measurement of Qualitative Data

• Qualitative data are considered to be measured on a nominal scale or an ordinal scale.

• A nominal scale classifies data into distinct categories in which no ordering is implied.

• An ordinal scale classifies data into distinct categories in which ordering is implied.

Page 13: QBM117 Business Statistics Descriptive Statistics

Example 2

For each of the following examples of qualitative data, determine the level of measurement.

1. Type of stocks owned (Growth, Income, Technology, Other, None)

Nominal

2. Product satisfaction (Very unsatisfied, Unsatisfied, Neutral, Satisfied, Very satisfied)

Ordinal

Page 14: QBM117 Business Statistics Descriptive Statistics

3. Student Grades (HD, DI, CR, PS, FL)Ordinal

4. Personal Notebook (Compaq, Toshiba, IBM, Apple, ACER, Other)

Nominal

5. Commodities (Gold, Oil, Aluminium, Cooper, Zinc, Wheat, Wool, Cotton, Sugar)

Nominal

6. Faculty rank (Professor, Associate Professor, Senior Lecturer, Lecturer, Associate

Lecturer)Ordinal

Page 15: QBM117 Business Statistics Descriptive Statistics

Levels of Measurement ofQuantitative Data

• Quantitative data are considered to be measured on an interval scale or a ratio scale.

• An interval scale is an ordered scale in which the difference between measurements is a meaningful quantity that does not involve a true zero point.

• A ratio scale is an ordered scale in which the difference between points involves a true zero point.

Page 16: QBM117 Business Statistics Descriptive Statistics

Example 3

For each of the following examples of quantitative data, determine the level of measurement.

1. Temperature (degrees Celsius or Fahrenheit)Interval

2. Height (centimeters or inches) Ratio

3. Calendar YearsInterval

4. Annual income Ratio

Page 17: QBM117 Business Statistics Descriptive Statistics

Example 4

For each of the following examples of data, determine the data type and the level of measurement.

1. Name of Internet providerqualitative, nominal

2. Monthly charge for Internet servicequantitative, ratio

Page 18: QBM117 Business Statistics Descriptive Statistics

3. Amount of time spent on the Internet per week

quantitative, ratio

4. Primary purpose for using the Internet

qualitative, nominal

5. Number of emails received per week

quantitative, ratio

6. Number of on-line purchases made in a month

quantitative, ratio

Page 19: QBM117 Business Statistics Descriptive Statistics

7. Total amount spent on on-line purchases in a month

quantitative, ratio

8. Whether the personal computer as a rewritable CD drive

qualitative, nominal

Page 20: QBM117 Business Statistics Descriptive Statistics

Graphical and Tabular Methods for Quantitative Data

• The best way to examine large amounts of data is to present it in summary form by constructing appropriate tables and graphs.

• We can then extract the important features from the data from these tables and graphs.

• Often, the first step taken towards summarising a mass of numbers is to form what is known as a frequency distribution.

Page 21: QBM117 Business Statistics Descriptive Statistics

Frequency Distribution

• A frequency distribution is a tabular summary of a set of data showing the number (frequency) of observations in each of several non-overlapping classes.

• When constructing a frequency distribution you need to

- select an appropriate number of classes- select an appropriate width for each class- make sure that classes are non-overlapping

and contain all observation

Page 22: QBM117 Business Statistics Descriptive Statistics

Number of observations Number of classes

Less than 50 5-7

50-200 7-9

200-500 9-10

500-1000 10-11

1000-5000 11-13

5000-50000 13-17

More than 50000 17-20

The following table is a guide to the appropriate number of classes for different numbers of observations.

Page 23: QBM117 Business Statistics Descriptive Statistics

• An alternative rough guide to selecting the appropriate number of classes K required to accommodate n observations is given by Sturge’s formula:

K=1+3.3log10n

• Once the number of classes to be used has been chosen, the approximate class width is calculated using the following formula:

Class width = largest value – smallest value

number of classes

Page 24: QBM117 Business Statistics Descriptive Statistics

• The class width chosen should allow for convenient and easy reading.

• You need to ensure that the classes do not overlap and that each observation is contained in a class.

• The classes should then be listed in a column.

• You then need to count the number of observations that fall into each class interval.

• The counts (frequencies) are then listed next to their respective classes.

Page 25: QBM117 Business Statistics Descriptive Statistics

Example 5Exercise 2.41 page 50 of text

The number of items returned to a leading Brisbane retailer by its customers recorded for the last 25 days are as follows:

21 8 17 22 19

18 19 14 17 11

6 21 25 19 9

12 16 16 10 29

24 6 21 20 25

Page 26: QBM117 Business Statistics Descriptive Statistics

Construct a frequency distribution for these data.

There are n=25 observations.

The table suggests that 5-7 classes would be appropriate.

A rough guide to an appropriate number of classes is

K=1+3.3log1025 =5.61 (2 d.p.)

Page 27: QBM117 Business Statistics Descriptive Statistics

Approximate class width = 29-6 = 3.83

6

Round this up to 5 as a class width of 5 is easy and convenient.

Now we need to choose non-overlapping intervals of width 5 so that each observation falls into one interval.

Page 28: QBM117 Business Statistics Descriptive Statistics

Number of items Tally Frequency

>5 up to and including 10 IIII 5

>10 up to and including 15 III 3

>15 up to and including 20 IIII IIII

9

>20 up to and including 25 IIII II 7

>25 up to and including 30 I 1

21 8 17 22 19

18 19 14 17 11

6 21 25 19 9

12 16 16 10 29

24 6 21 20 25

Page 29: QBM117 Business Statistics Descriptive Statistics

Histograms

• The information in a frequency distribution is often grasped more easily if the distribution is graphed.

• The most common graphical technique used for representing a frequency distribution for quantitative data is the frequency histogram.

Page 30: QBM117 Business Statistics Descriptive Statistics

Frequency Histograms

A frequency histogram is constructed by placing the variable of interest on the horizontal axis, and the frequency on the vertical axis.

The frequency of each class is shown by drawing a rectangle whose base is the class interval on the horizontal axis and whose height is the corresponding frequency.

Page 31: QBM117 Business Statistics Descriptive Statistics

Example 5 revisited

The number of items returned to a leading Brisbane retailer by its customers recorded for the last 25 days are as follows:

21 8 17 22 19

18 19 14 17 11

6 21 25 19 9

12 16 16 10 29

24 6 21 20 25

Page 32: QBM117 Business Statistics Descriptive Statistics

Construct a frequency histogram for these data.

0 5 10 15 20 25 30

Number of Items Returned by Customers

0

2

4

6

8

Histogram of the Number of Items Returned By Customers

Fre

quen

cy

Page 33: QBM117 Business Statistics Descriptive Statistics

Relative Frequency Histograms

• Instead of showing the absolute frequency of observations in each class, it is often preferable to show the proportion of observations falling into each class.

• To do this we replace the class frequency by the relative class frequency, which is calculated as follows:

class relative frequency = class frequency______ Total number of observations

Page 34: QBM117 Business Statistics Descriptive Statistics

• We start be forming a relative frequency distribution.

• The frequencies in the frequency distribution are replaced by the relative frequencies.

• We then construct a relative frequency histogram.

• The relative frequency histogram is constructed by placing the relative frequency on the vertical axis (in place of the frequency).

Page 35: QBM117 Business Statistics Descriptive Statistics

Example 5 revisited

The number of items returned to a leading Brisbane retailer by its customers recorded for the last 25 days are as follows:

21 8 17 22 19

18 19 14 17 11

6 21 25 19 9

12 16 16 10 29

24 6 21 20 25

Page 36: QBM117 Business Statistics Descriptive Statistics

Number of items Frequency Relative

Frequency

>5 up to and including 10 5 0.20

>10 up to and including 15 3 0.12

>15 up to and including 20 9 0.36

>20 up to and including 25 7 0.28

>25 up to and including 30 1 0.04

Construct a relative frequency distribution for these data.

Page 37: QBM117 Business Statistics Descriptive Statistics

Construct a relative frequency histogram for these data.

0 5 10 15 20 25 30

Number of Items Returned by Customers

Relative Frequency Histogram of the Number of Items Returned By CustomersR

elat

ive

Fre

quen

cy

0.08

0.16

0.24

0.32

Page 38: QBM117 Business Statistics Descriptive Statistics

Shapes of Histograms

• The purpose of drawing histograms is to acquire information.

• We describe the shape of a histogram on the basis of the following four characteristics.

- symmetry

- skewness

- number of modes

- bell-shaped

Page 39: QBM117 Business Statistics Descriptive Statistics

Symmetry• A histogram is said to be symmetric if, when we draw

a vertical line down the centre of the histogram, the two sides are identical in shape and size.

Page 40: QBM117 Business Statistics Descriptive Statistics

Skewness

• A histogram with a long tail extending to the right is positively skewed.

• A histogram with a long tail extending to the left is negatively skewed.

Page 41: QBM117 Business Statistics Descriptive Statistics

Number of Modes

• A unimodal histogram is one with a single peak.

• A bimodal histogram is one with two peaks

• A multimodal histogram is one with several peaks.

Page 42: QBM117 Business Statistics Descriptive Statistics

Bell-shaped

• A special type of symmetric unimodal histogram is one that is bell-shaped.

• You will discover the importance of this in the next topic.

Page 43: QBM117 Business Statistics Descriptive Statistics

Cumulative Frequency Distribution

• A variation of the frequency distribution that provides another tabular summary of quantitative data is the cumulative frequency distribution.

• The cumulative frequency distribution contains the same number of classes as the frequency distribution.

• However, the cumulative frequency distributions shows the number of observations less than or equal to the upper class limit of each class.

Page 44: QBM117 Business Statistics Descriptive Statistics

Cumulative Relative Frequency Distribution

• The cumulative relative frequency distribution shows the proportion of observations with values less than or equal to the upper limit of each class.

• The cumulative relative frequency distribution can be computed either by summing the relative frequencies in the relative frequency distribution, or by dividing the cumulative frequencies by the total number of observations.

Page 45: QBM117 Business Statistics Descriptive Statistics

Ogives

• A graph of the cumulative relative frequency is called an ogive.

• The cumulative relative frequency of each class is plotted above the upper limit of the corresponding class, and the points representing the cumulative relative frequencies are the joined by straight lines.

• The ogive is closed at the lower end by extending a straight line to the lower limit of the first class.

Page 46: QBM117 Business Statistics Descriptive Statistics

Example 5 revisited

The number of items returned to a leading Brisbane retailer by its customers recorded for the last 25 days are as follows:

21 8 17 22 19

18 19 14 17 11

6 21 25 19 9

12 16 16 10 29

24 6 21 20 25

Page 47: QBM117 Business Statistics Descriptive Statistics

Construct a cumulative relative frequency distribution for these data.

Number of items Relative

Frequency

Cumulative

Relative

Frequency

> 5 up to and including 10 0.20 0.20

>10 up to and including 15 0.12 0.32

>15 up to and including 20 0.36 0.68

>20 up to and including 25 0.28 0.96

>25 up to and including 30 0.04 1.00

Page 48: QBM117 Business Statistics Descriptive Statistics

Construct an ogive for these data.

Orgive of the Number of Items Returned by Customers

0

0.2

0.4

0.6

0.8

1

5 10 15 20 25 30

Number of Items Returned by Customers

Cu

mu

lati

ve R

elat

ive

Fre

qu

ency

Page 49: QBM117 Business Statistics Descriptive Statistics

Histograms for Large Data Sets

We have constructed a frequency distribution and histogram for a small data set by hand.

We are now going to construct a frequency distribution and histogram for a large data set.

To do this by hand would be very time consuming.

Page 50: QBM117 Business Statistics Descriptive Statistics

Excel

There are many computer software packages available which make dealing with large data sets quite manageable.

We will use Excel rather than a statistical package as most students are familiar with Excel.

However, some of the things Excel does are not “statistically” correct.

Page 51: QBM117 Business Statistics Descriptive Statistics

Defining Class Intervals

Note that the method we use to define class intervals for frequency distributions is slightly different to the method described in the text.

On page 20 of the text (page 19 of the abridged version) the class intervals for the frequency distribution for Example 2.1 are

0 up to but not including 1515 up to but not including 30

and so on

Page 52: QBM117 Business Statistics Descriptive Statistics

Using our method the class intervals would be

>0 up to and including 15

>15 up to and including 30

and so on

We use this method as it is consistent with the method of defining intervals used by Excel.

This way manually prepared frequency distributions will be the same as frequency distributions prepared using Excel.

Page 53: QBM117 Business Statistics Descriptive Statistics

Histograms in Excel

There are instructions on how to produce a histogram in Excel on page 23 of the text (page 21 of the abridged version).

We will modify some of these instructions.

Detailed instructions will be given in Tutorial 1.

Page 54: QBM117 Business Statistics Descriptive Statistics

The histogram produced by Excel needs some editing.

Excel produces histograms with gaps between the columns.

We need to remove these gaps.

We need to change the horizontal axis label.

We need to remove the legend.

And we need to add an appropriate title to the plot.

Page 55: QBM117 Business Statistics Descriptive Statistics

Excel allows you to specify the upper limits of the intervals.

However when it creates the histogram, it puts the upper limit in the center of the interval.

The upper limit should be at the extreme right of the interval.

Page 56: QBM117 Business Statistics Descriptive Statistics

We will use the Chart Wizard to edit the histogram produced by Excel.

As Excel places the upper limit in the middle of the column, we will determine the midpoint of each class and use the Chart Wizard to plot these values instead of the upper limits.

Page 57: QBM117 Business Statistics Descriptive Statistics

Example 2.1 from text

We are going to produce a histogram of the salary data from Exercise 2.5 from the text.

The data are stored in the file XR02-46.

Page 58: QBM117 Business Statistics Descriptive Statistics

Histogram from Excel

Histogram

0100200

30

50

70

90

11

0

upper limit

Fre

qu

en

cy

Frequency

Page 59: QBM117 Business Statistics Descriptive Statistics

Edited Histogram

Histogram of Annual Salaries of Univeristy Academics

0

20

40

60

80

100

120

25 35 45 55 65 75 85 95 105

Salary ($000's)

Fre

qu

ency

Page 60: QBM117 Business Statistics Descriptive Statistics

Reading for next lecture

• Chapter 2 Section 2.5

• Chapter 3 Sections 3.1-3.2

Exercises

• 2.3

• 2.9 omit part a and revise parts b and c to read “…>20 as the lower limit…”