business statistics

201
1 CHAPTER 1 INTRODUCTION TO STATISTICAL ANALYSIS Reading Newbold 1.1, 1.3, parts of 1.2. Anderson, Sweeney, and Williams Chapter 1 Wonnacott and Wonnacott Chapter 1 James T Mc Clave, P. George Benson Chapter 1 Introductory Comments This Chapter sets the framework for the book. Read it carefully, because the ideas introduced are a basis to this subject and research Methodology. 1. Random Sampling, Deductive and Inductive Statistics. Random Sampling Only in exceptional circumstances is it possible to consider every member of the population. In most cases only a sample of the population can be considered and the results contained from this sample must be generalized to apply to the population. In order that these generalizations should be accurate the sample must be random, that is, every possible sample has an equal chance of selection and the choice of a member of the sample must not be influenced by previous selection; this is simple random sampling.

Upload: derickmwansa

Post on 06-Nov-2015

134 views

Category:

Documents


31 download

DESCRIPTION

Business statistics

TRANSCRIPT

  • 1

    CHAPTER 1

    INTRODUCTION TO STATISTICAL ANALYSIS

    Reading

    Newbold 1.1, 1.3, parts of 1.2.

    Anderson, Sweeney, and Williams Chapter 1

    Wonnacott and Wonnacott Chapter 1

    James T Mc Clave, P. George Benson Chapter 1

    Introductory Comments

    This Chapter sets the framework for the book. Read it carefully, because the ideas

    introduced are a basis to this subject and research Methodology.

    1. Random Sampling, Deductive and Inductive Statistics.

    Random Sampling

    Only in exceptional circumstances is it possible to consider every member of the

    population. In most cases only a sample of the population can be considered and

    the results contained from this sample must be generalized to apply to the

    population.

    In order that these generalizations should be accurate the sample must be random,

    that is, every possible sample has an equal chance of selection and the choice of a

    member of the sample must not be influenced by previous selection; this is simple

    random sampling.

  • 2

    Example 1

    Suppose that a population consists of six measurements, 1, 2, 3, 4, 5, and 7. List

    all possible different samples of two measurements that could be selected from

    the population. Give the probability associated with each sample in a random

    sample of 2n measurement selected from the populations.

    Solution

    All possible samples are listed below

    Sample Measurements 1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    12

    13

    14

    15

    1,2

    1,3

    1,4

    1,5

    1,7

    2,3

    2,4

    2,5

    2,7

    3,4

    3,5

    3,7

    4,5

    4,7

    5,7

    Now let us suppose that I draw a single sample of n = 2 measurement from the 15

    possible sample of two measurements. The sample selected is called a random sample if

    every sample had an equal probability (1/15) being selected.

    It is rather unlikely that we would ever achieve a truly random sample, because the

    probabilities of selection will not always be exactly equal. But we do the best we can.

    One of the simplest and most reliable ways to select a random sample of n measurements

    from a population is to use a table of random numbers (See Appendix B). Random

    number tables are constructed in such a way that, no matter where you start in the tables

    no matter what direction you move, the digits occur randomly and with equal probability.

    Thus if we wished to choose a random sample of n = measurements from a population

    containing 100 measurements, we could label the measurements in the population from

    0 to 99 (or 1 to 100). Then referring to Appendix Vii and choosing a random starting

    point, the next 10 two-digit numbers going across the page would indicate the labels of

    the particular measurements to be included in the random sample. Similarly, by moving

    up or down the page, we would also obtain a random sample.

  • 3

    Example 2

    A small community consists of 850 families. We wish to obtain a random sample of 20

    families to ascertain public acceptance of a wage and price freeze. Refer to Appendix B

    to determine which families should be sampled.

    Solution

    Assuming that a list of all families in the community is available such as a telephone

    directory), we could label the families from 0 to 849 (or equivalently, from 1 to 850).

    Then referring to the Appendix, we choose a starting point. Suppose we have decided to

    start at line 1, column 4. Going down the page we will choose the first 20 three-digit

    numbers between 000 and 849 from Table B, we have

    511 791 099 671 152

    584 045 783 301 568

    754 750 059 498 701

    258 266 105 469 160

    These 20 members identify the 20 families that are to be included in our example/

    Deductive and Inductive Statistics.

    The reasoning that is used in statistics hinges on understanding two types of logic,

    namely deductive and inductive logic. The type of logic that reasons from the particular

    (sample) to the general (Population) is known as inductive logic, while the type that

    reasons from the general to the particular is known as deductive logic.

    Learning Objectives

    After working through this chapter, you should be able to:

    Explain what random sampling is

    Explain the difference between a population and a sample

  • 4

    CHAPTER 2

    METHODS OF ORGANISING AND PRESENTING DATA

    Reading

    Newbold Chapter 2

    James T Mc Clave and P George Benson Chapter 2

    Tailoka Frank P Chapter 3

    Introductory Comments

    This Chapter contains themes to do with the understanding of data. We find graphical

    representations from the data, which allow one to easily see its most important

    characteristics. Most of the graphical representations are very tedious to construct

    without the use of a computer. However, one understands much more if one tries a few

    with pencil and a paper.

    Graphical Representations Of Data

    Types of business data; methods of representation of qualitative data, cumulative

    frequency distribution.

    Types of business data. Although the number of business phenomena that can be

    measured is almost limitless, business data can generally be classified as one of two

    types: quantitative or qualitative.

    Quantitative data are observations that are measured on a numerical scale. Examples of

    quantitative business data are:

    i. The monthly unemployment percentage ii. Last years sales for selected firms. iii. The number of women executives in an industry.

    Qualitative data is one that is not measurable, in the sense that height is measured, or

    countable, as people entering a store. Many characteristics can be classified only in one

    of asset of category. Examples of qualitative business data are:

  • 5

    i) The political party affiliations of fifty randomly selected business executives.

    Each executive would have one and only one political party affiliation.

    ii) The brand of petrol last purchased by seventy four randomly selected car owners.

    Again, each measurement would fall into one and only one category.

    Notice that each of the examples has nonnumerical or qualitative measurements.

    Graphical methods for describing qualitative data.

    (a) The Bar Graph

    For example, suppose a womans clothing store located in the downtown area of a large city wants to open a branch in the suburbs. To obtain some information

    about the geographical distribution of its present customers, the Store manager

    conducts a survey in which each customer is asked to identify her place of

    residence with regard to the citys four quadrants. Northwest (NW), North east (NE), Southwest (SW), or Southeast (SE). Out of town customers are excluded

    from the survey. The response of n = 30 randomly selected resident customers

    might appear as in Table 1.1 (note that the symbol n is used here and throughout

    this course to represent the sample size i.e. the number of measurements in a

    sample). You can see that each of the thirty measurements fall in one and only

    one of the four possible categories representing the four quadrants of the city.

    Table 1.1. Customer resident Survey: n = 30

    Customer Resident Customer Residence Customer Residence

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    NW

    SE

    SE

    NW

    SW

    NW

    NE

    SW

    NW

    SE

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    NW

    SE

    SW

    NW

    SW

    NE

    NE

    NW

    NW

    SW

    21

    22

    23

    24

    25

    26

    27

    28

    29

    30

    NE

    NW

    SW

    SE

    SW

    NW

    NW

    SE

    NE

    SW

    A natural and useful technique for summarizing qualitative data is to tabulate the

    frequency or relative frequency of each category.

    Definition:

  • 6

    The frequency for a category is the total number of measurements that fall in the

    category. The frequency for a particular category, say category i will be denoted by the

    symbol if .

    The relative frequency for a category is the frequency of that category divided by the

    total number of measurements; that is, the relative frequency for category I is

    Relative frequency = n

    f i

    Where n = total number of measurements in the sample

    if = frequency for the i category.

    The frequency for a category is the total number of measurements in that category,

    whereas the relative frequency for a category is the proportion of measurements in the

    category. Table 1.2 shows the frequency and relative frequency for the customer

    residences listed in Table 1.1. Note that the sum of the frequencies should always equal

    the total number of measurements in the sample and the sum of the relative frequencies

    should always equal 1 (except for rounding errors) as in Table 1.2.

    Category Frequency Relative Frequency

    NE

    NW

    SE

    SW

    5

    11

    6

    8

    5/30 = .167

    11/30 = .367

    6/30 = .200

    8/30 = .267

    Total 30 1

    A common means of graphically presenting the frequencies or relative frequencies for

    qualitative data is the bar chart. For this type of chart, the frequencies (or relative

    frequencie) are represented by bars-one bar for each category.

    The height of the bar for a given category is proportional to the category frequency (or

    relative frequency). Usually the bars are placed in a vertical position with the base of the

    bar on the horizontal axis of the graph. The order of the bars on the horizontal axis is

    unimportant. Both a frequency bar chart and a relative frequency bar chart for the

    customers residence are shown in Figure 1.1.

  • 7

    10

    Relative

    5 Frequency

    Frequency

    0

    NE NW SE SW

    Residential quadrant

    a) A frequency bar chart.

    .50

    .25

    0

    NE NW SE SW

    Residential Quadrant

    b) A Relative Frequency bar chart.

    Figure 1.1

    b) The Pie Chart

  • 8

    The second method of describing qualitative data sets is the pie chart. This is

    often used in newspaper and magazine articles to depict budgets and other

    economic information. A complete circle (the pie) represents the total number of

    measurements. This is partitioned into a number of slices with one slice for each

    category. For example, since a complete circle spans 360o, if the relative

    frequency for a category is .30, the slice assigned to that category is 30% of 360

    or (.30) (36) = 108o.

    108o

    Figure 1.2 The portion of a pie char corresponding to a relative frequency of .3.

    Graphical Methods for Describing Quantitative Data.

    The Frequency Histogram and Polygon.

    The histogram (often called a frequency distribution) is the most popular graphical

    technique for depicting quantitative data. To introduce the histogram we will use thirty

    companies selected randomly from the 1980 Financial Magazine (the top 500 companies

    in sales for calendar year 1979). The variable X we will be interested in is the earnings

    per share (E/S) for these thirty companies. The earnings per share is computed by

    dividing the years net profit by the total number of share of common stock outstanding. This figure is of interest to the economic community because it reflects the economic

    health of the company.

    The earnings per share figures for the thirty companies are shown (to the nearest ngwee)

    in Table 1.3.

    Company E/S Company E/S` Company E/S

  • 9

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    1.85

    3.42

    9.11

    1.96

    6.48

    5.72

    1.72

    .8.56

    0.72

    6.28

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    2.80

    3.46

    8.32

    4.62

    3.27

    1.35

    3.28

    3.75

    5.23

    2.92

    21

    22

    23

    24

    25

    26

    27

    28

    29

    30

    2.75

    6.58

    3.54

    4.65

    0.75

    2.01

    5.36

    4.40

    6.49

    1.12

    How to construct a Histogram

    1. Arrange the data in increasing order, from smallest to largest measurement.

    2. Divide the interval from the smallest to the largest measurement into between five

    and twenty equal sub-intervals, making sure that:

    a) Each measurement falls into one and only one measurement class.

    b) No measurement falls on a measurement class boundary.

    Use a small number of measurement classes if you have a small amount of

    data; use a larger number of classes for large amount of data.

    3. Compute the frequency (or relative frequency) of measurements in each

    measurement class.

    4. Using a vertical axis of about three-fourths the length of the horizontal axis, plot

    each frequency (or relative frequency) as a rectangle over the corresponding

    measurement class.

    Using a number of measurements, n = 30, is not large, we will use six classes to

    span the distance between the smallest measurements, 0.72, and the largest

    measurement, 9.11. This distance divided by 6 is equal to

    Largest measurement smallest measurement = 9.11 0.72 Number of intervals 6

    1.4

    By locating the lower boundary of the first class interval at 0.715 (slightly below the

    smallest measurement) and adding 1.4, we find the upper boundary to be 2.115. Adding

  • 10

    1.4 again, we find the upper boundary of the second class to be 3.515. Continuing this

    process, we obtain the six class intervals shown in the table below. Note that each

    boundary falls on a 0.005 value (one significant digit more than the measurement), which

    guarantees that no measurement will fall on a class boundary.

    The next step is to find the class frequency and calculate the class relative frequencies

    Class Measurement

    Class

    Class

    Frequency

    Class relative

    Frequency

    1

    2

    3

    4

    5

    6

    0.715 2.115 2.115 3.515 3.515 4.915 4.915 6.315 6.315 7.715 7.715 9.115

    8

    7

    5

    4

    3

    3

    8/30 = .267

    7/30 = .233

    5/30 = .167

    4/30 = .133

    3/30 = .100

    3/30 = .100

    Total 30 1.00

    Table 1.4

    Definition

    The class frequency for a given class, say class i, is equal to the total number of

    measurements that fall in that class. The class frequency for class I is denoted by the

    symbol if .

    Definition

    The class relative frequency for a given class, say class i, is equal to the class frequency

    divided by the total number n of measurements, i.e.

    Relative frequency for class i = n

    fi

  • 11

    8

    6

    4

    2

    0

    0.517 2.115 3.515 4.915 6.315 7.715 9.115

    Earnings per share

    a) Frequency Histogram.

    .3

    .2

    .1

    0.715 2.115 3.515 4.915 6.315 7.715 9.115

    Earnings per share

    (b) Relative Frequency histogram

    Cumulative Frequency Distribution

  • 12

    It is often useful to know the number or the proportion of the total number of

    measurements that are less than or equal to those contained in a particular class. These

    quantities are called the class cumulative frequency and the class cumulative relative

    frequency respectively.

    For example, if the classes are numbered from the smallest to the largest values of x, 1, 2,

    3, 4, . . . , then the cumulative frequency for the third class would equal the sum of the

    class frequencies corresponding to classes 1, 2, and 3.

    Cumulative frequency for class 3213 fff

    Similarly, cumulative relative frequency for class n

    fff 3213

    where n is the total

    number of measurements in the sample.

    Cumulative frequencies and cumulative relative frequencies for earning per share data.

    Class No. Measurement

    class

    Class

    Frequency

    Cumulative

    frequency

    Class Relative

    Frequency

    Class

    Cumulative

    Relative

    Frequency

    1

    2

    3

    4

    5

    6

    0.715 - 2.115

    2.115 3.515

    3.155 4.915

    4.915 6.315

    6.315 7.715

    7.715 9.115

    8

    7

    5

    4

    3

    3

    30

    8

    (8 + 7) = 15

    (15 + 5) = 20

    (20 + 4) = 24

    (24 + 3) = 27

    (27 + 3) = 30

    8/30 = .267

    7/30 = .233

    5/30= .167

    4/30 = .133

    3/30 = .100

    3/100 = .100

    8/30 =.267

    15/30 = .500

    20/30 = .667

    24/30 = .800

    27/30 = .900

    30/30 = 1.00

    Cumulative relative frequency Distribution for earnings per share data.

  • 13

    1.0

    Cumulative

    Relative .8

    Frequency

    .6

    .4

    .2

    0.715 2.115 3.115 4.915 6.315 7.715 9.115

    Earnings per share

    Learning Objective

    After working through this Chapter you should be able to:

    Draw a pie chart, bar chart and also construct frequency tables, relative frequencies, and histogram.

    Interpret the diagrams. You will understand the importance of captions, axis labels and graduation of axes.

    CHAPTER 3

  • 14

    DESCRIPTIVE MEASURES

    Reading

    Newbold Chapter 2

    Wonnacott and Wonnacolt Chapter 2

    Tailoka Frank P. Chapter 4

    James T McClave , Lawrence Lapin L and P George Benson Chapter 3

    Introductory Comments

    This Chapter contains themes which allow one to easily se the most important

    characteristics of data. The idea is to find simple numbers like the mean, variance which

    will summarize those characteristics.

    3. Numerical Description of Data.

    The Mode; A measure of Central tendency.

    Definition.

    The mode is the measure that occurs with the greatest frequency in the data set.

    Because if emphasizes data concentration, the mode has application in marketing

    as well as in description of large data sets collected by state and federal agencies.

    Unless the data set is rather large, the mode may not be very meaningful. For

    example, consider the earning per share measurements for the thirty financial

    companies we used in the previous chapter. If you were to re-examine these data,

    you would find that none of the thirty measurements is duplicated in this sample.

    This, strictly speaking, all thirty measurements are mode for this sample.

    Obviously, this information is of no practical use for data description. We can

    calculate a more meaningful mode by constructing a relative frequency histogram

    for the data. The interval containing the most measurements is called the modal

    class and the mode is taken to be the midpoint of this class interval.

    The modal class, the one corresponding to the interval 0.715 2.115 lies to the left side of the distribution. The mode is the midpoint of this interval; that is

  • 15

    Mode = 415.12

    115.2715.0

    In the sense that the mode measures data concentration, it provides a measure of central

    tendency of the data.

    The Arithmetic mean

    A measurement of Central Tendency

    The most popular and best understood measure of central Tendency for a quantitative

    data set is the arithmetic (or simply the mean):

    Definition

    The mean of a set of quantitative data is equal to the sum of the measurements divided by

    the number of measurement contained in the data set. The mean of a sample is denoted

    by x (read x bar) and represent the formula for this calculation as follows:-

    Example 1

    Calculate the mean of the following five simple measures,. 5, 3, 8, 5,6.

    Solution

    Using the definition of the sample mean and demand shorthand notation we find

    .4.55

    27

    5

    65835

    5

    5

    11

    ixx

    The mean of this sample is 5.4

    The sample mean will play an important role in accomplishing our objective of making

    inferences about populations based on sample information. For this reason it is important

    to use a different symbol when we want to discuss the mean of a population of

    measurement s i.e. the mean of the entire set of measurements in which we are interested.

    We use the Greek letter (mu) for the population mean

    The Median: Another measure of Central Tendency

  • 16

    The median of a data set is the number such that half the measurements fall below the

    median and half fall above. The median is of most value in describing large data sets. If

    the data set is characterized by a relative frequency histogram, the median is the point on

    the x-axis such that half the area under the histogram lies above the median and half lies

    below. For a small, or even a large but finite, number of measurements, there may be

    many numbers that t satisfy the property indicated in the figure on the next page. For this

    reason, we will arbitrarily calculate the media of a data.

    Calculating a median

    1. If the number of n of measurements in a data set is odd, the median is the middle

    number when the measurements are arranged in ascending (or descending) order.

    2.. If the number of n of measurements is even, the median is the mean of the two

    middle measurements when the measurements are arranged in ascending (or

    descending) order.

    Example 2

    Consider the following sample of n = 7 measurements.

    5, 7, 4, 5, 20, 6, 2

    a) Calculate the median of this sample

    b) Eliminate the last measurement (the 2) and calculate the median of the remaining n = 6 measurements.

    Solution

    a) The seven measurements in the sample are first arranged in ascending order

    2, 4, 5, 5, 6, 7, 20

    Since the number of measurements is odd, the median is the middle measure.

    Thus, the median of this sample is 5.

    b) After removing the 2 from the set of measurements, we arrange the sample

    measurements in ascending order as follows:

    4, 5, 5, 6, 7, 20

    Now the number of measurements is even, and so we average the middle two

    measurements. The median is (5+6)/2 = 5.5.

  • 17

    Comparing the mean and the median

    1. If the median is less than the mean, the data set is skewed to the right.

    Relative

    Frequency

    Median Mean

    Rightward Skewness measurement units

    deviationdards

    medianmean

    deviationdards

    ModeMeanSkewness

    tan

    )(3

    tan

    2. The median will equal the mean when the data set is symmetric.

    Median Mean

    Measurement unit

    Symmetry

  • 18

    3. If the median is greater than the mean, the data set is skewed to the left.

    Mean Median

    The range: A measure of variability

    Measures of Variation

    Definition:

    The range of a data. Set is equal to the largest measurement minus the smallest measure.

    When dealing with grouped data, there are two procedures which are not adopted for

    determining the range.

    1. Range = class mark of highest class class mark of lowest class. 2. Range = upper class boundary of highest class lower class boundary of lowest

    class.

    Variance and Standard Deviation

    The Sample Variance for a sample of n measurements is equal to the squared distances

    from the mean divided by (n-1). In symbols using 2S to represent the simple variances,

    1

    )(1

    2

    2

    n

    xx

    S

    n

    i

    i

    The second step in finding a meaningful measure of data variability is to calculate the

    standard deviation of the data set.

  • 19

    The sample standard deviation , s, is defined as the positive square root of the sample

    variance, 2S thus,

    1

    )(1

    2

    2

    n

    xx

    SS

    n

    i

    i

    The corresponding quantity, the population standard deviation, measure the variability of

    the measurements in the population and is denoted by (sigma). The population

    variances will therefore be denoted by 2 .

    Example 3

    Calculate the standard deviation of the following sample. 2, 3, 3, 3, 4.

    Solution

    For this set of data, .3x Then

    71.05.04

    2

    15

    )34()33()23()32( 2222

    S

    Shortcut formular for simple variance

    1

    1

    )()(

    2

    1

    1

    1

    2

    2

    2

    n

    n

    x

    x

    n

    n

    tmeasuremensampleofsumtmeasuremensampleofsquareofsum

    S

    n

    in

    i

    i

  • 20

    Example 4

    Use the shortcut formula to compute the variances of these two samples of five measures

    each.

    Sample 1: 1, 2, 3, 4, 5 Sample 2:2, 3, 3, 3, 4

    Solution

    We first work with sample 1. The quantities needed are:

    n

    i

    x1

    1 = 1 + 2 + 3 + 4 + 5 = 15, and

    552516941

    543215

    1

    222222

    1

    i

    x

    5.24

    10

    4

    4555

    4

    5

    )15(55

    15

    5

    2

    1

    25

    12

    12

    n

    i

    i

    ix

    x

    S

    Similarly, for sample 2 we get

    5

    1i

    ix = 2 + 3 + 3 + 3 + 4 = 15

    Add 47169994433325

    1

    222222

    1 i

    x

  • 21

    Then the variance for sample 2 is

    5.04

    2

    4

    4547

    4

    5

    )15(47

    15

    5

    2

    1

    25

    12

    12

    n

    i

    i

    ix

    x

    S

    Example 5

    The earnings per share measurements for thirty companies selected randomly from 1980

    Financial/Daily mail are listed here. Calculate the sample variance 2S and the standard

    deviation, S, from these measurements.

    1.85

    3.42

    9.11

    1.96

    6.48

    5.72

    1.72

    8.56

    0.72

    6.28

    2.80

    3.46

    8.32

    4.62

    3.27

    1.35

    3.28

    3.75

    5.23

    2.92

    2.75

    6.58

    3.54

    4.65

    0.75

    2.01

    5.36

    4.40

    6.49

    1.12

    Solution

    The calculation of the sample variance , 2S , would be very tedious for this example if we

    tried to use the formula,

    130

    )(30

    1

    2

    2

    i

    i xx

    S

    because it would be necessary to compute all thirty squared distances from the mean.

    However, for the shortcut formula we need only compute:

  • 22

    4331.5

    29

    30

    )47.122(5239.657

    130

    30

    5239.57.6)12.1(...)42.3()85.1(

    47.12212.1...42.385.1

    230

    1

    230

    1

    1

    2

    2

    30

    1

    2222

    30

    1

    i

    i

    i

    i

    i

    i

    i

    x

    x

    S

    x

    andx

    Notice that we retained four decimal places in the calculation of 2S to reduce rounding

    errors, even though the original data were accurate to only two decimal places.

    The standard deviation is

    33.24331.52 SS

    Interpreting the Standard Deviation

    If we are comparing the variability of two samples selected from a population, the sample

    with the larger standard deviation is the more variable of the two. Thus, we know how to

    interpret the standard deviation on a relative or comparative basis, but we have not

    explained how it provides a measure of variability for a single sample.

    One way to interpret the standard deviation as a measure of variability of a data set would

    be to answer questions each as the following. How many measurements are within 1

    standard deviation of the mean? How many measurements are within 2 standard

    deviation of the mean? For a specific data set, we can answer the questions by counting

    the number of measurements in each of the intervals. However, if we are interested on

    obtaining a general answer to these questions, the problem is more difficult. There are

    two guidelines to help answer the questions of how many measurements fall within 1, 2,

    and 3 standard deviations of the mean. The first set, which applied to any sample, is

    derived from a theorem proved by the Russian Mathematician Chebyshev. The second

    set, the Empirical Rule is based on empirical evidence that has accumulated over time

    and applies to samples that posses mould shaped frequency distributions those that are

    approximately symmetric, with a clustering of measurement about the mid point of the

  • 23

    distribution (the mean, median and mode should all be about the same) and that laid off

    as we move away from the center of the histogram.

    Aids to the Interpretation of a Standard deviation.

    1. A rule (from Chebyshevs theorem) that applied to any sample of measure regardless of the shape of the frequency distribution.

    a. It is possible that none of the measurements will fall within 1 standard

    deviation of the means ).( SxtoSx

    b. At least of the measurement will fall within 2 standard deviations of the

    mean ).22( SxtoSx

    c. At least 8/9 of the measurements will fall within 3 standard deviations of

    the mean ).33( SxtoSx

    2. A rule of thumb, called the empirical rule, that applies to samples with frequency

    distributions that are mould-shaped:

    a) Approximately 68% of the measurements will fall within 1 standard

    deviation of the mean ).( SxtoSx

    b) Approximately 95% of the measurements will fall within 2 standard

    deviations of the mean ).22( SxtoSx

    c) Essentially all the measurements will fall within 3 standard deviations of

    the mean ).33( SxtoSx

    Example 6

    Refer to the data for earnings per share for thirty companies selected randomly from the

    1980 Financial/Daily Mail. .33.2,08.4 Sx Calculate the fraction of the thirty

    measurements that lie within the intervals ,3,2, SxandSxSx and compare the

    results with those of the Chebyshev and Empirical rule.

  • 24

    Solution

    ), SxSx )41.6,75.1()33.208.4,33.208.4(

    A check of the measurements show that 19 of the 30 measurements i.e., approximately

    63% are within 1 standard deviation of the mean.

    )74.8,58.0()66.408.4,66.408.4()2,2( SxSx

    Contains 29 measurements, or approximately 97% of the n = 30 measurements. Finally

    the 3 standard deviation interval around x

    ).07.11,91.2()99.608.4,99.608.4()3,3( SxSx

    contains all the measurements. These 1, 2 and 3 standard deviations percentages (63, 97,

    and 100) agree fairly well with the approximations of 68%, 95% and 100%, given by the

    Empirical Rule for mould-shape distributions.

    Example 7

    The aid for interpreting the value of a standard deviation can be put to an immediate

    practical use as a check on the calculation of the standard deviation. Suppose you have a

    data set for which the smallest measurement is 20 and the largest is 80. You have

    calculated the standard deviation of the data set to be S = 190.

    How can you use the Chebyshev or empirical rule to provide a rough check on your

    calculated value of S?

    Solution

    The larger the number of measurements in a data set, the greater will be the tendency for

    very large or very small measurements (extreme values) to appear in the data set. But

    from the Rules, you know that most of the measurements (approximately 95% if the

    distribution is mould-shaped) will be within 2 standard deviations of the mean, and

    regardless of how many measurements are in the data set, almost all of them will fall 3

    standard deviations of the mean. Consequently we would expect the range to be between

    4 and 6 standard deviations i.e. between 4s and 6s.

  • 25

    Range largest measurement smallest measurement = 80 20 = 20.

    Sx 2 x Sx 2

    Range 4S

    The relation between the range and the Standard deviation.

    Then if we let the range equal 6S, we obtain

    Range = 6S

    60 = 6S

    S = 10

    Or, if we let the range equal 4S, we obtain a larger (and more conservative) value for S,

    namely

    Range = 4S

    60 = 6S

    S = 15

    Now you can see that it does not make much difference whether you let the range equal

    4S (which is more realistic for most data set) or 6S (which is reasonable for large data

    sets). It is clear than your calculated value, S = 190, is too large, and you should check

    your calculations.

  • 26

    Calculating a mean and standard Deviation from Grouped data

    If your data have been grouped in classes of equal width and arranged in a frequency

    table, you can use the following formulas to calculate x , S2, and S

    ix Midpoint of the ith class

    if = Frequency of the ith class

    K = Number of classes

    2

    2

    1

    12

    12

    1

    1

    SS

    n

    n

    fx

    fx

    S

    n

    fx

    x

    K

    i

    K

    i

    ii

    i

    K

    i

    ii

    Example 8

    Compute the mean and standard deviation for the earnings per share data using the

    grouping shown in the frequency Table 1.4.

    Solution

    The six class interval, midpoints, and frequencies are shown in the accompanying table.

    Table 1.4 Earnings per share

    Class Class Midpoint Class frequency

    if

    0.715 2.115

    2.115 3.515

    3.515 4.915

    4.915 6.315

    6.315 7.015

    7.715 9.115

    1.415

    2.815

    4.215

    5.615

    7.015

    8.415

    8

    7

    5

    4

    3

    3

    30 ifn

  • 27

    1

    03.430

    85.120

    30/)3)(415.8(...)5)(215.4()7)(815.2()8)(415.1(

    2

    1

    12

    12

    1

    n

    n

    fx

    fx

    S

    n

    fx

    x

    K

    i

    K

    i

    ii

    i

    K

    i

    ii

    We found

    K

    i

    ii fx1

    = 120.85 when we calculated x, therefore

    .35.25060.5

    5060.5

    29

    82408.48649875.646

    130

    30/)85.120())3()415.8(...)7()815.2()8()415.1(( 32222

    S

    S

    You will notice that values of ,,2Sx and S from the formulas for grouped data usually do

    not agree with these obtained for the raw data ( 03.4x and S = 2.311). This is because

    we have substituted the value of the class mid point for each value of x in a class

    interval. Only when every value of a x in each class is equal to its respective class

    midpoint will the formulas for grouped and for ungrouped data give exactly the same

    answers for ,,2Sx and S. Otherwise, the formulas for grouped data will give only the

    approximations to these numerical descriptive measures.

    Measures of Relative Standing

    Descriptive measures of the relationship of a measurement to the rest of the data are

    called measure of relative standing.

    One measure of relative standing of a particular measurement is its percentile ranking.

  • 28

    Definition

    Let nxxx ,...,, 21 be a set of n measurements arranged in increasing (or decreasing)

    order. The pth percentile is a number x such that p% of the measurements fall below the

    pth percentile and (100 p)% fall above it.

    For example: if oil company A report that its yearly sales are in the 90th

    percentile of all

    companies in the industry, the implication is that 90% of all oil companies have yearly

    sales less that As, and only 10% have yearly sales exceeding company As.

    Relative

    Frequency

    .90

    .10

    Company As sales. Yearly sales.

    Another measure of relative standing in popular use is the Z-score. The Z-score makes

    use of the mean and standard deviation of the data set in order to specify the location of a

    measurement.

    Definition

    The sample Z-score for a measurement x is

    S

    xxZ

    The population Z-Score for a measurement x is

    xZ

    The Z-score represents the distance between a given measurement x and the mean

    expressed in standard units.

  • 29

    Example 9

    Suppose 200 steel workers are selected, and the annual income of each is determined.

    The mean and standard deviation are 000,2,000,14 KSKx

    Suppose Chipos annual income is K12, 000 what is his sample Z-score?

    K8,000 K12,000 K14,000 K20,000

    Sx 3 x x Sx 3

    Annual income of steel workers.

    Solution

    Chipos annual income lies below the mean income of the 200 steel workers.

    We compute 0.12000

    1400012000

    S

    xxZ

    Which tells us that Chipos annual income is 1.0 standard deviation below the sample mean, in short, his sample Z-score is 1.0.

    Example 10

    Suppose a female bank executive believes that her salary is low as a result of sex

    discrimination. To try to substantiate her belief, she collects information on the salaries

    of her counterparts in the banking business. She finds that their salaries have a mean of

    K17, 000 and a standard deviation of K1, 000. Her salary is K13, 500. Does this

    information support her claim of sex discrimination?

    Solution

    The analysis might proceed as follows: First, we calculate the Z-score for the womans salary with respect to those of her male counterparts. Thus

    5.31000

    1700013500

    Z

  • 30

    The implication is that the womans salary is 3.5 standard deviations below the mean of the male distribution. Furthermore, if a check of the male salary data shows that the

    frequency distribution is mould-shaped, we can infer that very few salaries in this

    distribution should have a Z-score less than 3, as shown in the figure.

    Relative

    Frequency

    Z-Score = -3.5

    13.500 17,000 Salary (K)

    Male Salary Distribution

    Therefore, a Z-score of 3.5 represents either a measurement from a distribution different from the male salary distribution or a very unusual (highly improbable) measurement for

    the male salary distribution.

    Well, which of the two situations do you think prevails? Do you think the womans salary is simply an usually low one in the distribution of salaries, or do you think her

    claim of salary discrimination is justified? Most people would probably conclude that

    her salary does not come from the male salary distribution.

    However, the careful investigator should require more information before inferring sex

    discrimination as the case. We would want to know more about the data collection

    technique the woman used, and more about her competence at her job. Also perhaps

    other factors like the length of employment should be considered in the analysis.

  • 31

    Learning Objectives

    After working through this Chapter you should be able to

    Calculate the arithmetic mean, standard deviation, variance, median, and quartiles for grouped or ungrouped data.

    Explain the use of all the above quartiles.

  • 32

    Sample Examination Questions

    1. (a) Briefly state, with reasons, the type of chart which would best convey the information for each of the following:

    (i) Students at the University classified by programme of study.

    (ii) Members of a professional association classified by age.

    (iii) Numbers of cars taxed for 2002, 2003 and 2004 in areas A, B and C of a city.

    (b) The weekly cost (K) of rented accommodation was recorded for 100

    students living in an area.

    Amount in Thousand of

    Kwachas

    Frequency

    0 4 3 5 9 17

    10 14 24 15 19 31 20 24 19 25 - 29 6

    (i) Draw a histogram.

    (ii) Give the median and the interquartile range.

    (iii) Calculate the mean, mode, and standard deviation.

    (iv) What conclusions can you draw from the data?

  • 33

    2. The data below are per capita per week numbers of cigarettes sold for 38 states in a country.

    19.20 26.82 19.24 27.18 25.96 30.14

    29.27 21.10 28.91 29.92 29.64 21.94

    22.58 29.92 26.91 43.40 30.18 23.86

    28.56 24.75 24.32 24.78 22.17

    20.96 27.38 24.44 26.89 41.46

    21.08 23.57 15.80 32.10 24.44

    29.04 31.34 29.60 23.12 17.08

    (a) Plot the data using an approximate graphical method.

    (b) Give the mean, the median and the mode.

    (c) Assuming this is a normal distribution, and given a standard deviation of these figures of 4.387, what proportion of the states would expect to have

    more than 20 cigarettes smoked per capita per week?

    (d) How does this compare with the actual situation as shown in the table above?

    3. (a) Briefly state, with reasons, the type of chart which would best convey in each of the following:

    (i) A countrys total import of cigarettes by source.

    (ii) Students in higher education classified by age.

    (iii) Number of students registered for secondary school in year 2001, 2002 and 2003 for areas X, Y, and Z of a country.

    (b) The weekly cost (K000) of rented accommodation was recorded for 40 students living in an area.

    35 56 33 30 31 55 29 27

    21 32 43 33 29 27 30 29

    26 26 27 26 35 32 28 27

    31 27 33 24 27 28 33 49

    22 19 46 36 26 38 36 55

  • 34

    (i) Summarize the data in a frequency distribution table.

    (ii) Calculate the mean and the standard deviation from your frequency table.

    (iii) Plot a histogram for these data. What is the value of the median?

    (iv) What conclusions can you draw from these data?

    4. (a) Given below is a sample of 25 observations, calculate:

    (i) The range (ii) The arithmetic mean

    (iii) The median (iv) The lower quartile

    (v) The upper quartile (vi) The quartile deviation

    (vii) The mean deviation (viii) The standard deviation

    5 18 29 42 50 61

    8 20 33 43 54 63

    10 21 35 46 56 67

    11 25 39 48 58 69

    14

    (b) Explain the term measure of dispersion and state briefly the advantage and disadvantage of using the following measures of dispersion:

    (i) Range

    (ii) Mean deviation

    (iii) Standard deviation

  • 35

    5. A machine produces the following number of rejects in each successive period of five minutes.

    (a) Construct a frequency distribution from these data, using seven class

    intervals of equal width.

    (b) Using the frequency distribution, calculate:

    (i) the mean (ii) the standard deviation

    (c) Briefly explain the meaning of your calculated measures.

    20 55 58 40 15 28 21 29 30 17

    84 58 7 40 41 67 28 19 26 26

    16 25 55 43 22 66 32 29 11 21

    26 42 57 73 27 66 7 23 17 35

    27 42 13 28 24 37 34 27 24 12

  • 36

    CHAPTER 4

    PROBABILITY

    Reading

    Newbold Chapter 3

    Tailoka Frank P Chapter 8

    Wonnacott and Wonnacolt Chapter 3

    Introductory Comments

    Probability is more abstract than other parts of this subject, and solving the problems may

    be difficult. The concepts are very important for statistics because it is the rules of

    probability that allow one to reason about uncertainty. Independence and conditional

    probability are important to understand clearly for the purpose of statistical investigation.

    4. Elementary Probability

    Counting Techniques. Introduction of the probability concept. The event and the

    event relationships. Probability trees, conditional probability and statistical

    independence.

    Counting techniques: In calculating probabilities, it is essential to be able to work

    out n(s) and n(E) as straight-forwardly as possible. Permutations and

    combinations are very helpful here. We begin with the following basic principle.

    Fundamental principle of counting. If two operations A, B are carried out, and

    there are M different ways of carrying out A and k different ways of carrying out

    B, then the combined A and B may be carried out in M x K different ways.

    Example 1

    Suppose a license plate contains two distinct letters followed by three digits with

    the first digit not zero. How many different license places can be printed?

  • 37

    The first letter can be printed in 26 different ways, the second letter in 25 different ways

    (since the letter printed first cannot be chosen for a second letter, the first digit in 9 ways

    and each of the other two digits in 10 ways. Hence

    26.25.9.10.10 = 585,000

    Different plates can be printed.

    Example 2.

    A toy manufacturer makes a wooden toy in two parts, the top part may be coloured red,

    white or blue and the bottom part brown, orange, yellow or green. How many differently

    coloured toys can be produced?

    A red top part may be combined with a bottom part of any of the four possible colours.

    Similarly, either a white or a blue top part may be combined with each of the four

    different coloured parts. Hence the number of different coloured toys is

    1243

    Permutations; An arrangement of a set of n objects in a given order is called a

    permutation of the objects (taken all at a time). An arrangement of any nr of these objects in a given order is called an r-permutation or a permutation of the objects taken r at a time.

    Example 3

    Consider the set of letters a, b, c and d. Then

    i) bdca, dcba and acdb are permutations of the 4 letters (taken all at a time).

    ii) bad, adb and bca are permutations of 4 letters taken 3 at a time.

    iii) ad, ca, da and bd are permutations of the 4 letters taken 2 at a time.

  • 38

    Example 4

    The telephone switchboard in the company requires two operators whose chairs

    (positions) are side by side. When the telephone operators go to lunch, two of the four

    Secretaries take their places. If we make a distinction between the two operatorss positions, in how may ways can the four secretaries fill them?

    We can answer this question by determining the number of possible permutations of 4

    things taken 2 at a time. There are 4 secretaries, A, B, C and D, to fill the first position.

    Once this position has been filled, there are only 3 secretaries to fill the second positions.

    The figure below

    Ways to fill Ways to fill second Counting the number of

    First position position permutations

    B 1

    A C 2

    D 3

    A 4

    B C 5

    D 6

    A 7

    C B 8

    D 9

    A 10

    D B 11

  • 39

    C 12

    The tree diagram on the page illustrates that there are 4.3 = 12 possible permutations of

    four things taken two at a time. Suppose that n is the number of distinct objects from

    which an ordered arrangement is to be derived, and r is the number of objects in the

    arrangement. The number of possible ordered arrangements is the number of

    permutations of things taken r at a time. This is written symbolically as ),( rnP in

    general, or rn P .

    )1()1(..).2)(1(),( rnnnnrnP

    We multiply the right-hand side of equation (1) by

    )!/()!( rnrn

    This is equivalent to multiplying by 1, we obtain

    )!1(

    !

    )!(

    )!)(1(...)2)(1(

    )!(

    )!1()1(..).2)(1(),(

    n

    n

    rn

    rnrnnnn

    rn

    nrnnnnrnP

    Example 5

    i) In a stock room, 5 adjacent bins are available for storing 5 different items. The

    stock of each item can be stored satisfactorily in any bin. In how many ways can

    we assign the 5 items to the 5 bins?

    We get the answer by evaluating P(5, 5) which is

    1201.2.3.4.5)!55(

    !5)5,5(

    P

    ii) Suppose that there are 6 different parts to be stocked, but only 4 bins are

    available.

    To find the number of possible arrangements, we need to determine the number of

    permutations of 6 things taken 4 at a time, which is

  • 40

    360!2

    1.2.3.4.5.6

    )!46(

    !6)4,6(

    P

    Example 6

    How many permutation are there of 3 objects, say, a, b and c?

    There are 63.2.1!3)!33(

    !3)3,3(

    P such permutations.

    These are abc, acb, bac, bca, cab, cba.

    Permutation with repetitions:

    The number of permutations of n objects of which 1n are alike, 2n are alike of another

    kind . . . . rn are alike of a further kind, is given by

    rnnnnwhere

    nnn

    n

    ...

    !..!.!

    !

    21

    21

    Example 7

    Find the number of permutation of the word ACCOUNTANTS

    Total number of letters in ACCOUNTANTS is 11 out of which there are two Cs, two Ns, and two ts. So the required number of permutation s

    .2494800!2!2!2!2

    !11

    Combinations

    A combination is an arrangement of objects without regard to order.

  • 41

    Example 8

    The combinations of the letters a, b, c, d taken 3 at a time are

    {a, b, c}, {a, b, d}, (a, c, d}, (b, c, d} or simply

    abc, abd, acd, bcd, . Observe that the following combinations are equal.

    abc, acb, bac, bca, cab, cba.

    That is, each denotes the same set a, b, c

    The number of combinations of n objectives taken r at a time will be denoted by

    r

    nCorrnC ),( .

    Example 9

    We determine the number of combinations of the four letters, a, b, c, d taken 3 at a time.

    Note that each combination consisting of three letters determine 3! = 6 permutations of

    the letters in the combination.

    Combinations Permutations

    abc abc, acb, bac, bca, cab cba

    abd abd, adb, bad, bda, dab, dba

    acd acd, adc, cad, cda, dac, dca

    bcd bcd, bdc, cbd, cbd, dbc, dcb

    Thus the number of combinations multiplied by 3! Equals the number of permutations

  • 42

    )!(!

    !),(

    .4)3,4(;6!3242.3.4)3,4(

    !3

    )3,4(

    )3,4()3,4(!3).3,4(

    rnr

    nrnCThus

    abovenotedashenceCandPNow

    P

    orCPC

    Example 10

    A perfume manufacturer who makes 10 fragrances wants to prepare a gift package

    containing 6 fragrances. How many combinations of fragrances are available?

    The answer is

    2101.2.3.4!.6

    !6.7.8.9.10

    )610(!6

    !10)6,10(

    C

    Tree Diagrams

    A tree diagram is a device used to enumerate all the possible outcomes of a sequence of

    experiments where each experiment can occur in a finite number of ways. The

    construction of tree diagrams is illustrated in the following examples.

    Example 11

    Find the product A x B x C where

    A = {1, 2}, B{a, b, c} and C = {3, 4}. The tree diagram follows:

    3 (1, a, 3)

    a

    4 (1, a, 4)

    1 b 3 (1, b, 3)

    4 (1, b, 4)

    c

    3 (1, c, 3)

  • 43

    0

    4 (1, c, 4)

    a 3 (2, a, 3)

    5 (2, a, 4)

    b 3 (2, b, 3)

    2

    4 (2, b, 4)

    3 (2, c, 3)

    c

    4 (2, c, 4)

    Observe that the tree is constructed from left to right, and that the number of branches at

    each prints corresponds to the number of possible outcomes of the next experiment.

    Example 12

    Mumba and Ened are to play a tennis tournament. The first person to win two games in a

    row or who wins a total of three games wins the tournament. The following diagram

    shows the possible outcomes of the tournament.

    M M

    M

    M

    M E E

    E E

    0

    M

    E M

    M M

    E

    E

  • 44

    E E

    Observe that there are 10 end points which correspond to the 10 possible outcomes of the

    tournament.

    MM, MEMM, MEMEM, MEMEE, MEE, EMM, EMEMM, EMEME, EMEE, EE

    The path from the beginning of the tree to the end point indicates who won which game

    in the individual tournament.

    Basic Of Probability

    Given a sample spaces S, we need to assign to each event that can be obtained from S a

    number, called the probability of the event. This number will indicate the relative

    likelihood of the various events.

    For events that are equally likely, the probability of the event can be found from the

    following basic probability principle. Then the probability that event E occurs, written P

    (E), is

    P(E) = m (1)

    n

    This same result can also be given in terms of the cardinal number of a set. Where n (E)

    represents the number of elements in a finite set E. With the same assumptions given

    above,

    P(E) = n(E) . (2)

    n(S)

  • 45

    Example 1

    Suppose a fair coin is tossed twice. The sample space is S = (HH), (HT), (TH), (TT). Set S contains 4 outcomes, all of which are equally likely. (This makes n = 4 in the

    formula (1) above.) Find the probability of the following outcomes.

    a) E = (HT), (TH)

    Event E contains two elements, so

    P (E) = 2 = 1

    4 2

    By this result, a head or tail will show up 1/2 of the time when a fair coin is tossed

    twice.

    b) Two heads

    Let event F = (HH) be the event two heads are observed when a fair coin is

    tossed twice. Event F contains one element, so

    P (F) =

    c) Three heads

    A fair coin tossed twice can never show three heads. If G is the event, then G =

    , and P (G) =4

    0 = 0.

    The event is impossible.

  • 46

    Example 2

    If a single paying card is drawn at random from an ordinary 52-card bridge deck,

    find the probability of each of the following events.

    a) An ace is drawn

    There are four aces on the deck, out of 52 cards, so

    P(ace) =13

    1

    52

    4

    b) A face card is drawn

    Since there are 12 face cards

    P (face card) =13

    3

    52

    12

    c) A spade is drawn

    The deck contains 13 spaces, so

    P (spade) = 4

    1

    54

    13

    d) A spade or heart is drawn

    Besides the 13 spades, the deck contains 13 hearts, so

    P (spade or heart) =2

    1

    52

    26

  • 47

    Example 3

    The Manager of a department store has decided to make a study on the size of purchases

    made by people coming into the store. To begin he chooses a day that seems fairly

    typical and gathers the following data. (Purchases have been rounded to the nearest

    Kwacha) with sales tax ignored.

    Amount of purchase Number of customer Probability (relative

    frequency)

    K0 and under

    160 0.280

    K2250 and under

    K11250

    84 0.147

    K11250 and under

    K13500

    50 0.088

    K13500 and under

    K20250

    136 0.239

    K20250 and under

    K22500

    77 0.135

    K22500 and over 63 0.111

    570 1.000

    Probability Distributions.

    In Example 3 the outcomes were various purchase amounts, and a probability was

    assigned to each outcome. By this process, a probability distribution can be set up; that is

    to each possible outcome of an experiment, a number, called the probability of that

    outcome, is assigned.

  • 48

    Example 4

    Set up a probability distribution for the number of heads observed when a fair coin is

    tossed twice.

    _______________________________________

    Number of heads Probability _______________________________________

    0 1

    4

    1 2

    4

    2 1

    4

    _________

    Total 1

    _______________________________________

    The probability distribution that was set up suggests the following properties of

    probability.

    Let S = S1, S2, S3, , Sn be the sample space obtained from the union of n distinct

    simple events S1 , (S2 , S3 ,, Sn with associated probabilities P1, P2, P3, ,

    Pn. Then

    1. 0 P1 1, 0 P2 1, , 0 Pn 1

    (All probabilities are between 0 and 1 inclusive);

    2. P1 + P2 + P3 + + Pn = 1;

    (The sum of all probabilities for a sample space is 1.);

    3. P (S) = 1

    4. P() = 0

  • 49

    Addition Principle

    Suppose nSSSE 21, , where nSSS ,, 21 are distinct simple events then

    P (E) = P( S1 ) + P( S2 ) + ... + P ( Sn )

    Example 5

    Refer to the previous Example and find the probability that a customer spends at least

    K11, 250 but less than K20250.

    This event is union of two simple events spending K11, 250 to K20, 250. The probability

    of spending at least K11, 250 but less than K20, 250 can thus be found by the addition

    principle. Let this event A, then

    P (A ) = P(Spending K11250 K13500) + P(spending K13500 -K20250)

    Addition for Mutually Exclusive Events .

    For mutually exclusive events E and F

    P (EUF) = P(E) + P(F)

    Example 6

    Use the probability distribution of Example 5 to find the probability that we get at least

    one head on tossing a fair twice.

    Event E At least one head is the union of three mutually exclusive events, two heads, one head one tail and one tail one head.

    P(E) = P(2 heads) + 2P(one head one tail)

    = 4

    3

    4

    2

    4

    1

    Complement: P(E ') = 1 - P(E' ) and P(E) = 1 - P(E)

  • 50

    In a particular experiment, P(E) 8

    3 . Find P(E')

    P(E') = 1 - P(E) = 8

    5

    8

    31 .

    Example 7

    In example 3 above, find the probability that a customer spends less than K22500. Let E

    to be the event a customer spends less than K22500.

    P(E) = 0.281 + 0.147 + 0.088 + 0.2394 + 0.135 = 0.889

    Alternatively E' is the event that a customer spends K22500 and over from the table.

    P(E') = 0.111, and 1-P( E ) = P(E) = 1 - 0.111 = 0.889

    Odds

    The Odds in favor of an event E is defined as the ratio of P(E) to P(E') , or P(E)

    P(E')

    Example 8

    Suppose the weather forecaster says that the probability of rain tomorrow is 5

    2 . Find

    the odds in favor of rain tomorrow.

    Let E be the event rain tomorrow. Then E is the event no rain tomorrow. Since

    P(E) 5

    2

    We have P( E ) =5

    3. By the definition of odds, odds in favor of rain = 2/5 written 2 to

    3 or 3:2 3/5 .

  • 51

    In general, if the odds favoring event E are m to n, then

    P(E) =nm

    m

    and P( E ) =

    nm

    m

    Example 9

    The odds that a particular bid will be the low bid are 8 to 13. Find the probability that the

    bid will be the low bid.

    Solution

    Odds of 8 to 13 show 8 favorable chances out of 8 + 13 = 21 chances altogether.

    P (bid will be low bid) = 21

    8

    138

    8

    There is a 21

    13chance that the bid will not be the low bid

    Extended Addition Principle

    For any two events, E and F form a sample space S,

    P(EUF) = P(E) + P(F) - (E F)

  • 52

    Example 10.

    If a single card is drawn from an ordinary deck, find the probability that it will be red or a

    face card.

    Let R and F represent the events red and face card respectively. Then

    P(R) =52

    26, P(F) =

    52

    12, and P (R F) =

    52

    6

    (There are six red face cards in a deck) By the extended addition principle,

    P(R F) = P(R) + P(F) - P(R F)

    = 26 + 12 - 6 = 32 = 8

    52 52 52 52 13

    Example 11

    Suppose two fair dice care rolled. Find each of the following probabilities.

    a) The first die show a 2 or the sum is 6

    A B

    (1,1) (2,1) (3,1) (4,1) (5,1) (6,1)

    (1,2) (2,2) (3,2) (4,2) (5,2) (6,2)

    (1,3) (2,3) (3,3) (4,3) (5,3) (6,3)

    (1,4) (2,4) (3,4) 4,4) (5,4) (6,4)

    (1,5) (2,5) (3,5) (4,5) (5,5) (6,5)

    (1,6) (2,6) (3,6) (4,6) (5,6) (6,6)

  • 53

    P(A) =36

    6 , P(B) =

    36

    5 , P(An B) =

    36

    1

    By the extended addition principle

    P(AB) = P(A) + P(B) P(A B)

    = 18

    5

    36

    10

    36

    1

    36

    5

    36

    6

    b) The sum is 5 or the second die is 4.

    P(sum is 5) = 36

    4 , P(second die is 4) =

    36

    6

    P(sum is 5 and second die is 4) =36

    1

    = 9 = 1

    36 4

    Often we are interested in how certain events are related to the occurrence of

    other events. In particular, we may be interested in the probability of the

    occurrence of an event given that another related event has occurred. Such

    probabilities are referred to as Conditional Probabilities.

    The conditional Probability of event E given event F, written P(EF), is

    P(EF) = P(E F), P(F) 0

    P(F)

  • 54

    Example 11

    The Training Manager for a large stockbrokerage firm has noticed that

    some of the of firms brokers use the firms research advice, while other brokers tend to go with their own feelings of which stocks will go up. To

    see if the research department is better than just the feelings of the brokers,

    the manager conducted a survey of 100 brokers, with results as shown in

    the following table.

    Picked stocks

    That went up

    Didnt pick stocks

    That went up

    Total

    Used research 30 15 45

    Didnt use research 30 25 55

    Totals 60 40 100

    Letting A represent the event picked stocks that went up, and letting B represent the event used research, we can find the following probabilities.

    P(A) = 100

    60 = 0.6 P(A') =

    100

    40 = 0.4

    P(B) =100

    45 = 0.45 P(B') =

    100

    55 = 0.55

    Suppose we want to find the probability that a broker using research will pick stocks that

    go up. From the table above, of the 45 brokers who use research, 30 picked stocks that

    went up, with

    P(broker who uses research picks stocks that go up)

    = 30 = 0.667.

    45

    This is a different number than the probability that a broker picks stocks that go up, 0.6,

    since we have additional information (the broker uses research) which reduced the

  • 55

    sample space. In other words, we found the probability that a broker picks stocks that go

    up, A, given the additional information that the broker uses research, B. This is called the

    conditional probability of event A, given that event B has occurred, written P(A/B). In

    the example above,

    P(AB) = P(A B)

    P(B)

    = 30 = 0.667.

    45

    Product Rule: For any events E and F

    P(EF) = P(F). P(E/F)

    Example 12.

    A class is5

    2 women and

    5

    3men . Of the women, 25% are business majors. Find the

    probability that a student chosen at random is a woman business major.

    Solution

    Let B and W represent the events business major and woman, respectively. We want

    to find P(B W) . By the product rule,

    P(B W) = P(W). P(BW)

    Using the given information, P(W) =5

    2 = 0.4 and P(BW) = 0.25.

    Thus P(B W) = 0.4(0.25) = 0.10

    Example 13

    Suppose an investment firm is interested in the following events:

    A = Common stock in XYZ Corporation gains 10% next year

  • 56

    B = Gross National Product gains 10% next year

    The firm has assigned the following probabilities on the basis of available information.

    P(AB) = 0.8, P(B) = 0.3

    That is, the Investment Company believes the probability is 0.8 that the XYZ common

    stock will gain 10% in the next year assuming that the GNP gains 10% in the same time

    period. In addition, the company believes the probability is only 0.3 that the GNP will

    gain 10% in the next year. Use the formula for calculating the probability of an

    intersection to calculate the probability that XYZ common stock and the GNP gain 10%

    in the next year.

    Solution.

    We want to calculate P(AB). The formula is

    P(AB) = P(B) P(AB) = (0.3) (0.8) = 0.24

    Thus, the probability, according to this investment firm, is 0.24 that both XYZ common

    stock and the GNP will gain 10% in the next year.

    In the previous section we showed that the probability of an event A may be substantially

    altered by the assumption that the event B has occurred. However, this will not always

    be the case. In some instances the assumption that event B has occurred will not alter the

    probability of event A at all. When this is true, we call events A and B independent.

    Events A and B are independent if the assumptions that B has occurred does

    not alter the probability that A has occurred, i.e

    P(AB) = P(A)

    When events A and B are independent it will also be true that

    P(BA) = P(B)

  • 57

    Events that are not independent are said to be dependent.

    Example 14

    The probability that interest rates will rise has been assessed as 0.8. If they do rise, the

    probability that the stock market index will drop is estimated to be 0.9. If the interest

    rates do not rise, the probability that the stock market index will still drop is estimated as

    0.4. What is the probability that the stock market index will drop?

    Solution

    P(A) = P(Interest rates rise) = 0.8.

    P(B) = P(Stock market index drops) = ?

    Then, the probability of A , the complement of A, interest rates do not rise is P( A ) =

    1 0.8 = 0.2.

    P(BA) = P(stock market index dropsinterest rates rise) = 0.9

    P(B A) = P(stock market index dropsinterest rates do not rise) = 0.4.

    By the multiplication rule

    P(B and A) = P(A) P(BA) = 0.8 x 0.9 = 0.72 and

    P(B and A ) = P( A ) P(B A ) = 0.2 x 0.4 = 0.08 = 0.80

    Example 15

    Suppose we toss a fair die, let B be the event observe a number less or equal to 4 and A to

    be the event an even number is observed. Are event A and B independent?

    P(B) = ,3

    2

    6

    4 since B = { 1, 2, 3, 4}

    P(A) = 2

    1

    6

    3 since A = 2, 4,

  • 58

    P(A B) = 3

    1

    6

    2 where A B = 2, 4

    Now given A has occurred

    P(BA) = P(AU B) = 1/3 = 2 = P(B)

    P(A) 3

    Similarly P(AB) )(3

    2

    2/1

    3/1

    )(

    )(BP

    AP

    BAP

    )(2

    1

    2/1

    3/1

    )(

    )()( AP

    BP

    BAPBAP

    Therefore the events A and B are independent.

    If events A and B are independent, the probability of intersection of A and B equals the

    product of the probabilities of A and B, i.e,

    P(A B) = P(A) P(B).

    In the toss experiment

    P(AB) = P(A). P(B) = 3

    1

    3

    2.

    2

    1

  • 59

    Bayes Theorem

    A posteriori Probabilities

    Suppose three machines, A, B, and C, produce similar engine components. Machine A

    produces 45 percent of the total components, machine B produces 30 percent, and

    Machine C, 25 percent. For the usual production schedule, 6 percent of the components

    produced by machine A do not meet established specifications; for machine B of machine

    C, the corresponding figures are 4 percent and 3 percent. One component is selected at

    random from the total output and is found to be defective. What is the probability that

    the component selected was produced by machine A?

    The answer to this question is found by calculating the probability after the outcomes of

    the experiment have been observed. Such probabilities are called a posteriori

    probabilities as opposed to a prior probabilities probabilities that give the likelihood

    that an event will occur.

    D is the event that a defective component is produced by machine A, machine B or

    machine C.

    A

    DA

    B

    D

    DB

    C

    DC

  • 60

    The three mutually exclusive events A, B and C form a partition of the sample spaces.

    Apart from being mutually exclusive, their union is precisely S.

    The event D may be expressed as:

    1. )()()( DCDBDAD

    2. The event that a component is defective and is produced by machine A is given

    by

    .DA

    Thus, a posterior probability that a defective component selected was produced by

    machine a is given by )(

    )()/(

    Dn

    DAnDAP

    )()()(

    )((

    )(

    )()/(

    DCPDBPDAP

    DAP

    DP

    DAPDAP

    (1)

    Next, using the product rule, we may express

    )/()()(

    ),/()()(

    )/()()(

    CDPCPDCP

    andBDPBPDBP

    ADPAPDAP

    so that (1) may be expressed in the form

    )/()()/()()/()(

    )/()()/(

    CDPCPBDPBPADPAP

    ADPAPDAP

    (2)

    which is a special case of a result known as Bayes Theorem.

    Observe that the expression on the right of (2) involves the probabilities P(A), P(B), P(C)

    and the conditional probabilities P(D/A),P(D/B), and P(D/C), all of which may be

  • 61

    calculated in the usual fashion. Infact, by displaying these quantities on a tree diagram,

    we obtain Figure 1.0. We may compute the required probability by substituting the

    relevant quantities into (2), or we may make use of the following device.

    P(A/D) = Product of probabilities along the limb through A

    Sum of products of the probabilities along each limb terminating at D

    Step 1 Step 2 Probability of

    outcome

    Machine Condition

    45.0)( AP 06.0)/( ADPA

    )/().()( ADPAPADP

    D = 0.027

    30.0)( BP 94.0)/( ADP )/().( ADPADPD = 0.423

    )/().()(04.0)/( BDPBPBDPDBDPB = 0.012

    25.0)( CP

    )/().()(96.0)/( BDPBPBDPDBDP

    =0.288

    )/().()(03.0)/( CDPCPCDPDCDPC = 0.0075

    )./(.).()(97.0)/( CDPCPCDPDCDP

    =0.2425

    In either case, we obtain

  • 62

    581.00465.0

    027.0

    0075.0012.0027.0

    027.0

    )03.0)(25.0()04.0)(3.0()06.0)(45.0(

    )06.0)(45.0()/(

    DAP

    Before looking at any further examples, let us state the general form of Bayes Theorem.

    Let nAAA ,...,, 21 be a partition of a sample space S and let E be an event of the

    experiment such that .0)( EP Then the posterior probability )1)(/( niEAP i is

    given by

    )3()/().(...)()/()/().(

    )/()()/(

    2211

    11

    nn

    iAEPAPAPAEPAEPAP

    AEPAPEAP

    Problems

    1) In a certain city, 40 percent of the people consider themselves movement for

    multiparty democracy (MMD), 35 percent consider themselves to be United Party

    for Nation Development (UPND) and 25 percent consider themselves to be

    independents (1). During a particular election, 45 percent of the MMDs voted, 40

    percent of the UPND voted and 60 percent of the independents voted. Suppose a

    person is randomly selected:

    a) Find the probability that the person voted. b) If the person voted, find the probability that the voter is

    i) MMD ii) UPND iii) Independent.

    2) Three girls Chanda, Mumba and Chileshe, pack okra in a factory. From the batch

    allotted to them Chanda packs 55%, Mumba, 30% and Chileshe 15%. The

    probability that Chanda breaks some okra in a packet is 0.7, and the respective

    probabilities for Mumba and Chileshe are 0.2 and 0.1. What is the probability

    that a packet with broken okra found by the Checker was packed by

    a) Chanda?

  • 63

    b) Mumba? c) Chileshe?

    3) A publisher sends advertising material for an accounting text to 80% of all

    Professors teaching the appropriate Accounting Courses. Thirty percent of the

    Professors who received this material adopted the books, as did 10% of the

    professors who did not receive the material. What is the probability that a

    Professor who adopts the book has received the advertising material?

    Solutions

    MMD UPND Independent

    40.0)( MMDP 35.0)( UPNDP 25.0)( IP

    45.0)/( MMDVP 40.0)/( UPNDVP 60.0)/( IVP

    a) )/()()/().()/().()( IVPIPUPNDVPUPNDPMMDVPMMDPVP

    = .40(.45) + .35(.40) + .25(.60)

    = 0.18 + 0.14 + 0.15 = 0.47

    b) i) )(

    )/()/(

    VP

    VMPVMP

    383.047.0

    18.0

    )/().()/().()/().(

    )/().(

    IVPIPUVPUPMVPMP

    MVPMP

    ii) )(

    )()/(

    VP

    VUPVUP

    298.047.0

    14.0

    )(

    )/().(

    VP

    UVPUP

    iii) 319.047.0

    15.0)/( VIP

  • 64

    2. Chanda, Mumba Chileshe

    (D) (M) (H)

    46.0015.006.0385.0

    )1.0(15.)2.0(30.)7.0(55.

    )/().()/().()/().()(

    1.0)/(,2.0)/(,7.0)/(

    15.)(30.)(,55.)(

    HBPHPMBPMPDBPDPBP

    HBPMBPDBP

    HPMPDP

    a) 837.046.0

    385.0

    )(

    )/().()/(

    BP

    DBPDPBDP

    b) 1304.046.0

    06.0

    )(

    )/().()/(

    BP

    MBPMPBMP

    c) 0326.046.0

    015.0

    )(

    )/().()/(

    BP

    HBPHPBHP

    3. Let R be the event the Professor received material. A be the even the Professor a

    adopted the book

    P(R).P(A/R)

    P(A/R) = 0.30

    AP( /R) = 0.10

    P(R) = 0.8

    P(A/ R ) = 0.10

    P( R ) = 0.2

    P( A / R ) = 0.90

  • 65

    .923.0

    26.0

    24.0

    02.024.0

    24.0

    )10.0(2.0)30.0(8.0

    )30.0(8.0

    )/(.)()/().(

    )/().(

    )(

    )()/(

    RAPRPRAPRP

    RAPRP

    AP

    ARPARP

    Learning Objectives

    After working through this Chapter, you should be able to

    List the rules of probability.

    Explain conditional probability, independent events, mutually exclusive events.

    Apply the Bayes Theorem to find conditional probabilities

    Define combinations, permutation and be able to apply such results to problems.

  • 66

    CHAPTER 5

    PROBABILITY DISTRIBUTION

    Reading

    Newbold Chapters 4 (not 4.4) and only 5.5 in Chapter 5

    Wonnacott and Wonnacott Chapter 4

    Tailoka Frank P Chapter 9

    Introductory Comments

    This Chapter introduces the three useful standard distributions for two counts (Discrete

    Probability distribution) and one for (Continuous probability Distribution). These are so

    often used that everyone should be familiar with them. We need to know the mean, the

    variance and how to find simple probabilities.

    5.0 Discrete Random Variables

    A random variable maybe defined roughly as a variable that takes on different

    numerical values because of chance. Random variables are classified as either

    discrete or continuous. A discrete random variable is one that can take on only a

    finite or countable number of distinct values. For example, the number of people

    entering a shop is finite the values are 0, 1, 2, etc., the outcomes on 1 roll of a fair

    die are limited to 1, 2, 3, 4, 5 and 6.

    A random variable is said to be continuous in a given range if the variable can

    assume any value in any given interval. A continuous variable can be be

    measured with any degree of accuracy by using smaller and smaller units of

    measurements. Examples of continuous variables include weight, length,

    velocity, distance, time, and temperature. While discrete variables can be

    counted, continuous variable can be measured with some degree of accuracy.

    A probability distribution of a discrete random variable x whose value at x is

    )(xf possess the following properties.

  • 67

    1. 0)( xf for all real values of x

    2. x

    xf 1)(

    Property 1: simply states that probabilities are greater than or equal to zero. The

    second property states that the sum of the probabilities in a probability

    distribution is equal to 1. The notation

    x

    xf )( means sum of the values f for all the values that x takes on. We will

    ordinarily use the term probability distribution to refer to both discrete and

    continuous variables; other terms are sometimes used to refer to probability

    distributions (also called probability functions).

    Probability distributions of discrete random variables are often referred to as

    probability mass functions or simply mass functions because the probabilities are

    massed at distinct points, for example along the x axis.

    Probability distributions of continuous random variables are referred to as

    probability density functions or density functions.

    5.1 Cumulative Distribution Functions

    Given a random variable x , the values of the cumulative distribution function at

    x , denoted )(xF , is the probability that x takes on values less than or equal to

    x . Hence

    )1()()()( xxpxf

    In the case of a discrete random variable, it is clear that

    cx

    xfcf

    )2()()(

    The symbol cx

    xf )(

    Means sum of the values of x for all values of x less than or equal to c.

  • 68

    Example 1

    Shoprite is interested in diversifying its product line into the soft goods market.

    Mr Phiri, Vice president in charge of mergers and acquisitions, is negotiating the

    acquisition of quick-save, a discount shop. The determine the price Shoprite

    would have to pay per share for quick save, she sets up the probability distribution

    for the stock price shown in the table below.

    Probability distribution and cumulative distribution for the price of Quick

    save common stock.

    Price of Quicksave

    Common stock x Probability

    xf Cumulative Probability

    xF K74 250

    76 500

    78 750

    81 000

    83 250

    0.08

    0.15

    0.53

    0.20

    0.04

    0.08

    0.23

    0.76

    0.96

    1.00

    The probability that the price would be K78 750 or less is

    23.0)50076()50076(

    76.053.015.008.0)78750()75078(

    KFKxP

    KFKxP

  • 69

    A graph of the cumulative distribution function is a step function that is the values

    change in discrete steps at the indicated integral values of the random variable x.

    )(xF

    1.00

    0.80

    0.60

    0.40

    0.20

    0.00

    K74 250 76 500 78 750 81 000 83 250 x

    Price of stock

    Graph of cumulative distribution of the price of Quicksave common

    stocks.

    5.2 Probability Distribution of Discrete Random Variables

    We will discuss the binomial and Poisson probability distribution of discrete

    random variables.

    xAll

    xxPxE )()(

    The variance of discrete random variable x is

  • 70

    xAll

    xpxxE )()()( 222

    In general, if g(x) is any function of the discrete random variable x, then

    xAll

    xXPxgxgE )()()]([

    For example

    )()5()5(

    )()(

    )(20)20(

    22

    xXPxXE

    xXPxxE

    xXxPxE

    Example 2

    The random variable X has the following distribution for .4,3,2,1x

    X 1 2 3 4

    )( xXP 0.02 0.35 0.53 0.10

    Calculate:

    )25()

    8)(6)

    )()

    )35()

    )()

    2

    2

    xEe

    xEd

    XEc

    xEb

    xEa

    Solution

    a) )()( xXxPxE

    71.2

    40.059.170.002.0

    )10.0(4)53.0(3)35.0(2)02.0(1

  • 71

    b) 3)(5)35( xExE

    55.10

    355.13

    3)71.2(5

    3)]10.0(4)53.0(3)35.0(2)02.0(1[5

    3)(5

    xXxP

    c) )()( 22 xXPXXE

    79.7

    6.177.44.102.0

    )10.0(4)53.0(3)35.0(2)02.0(1 2222

    d) 8)(68)(6 xXxPxE = 6(2.71) + 8 = 16.26 + 8

    = 24.26

    e) 2)(5)25( 22 xExE

    95.40

    2)79.7(5

    2)(5

    2)(5

    2

    2

    xXPxxE

    In general, the following results hold when X is a discrete random variable.

    1) aaE )( where a is any constant.

    2) ),()( XaEaxE where a is any constant

    3) ,)()( bxaEbaXE where a and b are any constants.

    4) 21221 )([)]([)]()([ fandfwherexfExfExfxfE are functions of X.

  • 72

    Variance, Var (x)

    As for the variance, the following results are useful.

    1) 0)( aVar where a is any constant

    2) )var()( 2 xaaxVar where a is any constant

    3) )var()( 2 xabaxVar where a and b are any constants.

    Example 3

    For the data in Example 2, calculate the following:

    )23()

    )4()

    )var(25)35()

    xVarc

    xVarb

    xxVara

    Solution

    a) )var(25)35( xxVar

    We will need to find )()()(22 xExExVar

    .71.2

    )()(

    xXxPXE

    1475.11)35var(

    )4459.0(25

    )var(25)35(

    4459.0

    )71.2(79.7

    )()()(

    79.7

    )()(

    2

    22

    22

    xTherefore

    xxVar

    xEXExVar

    xXPXXE

  • 73

    0131.4)4459.0(9

    )var(9)23()

    1344.7)4459.0(16

    )var(16)4()

    xxVarc

    xxVarb

    Example 4

    A risky investment involves paying K300 000 that will return K2, 700,000 (for a net

    profit of K2, 400,000) with probability 0.3 or K0 .00 (for a net loss of K300 000) with

    probability 0.7. What is your expected net profit from this investment?

    Solution

    x )(xP

    2,400,000 0.3

    -300,000 0.7

    (Note that a loss is treated as a negative profit.)

    Then 000,510000,210000,720)7.0)(000,300()3.0(000,400,2)()( xxPxE Your expected net profit on an investment of this kind is K510, 000. If you were to make

    a very large number of investments, some would result in a net profit of K7200, 000, and

    others would result in a net loss of K300, 000. However, in the long run, your Average

    net profit per investment would be K510, 000.

    5.3 The Binomial Distribution

    The Binomial distribution, in which there are two possible outcomes on

    each experimental trial, is undoubtedly the most widely applied

    probability distribution of a discrete random variable. It has been used to

    describe a large variety of processes in business and the social sciences as

    well as other areas. The Bernoulli process after James Bernoulli (1654 1705) gives rise to the Binomial distribution.

    The Bernoulli process has the following characteristics.

    a) On each trial, there are two mutually exclusive possible outcomes, which are referred to as success and failure. In somewhat different language sample space of possible outcomes on each experimental trial is S =

    (failure, success).

    b) The probability of a success will be denoted by P , P remains constant

    from trial to trial. The probability of a failure will be denoted by q , q is

    always equal to P1 .

  • 74

    c) The trials are independent. That is, the outcomes on any given trial or

    sequence of trials does not affect the outcomes on subsequent trials.

    Suppose we toss a coin 3 times, then we may treat each toss as one Bernoulli trial.

    The possible outcomes on any particular trial are a head and a tail. Assume that

    the appearance of a head is a success. For example, we may choose to refer to the

    appearance for a defective item in a production process as a success, if a series of

    births is treated as a Bernoulli process, the appearance of female 9male0 may be

    classified as a success.

    Consider the experiment of tossing a fair coin three times, then the sequence of

    outcome is

    HTH, HHH, HHT, THH, TTT, THT, TTH, HTT

    Since the probability of a success and failure on a given trial are respectively, P

    and , the probability of the outcome for instance qppqpHTH 2}{ where p is

    the probability of observing a head and q is the probabilit