basic statistic chap03

54
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Statistics for Managers Using Microsoft ® Excel 4 th Edition

Upload: irving-adrian-santoso

Post on 16-Dec-2015

60 views

Category:

Documents


2 download

TRANSCRIPT

  • Chapter 3

    Numerical Descriptive MeasuresStatistics for ManagersUsing Microsoft Excel 4th Edition

  • Chapter GoalsAfter completing this chapter, you should be able to: Compute and interpret the mean, median, mode, geometric mean, and quartiles for a set of dataFind the range, variance, standard deviation, and coefficient of variation and know what these values meanConstruct and interpret a box and whiskers plot Compute and explain the correlation coefficientUse numerical measures along with graphs, charts, and tables to describe data

  • Chapter TopicsMeasures of central tendency, variation, and shapeMean, median, mode, geometric meanQuartilesRange, interquartile range, variance and standard deviation, coefficient of variationSymmetric and skewed distributionsPopulation summary measuresMean, variance, and standard deviationThe empirical ruleFive number summary and box-and-whisker plotsCoefficient of correlationEthical considerations in numerical descriptive measures

  • Summary MeasuresArithmetic MeanMedianModeDescribing Data NumericallyVarianceStandard DeviationCoefficient of VariationRangeInterquartile RangeGeometric MeanSkewnessCentral TendencyVariationShapeQuartiles

  • Measures of Central TendencyCentral TendencyArithmetic MeanMedianModeGeometric MeanOverviewMidpoint of ranked valuesMost frequently observed value

  • Arithmetic MeanThe arithmetic mean (mean) is the most common measure of central tendency

    For a sample of size n:Sample sizeObserved values

  • Arithmetic MeanMean = sum of values divided by the number of valuesAffected by extreme values (outliers)

    (continued)0 1 2 3 4 5 6 7 8 9 10Mean = 3 0 1 2 3 4 5 6 7 8 9 10Mean = 4

  • MedianNot affected by extreme values

    In an ordered array, the median is the middle number (50% above, 50% below)

    0 1 2 3 4 5 6 7 8 9 10Median = 3 0 1 2 3 4 5 6 7 8 9 10Median = 3

  • Finding the MedianThe location of the median:

    If the number of values is odd, the median is the middle numberIf the number of values is even, the median is the average of the two middle numbers

    Note that is not the value of the median, only the position of the median in the ranked data

  • ModeA measure of central tendencyValue that occurs most oftenNot affected by extreme valuesMainly used for grouped numerical data or categorical dataThere may may be no modeThere may be several modes

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 90 1 2 3 4 5 6No Mode

  • Review ExampleFive houses on a hill by the beachHouse Prices: $2,000,000 500,000 300,000 100,000 100,000

  • Review Example:Summary StatisticsMean: ($3,000,000/5) = $600,000

    Median: middle value of ranked data = $300,000

    Mode: most frequent value = $100,000House Prices: $2,000,000 500,000 300,000 100,000 100,000Sum 3,000,000

  • Which measure of location is the best?Mean is generally used, unless extreme values (outliers) existThen median is often used, since the median is not sensitive to extreme values.Example: Median home prices may be reported for a region less sensitive to outliers

  • Geometric MeanGeometric meanUsed to measure the rate of change of a variable over time

    Geometric mean rate of returnMeasures the status of an investment over time

    Where Ri is the rate of return in time period i

  • Geometric Mean ExampleAn investment of $100,000 declined to $50,000 at the end of year one and rebounded to $100,000 at end of year two:50% decrease 100% increaseThe overall two-year return is zero, since it started and ended at the same level.

  • Geometric Mean ExampleUse the 1-year returns to compute the arithmetic mean and the geometric mean:Arithmetic mean rate of return:Geometric mean rate of return:Misleading resultMore accurate result(continued)

  • Geometric Mean ExampleAn investment of $100,000 declined to $50,000 at the end of year one and rebounded to $100,000 at end of year two:Returns as percents: 50% and 100% are converted to decimals -.5 and 1.00 Add 1 to each decimal yields .5 and 2 Find the geometric mean using the geomean functionSubtract 1 from the answer to get a rate of return of 0

  • QuartilesQuartiles split the ranked data into 4 segments with an equal number of values per segment25%25%25%25%The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are largerQ2 is the same as the median (50% are smaller, 50% are larger)Only 25% of the observations are greater than the third quartile

    Q1Q2Q3

  • Quartile FormulasFind a quartile by determining the value in the appropriate position in the ranked data, where

    First quartile position: Q1 = (n+1)/4

    Second quartile position: Q2 = (n+1)/2 (the median position)

    Third quartile position: Q3 = 3(n+1)/4

    where n is the number of observed values

  • Quartiles (n = 9) Q1 = is in the (9+1)/4 = 2.5 position of the ranked dataso use the value half way between the 2nd and 3rd values, Q1 = 12.5Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22 Example: Find the first quartile Q1 and Q3 are measures of noncentral location Q2 = median, a measure of central tendency

  • Measures of VariationSame center, different variationVariationVarianceStandard DeviationCoefficient of VariationRangeInterquartile RangeMeasures of variation give information on the spread or variability of the data values.

  • RangeSimplest measure of variationDifference between the largest and the smallest observations:

    Range = Xlargest Xsmallest

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13Example:

  • Disadvantages of the RangeIgnores the way in which data are distributed

    Sensitive to outliers

    7 8 9 10 11 12Range = 12 - 7 = 57 8 9 10 11 12Range = 12 - 7 = 51,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,51,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120Range = 5 - 1 = 4Range = 120 - 1 = 119

  • Interquartile Range

    You can eliminate some outlier problems by using the interquartile range Difference between the first and third quartiles

    Interquartile range = 3rd quartile 1st quartile = Q3 Q1

  • Interquartile RangeMedian(Q2)XmaximumXminimumQ1Q3Example:25% 25% 25% 25%12 30 45 57 70Interquartile range = 57 30 = 27

  • VarianceAverage of squared deviations of each value from the mean

    Sample variance:Where = arithmetic meann = sample sizeXi = ith value of the variable X

  • Standard DeviationMost commonly used measure of variationShows variation about the meanHas the same units as the original data

    Sample standard deviation:

  • Calculation Example:Sample Standard DeviationSample Data (Xi) : 10 12 14 15 17 18 18 24 n = 8 Mean = X = 16

  • Measuring variationSmall standard deviation

    Large standard deviation

  • Comparing Standard DeviationsMean = 15.5 S = 3.338 11 12 13 14 15 16 17 18 19 20 2111 12 13 14 15 16 17 18 19 20 21Data BData AMean = 15.5 S = .925811 12 13 14 15 16 17 18 19 20 21Mean = 15.5 S = 4.57Data C

  • Coefficient of VariationMeasures relative variationAlways a percentage (%)Shows variation relative to meanIs used to compare two or more sets of data measured in different units

  • Comparing Coefficients of VariationStock A:Average price last year = $50Standard deviation = $5

    Stock B:Average price last year = $100Standard deviation = $5Both stocks have the same standard deviation, but stock B is less variable relative to its price

  • Shape of a DistributionDescribes how data is distributedShape - Symmetric or skewedMean = Median Mean < Median Median < MeanRight-SkewedLeft-SkewedSymmetric

  • Microsoft ExcelDescriptive Statistics can be obtained from Microsoft Excel

    Use menu choice: tools / data analysis / descriptive statistics

    Enter details in dialog box

  • Using ExcelUse menu choice: tools / data analysis / descriptive statistics

  • Using ExcelEnter dialog box details

    Check box for summary statistics

    Click OK(continued)

  • Excel outputMicrosoft Excel descriptive statistics output, using the house price data:House Prices: $2,000,000 500,000 300,000 100,000 100,000

  • Population Summary MeasuresThe population mean is the sum of the values in the population divided by the population size, N = population meanN = population sizeXi = ith value of the variable XWhere

  • Population VarianceAverage of squared deviations of values from the mean

    Population variance:Where = population meanN = population sizeXi = ith value of the variable X

  • Population Standard DeviationMost commonly used measure of variationShows variation about the meanHas the same units as the original data

    Population standard deviation:

  • If the data distribution is bell-shaped, then the interval:contains about 68% of the values in the population or the sampleThe Empirical Rule68%

  • contains about 95% of the values in the population or the samplecontains about 99.7% of the values in the population or the sampleThe Empirical Rule99.7%95%

  • Exploratory Data AnalysisBox-and-Whisker Plot: A Graphical display of data using 5-number summary:Minimum -- Q1 -- Median -- Q3 -- MaximumExample:25% 25% 25% 25%

  • Shape of Box and Whisker PlotsThe Box and central line are centered between the endpoints if data is symmetric around the median

    A Box and Whisker plot can be shown in either vertical or horizontal formatMin Q1 Median Q3 Max

  • Distribution Shape and Box and Whisker PlotRight-SkewedLeft-SkewedSymmetricQ1Q2Q3Q1Q2Q3Q1Q2Q3

  • Box-and-Whisker Plot ExampleBelow is a Box-and-Whisker plot for the following data: 0 2 2 2 3 3 4 5 5 10 27

    This data is right skewed, as the plot depicts0 2 3 5 27Min Q1 Q2 Q3 Max

  • Coefficient of CorrelationMeasures the relative strength of the linear relationship between two variables

    Sample coefficient of correlation:

  • Features of Correlation Coefficient, rUnit freeRanges between 1 and 1The closer to 1, the stronger the negative linear relationshipThe closer to 1, the stronger the positive linear relationshipThe closer to 0, the weaker any linear relationship

  • Scatter Plots of Data with Various Correlation CoefficientsYXYXYXYXYXr = -1r = -.6r = 0r = +.3r = +1YXr = 0

  • Using Excel to Find the Correlation CoefficientSelect Tools/Data AnalysisChoose Correlation from the selection menuClick OK . . .

  • Using Excel to Find the Correlation CoefficientInput data range and select appropriate optionsClick OK to get output(continued)

  • Interpreting the Resultr = .733

    There is a relatively strong positive linear relationship between test score #1 and test score #2

    Chart1

    82

    88

    91

    90

    92

    85

    89

    81

    96

    77

    Test #2 Score

    Test #1 Score

    Test #2 Score

    Scatter Plot of Test Scores

    Sheet4

    Test #1 ScoreTest #2 Score

    Test #1 Score1

    Test #2 Score0.73324370471

    Sheet1

    Test #1 ScoreTest #2 Score

    7882

    9288

    8691

    8390

    9592

    8585

    9189

    7681

    8896

    7977

    Sheet1

    Test #2 Score

    Test #1 Score

    Test #2 Score

    Scatter Plot of Test Scores

    Sheet2

    Sheet3

  • Ethical ConsiderationsNumerical descriptive measures:

    Should document both good and bad resultsShould be presented in a fair, objective and neutral mannerShould not use inappropriate summary measures to distort facts

  • Chapter SummaryDescribed measures of central tendencyMean, median, mode, geometric meanDiscussed quartilesDescribed measures of variationRange, interquartile range, variance and standard deviation, coefficient of variationIllustrated shape of distributionSymmetric, skewed, box-and-whisker plotsDiscussed correlation coefficientAddressed pitfalls in numerical descriptive measures and ethical considerations