chapter 2
TRANSCRIPT
Presenting Data in Thblesand Charts
USING STATISTICS @ Choice IsYours, Part I
2.1
2.3
,2 .2
TABLES AND CHARTS FOR CATEGORICALDATAThe Summary TableThe Bar ChartThe Pie ChartThe Pareto Diagram
ORGANIZING NUMERICAL DATAThe Ordered ArrayThe Stem-and-Leaf Display
TABLES AND CHARTS FOR NUMERICALDATAThe Frequency DistributionThe Relative Frequency Distribution and the
Percentage DistributionThe Cumulative DistributionThe HistogramThe PolygonThe Cumulative Percentage Polygon (Ogive)
CROSS TABULATIONSThe Contingency TableThe Side-by-Side Bar Chart
2.5 SCATTER PLOTS AND TIME.SERIES PLOTSThe Scatter PlotThe Time-Series Plot
2.6 MISUSING GRAPHS AND ETHICAL IsSUEsMicrosoft Excel Graphs
EXCEL COMPANION TO CHAPTER 2E2.l Creating SummaryTablesE2.2 Creating Charts82.3 Creating Bar and Pie Charts from Summary
TablesE2.4 Creatins Pareto Diagrams from Summary
TablesE2.5 Creating an OrderedArrayE2.6 Creating Stem-and-Leaf DisplaysE2.7 Creating Frequency Distributions and
Histograms82.8 Creating a Histogram from Summarized DataE2.9 Creating PolygonsE2.10 Creating Contingency TablesE2.l I Creating Side-by-Side ChartsE2.12 Creating Scatter Plots82.13 Creating Time-Series Plots
In this chapter, you learn:r To develop tables and charts for categorical datar To develop tables and charts for numerical datar The principles of properly presenting graphs
) z CHAPTER TWO Presentins Data in Tables and Charts
USING STATISTICS @ Choice ls Yours, Part I
Choice Is Yours is a service that helps customers make wise investmentchoices. You've been hired to assist investors interested in mutual funds, amarket basket of securit ies. According to investopedia.com, "A mutualfund is nothing more than a collection of stocks and/or bonds. You canthink of a mutual fund as a company that brings together a group of peopleand invests their money in stocks, bonds, and other securit ies. Eachinvestor owns shares, which represent a portion of the holdings of thefund." (You can learn more about mutual funds at www.investopedia.com/university/mutualfunds/. )
The Choice Is Yours company previously selected a sample of 838mutual funds that it believes might be of interest to its customers. You havebeen asked to present data about these funds in a way that will help cus-tomers make good investment choices. What facts about each mutual fundwould you collect to help customers compare and contrast the many funds?
A good starting point would be to collect data that would help cus-tomers classify mutual funds into various categories. You could research
such things as the amount of risk involved in a fund's investment strategy and whether the fundfocuses on growth securities, those companies that are expected to grow quickly in the nextyear, or, on value securit ies, those companies whose stock prices are currently consideredundervalued. You rnight also investigate whether a mutual fund specializes in a certain size ofcompany and whether the fund charges management fees that would reduce the percentagereturn earned by an investor.
Of course, you would want to know how well the fund performed in the past. You wouldalso want to supply the customer with several measures of each fund's past performance. Whilepast performance is no assurance of future performance, past data could give customers insightinto how well each mutual fund has been managed.
As you further think about your task, you realize that all these data for all 838 mutual fundswould be a lot for anyone to review How could you "get your hands around" such data andexplore thern in a comprehensible manner?
fo get your hands around the data described in this chapter's Using Statistics scenario, you
I need to use methods of descriptive statistics, defined in Chapter I as the branch of statisticsthat collects, summarizes, and presents data. In this scenario, you need to use descriptive tech-niques for both categorical variables (to help investors classify the mutual funds) and numeri-cal variables (to help show the return each fund has achieved). Reading this chapter will helpyou to prepare tables and charts that are appropriate for both types of variables. You'l l alsolearn techniques to help answer questions that require two variables, such as "Do growth-oriented mutual funds have lower returns than 'value' mutual funds?" and "Do growth fundstend to be riskier investments than value funds?"
Mant; examples in this chapter use a sample o/'838 real mutual .funds, the data./brwhichyou can ./ind in the Data worksheet o/l/r" E@lf,EEIE .file on the Student CD-ROM.
2.1 TABLES AND CHARTS FOR CATEGORICAL DATAWhen you have categorical data, you tally responses into categories and then present the fre-quency or percentage in each category in tables and charts.
TABLE 2 .1What Peoole Would Dowith an Extra $1,000
EXAMPLE 2 .1
ABLE 2,2Frequency and
rtage SummaryPertaining to Riskfor 838 Mutual
2.l : Tables and Charts for Catesorical Data 33
The Summary TableA summary table indicates the frequency, amount, or percentage of items in a set of cate-gories so that you can see differences between categories. A summary table lists the categoriesin one column and the frequency, amount, or percentage in a different column or columns.Table 2.1 illustrates a summary table based on a recent survey that asked people what theywould do if they had an extra $1,000 to spend ("IfYou Had an Extra $1,000 to Spend, WhatWould You Do with the Money?" USA Today, January 11,2006, p. Al ). In Table 2.1, the mostcommon choices are saving and paying debt, followed by buying a luxury item, vacation, orgift and spending on essentials. Very few respondents mentioned giving to charity or otherUSES,
WhatYouWould Do with the Monev Percentage (7o)
Buy a luxury item, vacation, or giftGive it to charityPay debtSaveSpend on essentialsOther
SUMMARY TABLE OF LEVELS OF RISK OF MUTUAL FUNDS
The 838 mutual funds that are part of the Using Statistics scenario (see page 32) are classifiedaccording to their risk level, categorized as low, average, and high. Construct a summary tableof the mutual funds, categorizedby risk.
SOLUTfON The mutual funds are fairly evenly divided by risk (see Table 2.2).There aremore high-risk funds (346, or 4lo/o) than low-risk or average-risk funds. There are about thesame number of average-risk funds as low-risk funds (29%).
Fund Risk Level Number of Funds Percentage of Funds (7o)
202
243 tl 67
LowAverageHighTotal
' , A1
245346838
29.4729.2441.29
100.00
The Bar ChartIn a bar chart, a bar shows each category, the length of which represents the amount, fre-quency, or percentage of values falling into a category. Figure 2.1 onpage 34 displays the barchart for spending an extra $1,000 presented in Table 2.1.
Bar charts allow you to compare percentages in different categories. In Figure 2.1,respondents are most likely to save or pay debt, followed by buy a luxury item, vacation, orgift and spend on essentials. Very few respondents mentioned giving to charity or otherUSCS.
FIGURE 2.1Microsoft Excel barchart for spending anextra $' l ,OO0
See Sections E2.2 and E2.3to create this.
EXAMPLE 2 .2
FIGURE 2.2Microsoft Excel barchart of the levels ofr isk of mutual funds
trChrl lb( sp.rxh ArY.y
tprnlmcrdtl.|'
sra
Pry&bt
Oh.t
CtY. I lo rhvty
luy . lruy t r! Yftabn.or dn
BAR CHART OF LEVELS OF RISK OF MUTUAL FUNDS
Construct a bar chart for the levels of risk of mutual funds (based on the informationTable 2.2) and interpret the results.
SOLUTION The mutual funds are fairly evenly divided by risk (see Figure 2.2 on page 33).There are more high-risk funds (.346, or 4lo/o) than low-risk or average-risk funds. There areabout the same number of averase-risk funds as low-risk funds (29%).
!- Chd ro. L.v.lr ot Rld(
ao-
3Icl€I
a3d
T
3
! e-.'
The Pie Chart
The pie chart is a circle broken up into slices that represent categories. The size ofeach sliceof the pie varies according to the percentage in each category. In Table 2.1 on page 33, forexample, 3 I % of the respondents stated that they would save the $ 1,000. Thus, in constructingthe pie chart, the 360 degrees that makes up a circle is multiplied by 0.3 I, resulting in a slice ofthe pie that takes up I I I .6 degrees of the 360 degrees of the circle. From Figure 2.3, you cansee that the pie chart lets you visualize the portion ofthe entire pie that is in each category. Inthis figure, saving takes 3 l% of the pie and giving to charity takes only 2%.
2.1: Tables and Charts for Catesorical Data
Ple Chflt for Spon&|g Survey
Buy a Uxwy leqvacdlon, or glll
20%
€lv6 I to charty3lo
Which chart should you use-a bar chart or a pie chart? The selection of a particular chartoften depends on your intention. If a comparison of categories is most important, you shoulduse a bar chart. If observing the portion of the whole that is in a particular category is mostimportant, you should use a pie chart.
PIE CHART OF LEVELS OF RISK OF MUTUAL FUNDS
Construct a pie chart for the levels of risk of mutual funds (see Table 2.2 on page 33) and inter-pret the results.
SOLUTfON (See Figure 2.4.)The mutual funds are fairly evenly divided by risk. There aremore high-risk funds (346, or 4lo/o) than low-risk or average-risk funds. There are about thesame number of average-risk funds as low-risk funds (29%).
Plo Chart for L.volr of Rl*
The Pareto DiagramIn a Pareto diagram, the categorized responses are plotted in descending order, according totheir frequencies, and are combined with a cumulative percentage line on the same chart. ThePareto diagram can identify situations in which the Pareto principle occurs.
35
FIGURE 2.3Microsoft Excel oiechart for spending anextra $1,000
Other7%
MPLE 2 .3
Excel pieof the levels of
of mutual funds
2.4
III
J b CHAPTER TWO Presenting Data in Tables and Charts
T A B L E 2 . 3Summary Table ofCauses of Defects inComputer Keyboardsin a Three-Month Perioq
PARETO PRINCIPLEThe Pareto principle exists when the majority of items in a set of data occur in a smallnumber of categories and the few remaining items are spread out over a large number ofcategories. These two groups are often referred to as the "vital few" and the "trivial many."
The Pareto diagram has the ability to separate the "vital few" from the "trivial many,"enabling you to focus on the important categories. In situations in which the data involved con-sist of defective or nonconforming items, the Pareto diagram is a powerful tool for prioritizingimprovement efforts.
Table 2.3 presents data for a large injection-molding company that manufactures plasticmolded components used in computer keyboards, washing machines, automobiles, and televi-sion sets (see the |lllllilEffil"FlFfF file). The data presented in Table 2.3 consist of all com-puter keyboards with defects produced during a three-month period.
Cause Frequency Percentage
Black spotDamageJettingPin markScratchesShot moldSilver streakSink markSpray markWarpageTotal
sRestrlt difJbrs slightly /i.om 100.00 due to rounding.Source Extracted/romU.H.Achurv*oant lC.Mahesh,"WnningBacktheCustomersConf idence;ACuseStucl t ,ontheAppl i t :at iono/Designof E-rper imenl .stoanlnjet t ion-Mokl ingPrcces.s,"Qual i tyEngineer ing, 11,1999,pp.357 363.
Table 2.4 presents a summary table for the computer keyboard data in which the categoriesare ordered based on the percentage of defects present (rather than arranged alphabetically).The cumulative percentages for the ordered categories are also included as part ofthe table.
4131,039
2588344422754t3371292
1,9816,324
6.53t6 .434.08
1 3 . 1 96.994.356.535 . 8 74.62
31.42t00 l )1*
Cause Frequency PercentageCumulativePercentage
T A B L E 2 . 4Ordered SummaryTable of Causes ofDefects in ComputerKeyboards in a Three-Month Period
WarpageDamagePin markScratchesBlack spotSilver streakSink markSpray markShot moldJettingTotal
t ,9871,039
8344424134131 a 1J / l
292275258
6,324
31.4216.431 3 . l 96.996.536.535 . 8 74.624.354.08
31.4247.8561.0468.0374.568r .0986.969 1 . 5 895.93
t00 .00
*Restrlt di/Jbrs slightlv.fi.om 100.00 due to nnrntling.
I 00.01 *
In Table 2.4, the first category l isted is warpage (with 3 l.42uh of the defects), followedby damage (with 16.43%), followed by pin mark (with 13.19%). The two most frequentlyoccurring categories-warpage and damage-account for 41 .85oh of the defects; the threemost f requent ly occurr ing categor ies warpage, damage, and p in mark-account for61 .04% of the defects. and so on. Fisure 2.5 is a Pareto diasram based on the results dis-olaved in Table 2.4.
Pateto Diagram for Keyboard Defects DataFIGURE 2.5
Microsoft ExcelPareto d iagram for thekeyboard defects data
See Sections E2.2 and E2.4to create this.
150/"
1Bola
'' 1000/o
-f eo%I
|
8006
i 70%
{ uo"nI+ 50%
I nov"I
1 30yo
+ 2lolo
I ro"r.I
,,.1- 00/6
^doP€- -"- .J "C "dO/ ""*d "e". -"C '.."
EXAMPLE 2 .4
Figure 2.5 presents the bars vertically, along with a cumulative percentage line. Thecumulative l ine is plotted at the midpoint of each bar, at a height equal to the cumulativepercentage. Ifyou follow the l ine, you see that these first three categories account lbr about60% of the defects. Because the categories in the Pareto diagram are ordered by the fre-quency of occurrences, decision makers can see where to concentrate efforts to improve theprocess. Attempts to reduce defects due to warpage, damage, and pin marks should producethe greatest payoff. Then efforts can be made to reduce scratches, black spots, and silverstreaks.
In order for a Pareto diagram to include all categories, even those with f'ew defects, in somesituations you need to include a category labeled Other or Miscellttneous. In these situations,the bar representing these categories is placed to the right ofthe other bars.
PARETO DIAGRAM OF SPENDING AN EXTRA $1,0(
Construct a Pareto diagram of what respondents would do ron page 33).
SOLUTION In Figure 2.6, saving and paying debt acc,would do with the extra $1,000;91% of the respondentsitem, vacation, or gift; or spend on esscntials.
/>m
ffiOo a'
.A
FIGURE 2.5
Microsoft ExcelPareto diagram for thekeyboard defects data
See Sections E2.2 and E2.4to create this.
EXAMPLE 2.4
2.1: Tables and Charts for Catesorical Dataa -J I
In Table 2.4, the first category listed is warpage (with 31.42% of the defects), followedby damage (with 16.430/o), followed by pin mark (with 13.19%). The two most frequentlyoccurring categories-warpage and damage-account for 4'1 .85o/o of the defects; the threemost frequently occurring categories-warpage, damage, and pin mark-account for61.04% of the defects, and so on. Figure 2.5 is a Pareto diagram based on the results dis-played inTable 2.4.
Pareto Diagram for Keyboard D€f.cG Data
."df .t'o "-""d .rt' .C."/ ".--t "c-. .J r."
Figure 2.5 presents the bars vertically, along with a cumulative percentage line. Thecumulative line is plotted at the midpoint of each bar, at a height equal to the cumulativepercentage. Ifyou follow the line, you see that these first three categories account for about600/o of the defects. Because the categories in the Pareto diagram are ordered by the fre-quency of occurrences, decision makers can see where to concentrate efforts to improve theprocess. Attempts to reduce defects due to warpage, damage, and pin marks should producethe greatest payoff. Then efforts can be made to reduce scratches, black spots, and silverstreaks.
In order for a Pareto diagram to include all categories, even those with few defects, in somesituations you need to include a category labeled Other or Miscellaneoas. In these situations,the bar representing these categories is placed to the right ofthe other bars.
PARETO DIAGRAM OF SPENDING AN EXTRA $1,OOO
Construct a Pareto diagram of what respondents would do with an extra $1,000 (see Table 2.1on page 33).
SOLUTION In Figure 2.6, saving and paying debt account for 55oh of what respondentswould do with the extra $1,000; 9lo/o of the respondents would save; pay debt; buy a luxuryitem, vacation, or gift; or spend on essentials.
FIGURE 2.6
M icrosoft Excel Paretc.rd iagram for spendingan extra $1,000
Pareto Diagram for Spending Survey
Paydru Bwaluxuryf,.rl Apendmvacatlon, or glft ''slltlds
vout You woutd Do wth th.Ion€y
GlY. lt to chadty
*\<a-)
A/j
o-S
\
r..S
s
.V
r:
sually when you learn some-thing, there is eventuallysome sort of feedback.Providing feedback is one
reason instructors give tests. When you getthe results of a test, you can reflect on howwell you have properly learned the subjectmatter. Unfortunately, feedback opportunitiesare rare when you are learning to use a com-puter program such as Microsoft Excel as youstudy statistics, You either get the worksheetto calculate what you want or get a work-sheet with misleading or inaccurate results.This good/bad, pass/fail type of feedback can-not help you refine your Microsoft Excel skills.We hope the following l ist wil l.
You are using Microsoft Excel properly ifyou can:
1. Understand how to operate theprogram. Understanding how to operate theprogram does not mean that you need toknow everything about Microsoft Excel. Don'twaste your time teaching yourself every com-mand or trying to memorize every function.Use the Excel Companion to Chapter 1 asyour guide for what you need to know to getstarted using Excel effectively. Learn aboutadditional features only just-in-time-whenyou need to use the feature. Excel featuresthat you learn but don't immediately use arefeatures that you will soon forget. However, ifyou learn an Excel feature when you need to
use it, such as discovering that you can useData t Sort to sort data in Excel 97-2003.you are more likely to remember the propercommand.
2. Understand the underlying statis-tical concepts. Make sure you understandwhat a worksheet is doing as it produces itsinformation. As a minimum, be informedabout what each function you use calculates,or returns, as its value. Knowing this informa-tion wil l minimize your confusion and helpyou avoid making mistakes such as using theN0RMSDIST function described in Chapter 6when you meant to use the similar lookingbut different N0RMDIST function.
3. Understand how to format andpresent information in a worksheet.Section 1.6 on page 1 I has already discussedhow to organize your worksheets. With thatas a starting point, you should plan how youare going to use the rows and columns of aworksheet to enhance the contents of yourworksheets. For almost every worksheet illus-trated in this text, the authors changed col-umn widths, book font characteristics, andbackground tints of selected cells and addedborders to highlight various cell ranges. Usethe author's techniques (reviewed in theExcel Companion to Chapter 1 ) or invent yourown, but always remember to have a planthat will present information effectively.
4. Know how to review formulas for'errors. Never assume that any worksheet youcreate or that was created for you is free oferror. Always examine all formulas that havebeen entered into a worksheet. (See the ExcelCompanion to Chapter 1 for one way to dothis.) You need to do this even if you useworksheets created by PHStat2 or retrievedfrom the Student CD-ROM.
5. Take control of your copy ofMicrosoft Excel. Customize the settings ofMicrosoft Excel (and Microsoft Windows) tobest suit your style, abil it ies, and needs. Forexample, many overlook increasing thedefault 10-point Arial font to a more readablesize when examining the contents of a work-sheet. lf you own or control the computer onwhich you use Excel, you can make such cus-tomizations permanent so that they are auto-matically available to you every time you useMicrosoft Excel.
In addition, if you own or control yourown computer, you should regularly applythe security updates to Microsoft Office thatMicrosoft puts on its Web site. And if youplan to use PHStat2 on your own computer,you should read all the technical documenta-tion including the PHStat readme fi le on theStudent CD-R0M.
.i .,.i r;r-:: Visit the online forum tocontinue this discussion online.
riii,titlr
i
Learning the Basics
2,1 A categorical variable has three categorieswith the following frequencies of occurrence:
Category Frequency
ConsultingHardware toolsLaborLost business/revenueNonproductive employee timeSoftware toolsOther
CostPercentage
(%)
ABC
l3289
7.68.2
2s.923.615.514.25.0
r" Compute the percentage ofvalues in each category.h Construct a bar chart.c. Construct a pie chart.d. Construct a Pareto diasram.
2.2 A categorical variable has four categorieswith the following percentages of occurrence:
Percentage Category Percentage
t 229
a. Construct a bar chart, a pie chart, and a Pareto diagram.b. Which graphical method do you think is best to portray
these data?c. What conclusions can you reach concerning the costs
companies face in defending their networks from attack?
2.5 When do Americans decide what to make for dinner?An online survey (N. Hellmich, Americans Go for theQuick Fix for Dinner," USA Today, February 14,2005,p.I B) indicated the following:
WhenAmericansDecideWhat toMake for Dinner
c35D24
oConstruct a bar chart.Construct a pie chart.Construct a Pareto diagram.
the Concepts
A survey of 705 workers asked how much they usedInternet at work. The results (USA Todav Snapshots.
21,2006) were as follows:
Use of the Internet at Work o
Too muchMore than I shouldWthin limitsVery littleDo not use
Construct a bar chart, a pie chart, and a Pareto diagram.iVhich graphical method do you think is best to portray6ese data?Based on this survey, what conclusions can you reach
the use of the Internet at work?
2.4 An article (R. Richmond 'Anatomy of aThreat," The Wall Street Journal, February 13,2006, pp. R5, R6) discussed the costs companies
in defending their networks from attack. The follow-
a. Construct a bar chart, a pie chart, and a Pareto diagram.b. Which graphical method do you think is best to porlray
these data?
2.5 The following table represents the U.S. sources ofelectric energy in a recent year:
Source
Coal 5lHydropower 6Natural gas 16Nuclear 2lPetroleum 3oil 3
Source: U.S. Department of Energy.
a. Construct a Pareto diagram.b. What percentage of electricity is derived from either
coal, nuclear energy, or natural gas?c. Construct a pie chart.d. Which chart do you prefer to use for these data-the
Pareto diagram or the pie chart? Why?
At the last minutePlan in advanceThat dayDon't know
3725a aJ I
I
54
605
26
o
table provides the breakdown in costs:
2.7 An ar t ic le (P. Ki tchen, "Ret i rentent Plan: To Keep
Working," lr, lewstltn'. September 24, 2003\ discussed the
results of a sarnple of 2,00 I Americans ages 50 to 70 who
were errployed full t ime or part t ime. The following table
represents their plans fbr retirement:
2.9 Medication errors are a serious problem in hospitals.
The following data represent the root causes of pharmacy
errors at a hospital dunng a recent t ime period:
Reason for Failure Frequency
Additional instructionsDoseDrugDuplicate order entryFrequencyOmissionOrder not discontinued when received
Order not receivedPatientRouteOther
a. Construct a Pareto diagram.b. Discuss the "vital few" and "trivial many" reasons for
the root causes ofpharmacy errors.
2.10 The following data represent complaints about hotel
rooms:
Number
Roorn dirtyRoom not stockedRoom not readyRoom too noisyRoom needs maintenanceRoom has too few bedsRoom doesn't have promised featuresNo specia l accomrnodat ions
Construct a Pareto diagrarn.What reasons for compla ints do you th ink the hote l
should focus on if i t wants to reduce the number of com-
pla ints? Expla in.
Plans Percentage (7o)
Not work for pay at allStart own businessWork fulltimeWork part timeDon't knowOther
a. Construct a bar chart and a pie chart.b. Which graphical method do yot"t think is best to portray
these data'l
2 .8 U.S. cornpanies spent more than $250 b i l l ion in
advert is ing in 2005 (K. Delaney, " ln Latest Deal , Google
Steps Further into World of Old Media," The Wall Street
Journul , . lanuary 18,2006, pp.Al . ,46) . The spending was
as fbllows:
Media Amount ($bil l ions) Percentage (7o) Reason
29l 07
463
l 6z.)
t 422412 ll 252548
CinernaDirect mai lInternetMagazinesNewspapersOutdoorRadioTVOther
0.4++.)r0 .023.9s0.25 . 7
20.6
45.8
0 . 1 617.353.909.32
19.572.228.03
21.601 7 . 8 6
J L
1 11 2l 01 1972
a.b.a. Construct a bar chart and a pie chart.
b. Which graphical method do you think is best to portray
these data'J
2.2 ORGANIZING NUMERICAL DATA
When the number of data values is large, you can organize nurnerical data into an ordered array
or a stem-and-leaf display to help understand the information you have. Suppose you decide to
undertake a study that cornpares the cost for a restaurant meal in a major city to the cost of a
similar meal in the suburbs ourside the city. The data fi le EEd@EE contains the data for 50
city restaurants and 50 suburban restaurants, as shown in Table 2.5. The data are not arranged
in order fiorn lowest to highest. This arrangement makes it diff icult to make conclusions about
the price of meals in the two geographical areas.
TABLE 2 .5
Pri re ner Pcrson at 50Ci ty Restaurants and 50Suburban Restaurants
T A B L E 2 . 6
Ordered Array of Pricen o r P o r c n n a t E O C i + r rr " , . " ,Restaurants and 50Suburban Restaurants
Citv
Suburban
4t 4422 4550 3532 6339 53
255044263 7
43491 44853
503444a 1J I
36
26J U- AJ+
3855
2529) z
3844
) tA A++
+)z o
5 l
z )
21303639
3839383438
37271 tJ t
5 l30
z )
21a tJ I
3139
2924262621
3930)./..., A
J J
J J
3539353 l
zo30J J
385 l
56 51 3637 40 5044 51 2748 30 4223 39 45
38 37 3834 44 2334 23 4148 39 5538 26 28
29) L
303838
3 t36404853
36 3825 2928 3331 3032 25
The Ordered Array
An ordered array is a sequence of data, in rank ordeq fiom the smallest value to the largestvalue. Table 2.6 contains ordered arrays for the price of meals at city restaurants and suburbanrestaurar.lts. From Table 2.6 you can see that the price of a meal at the city restaurants is between
$14 and $63 and the price of a rneal at the suburban restaurants is between $23 and $55.
City1 4J J
384350
Suburban
263539445 l
2234384450
z)
34384450
1/,1
28a l
) I
4 l
2535394450
2428) za -) t
43
213 53 9
5 l
-tt,
36394553
26293 33 848
3 l. ) l
4 14856
263 01 AJ +
385 l
) z
) t
4249b-i
2529) z
3844
The Stem-and-Leaf Display
A stem-and-leaf display organizes data into groups (called stems) so that the values withrneach group (the leaves) branch out to the right on each row. The resulting display allows you tosee how the data are distributed and where concentrations ofdata exist. To see how to constructa stem-and-leaf display, suppose that l 5 students from your class eat lunch at a fast-foodrestaurant. The following data are the amounts spent for lunch:
5 .40 4 .30 4 .80 5 .50 7 . 30 8 .50 6 .10 4 .80 4 .90 4 .90 5 . s0 3 .50 5 .90 6 .30 6 .60
To form the stem-and-leaf display, you use the units as the stems and round the decimals(the leaves) to one decimal place. For example, the first value is 5.40. Its stem (row) is 5, andits leaf is 4. The second value, is 4.30. lts stem (row) is 4, and its leaf is 3. You continue with theremainder of the | 5 values and then reorder the leaves within each stem as follows:
3
4
5
o
l
8
38899
4559
1 3 6
3
5
E X A M P L E 2 . 5
FIGURE 2.7
Microsoft Excel stem-and - l ea f d i sp lay o f t hethree-year annual izedreturns (2003-2005)
See Sectlon E2.6 to createthls.
STEM-AND-LEAF DISPLAY OF THE THREE.YEAR ANNUALIZED RETURNOF MUTUAL FUNDS
In this chapter's Using Statistics scenario, you are interested in studying the past performanceof mutual funds. One measure of past performance is the three-year annualized return(2003-2005)-that is, the average percentage return over the past three years. Construct astem-and-leaf display of the three-year annualized returns.
SOLUTION From Figure 2.7 , you can conclude that:
r The lowest three-year annualized return was 6.7.r The highest three-year annualized return was 42.3.I The three-year annualized returns were concentrated between 1 I and 23.' Only six mutual funds had three-year annualized returns below 8, and only two mutual
flunds had three-year annualized returns above 40.
- - A I
Slsn' , i ld.1€al ok} l6y ol lhree Y6dr RotuxB
: 67
l ei . 9i 1 0i r l
' 1 r. 1 5, 1 6: 1 7i 1 8: 1 ei 2 [
l1
: 2 2: ? 3
' , x: f r:, a1. 3
i 3 1i72
: J E, 3 7:: ]8
!ud 1
Steff oillt t
s \ z z t7 i 3 6 8s l 2 5 t t Is | 0 0 0 0 0 ' t t 5 5 6 7 7 7 8 8 9 9
r o l o I r r z 3 d . r d i 5 5 6 6 o o s 91 1 l , 1 1 1 ? 7 1 1 3 3 3 ? 3 t i i r t t , 5 5 5 5 s 6 7 t 2 7 s 0 c 0 9 912i 0 0 tr 1 1 2 1 2 2 2 3 3 { , t . t t 1 | | | | 5 5 5 5 5 5 5 5 5 5 6 6f 6 t 7 7 7 7 I I I E S { 8 8 9 t I I 9f i i 0 0 0 o 0 0 0 0 o | | 1 2 2 2 2 2 1 I 3 3 t : r 3 I 3 I 1 { , t 5 5 5 5 f 5 5 6 6 6 6 6 / I t 0 I I I I I s s I 9 9 s I 9 9 9I { i 0 0 0 0 1 I 1 t t I 1 | 1 1 1 2 2 2 1 2 2 2 ? ? I 3 3 3 { 1 t 5 5 5 5 5 5 5 5 r 6 € 6 6 6 6 6 7 7 7 7 7 8 8 9 S S 9 9 9 9 9 9 9 91 5 i 0 0 0 0 t 1 1 r 1 1 1 | 1 1 1 2 2 2 2 1 2 2 2 3 3 3 3 3 a 4 a 4 1 t 5 5 t 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 7 7 7 7 7 8 8 8 E 8 8 8 8 9 I 9t 6 i 0 0 0 0 0 0 t t 1 1 1 2 2 2 2 2 2 ? 2 l 3 l t l 3 a a i J t t t a l a . { 5 5 J 5 5 5 6 6 6 6 6 7 7 7 7 t t 7 7 7 7 8 8 8 8 8 8 8 8 3 9 9 9 9 9 9 9t 7 i 0 0 0 0 0 1 1 | 1 1 1 , 1 2 2 2 2 2 2 2 2 2 1 3 3 : 3 3 l t 1 | | | 5 5 5 5 5 6 6 6 6 6 6 6 7 7 8 8 8 9 9 3 Sr s l 0 0 0 0 0 c r 1 r 1 r 2 2 2 1 7 z ? 2 2 ? . 3 1 3 3 3 r 1 { r i 5 r 5 6 t t t / t a s d i 9 g 3 91 3 1 0 0 0 0 0 0 0 0 0 1 1 1 ? 2 2 2 2 . t 3 t 3 3 t 3 a t t t 1 1 5 t 5 t 5 t 5 5 1 5 6 c 6 6 6 6 7 E 8 8 8 8 9 9 9 9 3 r 9? 4 1 0 0 0 0 r 1 t ' 1 t ' t 1 1 2 2 2 2 2 2 3 X 7 t ! r I t 1 | t t 5 t 6 6 6 I 7 t r 7 8 I 8 8 E E 8 I I I 9 9 9 9 I? l l 0 0 0 0 n 0 0 0 t I 1 ' l ? ? 2 2 2 3 3 x 3 3 a | | t i 5 5 : 5 5 5 5 6 6 6 6 6 6 6 t ? 7 t ? 7 t ? ? I f d 8 0 t t 9 9n l o o r t r t t t ' t I I t t ? 2 2 2 2 ? S t l i r r t a 5 ! i 5 t i 6 6 ? ? ? t i a 9 9 92 3 1 0 0 0 0 1 r z 2 z t i l i 3 3 3 l { 1 r 5 5 i i 6 6 7 t ? e 8 a r 9 9 gt r l o r t r i ? J 3 ! 5 7 7 8 9 9 9z t l r 1 1 ? 2 3 3 a r 1 8 8 8 9 9? 6 i 0 0 0 1 t 1 2 { 5 8 92 7 l r z i t , s 6 t 7 | $ 92 8 1 0 r r 3 62 9 1 0 r 2 3 9 9r 1 5 5 6 93 r l r 3 s?1,2 73li s3{ i { }7s itrl3?lrolslr0l 54 t ,ial l
,'t'iill,T'i,ii-iiLearning the Basics
2.11 Form an ordered array, given the followingdata from a sample of n:7 midterm exam scoresin accounting:
68 94 63 15 ] t 88 64
2.'12 Form a stem-and-leaf display,following data from a sample of n -
exam scores in finance:
80 s4 69 98 93 53 t4
given the7 midterm
2.13 Form an ordered array, given the followingdata from a sample of n : J midterm exam scoresin marketing:
88 18 18 73 91 78 85
2.14 Form an ordered array, given the followingsterr -and- leaf d isp lay f rom a sarnple of n : 7rnidterm exaffr scores in information systems:
5
o
1
8o
0
446
l 9
2
Applying the Concepts
ff i 2.15 The fol lowing is a stem-and-leaf displaylAsslsTl represent ing the amount of gasol ine purchased.
in gallons (with leaves in tenths of gallons), for asample of 25 cars that use a particular service station on theNew Jersey Turnpike:
a. Place the data into an ordered array.b. Construct a stem-and-leaf display for these data.c. Which of these two d isplays seems to prov ide more
information'l Discuss.d. Around what value, if any, are the bounced check fees
concentrated ' . ' Expla i n.
2,"17 The fil. EIEE@IEIE contains data on the price fortwo t ickets, wi th onl ine serv ice charges, large popcorn.
and two medium sof t dr inks at a samole of s ix theaterchains:
s36 . l 5 $3 r . 00 $35 .05 $40 .25 S33 .75 S43 .00Source: Extruttad./i'om K. Kell.t', "Thc Mtrltiplex Under Siege."The'Wall Strect.lournal, Decerrtbar 24 25, 2005. pp. Pl, P5.
Place the data into an ordered array.Construct a stem-and-leaf display for these data.Which of these two d isplays seems to prov ide moreinformation'l Discuss.
d. Around what value, i f any, are the movie pr ices con-centrated'? Explain.
l-lsELFl 2.18 The file @l$fr contains data on theffi total fat, in grams per serving, for a sample of 20
chicken sandwiches fiom fast-food chains. Thedata are as follows:
7 8 4 5 1 6 2 0 2 0 2 4 1 9 3 023 30 25 l9 29 29 30 30 40 56
Source: Estruttetl .fiom "Fa:;t.fixtd; Adding HeuIth to tha Menu,"Consumer Reports, Septentlter 2{)04, pp. 28 31.
a. Place the data into an ordered array.b. Construct a stem-and-leaf display.c. Does the ordered array or the stem-and-leaf display pro-
vide more infbnnation'J Discuss.
2.19 The data in the file [llllfl@ffr represent the bat-tery l ife, in shots, for three pixel digital cameras:
300 t80 85 170 380 460 260 35 380 t20 110 240
Sourte: Exttrtttad.fi'om "Cunteras; More Feulure.t in tha Mi.r."Consurner Rcports, .hlr 2005.1t1t. l4 18.
a. Place the data into an ordered array.b. Construct a stem-and-leaf display.c. Does the ordered array or the stem-and-leaf display pro-
vide more information'l Discuss.d. Around what value, if any, is the battery life concentrated'l
Explain.
a.b.c.
o
l 0
l l
l 2
I J
t470223812ss6677722348902
a. Place the data into an ordered array.b. Which of these two d isplays seems to prov ide more
information? Discuss.c. What amount of gasoline (in gallons) is most l ikely to be
purchased'?d. Is there a concentration of the purchase amounts in the
center of the distribution'l
2.16 The data in the file [!fi!!!Il$ containsthe bounced check fees, in dollars, for a sampleof 23 banks for direct-deposit customers who
maintain a $100 balance:
26 28 20 20 2 t 22 25 25 l8 25 15 20
18 20 25 25 22 30 30 30 15 20 29Sow'ce: E.rtracted./ronr Thc New Face of Banking. June 2000Cop.t'right'l 2000 bv ('ottsrrnars Union o.l'U.5.. lnr'.. Yrnker.s. NYt0703 I 057
1l
I
-,j
2.3 TABLES AND CHARTS FOR NUMERICAL DATAWhen you have a data set that contains a large number of values, reaching conclusions from anordered array or a stem-and-leaf display can be difficult. In such circumstances, you need touse tables and charts. There are many types oftables and charts that you can use to visually pre-sent numeric al data. These include the frequency and percentage distributions, histogram, poly-gon, and cumulative percentage polygon (ogive).
The Frequency DistributionThe frequency distribution is a summary table in which the data are arranged into numen-cally ordered class groupings. [n constructing a frequency distribution, you must give attentionto selecting the appropriate number of class groupings for the table, determining a suitablewidth of a class grouping, and establishing the boundaries of each class grouping to avoidoverlapping.
The number of class groupings you use depends on the number of values in the data.Larger numbers of values allow for a larger number of class groupings. In general, the fre-quency distribution should have at least 5 class groupings but no more than 15. Having too fewor too many class groupings provides little new information.
When developing a frequency distribution, you define each class grouping by class inter-vals of equal width. To determine the width of a class interval, you divide the range (highestvalue - lowest value) of the data by the number of class groupings desired.
DETERMINING THE WIDTH OF A CLASS INTERVAL
Width of interval =Range (2.r)
Number of desired class groupings
Because the city restaurant data consist of a sample of 50 restaurants, 10 class group-ings are acceptable. From the ordered array in Table 2.6 on page 41, the range of the data is$63 - l4: $49. Using Equation (2.1), you approximate the width of the class interval asfollows:
W i d r h o f i n t e r v a l : 4 9 = 4 . 91 0
You should choose an interval width that simplifies reading and interpretation. Therefore,instead of using an interval width of $4.90, you should select an interval width of $5.00.
To construct the frequency distribution table, you should establish clearly defined classboundaries for each class grouping so that the values can be properly tal l ied into theclasses. You place each value in one and only one class. You must avoid overlapping ofclasses.
Because you have set the width ofeach class interval for the restaurant data at $5, you needto establish the boundaries of the various class groupings so as to include the entire range ofvalues. Whenever possible, you should choose these boundaries to simplify reading and inter-pretation. Thus, for the city restaurants, because the cost ranges from $14 to $63, the first classinterval ranges from $10 to less than $1 5, the second from $15 to less than $20, and so on, untilthey have been tallied into I I classes. Each class has an interval width of $5, without overlap-ping. The center of each class, the class midpoint, is halfway between the lower boundary ofthe class and the upper boundary of the class. Thus, the class midpoint for the class from $ 10 tounder $15 is $12.50, the class midpoint for the class from $15 to under $20 is $17.50, and soon. Table 2.7 is a frequency distribution of the cost per meal for the 50 city restaurants and the50 suburban restaurants.
2.3: lables and Uharts lor Numencal uara
Cost per Meal ($) City Frequency Suburban Frequency
l0 but less than l5l5 but less than2020 but less than 2525but less than 3030 but less than 3535 but less than 4040 but less than 4545 but less than 5050 but less than 5555 but less than 6060 but less than 65Total
The frequency distribution allows you to draw conclusions about the major characteristicsof the data. For example, Table 2.7 shows that the cost of meals at city restaurants is concen-trated between S30 and $55. and the cost of meals at suburban restaurants is clustered between$25 and $40.
If the data set does not contain many values, one set of class boundaries may provide a dif-ferent picture than another set. For example, for the restaurant cost data, using a class-intervalwidth of 4.0 instead of 5.0 (as was used in Table 2.7) may cause shifts in the way the values dis-tribute among the classes.
You can also get shifts in data concentration when you choose diflerent lower and upperclass boundaries. Fortunately, as the sample size increases, alterations in the selection of classboundaries affect the concentration ofdata less and less.
FREOUENCY DISTRIBUTION OF THE THREE-YEAR ANNUALIZED RETURNFOR GROWTH AND VALUE MUTUAL FUNDS
In the Using Statistics scenario, you are interested in comparing the three-year annualizedreturn (2003-2005) of growth and value mutual funds. Construct frequency distributions forthe growth funds and the value funds.
SOLUTION The three-year annualized returns of the growth funds and the value funds arehighly concentrated between 10 and 25 (see Table 2.8). You should not directly compare thefrequencies ofthe growth funds and the value funds because there are 480 growth funds and358 value funds in the sample. On page 4T,relative frequencies and percentages are introduced.
Three-Year Annualized Return Growth Frequency Value Frequency
+J
TABLE 2.7
Frequency Distri butionof the Cost per Mealfor 50 City Restaurantsand 50 SuburbanRestaurants
EXAMPLE 2 .6
TABLE 2 .8
Frequency Distri butionofthe Three-YearAnnualized Return forGrowth and ValueMutual Funds
5 but less than l0l0 but less than l515 but less than 2020 but less than 2525 but less than 3030 but less than 3535 but less than 4040 but less than 45Total
004
l3l 3t24I2I0m
I02J
7t 4858II
50
2 l1481721 1 32r500
480
85132972870z
3s8-
6
t
I
i
The Relative Frequency Distribution and the PercentageDistribution
Because you usually want to know the proportion or the percentage of the total that is in eachgroup, the relative frequency distribution or the percentage distribution is preferred to the fre-quency distribution. When you are comparing two or rnore groups that differ in sample size,you must use either a relative frequency distribution or a percentage distribution.
You form the relative frequency distribution by dividing the frequencies in each classof the frequency distribution (see Table 2.7 on page 45) by the total number of values. Youform the percentage distribution by multiplying each relative frequency by 100%. Thus,the relative frequency of meals at city restaurants that cost between $30 and $35 is 7 dividedby 50, or 0.14, and the percentage is l4n/o. Table 2.9 presents the re lat ive f requency d is-tribution and percentage distribution of the cost of restaurant meals at city and suburbanrestaurants.
SuburbanTABLE 2 .9
Relative FrequencyDist r ibut ion andPercentage Distributionof the Cost of RestaurantMeals at Ci ty andSuburban Restaurants
Cost per Meal ($)
10 but less than l5I 5 but less than 2020 but less than 2525 but less than 3030 but less than 3535 but less than 4040 but less than 4545 but less than 5050 but less than 5555 but less than 6060 but less than 65Total
City
RelativeFrequency
0.020.000.040.060 .140.280 .160 . r00 . l 60.020.021.00
PercentageRelative
Frequency
0.000.000.080.260.260.240.080.020.040.020.001.00
Percentage
2.00.04.06.0
14.028.016 .010.016 .02.02.0
100.0
0.00.08.0
26.026.024.08.02.04.02.0
__Q-0100.0
From Table 2.9,you conclude that meals cost more at city restaurants than at suburbanrestaurants-16% of the meals at city restaurants cost between $40 and $45 as compared to 8%ofthe suburban restaurants; l6% ofthe meals at city restaurants cost between $50 and $55 ascompared to 4% of the suburban restaurants, and only 60/o of the meals at city restaurants costbetween $25 and $30 as compared to 26oh ofthe suburban restaurants.
1 EXAMPLE 2 .7 RELATIVE FREOUENCY DISTRIBUTION AND PERCENTAGE DISTRIBUTIONOF THE THREE-YEAR ANNUALIZED RETURN FOR GROWTH AND VALUEMUTUAL FUNDS
In the Using Statistics scenario. you are interested in comparing the three-year annualizedreturn (2003-2005) for growth and value mutual funds. Construct relative frequency distribu-tions and percentage distributions for the growth funds and the value funds.
SOLUTION You conclude (see Table 2.l0) that the three-year annualized return (2003-2005)for the growth funds is slightly lower than for the value funds. 4.38% of growth funds havereturns below 10, while only l.96oh of value funds have returns below 10. Of the growth funds,30.83% have returns between l0 and l5 as comparedto23.74o of the value funds. Also, moreof the value funds have higher returns. For example , 34.91o/o of the value funds made between20 and 30. while 27.92% of the srowth funds made between 20 and 30.
TABLE 2 .1 O
Relat ive FrequencyDist r ibut ion andPercenta ge Dist r ibut ionu t L t t g t t i l g g - t g d l
Annual ized Return(2003-2005) for Growthand Va lue Mutua l Funds
Three-YearAnnual ized Return
Growth
Proportion Percentage Proportion Percentage
5 but less than 1010 but less than I 5I 5 but less than 2020 but less than 2525 but less than 3030 but less than 3535 but less than 4040 but less than 45Total
0.04380.30830.35 830.23540.04380 . 0 1 0 40.0000
a.0ll99r.0000
4.3830.833 5 . 8 323.544.381 . 0 40.000.00
r00.00
0 . 0 1 9 60.23740.36870.27090.07820 . 0 r 9 60.00000.0056t .0000
1 . 9 6z ) . t +
36.8727.09
7.821 . 9 60.000.56
100.00
The Cumulat ive Distr ibut ion
Thc cumulative percentage distribution provides a way of presenting infbrnratiorr about thepercentage of itenrs that are less than a certain vah.re. For exarlple. you nright want to knowwhat pcrcentagc of thc city restaurant r.neals cost less than $20. less than $30. less than $50, andso on. The percentage distribLrtion is used to fbrm the curnulative percentage distribution. Table2. l l i l lust rates how to develop the curnr"r la t ive pcrcentage d is t r ibut ion fbr the cost of rneals atc i ty restaurants.0.00%r of the rneals cost lcss than $10,2 'Zo cost lcss than S15.2 'Zn a lso cost lessthan $20 (because nonc of the rneals cost betwccrr $15 and $20).6 'X, (2 '%+ 4%,) cost less than$25. and so on. unt i l a l l 100%, of the rneals cost lcss than 565.
Cost per Meal ($) PercentagePercentage of Meals Less Than
Lower Boundary of Class IntervalTABLE 2 .1 1
Developing theCumulat ive PercentageDist r ibut ion for the Cosrof Meals at Ci tyResta u ra nts
I 0 but less than I 5I 5 but less than 2020 but less than 2525 br"rt less than 3030 but less than 3535 but less than 4040 but less than 4545 but less than 5050 but less than 5555 but less than 6060 but less than 6565 but less than 70
022 : 2 + 06 - - 2 + 0 + 4
1 2 : 2 + 0 + 4 + 62 6 - 2 + 0 + 4 + 6 + 1 45 4 - 2 + 0 + 4 + 6 + 1 4 + 2 87 0 - 2 + 0 + 4 + 6 + 1 4 + 2 8 +8 0 : 2 + 0 + 4 + 6 + 1 4 + 2 8 +9 6 : 2 + 0 + 4 + 6 + 1 4 + 2 8 +9 8 : 2 + 0 + 4 + 6 + 1 4 + 2 8 +
1 0 0 - 2 + 0 + 4 + 6 + 1 4 + 2 8 +
204o
t 42 8l 6r0t 6220
l 61 6 + 1 01 6 + l 0 + 1 61 6 + 1 0 + 1 6 + 21 6 + 1 0 + 1 6 + 2 + 2
Table 2.12 on page 48 surrrnarizcs the curnulative perccntages of the cost of city and sub-urban lestaurant r lea ls . T l . re curru lat ive d is t r ibut ion c lear ly shows that the cost of rneals tslower irr suburban restaurants than in city rcstaurants--34'h of the suburban restaurants costless than $30 as compared to only l2% of the city restaurants; 60%r of the suburban restaurantscclst lcss than $35 as conrparcdto only 26% of the city restaurants; 84%, of the suburban restau-rants cost less than $40 as compared to only 54%, of the city restaurants.
Cost ($)
City Percentage ofRestaurants Less Than
Indicated Value
Suburban Percentage ofRestaurants Less Than
Indicated Value
TABLE 2 .12
Cumulat ive PercentageDist r ibut ions of theC n c i n f C i t r r : n i
Suburban RestaurantMea ls
EXAMPLE 2 .8
l 0l 520253035404550556065
0226
t2265470809698
100
0008
J+
6084929498
100100
CUMULATIVE PERCENTAGE DISTRIBUTION OF THE THREE-YEARANNUALIZED RETURN FOR GROWTH AND VALUE MUTUAL FUNDS
In the Using Statistics scenario, you are interested in comparing the three-year annualizedreturn (2003-2005) of growth and value mutual funds. Construct cumulative percentage distri-butions for the growth funds and the value funds.
SOLUTION The cumulative distribution in Table 2.13 indicates that more of the growth fundshave lower returns than the value funds-35.21% of the growth funds have returns below l5 ascompared to25.10oh of the value funds; 7l .04oh of the growth funds have returns below 20as compared to 62.57o/o of the value funds; 94.58o/o of the growth funds have returns below25 as compared to 89.66% of the value funds
AnnualReturn
Growth Fund PercentageLess Than Indicated Value
Value Fund PercentageLess Than Indicated Value
TABLE 2 .13
Cumulat ive PercentageDist r ibut ions of theThree-Year Annual izedReturn for Growth anoVa lue Funds
5l0l 5202530354045
0.004.38
35.2171.0494.5898.96
100.00100.00100.00
0.001 . 9 6
25.7062.5789.6697.4899.4499.44
100.00
The Histogram
A histogram is a bar chart for grouped numerical data in which the frequencies or percentages
of each group of numerical data are represented as individual vertical bars. In a histogram,there are no gaps between adjacent bars as there are in a bar chart ofcategorical data. You dis-play the variable of interest along the horizontal (X) axis. The vertical ( l') axis represents eitherthe frequency or the percentage ofvalues per class interval.
Figure 2.8 displays a Microsoft Excel frequency histogram for the cost of restaurant mealsat city restaurants. The histogram indicates that the cost of restaurant meals at city restaurantsis concentrated between approximately $30 and $50. Very few meals cost less than $20 or morethan 560.
2.8Excelfor the cost
ant meals atrestaurants
E2.7 and E2.8
MPLE 2 .9
2.9m of the three-
annualized return1005) (Pane lA-funds and Panel
funds)
Hhtog"arn tor tho Cosi of tcals d Cfy Rostauaris
12.5 17.5 22.5 27.5 32.5 3t 5 tZ,
Co.{ ofloal
1f .5 52.5 57.5
HISTOGRAM OF THE THREE-YEAR ANNUALIZED RETURN FOR GROWTHAND VALUE MUTUAL FUNDS
In the Using Statistics scenario, you are interested in comparing the three-year annualizedreturn (2003-2005) of growth and value mutual funds. Construct histograms for the growthfunds and the value funds.
SOLUTION Figure 2.9 shows that the distribution of the growth funds (shown below) has morelow returns as compared to the value funds (shown on p. 50), which have more high returns.
t'10
1m
3 t toE
fl)
60
'llt
m
0
HFtogram for Tlr.a-Yoar Atrnrdzad Ro{lrn {Grourlh Frrd3t
Panel A
Hlslogrm for Thres-Y€a Annualized R6tum {Vds Fmds}
d 8 0!
'i 60
Pane l B
FIGURE 2 .10
Microsoft Excelpercentage polygonsof the cost of restaurantmeals for c i ty andsuburban restaurants
See Section E2.9 to createthls.
The PolygonConstructing multiple histograms on the same graph when comparing two or more sets of datais confusing. Superimposing the vertical bars of one histogram on another histogram makesinterpretation difficult. When there are two or more groups, you should use a percentagepolygon.
PERCENTAGE POLYGONA percentage polygon is formed by having the midpoint of each class represent the data inthat class and then connecting the sequence of midpoints at their respective classpercentages.
Figure 2.10 displays percentage polygons for the cost of restaurant meals for city andsuburban restaurants. The polygon for the suburban restaurants is concentrated to the left of(corresponding to lower cost) the polygon for city restaurants. The highest percentages ofcost for the suburban restaurants are for class midpoints of $27.50 and $32.50, while thehighest percentages of cost for the city restaurants are for a class midpoint of $37.50.
Perc€nlags Pobfgon for ths Cost ot ileals at Cty nd Suburbm Rostilatts
12.5 17.5 22.5 21.5 32.5 n7.5 12.5 47.5
EXAMPLE 2 .10
FIGURE 2.11
Microsoft Excelpercentage polygonsofthe three-yearannualized return
The polygons in Figure 2.10 have points whose values on theXaxis represent the mid-
point of the class interval. For example, look at the points plotted on the )'axis at 22.5
($22.50). The point for the suburban restaurants (the higher one) represent, ti" lu., ,frut itiof these restaurants have meal costs between $20 and $25. The point forthe city restaurants(the lower one) represents the fact that 4% of these restaurants have meal costs between $20and S25.
When you construct polygons or histograms, the vertical ( )] axis should show the truezero, or "origin," so as not to distort the character of the data. The horizontal (-l] axis does notneed to show the zero point for the variable of interest, although the range of the variableshould constitute the major portion of the axis.
PERCENTAGE POLYGONS OF THE THREE.YEAR ANNUALIZED RETURN(2003-2005) FOR GROWTH AND VALUE MUTUAL FUNDS
In the Using Statistics scenario, you are interested in comparing the three-year annualizedreturn (2003-2005) of growth and value mutual funds. Construct percentage polygons for thesrowth funds and the value funds.
SOLUTION Figure 2.1 I shows that the distribution of the growth funds has slightly more lowreturns as compared to the distribution of value funds, which has more high returns.
22.5 27.5 37.5 12.532.517.5
The Cumulative Percentage Polygon (Ogive)
The cumulative percentage polygon, or ogive, displays the variable of interest along the Xaxis, and the cumulative percentages along the I axis.
Figure 2.12 illustrates the Microsoft Excel cumulative percentage polygons of the costof restaurant meals at city and suburban restaurants. Most of the curve for the city restau-rants is located to the right ofthe curve for the suburban restaurants. This indicates that thecity restaurants have fewer meals that cost below a particular value. For example, l2oh ofthe city restaurant meals cost less than $30 as compared to 34o/o of the suburban restaurantmeals.
Porconlago Pobgm for Ttreo-Yoa Annudlzed R6lum
Il
J
FIGURE 2.12
Microsoft Excelcumulat ive percentagepolygons of the cost ofrestaurant meals at c i tyand subu rbanresta u ra nrs
EXAMPLE 2 .11
FIGURE 2.13
Microsoft Excelcumulat ive percentagepolygons of the three-year annual ized return
Cumulative Percentage Polygon tor the Cost ot Meals at City and Suburban Reslaurants
9.99 14.99 19.99 ?4.99 29.99 34.99 39.99 44.99 49.99 54.99 59.99 6rt.99
CUMULATIVE PERCENTAGE POLYGONS OF THE THREE-YEARANNUALIZED RETURN (2003-2005) FOR GROWTH ANDVALUE MUTUAL FUNDS
In the Using Statistics scenario. you are interested in comparing the three-year annualizedreturn (2003-2005) of growth and value mutual funds. Construct cumulative percentage poly-gons for the growth funds and the value funds.
SOLUTION Figure 2.13 i l lustrates the Microsoft Excel cumulative percentage polygons ofthe three-year annualized return for growth and value funds. The curve for the value funds islocated slightly to the right of the curve for the growth funds. This indicates that the value fundshave fewer returns below a particular value. For example, 62.51% of the value funds havereturns less than 20 as compared to 7 1.04o/o of the growth funds.
Cumulative Percentage Polygon for Throe-Yotr Annualized Retum
100%
900/0
ao%
700/o
600/o
50%
40%
10r/o
2g%
100to
2.J: lables and Charts tor Numencal L)ata ) J
b.c.
Learning the Basics
2.2O The values for a set of data vary from I 1.6to 97.8.a. If these values are grouped into nine classes,
indicate the class boundaries.What class-interval width did you choose?What are the nine class midpoints?
2.21 The GMAT scores from a sample of 50applicants to an MBA program indicate that noneofthe applicants scored below 450. A frequency
distribution was formed by choosing class intervals 450 to499, 500 to 549, and so on, with the last class groupingbeing 700 to 749. Two applicants scored in the interval 450to 499, and l6 applicants scored in the interval 500 to 549.Construct a cumulative percentage polygon.a. What percentage of applicants scored below 500?b. What percentage of applicants scored between 500 and
549?c. What percentage of applicants scored below 550?d. What percentage of applicants scored below 750?
Applying the Concepts
2.22 The following data (contained in the file
EsltD represent the cost of electricity duringJuly 2006 for a random sample of 50 one-bedroom
apartments in a large city:
Raw Data on Utility Charges ($)
of 100 steel parts. The measurement reported is the differ-ence in inches between the actual length ofthe steel part, asmeasured by a laser measurement device, and the specifiedlength of the steel part. For example, the first value,-0.002, represents a steel part that is 0.002 inch shorterthan the specified length.
-0.002
0.0005
0.002s
0.001
0.002
0.001
0.005-0.002
0
0.001-0.0025
-0.003
0.001-0.0005
0-0.003
-0.001
0.0005
0.002s-0.0025
0.002
0.001
0.001
0.001-0.002
-0.003
-0.0015
-0.0005
0-0.0025
-0.003
-0.001
0.002
0.0025-0.002
0-0.0005
-0.001
0-0.002
0
0
0.002-0.003
0.00050.001
0.000s0.00r
0.00250.001 -0.001
0 -0 .0015
-0.0025 0.0025-0.001 0.001
0.0035 0.0005
0 0-0.003 0.002
-0.0015 -0.00 r-0.0005 -0.001
-0.001s 0.0005-0.0005 -0.002s
-0.0005 0.0025
0.002 -0.002 -0.0005
-0.001 -0.001 0.000s
0 -0.0015 0.000s
0.003 -0.001s 0
0.001s -0.002 -0.0005
96 t7l
157 185
t4t 149
95 163
108 l 19
t97 127 82
2t3 130 165
168 109 t67
166 139 t49
137 t29 158
202 178 t47 102 153
9 0 l 1 6 r 7 2 i l r 1 4 8
206 175 r23 r28 r44
150 ts4 130 143 187
1 8 3 1 5 1 t t 4 1 3 5 l 9 l
0 0.001 0.002 -0.0005
0 -0.002s 0.001 -0.002
-0.0025 -0.0005 -0.0015 -0.002
a. Form a frequency distribution and a percentage distribu-tion that have class intervals with the upper class limits$99. $l 19. and so on.
b. Construct a histogram and a percentage polygon.
{. Form a cumulative percentage distribution and plot acumulative percentage polygon.
d. Around what amount does the monthly electricity costseem to be concentrated?
2.23 One operation of a mill is to cut pieces of steel intoparts that will later be used as the frame for front seats in anautomobile. The steel is cut with a diamond saw andrequires the resulting parts to be within +0.005 inch of thelength specified by the automobile company. The followingtable (contained in the file @[$ comes from a sample
a. Construct a frequency distribution and a percentagedistribution.
b. Construct a histogram and a percentage polygon.c. Plot a cumulative percentage polygon.d. Is the steel mill doing a good job in meeting the require-
ments set by the automobile company? Explain.
2.24 A manufacturing company producessteel housings for electrical equipment. Themain component part of the housing is a steel
trough that is made out of a l4-gatge steel coil. It is pro-duced using a 250-ton progressive punch press with awipe-down operation that puts two 90-degree forms inthe flat steel to make the trough. The distance from oneside of the form to the other is critical because of weath-erproofing in outdoor applications. The companyrequires that the width of the trough be between 8.31inches and 8.61 inches. The following (contained in the
LHAr l bK I WO Presentlng Data in Tables and Charts
file IEE@) are the widths of the troughs, in inches,for a sample of 49 troughs.
8 .312 8 .343 8 .317 8 .383 8 .348 8 .410 8 .351 8 .3738.481 8.422 8.476 8.382 8.484 8.403 8.414 8.4198.38s 8.465 8.498 8.447 8.436 8.413 8.489 8.4148.481 8.4t5 8.479 8.429 8.458 8.462 8.460 8.4448.429 8.460 8.412 8.420 8.410 8.40s 8.323 8.4208.396 8.447 8.405 8.439 8.411 8.427 8.420 8.4988.409
a. Construct a frequency distribution and a percentagedistribution.
b. Construct a histogram and a percentage polygon.c. Plot a cumulative percentage polygon.d. What can you conclude about the number of troughs
that will meet the company's requirements of troughsbeing between 8.3 I and 8.61 inches wide?
2.25 The manufacturing company in problem 2.24 alsoproduces electric insulators. Ifthe insulators break when inuse, a short circuit is likely to occur. To test the strength ofthe insulators, destructive testing in high-powered labs iscarried out to determine how much/orce is required tobreak the insulators. Force is measured by observing howmany pounds must be applied to the insulator before itbreaks. The strengths of 30 insulators (contained in the file!!!!!f[l) are as follows:
t ,870 1 ,728 1 ,656 1 ,610 1 ,634 t ,784 1 ,522 1 ,6961,592 1,662 1,866 1,764 1,734 1,662 1,734 1,7741,550 1 ,756 1 ,762 1 ,866 1 ,820 1 ,744 1 ,788 1 .6881,810 1 ,752 1 ,680 1 ,810 1 ,652 1 .736
a. Construct a frequency distribution and a percentagedistribution.
b. Construct a histogram and a percentage polygon.c. Plot a cumulative percentage polygon.d. What can you conclude about the strength of the insula-
tors if the company requires a force measurement of atleast 1,500 pounds before breaking?
2.26 The ordered arrays in the accompanying table (andcontained in the file [sf@ deal with the life (in hours)of a sample of 40 100-watt light bulbs produced byManufacturer A and a sample of 40 100-watt light bulbsproduced by Manufacturer B.
a. Form a frequency distribution and a percentage distribu-tion for each manufacturer, using the following class-interval widths for each distribution:
l. Manufacturer A: 650 but less than 750, 750 butless than 850, and so on.2. Manufacturer B: 750 but less than 850, g50 butless than 950, and so on.
b. Construct percentage histograms on separate graphs andplot the percentage polygons on one graph.
c. Form cumulative percentage distributions and plotcumulative percentage polygons on one graph.
d. Which manufacturer has bulbs with a longer life-Manufacturer A or Manufacturer B? Explain.
2.27 The following data (contained in the file Fnll|fF)represent the amount of soft drink in a sample of 50 2-literbottles:
68483 t859893922939972
1 ,016
Manufacturer A
697835860899924943977
I ,041
Manufacturer B
819 836 888 897907 912 918 942952 959 962 986994 1,004 1,005 1,007
1,016 1,018 t ,020 1,0221,038 t ,072 1,077 1,0771 , 0 9 6 1 , 1 0 0 1 , 1 1 3 1 , 1 1 31 , 1 5 3 1 , 1 5 4 1 , 1 7 4 1 , 1 8 8
2.065 2.057
2.025 2.029
2.012 2.012
1.997 1.992
1.97 | 1.969
r.947 1.941
2.052 2.044
2.023 2.020
2.0t2 2.010
1.994 t .986
1.966 t .967
1.94r r .938
903943992
,015,034,082, 1 l 6?10
720848868905926946984
t,052
773 821852 852870 876909 91 I926 938954 971
1,005 1,0141,080 1,093
2.109 2.086
2.036 2.038
2.015 2 .014
2.00s 2.003
r . 9 8 4 l . 9 8 1
1.963 1.957
1.908 1 .894
2.066 2.075
2.031 2.029
2.013 2.014
1.999 1.996
1.973 1 .975
1 . 9 5 1 1 . 9 5 1
a. Construct a frequency distribution and a percentagedistribution.
b. Construct a histogram and a percentage polygon.c. Form a cumulative percentage distribution and plot a
cumulative percentage polygon.d. On the basis of the results of (a) through (c), does the
amount of soft drink filled in the bottles concenrrarearound specific values?
2.4 CRO55 TABULATIONSThe study of patterns that may exist between two or more categorical variables is common inbusiness. These patterns are explained by cross-tabulating the data. You can present cr6$gtabulations in tabular form (contingency tables) or graphical form (side-by-side charts).
The Cont ingency Table
A contingency table presents the results of two categorical variablcs. The joint responscs areclassified so that thc categories ofone variablc are located in the rows and the categories oftheother variable are locate-cl in the columns. The values located at the intcrsections of the rows andcolumns arc called cells. Depending on the type of contingency table constnrcted, the cells fbrcirch row column corlbination contain the frequency, thc percentage of the overall total, thepcfcentagc ofthe row total. or the percentagc ofthe colunrn total.
Suppose that in the Using Stat is t ics sccnul io . you walr t to exrrn ine whcther there is anypattern or relationship bctween the level of risk and the objective of the mr.rtual fund (grorvthversus value). Table 2.14 sumrrarizes this infbrrration for all 83ll mutual funds.
RISK LEVEL
OBJECTIVE Average TotalHigh
TABLE 2 .14Cont ingency TableD isp lay ing FundO h i o n t i . r a r n r l
Fund R iskGrowthValueTotal
332l4
346
t ) z
1 1 3245
l 6231) A - 1
4803 5 8838
See Section E2.10 to createthis.
You construct th is cont ingency table by ta l ly ing thc - jo int responses for each of the 838r.t.tutual funds with respect to objective and risk into onc of the six possible cells in the table.The first firnd listed in the f[!!f@!f[! filc is classified as a growth fr"rnd rvith an avcragerisk. Thr"rs, you tally this joint response into the cell that is the interscction of the first row andsecotrd column. Thc rcmain ing 837 jo int responses arc recorded in a s i rn i lar manner. Each cel lcontains the frequency fbr the row column cornbination.
In order to further cxplore any possible pattern or relationship between objective and fundrisk. you can coustruct contingency tables based on perccntages. You first collvert these resultsinto percerrtagcs bascd on the fbllowing three totals:
l . The overal l to ta l ( i .c . , the 83t l r lu tual funds)2. The row tota ls ( i .e . , 480 growth funds and 358 value f i rnds)3. The colurnn tota ls ( i .e . , 346 h igh. 245 average, and 247 low)
Tables 2.15, 2.1 6, and 2. I 7 sumrnar ize these percentagcs.
RISK LEVET
OBJECTIVE High Average Total
TABLE 2 .15C n n t i n a o n r r r T r h l o
Displaying FundObject ive and Fund Risk,R e c o r { n n P o r r a n i r n a
of Overall Total
I . 9 12 7 . 5 129.48
51.28A 1 1 1
GrowthValueTotal
39.621 . 6 7
4 t . 2 9
r 5 .7513 .4829.23
RISK LEVEL
Low
r00.00
TotalOBJECTIVE High Average
TABLE 2 .1 6Cont ingency TableDisplay ing FundObjective and Fund Risk,R a c o a ] ^ ^ P o r r a n i : a o
of Row Total
GrowthValueTotal
69.173.9 r
29.24
21.503 1 . 5 641.29
3 .3364.5329.41
100.00100.00100.00
56 CHAPTER TWO Presentins Data in Tables and Charts
RISK LEVEL
OBJECTIVE High Average Low Total
T A B L E 2 . 1 7Contingency TableDisplaying FundObjective and Fund Risk,Based on Percentage ofColumn Total
GrowthValueTotal
95.954.05
100.00
53.8846.12
100.00
6.4893.52
100.00
57.2842.72
r00.00
Table 2.15 shows that 41.29% of the mutual funds sampled are high isk,57.28oh are growthfunds, and 39 .62% are high-risk funds that are growth funds. Table 2. 16 shows that 69 .l7oh ofthe growth funds are high risk and,3.33%o are low risk. Table 2.17 shows that95.95oh of the high-risk funds and only 6.48% of the low-risk funds are growth funds. The tables reveal that growthfunds are more likely to be high risk, while value funds are more likely to be low risk.
The Side-by-Side Bar ChartA useful way to visually display the results of cross-classification data is by constructing aside-by-side bar chart. Figure 2.14, which uses the data from Table2.l4, is a Microsoft Excelside-by-side bar chart that compares the three fund risk levels, based on their objectives. Anexamination of Figure 2.14 reveals results consistent with those of Tables 2.15.2.16, and2.l7:Growth funds are more likely to be high risk, while value funds are more likely to be low risk.
FIGURE 2.14Microsoft Excel side-by-side bar chart for fundobiect ive and r isk
See Section E2.11 to createthis.
Learning the Basics
2.28 The following datato two questions asked in
M F
C M
M M
A A
Sldc.gy€d. Chdtor Fl'ld ObJ..ily. ild Rbt
ffiGender:
represent the responsesa survey of40 college
students majoring in business-What is your gender? (: M; female : F) and What is your major? (accounting =
computer information systems : C; marketing: M):
F
1F
C
F
vF
A
F
A
M
C
M
A
F
C
F
IM
A
M
C
M
A
M
M
M
A
M
A
M
A
M
A
M
C
M
qM
A
M
A
F
M
M
C
M
C
M
A
M
C
F F
C A
M F
M C
M F
A C
F M
A AMajor:
z . a . u l u l b r a u u r d l l u l l s
SHIFTTally the data into a contingency table where the tworows represent the gender categories and the threecolumns represent the academic-major categories.Form contingency tables based on percentages ofall 40student responses, based on row percentages and based0n column percentages.Using the results from (a), construct a side-by-side barchart ofgender based on student major.
2.29 Given the following contingency table,construct a side-by-side bar chart comparing Iand B for each of the three-column catesories:
| 2 3 Total
40 10040 200
the Concepts
2.30 The results of a study made as part of ayield improvement effort at a semiconductormanufacturing facility provided defect data for asample of 450 wafers. The following table pre-sents a summary of the responses to two ques-
Was a particle found on the die that produced theand Is the wafer sood or bad?
CONDITION OF DIE
No Particles Particles Totals
a. Construct contingency tables based on total percent-ages, row percentages, and column percentages.
b. Which type of percentage-row, column, or total-doyou think is most informative for these data? Explain.
c. What conclusions concerning the pattern of noncon-forming laboratory tests can the laboratory directorreach?
2.32 A sample of 500 shoppers was selected ina large metropolitan area to determine variousinformation concerning consumer behavror.
Among the questions asked was "Do you enjoy shoppingfor clothing?" The results are summarized in the followingcross-classification table :
GENDER
EN'OY SHOPPINGFOR CLOTHING Male Female
136 224 360104 36 140240 260 500
a. Construct contingency tables based on total percent-ages, row percentages, and column percentages.
b. Construct a side-by-side bar chart ofenjoying shoppingfor clothing based on gender.
c. What conclusions do you draw from these analyses?
2.33 As more Americans use cell phones, they questionwhere it is okay to talk on cell phones. The following is atable of results, in percentages, for 2000 and 2006(extracted from W. Koch, "Business Put a Lid onChatterboxes : U SA Tbday, February 7, 2006, p. 3A):
OK TO TALK ON A CEILPHONE IN A RESTAURANT 2000 2006
YesNoTotal
a. Construct a side-by-side bar chart.b. Discuss the changes in attitude concerning the use of
cell phones in restaurants between 2000 and 2006.
LAB TESTSPERFORMED Day Evening Total
Nonconforming 16Conforming 654Total 670
24306330
40960
1,000
4080
2080
Total
YesNoTotal
320 14 33480 36 116
400 50 450
: Extracted from S. W Hall, ' Analysis of Defectivity ofWaJbrs by Contingency Table," Prcceedings Institute
Sciences, Vol. I (1994), pp. 177 183
Construct contingency tables based on total percent-ages, row percentages, and column percentages.Construct a side-by-side bar chart of quality of wafersbased on condition of die.What conclusions do you draw from these analyses?
2.31 Each day at a large hospital, several hun-dred laboratory tests are performed. The rate alwhich these tests are done improperly (and there-
fore need to be redone) seems steady, at about 4oh. In aneffort to get to the root cause of these nonconformances(hat is, tests that need to be redone), the director ofthe labdecided to keep records over a period ofone week. The lab-'oratory tests were subdivided by the shift of workers whoperformed the lab tests. The results are as follows:
YEAR
2 l79
100
3 l69
100
58 CHAPTER TWO Prescnting Data in Tables and Charts
2.34 An exper iment was conducted by James Choi ,David Labson, and Brigitte Madrian to study the choicesrnade in fund selection. When presented with four S&P500 index funds that were identical except for their fees,undergraduate and MBA students chose the funds as fol-lows (in percentages):
Student Group
Fund Undergraduate MBA
Lowest-cost fundSecond-lowest-cost fundThird-lowest-cost f undHighest-cost fund
Sourc'e: E.rtructed.firtnr.l. Choi, D. Luibson, und B. Mudriun. WhyDocs the Law of Onc Practice Fai l ' l An Experinrent on MutualFunds rln,u.sn n.ya le. ed u[itc u Ity/jj c 8 3[bes.ptlJ.
a. Construct a side-by-side bar chart for the two student
groups.
b. What do these results tell you about the differencesbetween undergraduate and MBA students in their abil-ity to choose S&P 500 index funds'l
2.35 Where people turn to for news is different for vari-ous age groups. A study indicated where different agegroups pr irnar i ly get their news.
Age Group
Under 36l 9a a3 l
1 127
1940z)
1 8
t0773755295
1 3 31211 0 910176
Local TVNat ionalTVRadioLocal newspaperlnternet
1 1 9t02977983
a. Construct a side-by-side bar chart for the three age groups.b. What differences are there in the age groups?
2.5 SCATTER PLOTS AND TIME.SERIES PLOTSWhen analyzing a single numerical variable such as the cost of a restaurant meal or the three-year annualized return, you use histograms, polygons, and cumulative percentage polygons,
developed in Sect ion 2.3. This sect ion d iscusses scat ter p lots and t ime ser ies p lots , which areused when you have two numerical variables.
The Scatter Plot
You use a scatter plot to examine possible relationships between two numerical variables.For each observation, you plot one variable on the X axis and the other variable on the verti-cal ) 'axis. For example, a marketing analyst could study the effectiveness of advertising bycomparing weekly sales volumes and weekly advertising expenditures. Or a human resources
director interested in the salary structure of the company could compare the employees'years of experience with their current salaries.
To demonstrate a scatter plot, you can examine the relationship between the cost of dif-ferent items in various cit ies (extracted from K. Spors, "Keeping Up with . . . Yourself," I/reWull Street Journal, April I I , 2005, p. R4). Table 2.1 8 provides the cost of a fast-food ham-burger meal and the cost of two rnovie tickets in l0 cit ies around the world. The data fi le(![I!![[fr!f!! contains the complete data set.
City Hamburger Movie TicketsTABLE 2 .18
Cost of a Fast-FoodHamburge r Mea l andCost of Two MovieTickets in
'10 Ci t ies
TokyoLondonNew YorkSydneyChicagoSan FranciscoBostonAtlantaTorontoRio de Janeiro
s .997.625 . 7 54.454.995.294.393.704.622.99
32.6628.4120.0020.7118.0019 .5018.0016.0018.059.90
FIGURE 2.15
Microsoft Excel scatterplot of the cost of afast-food hamburgermeal and the cost oftwo movie tickets
See Section E2.1 2 to createthis.
TABLE 2 .19
Movie Attendance,in Bil l ions, from 1999to 2005
z.): Scatter Dlagrams and llme-senes Plots
For each city, you plot the cost of a fast-food hamburger meal on the X axis, and the cost oftwo movie tickets on the )zaxis. Figure 2.15 presents a Microsoft Excel scatter plot for thesetwo variables.
Scd'| Pkt ot h. Cott of I Fmt-F@d Hrr$urgor M.al rr|d lh. Cod of Two Movi. Tick.lr
25.00
20.00h
I
,9
16.00
10.00
5.00
0.@
Although there is some variation, there appears to be a clearly increasing (positive) rela-tionship between the cost of a fast-food hamburger meal and the cost of two movie tickets. Inother words, cities in which the cost of a fast-food hamburger meal is low seem to also have alow cost of two movie tickets. Other pairs of variables may have a decreasing (negative) rela-tionship in which one variable decreases as the other increases. The scatter plot will be studiedagain in Chapter 3, when the coefficient ofcorrelation and the covariance are presented, and inChapter 13, when regression analysis is developed.
The Time-Series PlotA time-series plot is used to study patterns in the values of a numerical variable over time.Each value is plotted as a point in two dimensions with the time period on the horizontalXaxisand the variable ofinterest on the )'axis.
To demonstrate a time-series plot, you can examine the yearly movie attendance, in bil-lions, from 1999 to 2005 (extracted from C. Passy, "Good Night and Good Luck," Palm BeachPosl,February5,2006,p.1J).Table2. l9presentsthedatafortheyearlymovieattendance(seethe file @!ft$!). Figure 2.16 is a time-series plot of the movie attendance (in billions) from1999 to 2005. You can see that although movie attendance increased from 1999 to 2002, it hasdeclined since then. Attendance in 2005 was below attendance in 1999.
Year Attendance
1999200020012002200320042005
t .47t.42t .491.63t . 5 71 . 5 3t . 4 l
60 CHAPTER TWO Presentins Data in Tables and Charts
FIGURE 2.16
Microsoft Exceltime-series olot of movieattendance from 1999to 2OO5
See Section E2.13 to createthis.
Tlns€sbs Plot of toylo AttondaEo. l999.zn5
Learning the Basics
2.36 The following is a set of data from aple of r : I I i tems:
Car Owner Government
x 7 5 8 3 6 1 0 1 2 4
Y 2 t 1 5 2 4 9 1 8 3 0 3 6 1 2
Construct a scatter plot.Is there a relationship between X and Y!
2.37 The following is a series(in millions of dollars) over an(1996 to 2006):
1997 t998 1999 2000 2001 2002 2003
2004 Honda Civic Hybrid2002 Ford, Explorer2005 Toyota Camry2003 Toyota Corolla2005 Toyota Prius
48.816 .823.732.837.3
47.618.328.533 .156.0
a,b.
Sales 13.0 17.0 19.0 20.0 20.5 20.5 20.5 20.0 19.0 17.0 13.0
a. Construct a time-series plot.b. Does there appear to be any change in real annual sales
over time? Explain.
Applying the Concepts
2.38 There are several methods for calculatingfuel economy. The following table (contained inthe file @l!!f@) indicates the mileage as cal-
culated by owners and by current government standards:
Car Owner Government
a. Construct a scatter plot with owner mileage on the Xaxisand current government standards mileage on the I axis.
b. Does there appear to be a relationship between ownerand current government standards mileage? If so, is therelationship positive or negative?
2.39 The file !@@ contains data on the calories andtotal fat, in grams per serving, for a sample of 20 chickensandwiches from fast-food chains:
Sandwich Calories Fat
9 l s 1 8
27 45 54
Explain.
of annual salesI 1-year period
2004 2005 2006Year 1996
Wendy's Ultimate Chicken GrillBaja Fresh Original Baja Taco
with Charbroiled ChickenBurger King Smoky BBQ Fire
Grilled Chicken BaguetteQuiznos Sub Honey Bourbon
Chicken on wheat breadMcDonald's Chicken McGrillBlimpie Grilled Chicken Hot SubSubway Oven Roasted Chicken BreastBlimpie Buffalo Chicken Hot SubWendy's Spicy Chicken Fillet
360
370
380
2005 Ford F-1502005 Chevrolet Silverado2002 Honda Accord LX2002 Honda Civic
14.3,1 5 . 027.827.9
16.8r7.826.234.2
400400470470s00510
(
5l6202024l9
Calories Fat
Bell Ranchero Chicken Soft Tacotacos)
Oven Roasted Chicken Tender
Buffalo Chicken SubKing Chicken WhopperPainArizona ChickenMarket Rotisserie Chicken
SoftTacos with Chickentacos) 660 29
Chicken with Tomato,and Mozzarella
2.5: Scatter Diagrams and Time-Series Plots 6l
2.42 The U.S. Bureau of Labor Statistics compiles data ona wide variety of workforce issues. The following table(contained in the file @ffiflfs) gives the monthly sea-sonally adjusted civilian unemployment rate for the UnitedStates from 2000 to 2005:
Month 2000 2001 2002 2003 2004 2005
540 30
5s0 23550 30570 25580 19
January 4.0 4.2 5.7 5.8February 4.1 4.2 5.7 5.9March 4.0 4.3 5.'7 5.9Apri l 3.8 4.4 5.9 6.0May 4.0 4.3 5.8 6.1June 4.0 4.5 5.8 6.3Jufy 4.0 4.6 5.8 6.2August 4.1 4.9 5;7 6.1September 3.9 5.0 5;7 6.1October 3.9 5.3 5.7 6.0November 3.9 5.5 5.9 5.9December 3.9 5.7 6.0 5.7
BreadTuscan Chicken on& Onion Focaccia
Extractedfrom "Fastfood: Adding Hettlth to the Menu,"Reports, September 2004, pp. 28-3L
a scatter plot with calories on the X axis andfat on the I'axis.conclusions can you reach about the relationship
the calories and total fat in chicken sandwiches?
Iege basketball is big business, with coachesrevenues, and expenses in millions of dollars.file EE!$E!$EIEE contains the coaches'
and revenue for collese basketball at selectedin a recent year (extracted from R. Adams, "Pay
" The Wall Street Journal. March 1l-12.P l , P8) .
a scatter plot with coaches' salarieson the Xandrevenue on the Iaxis.conclusions can you reach about the relationship
the coach's salary and the revenue?would expect a school with a higher revenue to have
coach's salary? Is this borne out by the data?
f,ootball players trying out for the NFL areWonderlic standardized intellisence test. The data
contains the averase Wonderlic scores ofplayers trying out for the NFL and the graduationfootball players at selected schools (extracted from
"The NFLs Smartest Team." The Wall StreetSeptember 30, 2005, pp.Wl, Wl0).
a scatter plot with average Wonderlic score onaxis and graduation rate on the Iaxis.conclusions can you reach about the relationship
graduation
29640
s.7 5 .25 .6 5 .45 . 7 5 . 15 . 5 5 . 15 .6 5 . r5 .6 5 .05 .5 5 .05.4 4.95 .4 5 .15.4 4.95.4 5.05.4 4.9
Bread Company ChargrilledPesto Panini
Bakery Caf6 Chicken Pomodori
'720 30
740 30
910 40
950 56Source: U.S. Bureau of Labor Statistics, www,bls.gov, Sept. 19,2006.
a. Construct a time-series plot of the U.S. unemploymentrate.
b. Does there appear to be any pattern?
2,43 In 2005, five million people in the United Statessubscribed to online dating services. The subscribers spent,on average, $99 for the services provided, thus generatingnearly $500 million for the online dating companies. Thefollowing table (contained in the file Ellfrfl$ gives thenumber of subscribers (in millions) to U.S. dating Websites from 2000 to 2005:
YearNumber of Subscribers
(millions)
200020012002200320042005
0.7t . 62.8+ . J
4.75 .0
Source: Extracted.from "Making an E-Match," National Geographic,February,2006, p. 128.
a. Construct a time-series plot.b. Do you think the number of subscribers is increasing or
decreasing? Do you think the rate of growth is increas-ing or decreasing? Explain.
2.44 The following table contained in the file
@(extracted from "Price of Sicknessthe average Wonderlic score and
Rising," LISA Tixluy, April 7, 2006. p. Al), shows the aver-age price ofprescription drugs from 2000 to 2004:
Year Price ($)
Year
2000200 I200220032004
l 995t996I O O 7
I 998I 9992000200 r20022003200420052006
7088 6 193r939
1,03 I893735759
I , 0 1 3622
7 t 5865
465 0
5 560o+
a. Corrstruct a tirne-series plot for the average price of pre-
scription drugs fionr 2000 to 2004.b. What pattcm, if any, is present in the data'lc. If you hacl to rnake a prediction of t lre alerage price of
prcscription drugs in 2005. what would you predict'?
2.45 The following data, contained in the fi le [!!f,[!,provide the nurnber of rnergers and acquisit ions made dur-ing January I through.lanuary I 1 ofeach year from 1995 to2006 (extracted fiom "Back of the Envelope." The l ' , lcw'YorkTirna.s. January 13.2006, p. C7):
FIGURE 2 .17
" lmproper " d rsp lay o fAustral ian wine exportsto the United States,i n m i l l i o n s o f g a l l o n s
.\orrrt t': Atluptt'd /ittn S.ll u ttt' r'.ton. " Li q u i I Gol d. 1 tt.s| ru I iu tt.s .1 n' (. hu ngi ngtltL, llltrltl ol Ll'ine. Et,en thL'I:n'ttt h Sctnt (irutLf uL" Ti'ne,. \ i t n t , n t h t ' r 12 , 1999 . p .68 .
a. Construct a time-series plot.b. What pattern, if any, is present in the data'lc . I f you had to make a predict ion of the mergers and
acquisit ions rnade during .lanuary I through January I l,2007, what would you predict'/
2.6 MISUSING GRAPHS AND ETHICAL ISSUESCiood graphical displays reveal what the data convey. Unfortunately, many graphs presented-inne\ /spapers and magazines as well as graphs that can be developed using the Chart Wizard ofMicrosof t E,xcel e i ther are incorrect , mis leading, or so unnccessar i ly compl icated that theyshould nevcr be used. To i l lustrate the rnisuse of graphs, the graph presented in Figure 2.17 issirnilar to one that was printed in Time rragazine as part of an article on increasing exports oflvine frorr Australia to the United States.
We're dr ink ing more.Austra l ian wine exports to the U.S.i n m i l l i o n s o f g a l l o n s
z - z J;&, w
1989
€?ffi1992 1995
In Figure 2.1l . the wineglass icon represent ing the 6.77 rni l l ion gal lons for 1997 does notappear to be almost twice the size of the wineglass icon representing the 3.67 million gallonsfbr 1995, nordoes the wineglass icon represent ing the2.25 mil l ion gal lons for 1992 appeart0bc twice t l rc s ize of the wineglass icon represent i r rg the 1.04 mi l l ion gal lons for 1989. Par tthe reason for this is that the three-dirnensional wineglass icon is used to represent the twodimensions of exports and time. Although the wineglass presentation may catch the eye, thedata should instead be presented in a surrmary table or a tinre-series plot.
In addition to the type of distortion created by the wineglass icons in the Time naggraph displayed in Figure 2.17, improper use of the vertical and horizontal axes leads to dtions. Figure 2.18 presents another graph used in the same Time magazine article.
FIGURE 2.18
"lmproper" display ofamount of land plantedwith grapes for the wineindustry
Source: Adaptedfrom S.Watterson, " Liquid Gold-Australians Are Changing theWorld of Wne. Even the FrenchSeem Grateful," Time,November 22, 1999, pp. 68-69.
2.19
of market sharesoft drinks in 1999
:AdaptedfromAnne B.and Sam Ward, "Coke
Has Most Fizz," USAMay 10. 2000. p. l B.
...they're growing more...Amount of land planted with grapes for the wine industry
1979-1980172,075
1949-1950135,326acres
1970
2.6: Misusing Graphs and Ethical Issues 63
1997-1998243,644acres
989-1990146,204acres
,.di$,
150,300acres
There are several problems in this graph. First, there is'no zero point on the vertical axis.Second, the acreage of 135,326 for 1949 to 1950 is plotted above the acreage of 150,300 for1969 to 1970. Thir4 it is not obvious that the difference between 1979 to 1980 and 1997 to1998 (11,569 acres) is approximately 3.5 times the difference between 1979-1980 and1969*1970 (21,775 acres). Fourth, there are no scale values on the horizontal axis. Years areplotted next to the acreage totals, not on the horizontal axis. Fifth, the values for the timedimension are not properly spaced along the horizontal axis. The value for 1979-1980 is muchcloser to 1989-1990 than it is to 1969-1970.
Other types of eye-catching displays that you typically see in magazines and newspapersoften include information that is not necessary and just adds excessive clutter. Figure 2.19 rcp-resents one such display. The graph in Figure 2.19 shows those products with the largest mar-ket share for soft drinks ln 1999. The graph suffers from too much clutter, although it isdesigned to show the differences in market share among the soft drinks. The display of the fizzfor each soft drink takes up too much of the graph relative to the data. The same informationcould have been conveyed with abar chart or pie chart.
Goke still has most fizzCoke Classic Carbonated soft drinks with the biggest
20% share of the $58 bil l ion market last vear:
Pepsi-Cola14%
Mounta inDew Sprite7o/o 7o/o
Some guidelines for developing good graphs are as follows:
The graph should not distort the data.The graph should not contain u+necessary adornmentschartjunk).
DietCoke
9o/o Dr Pepper6%
I
I (sometimes referred to as
r Any two-dimensional graph should contain a scale for each axis.r The scale on the vertical axis should begin at zero.r All axes should be properly labeled.. The graph should contain a title.. The simplest possible graph should be used for a given set of data.
Often these guidelines are unknowingly violated by individuals unaware of how to constructappropriate graphs. However, ethical issues arise when these guidelines are purposely violatedin an effort to mislead the reader.
Microsoft Excel GraphsUnfortunately, to some extent, Microsoft Excel encourages you to violate these guidelineswhen you use the Chart feature to create graphs. Microsoft Excel offers you many types ofunusual charts, such as doughnut, radar, surface, bubble, cone, and pyramid charts, thatobscure the data for most analyses. Even for the simple graphs discussed in this chapter,Microsoft Excel creates graphs with unnecessary adornments or other examples of poor style.Fortunately, for the types of graphs discussed in this text, you will find instructions in ExcelCompanion sections that correct the style errors that Excel makes.
Oddly, many of the sample graphs used in marketing or educational materials by Microsoftfor Microsoft Excel 2007 , feature "three-dimensional" charts, such as the 3-D pie chart shownin Figure 2.20. Although you may see many examples of such "3-D" charts in Microsoft-supplied materials, you should avoid creating this type of chart as the prospective of the faked thirddimension serves only to distort the relative proportions of each category shown on the chart.
Totnl Sales By Calegory
a Oordcdryo
a 8frrkt
. Spor{n$
d g@tr
r G @ t
FIGURE 2.20
"Poor-style" 3-D piechart
Applying the Concepts
2.46 (Student Project) Bring to class a chart from anewspaper or magazine that you believe to be a poorlydrawn representation of a numerical variable. Be preparedto submit the chart to the instructor with comments as towhy you believe it is inappropriate. Do you believe that theintent of the chart is to purposely mislead the reader? Also,be prepared to present and comment on this in class.
2.47 (Student Project) Bring to class a chart from anewspaper or magazine that you believe to be a poorly
drawn representation of a categorical variable. Bepared to submit the chart to the instructor withas to why you consider it inappropriate. Do you belithat the intent of the chart is to purposely misleadreader? Also, be prepared to present and comment onin class.
2.48 (Student Project) Bring to class a chart fromnewspaper or magazine that you believe to containmany unnecessary adornments (i.e., chartjunk) thatcloud the message given by the data. Be prepared tothe chart to the instructor with comments about why
think it is inappropriate. Also, be prepared to present andcomment on this in class.
2.49 The following visual display contains an overem-belfished chart similar to one that appeared in USA ktclav,dealing with the number of deaths from lightning strikes inthe United States:
Lightning strike fatalit ies decline
Number of deaths f rom l ightn ing st r ikes
1982-91
a. Describe at least one good feature ofthis visual display.b. Describe at least one bad feature ofthis visual display.c. Redraw the graph, using the guidel ines given on
pages 63 and 64.
2.50 The following visual display is similar to one thatappeared in USA ktday concerning the relative size ofpolice departments in major U.S. cities:
Highest police-resident ratio
Of the USAs 50 largest pol ice forces, these cit ies have thehighest number of ful l- t ime off icers per 10,000 residents:
67
a. Indicate a feature of this chart that violates the princi-ples ofgood graphs.
b. Set up an alternative graph for the data provided in thisfigure.
2.51 The following visual display concerning where theUni ted States gets i ts e lect r ic i ty is s imi lar to one thatappeared in USA Today:
Coal Nuclear Gas Hvdro Oi l Wind OtherPower
a. Describe at least one good feature of this visual display.b. Describe at least one bad feature of this visual display.c. Redraw the graph, us ing the guidel ines g iven on
pages 63 and 64.
2.52 An article in The l{ew York Times (Donna Rosato,"Worried About the Numbers? How About the Charts?"The Neu, York Times, September 15, 2002, p. 87) reportedon research done on annual reports of corporations byProfessor Deanna Oxender Burgess of Florida Gulf CoastUniversity. Professor Burgess found that even slight distor-tions in a chart changed readers'perception of the informa-tion. The article displayed sales information from theannual report of Zale Corporation and showed how resultswere exaggerated.
Go to the World Wide Web or the library and study themost recent annual report of a selected corporation. Find atleast one chart in the report that you think needs improve-ment and develop an improved version of the chart.Explain why you believe the improved chart is better thanthe one included in the annual report.
2.53 Figures 2.1,2.3, and 2.6 consist of a bar char t , a p ie
chart, and a Pareto diagram for how respondents wouldspend $ 1.000.
.!?4q
t+el€s46
ffff
Coal is top electricity sourceWhere USA gets its electricity:
1972-81
Washing ton NewYork Newark ,N.J . Ch icago Ph i lade lph ia S t .Lou is Ba l t imore
CHAPTER TWO presenting Data in Tables and Charts
a. Use the Chart Wizard of Microsoft Excel to construct adoughnut chart, a cone chart, and a pyramid chart forhow respondents spend $1,000.
b. Which graphs do you prefer_the bar chart, pre chart,and Pareto diagram or the doughnut chart, cone chart.and pyramid chart? Explain.
2.54 Figures 2.2 and 2.4 consist of a bar chart and a piechart for the risk level for the mutual fund data.
a. Use the Chart Wizard of Microsoft Excel to develop adoughnut chart, a cone chart, and a pyramid chart foriherisk level of the mutual funds.
b. Which graphs do you prefer-the bar chart and pie chartor the doughnut chart, cone chart, and pyramid chart?Exolain.
As you can see in Table 2.20, this chapter discusses datapresentation. You have used various tables and charts todraw conclusions about spending an extra $ 1,000, about thecost of restaurant meals in a city and its suburbs, and aboutthe set of mutual funds that were first introduced in the
Using Statistics scenario at the beginning of the chapter.Now that you have studied tables and chaits, in Chapter 3you will learn about a variety of numerical descriptivemeasures useful for data analysis and interpretation.
TABLE 2.2ORoa.dmap for SelectingTables and Charts
Type of DataType ofAnalysis Numerical
Tab ul ating, or ganizing,and graphically presentingthe values of a variable
Graphically presenting therelationship between twovariables
Ordered array, stem-and-leafdisplay, frequency distribution,relative frequency distribution,percentage distribution,cumulative percentagedi stribution, histogram, polygon,cumulative percentage polygon(Sections 2.2 and 2.3)Scatter plot, time-seriesplot (Section 2.5)
Summary table, bar chart,pie chart, Pareto diagram(Section 2.1)
Contingency table,side-by-side bar chart(Section 2.4)
bar chart 33cei ls )5
chartjunk 63class boundaries 44class grouping 44class midpoint 44contingency table 55cross tabulations 54cumulative percentage distribution
47
cumulative percentage polygon (ogive)5 l
frequencydistribution 44histogram 48ogive (cumulative percentage polygon)
5 lordered array 4lPareto diagram 35Pareto principle 36percentage distribution 46
percentage polygon 50pie chart 34range 44relative frequency distributionscatter plot 58side-by-side bar chart 56stem-and-leafdisplay 4lsummary table 33time-series plot 59width of class interval 44
46
ffiffiffi
ffi
ffi
ffi
Checking Your Understanding2.55 How do histograms and polygons differ interms of construction and use?
2.56 Why would you construct a summarytable?
2.57 What are the advantages and/or disadvan-tages ofusing a bar chart, a pie chart, or a Paretodiagram?
2.58 Compare and contrast the bar chart for cate-gorical data with the histogram for numerical data.
2.59 What is the difference between a time-series nlot anda scatter plot?
2.60 Why is it said that the main feature of thePareto diagram is its ability to separate the "vitalfew" from the "trivial many"? Discuss.
2.61 What are the three different ways to breakdown the percentages in a contingency table?
Applying the Concepts2,62 The following data represent the breakdown of theprice of a new college textbook:
Revenue Categories Percentage
2.63 The following data represent the global market sharefor plasma TVs and LCD TVs in the fourth quarter of 2005:
Company
HitachiLGEPanasonicPhillipsSamsungOthers
PlasmaTvs (%)
LCDCompany TVs (%)
7.315.426.0I 3 .314.423.6
6.514.2I 1 . 613.6t4.639.5
LGEPhillips/MagnavoxSamsungSharpSonyOthers
PublisherManufacturing costsMarketing and promotionAdministrative costs and taxes
' After-tax profit
Employee salaries and benefitsOperationsPretax profit
Extracted from T. Lewin. "Ilhen Books Break the Bank."NewYorkTimes, September 16,2003, pp. Bl,84.
Using the four categories publisher, bookstore, author,and freight, construct a bar chart, a pie chart, and aPareto diagram.Using the four subcategories ofpublisher and three sub-categories of bookstore along with the author andfreight categories, construct a Pareto diagram.Based on the results of (a) and (b), what conclusions canyou reach concerning who gets the revenue from thesales of new college textbooks? Do any of these resultssurprise you? Explain.
Source : Extracted from Y. Kkageyama, " F lat- Panel TVs ProvingSavior of Japanese Electronics Makers," The Palm Beach Post,April 23, 2006, p. F3.
a. For plasma TVs and LCD TVs, separately construct abar chart, a pie chart, and a Pareto diagram.
b. Based on the results of (a), explain how the market shareof plasma TVs differs from the market share of LCDTVs.
2.64 The following data represent energy consumptionand renewable energy consumption in the United States in2005:
Energy Source o *RenewableEnergy Source oh*
Coal 23.0 EthanolHydroelectric power 2.'l GeothermalNatural gas 22.2 SolarNuclear electric power 8.1 WastePetroleum 40.5 WindRenewable fuels 3.3 Wood
*Percentages do not add to I 00% *Percentages do not add todue to rounding. 100% due to rounding.Source: Energy Information Source; Energy InformationAdministration, 2006. Administation, 2006.
a. For energy consumption and renewable energy con-sumption, separately construct a bar chart, a pie chart,and a Pareto diagram.
b. Based on the results of (a), what conclusions can youreach about energy consumption and renewable energyconsumption in the United States in 2005?
2.65 The data on page 68 represent proven conventionaloil reserves, in billions ofbarrels, subdivided by region andcountry:
64.832.315.410.07 . 1
^ ta ^L L . A
I 1 . 36.64.5
I 1 . 61 . 2
l 0 . l1 0 . 81 . 9
t 7 . l4 .5
55.5
Region and Country
Proven ConventionalReserves
(Billions of Barrels)
2.66 ln the af termath of the at tacks of September 11,2001, s tat is t ic ians at the Nat ional Center for Heal thStat is t ics became more concerned wi th thei r abi l i ty tot rack and c lass i fy v ic t ims of terror ism (E. Weinste in,"Tracking Terror's Rising Toll," The Wall Street Journal,January 25,2002, p. Al3). The following data representsdeaths due to terrorism on U.S. soil from 1990 to 200 I andalso the deaths in the United States in 2000 due to variouscauses:
YearDeaths Due to Terrorism
in the United States
Norlh AnrcricaMexicoU.S .Canada
Cenlrul anel South Americ'aVenezuelaBrazilOther Central and
South AmericaWestern Europe
NorwayBritainOther Western Europe
Alric'aLibyaNiger iaAlgeriaAngolaOther Africa
Middle EastSaudiArabiaIraqUnited Arab EmiratesKuwaitIran
QatarOmanOther Middle East
Far East qnd OceaniaChinaIndonesiaIndia
54.8
9s.2
t 7 . 2
74.9
28.32 1 . 84 .1
16.98 . 1
10.2
9 .55 . 02 .7
29.s22.s9.25.48 .3
259.2I t 2 . 597.894.089.713.25 . 5
n .7
24.05.04.7
10 .3
48.65.4
5 .0
I 9901 9 9 11992t9931994I 99519961997l 998r9992000200 I
683.6
44.0
0006l
t6920IaJ
02 ;717
CauseDeaths, inThousands
Other Far East and OceaniaEastern Europe and Former USSR 59.0
RussiaKazakhstanOther Eastern Europe and
Former USSR
Source: U.S. Departme nt tl Energ.r'.
Smoke and fireAccidental drowningAlcohol-induced deathsAlzheimer's diseaseAssault by firearmsAssault by non-firearmsAsthmaCancerStrokes and related diseasesEmphysemaDiabetesHeart diseaseFallsHIVlnf luenza and pneumoniaInjuries at workMotor vehicle accidentsSuicideDrug-related deaths
J . J
J - J
1 8 . 549.010.45 .74.4
5 5 1 . 8166.01 6 . 968.1
7 1 0 . 012.0t4 .467.05 . 3
4 1 . 828.31 5 . 9
a. Using the set of countries, construct a bar chart, a piechart, and a Pareto diagrarn.
b. Using the set of regions, construct a bar char t , a p iechart, and a Pareto diagram.
c. Which graphical method do you think is best to portraythese data'/
d. Based on the results of (a) and (b), what conclusions canyou make concerning the proven convent ional o i lreserves for the different countries and resions'l
Soru'r' a : Fede ruI Bu rcau of' C ri nt i nu I .ht.s ti t'c St u ti s t i c s, Nu t i onalCenter.fir Health Stutistics, Nutionul Highv,u.r TrunsportutionSu I e t-r A tl m i n i.s tru t i o n. D e p u r tm c nt o.f- D eJe n s e.
a. Construct a time-series plot of deaths due to teron U.S. soil. Is there any pattern to the deaths due toror ism on U.S. soi lbetween 1990 and 2001' l
b. For the deaths, in thousands, due to different causes,construct a bar chart, a pie chart, and a Pareto diagram.
c. Which graphical method do you think is best to portraythese data?
d. Based on the results of (c), what conclusions can youmake concerning the deaths in the United States in 2000due to various causes?
2,67 The owner of a restaurant serving Continental-styleentrdes is interested in studying patterns of patron demandfor the Friday-to-Sunday weekend time period. Records aremaintained that indicate the type of entr6e ordered. Thedata are as follows:
Type of Entr6e Number Served
a. For each ofthe two contingency tables, construct a con-tingency table ofrow percentages, column percentages,and total percentages.
b. Which type of percentage (row, column, or total) do youthink is most informative for each gender? for beefentr6e? Explain.
c. What conclusions concerning the pattern of dessertordering can the owner ofthe restaurant reach?
2.59 An article in The New York Times (W1lliamMcNulty and Hugh K. Truslow, "How It Looked Insidethe Booth," The NewYorkTimes, November 6,2002) pro-vided the following data on the method for recordingvotes in 1980, 2000, and2002, broken down by percent-age of counties in the United States using each methodand the percentage ofregistered voters using each method.The results are as follows:
BeefChickenDuckFishPastaShellfishVeal
r8710325
122631426
Percentage ofCounties UsingMethod 2000 20021980
a. Construct a percentage summary table for the types ofentr6es ordered.
b. Construct abar chart, a pie chart, and a Pareto diagramfor the types ofentr6es ordered.
c. Do you prefer a Pareto diagram or a pie chart for thesedata? Why?
d. What conclusions can the restaurant owner draw con-cerning demand for different types of entr6es?
2.68 Suppose that the owner of the restaurant inProblem 2.67 is also interested in studying the demandfor dessert during the same time period. She decided thattwo other variables, along with whether a dessert wasordered, are to be studied: the gender of the individualand whether a beef entr6e is ordered. The results are asfollows:
GENDER
DESSERT ORDERED Male Female Total
Punch cardsLever machinesPaper ballotsOptical scanElectronicMixed
Percentageof RegisteredVoters UsingMethod
1 8 . 536.740.70 .80.23 . 1
1 8 . 514.4I 1 . 941.59.34.4
2000
1 5 . 51 0 . 610.543.016.34 . 1
1980 2002
Punch cardsLever machinesPaper ballotsOptical scanElectronicMixed
1 1 -J 1 - l
42.910.52 .10.'7
12.0
31.417.41 . 5
30.812.26 .7
22.61 5 . 51 . 3
3 1 . 819.69 .3
YesNoTotal
96224-320
136464600
40240zgo
136464600
a. Construct separate pie charts for each year for the per-centage ofcounties and the percentage ofregistered vot-ers using the various methods.
b. Construct side-by-side bar charts, by year, for the per-centage ofcounties and the percentage ofregistered vot-ers using the various methods.
c. Which type of graphical display is more helpful indepicting the data? Explain.
d. What differences are there in the results for the countiesand the registered voters?
2.7O In summer 2000, a growing number of warrantyclaims on Firestone tires sold on Ford SUVs promptedFirestone and Ford to issue a maior recall. An analysis of
BEEF ENTREE
DESSERT ORDERED Yes No Total
YesNoTotal
7 ll l 6r87
65348413
warranty-claims data helped identify which models torecall. A breakdown of 2,504 warranty claims based on tiresize is given in the followine table:
Tire Size Warranty Claims
recent year, the company got 50 complaints concerningcarpet installation. The following data (contained in thefile @f!) represent the number of days betweenthe receipt of the complaint and the resolut ion of thecomplaint:
54 5 35 137 31 2t t52 2 123 81 14 21
t t t 9 t 2 6 1 1 0 1 1 0 2 9 6 1 3 5 9 4 3 t 2 6 5
t 2 4 1 6 5 3 2 2 9 2 8 2 9 2 6 2 5 l 1 4 1 3
1 3 1 0 s 2 1 4 s 2 3 0 2 2 3 6 2 6 2 0 2 3
33 68
a. Construct a frequency distribution and a percentagedistribution.
b. Construct a histogram and a percentage polygon.c. Form a cumulative percentage distribution and plot a
cumulative percentage polygon (ogive).d. On the basis of the results of (a) through (c), if you had
to tell the president of the company how long a customershould expect to wait to have a complaint resolved, whatwould you say? Explain.
2.72 Data concerning 58 of the best-sel l ing domes-tic beers in the United States are located in the file
[!@ls[!![s. The values for three variables areincluded: percentage alcohol, number of calories per 12ounces, and number of carbohydrates (in grams) per 12ounces.Source; Extracted Jrom www.Beerl00.com, March 31 , 2006.
a. Construct a histogram for each ofthe three variables.
b. Construct three scatter plots: percentage alcohol versuscalories, percentage alcohol versus carbohydrates, andcalories versus carbohydrates.
c. Discuss the information you learned from studying thegraphs in (a) and (b).
2.73 The data in the file l[!s@f[ are the per-capitaspending, in thousands ofdollars, for each state in 2004.a. Develop an ordered array.b. Construct a frequency distribution and a percentage
distribution.c. Plot a percentage histogram.d. What conclusions can you reach about the differences in
federal per-capita spending between the states?
2.74 The data in the file s!flfisfr are the yields for amoney market account, a one-year certificate of deposit(CD), and a five-year CD for 40 banks in south Florida,of December 20, 2005 (extracted from Bankrate.com,December 20. 2005).a. Construct a histogram for each ofthe three variables.b. Construct three scatter plots: money market accounl
versus one-year CD, money market account versus fiyear CD, and one-year CD versus five-year CD.
23575R153 1 1 0 5 0 R 1 530950R1523570R163 3 1 2 5 0 R 1 525570R I 6Others
? o 1 0
r37828 l5 85462
Source: Extracted.from ktbert L. Simison, "Ford Steps Up RecallWithotrt Firestone," The Wall Street Journal, August 14, 2000, p. A3.
The 2,030 warranty claims for the 23575R15 tires canbe categorized into ATX models and Wilderness models.The type of incident leading to a warranty claim, by modeltype, is summarized in the following table:
Inc ident
ATX ModelWarranty
Claims
WildernessWarranty
Claims
Tread separationBlowoutOther/unknownTotal
I 1 6 5
77422
1,864
594 l
66166
Source: Extracted-from Robert L. Simison, "Ford Steps Up RecallWithout Firestone," The Wall Street Journal, August 14, 2000, p. A3.
a. Construct a Pareto diagram for the number of warrantyclaims by tire size. What tire size accounts for most ofthe claims?
b. Construct a pie chart to display the percentage of thetotal number of warranty claims for the 23575R15 tiresthat come from the ATX model and Wilderness model.Interpret the chart.
c. Construct a Pareto diagram for the type of incident caus-ing the warranty claim for the AIX model. Does a cer-tain type of incident account for most of the claims?
d. Construct a Pareto diagram for the type of incident caus-ing the warranty claim for the Wilderness model. Does acertain type of incident account for most of the claims?
2.71 One of the major measures of the quality of serviceprovided by any organization is the speed with which theorganization responds to customer complaints. A largefamily-held department store selling furniture and flooring,including carpet, had undergone a major expansion in thepast several years. ln particular, the flooring departmenthad expanded from 2 installation crews to an installationsupervisor, a measurer, and 15 installation crews. During a
l
:
c. Discuss the information you learned from studying thegraphs in (a ) and (b ) .
2.75 The data in the file !!!fst represent the total com-pensation (in $millions) of CEOs of the 100 largest com-panies, by revenue (extracted from "Special Report:Executive Compensation," (JSA Today, April 10,2006,p p . 3 8 , 4 B ) .a. Construct a frequency distribution and a percentage dis-
tribution.b. Construct a histogram and a percentage polygon.c. Construct a cumulative percentage distribution and plot
a cumulative percentage polygon (ogive).d. Based on (a) through (c), what conclusions can you
reach concerning CEO compensation in 2005'/
2.76 Studies conducted by a manufacturer of "Boston"and "Vermont" asphalt shingles have shown productweight to be a major factor in customers' perception ofquality. Moreover, the weight represents the amount of rawmaterials being used and is therefore very important to thecompany from a cost standpoint. The last stage of theassembly line packages the shingles before the packagesare placed on wooden pallets. When a pallet is full (a palletfor most brands holds I 6 squares of shingles), it is weighed,and the measurement is recorded. The company expectspallets of its "Boston" brand-name shingles to weigh atleast 3,050 pounds but less than 3,260 pounds. For thecompany's "Vermont" brand-name shingles, pallets shouldweigh at least 3,600 pounds but less than 3,800. The datafile[lft[! contains the weights (in pounds) from a sam-ple of 368 pallets of "Boston" shingles and 330 pallets of"Vermont" shingles.a. Forthe "Boston" shingles, construct a frequency distrib-
ution and a percentage distribution having eight classintervals, using 3,015, 3,050, 3,085 ,3,120,3, I 55, 3,190,3225,3,260, and 3,295 as the class boundaries.
b. For the "Vermont" shingles, construct a frequency dis-tribution and a percentage distribution having sevenclass intervals, using 3,550, 3,600, 3,650, 3,700, 3,750,3,800, 3,850, and 3,900 as the class boundaries.
c. Construct histograms for the "Boston" shingles and forthe "Vermont" shingles.
d. Comment on the distribution of pallet weights for the"Boston" and "Vermont" shingles. Be sure to identifythe percentage of pallets that are underweight andoverweight.
2,77 Do marketing promotions, such as bobble-head give-aways, increase attendance at Major League Baseballgames? An article in Sport Marketing Quarterll; reportedon the effectiveness of marketing promotions (extracted
from T. C. Boyd and T. C. Krehbiel, 'An Analysis ofthe Effects of Specific Promotion Types on Attendanceat Major League Baseball Games," Mid-American Journal
ol Business, Vol. 21, 2006, pp. 21-32). The data file
@lllEEE includes the following variables for the KansasCity Royals during the 2002 baseball season:
GAME : Home games in the order they were playedATTENDANCE : Paid attendance for the gamePROMOTION 1 : If a promotion was held; 0 : if nopromotion was held
a. Construct a percentage histogram for the attendancevariable. Interpret the histogram.
b. Construct a percentage polygon for the attendance vari-able. Interpret the polygon.
c. Which graphical display do you prefer, the one in (a) or(b)? Explain.
d. Construct a graphical display containing two percent-age polygons for attendance-one for the 43 gameswith promotions and the second for the 37 gameswithout promotions. Compare the two attendancedistr ibut ions.
2.78 The data in the file[[@!!indicate fat and choles-terol information concerning popular protein foods (freshred meats, poultry, and fish).Sottrce: U.S. Depurtment of Agriculture.
For the data relating to the number of calories and theamount of cholesterol for the popular protein foods:a. Construct a frequency distribution and a percentage
distribution.b. Construct a histogram and a percentage polygon.c. Form a cumulative percentage distribution and plot a
cumulative percentage polygon.d. What conclusions can you draw from these analyses?
2.79 The data in the file l!s!f! represent the results ofthe American Community Survey, a sampling of house-holds taken in all states during the 2000 U.S. Census. Foreach of the variables average travel-to-work time in min-utes, percentage of homes with eight or more rooms,median household income, and percentage of mortgage-paying homeowners whose housing costs exceed 30% ofincome:a. Construct a frequency distribution and a percentage
distribution.b. Construct a histogram and a percentage polygon.c. Construct a cumulative percentage distribution and plot
a cumulative percentage polygon.
d. What conclusions about these four variables can youmake based on the results of (a) through (c)?
2.80 The economics of baseball has caused a great deal ofcontroversy, with owners arguing that they are losingmoney, players arguing that owners are making money, andfans complaining about how expensive it is to attend agame and watch games on cable television. In addition todata related to team statistics for the 2001 season, the file
EEEEEEE contains team-by-team statistics on ticket prices;
the fan cost index; regular-season gate receipts; local tele-v is ion, radio, and cable receipts; a l l o ther operat ing rev-
enue; player compensation and benefits; national and otherlocal expenses; and income from baseball operations. Forcach of these var iables.a. Construct a f requency d is t r ibut ion and a percentage
dist r ibut ion.b. Construct a histogram and a percentage polygon.
c. Construct a cumulative percentage distribution and plot
a cumulative percentage polygon.d. Construct a scatter plot to predict the number of wins on
the I axis from the player compensation and benefits onthe X axis . What conclus ions can you reach based onthis scatter plot?
e. What conclusions about these variables can you reachbased on the results of(a) through (d)?
2.81 In Sect ion 2.5 on page 59, a scat ter p lot of the re la-tionship between the cost of a fast-food hamburger mealand the cost of movie t ickets in 10 d i f ferent c i t ies wasconstructed. The data fil. EEEIIEEEIE also includesthe overal l cost index, the monthly rent for a two-bed-room apartment, the cost of a cup of coffee with service,the cost of dry cleaning a men's blazer, and the cost oftoothpaste.a. Construct six separate scatter plots. For each, use the
overall cost index as the I axis. Use the monthly rentfor a two-bedroom apartment, the costs of a cup of cof-fee with service, a fast-food hamburger meal, drycleaning a men's blazer, toothpaste, and movie ticketsas the X axis .
b. What conclusions can you reach about the relationshipofthe overail cost index to these six variables?
2.82 ln Problem 2.39 on page 60, us ing the data set
llftglffGl, you constructed a scatter plot of calories withthe total fat content ofchicken sandwiches.a. Construct a scatter plot of calories on the )'axis and car-
bohydrates on the X axis.b. Construct a scat ter p lot of ca lor ies on the ) 'ax is and
sodiurn on the X axis .c . Which var iable ( tota l fa t , carbohydrates, or sodium)
seems to be most closely related to calories? Explain.
2.83 The data fi le FfE contains the weekly averageprice of gasoline in the United States from March 1,2004,to March 6,2006. Prices are in dollars per gallon.
Source: U.S. DapurtnenI of Energ.r, www.eia.doe,gov.
a. Construct a time-series plot.b. What pattern, if any, is present in the data?
2.84 The data contained in the file [@f,[! represent theamount of soft drink fi l led in a sample of 50 consecutive2- l i ter bot t les. The resul ts are l is ted hor izonta l ly in theorder of beins fi l led:
2. 109 2.086 2.066 2.015 2.065 2.051 2.052 2.044 2.036 2.038
2.031 2.029 2.025 2.029 2.023 2.020 2.015 2.014 2.013 2.014
2.0t2 2.0t2 2.0t2 2.0t0 2.005 2.003 1.999 I .996 1.997 1.992
1.994 t .9861.984 1 .981 1 .973 1 .975 l .9 l l 1 .969 t .9661.967
1 . 9 6 3 1 . 9 5 7 1 . 9 5 t 1 . 9 5 1 t . 9 4 1 t . 9 4 1 1 . 9 4 1 1 . 9 3 8 1 . 9 0 8 I . 8 9 4
a. Construct a tirne-series plot for the amount of soft drinkon the )'axis and the bottle number (going consecutivelyfrom l to 50) on the Xaxis. l
b. What pattern, if any, is present in these data'lc. If you had to make a prediction of the amount of soft
drink fi l led in the next bottle, what would you predict?
d. Based on the results of (a) through (c), explain why it isimportant to construct a time-series plot and not just ahistogram, as was done in Problem 2.27 on page 54.
2.85 The S&P 500 Index tracks the overall movement ofthe stock rnarket by considering the stock prices of 500large corporations. The data fi le EE$EEEE[IE containsweekly data for this index as well as the weekly closingstock price for three companies during 2005. The variablesincluded are:
WEEK Week ending on date givenS&P -Weekly closing value for the S&P 500 Index "
SEARS -Weekly closing stock price for SearsTARGET -Weekly closing stock price for the TargetSARA LEE Weekly closing stock price for the Sara Lee
Source : E.rtractetl fi'onr finance.yahoo.com.
a. Construct a time-series plot for the weekly closing val-
ues of the S&P 500 Index, Sears, Target, and Sara Lee.
b. Explain any patterns present in the plots.
c. Write a short summary of your findings.
2.86 (Class Project) Let each student in the classto the question "Which carbonated soft drink do youprefer?" so that the teacher can tally the results into amary table.a. Convert the data to Dercentarles and construct a
diagram.b. Analyze the findings.
2.87 (Class Project) Let each student in the classcross-classified on the basis of gender (male, female)current employment status (yes, no) so that the teachertallv the results.a. Construct a table with either row or column pe
depending on which you think is more informative.b. What would you conclude from this study'?c. What other variables would you want to know
employment in order to enhance your findings?
Report Writing Exercises2.88 Referring to the results from Problem 2.76 onpageconcerning the weight of "Boston" and "Vermont" shi
write a report that evaluates whether the weight of the palletsof the two types of shingles are what the company expects.Be sure to incorporate tables and charts into the report.
2.89 Referring to the results from Problem 2.70 on page69 concerning the warranty claims on Firestone tires, writea report that evaluates warranty claims on Firestone tiressold on Ford SUVs. Be sure to incorporate tables andcharts into the report.
Team ProjectThe data file EEEE@@ contains information regard-ing nine variables from a sample of 838 mutual funds. Thevariables are:
Category-Type of stocks comprising the mutual fund(small cap, mid cap, large cap)Objective-Objective of stocks comprising the mutualfrrnd (growth or value)Assets-ln millions of dollarsFees-Sales charges (no or yes)Expense rat io-Rat io of expenses to net assets inpercentage2005 return-Twelve-month return in 2005Three-year return Annualized return, 2003-2005Five-year return-Annualized return, 200 1-2005Risk-Risk-of-loss factor of the mutual fund (low.average, or high)
2.90 For the expense ratio:I Construct a histogram.h Plot percentage polygons of the expense ratio for mutual
funds that have fees and mutual funds that do not havefees on the same graph.
c. What conclusions about the expense ratio can you reachbased on the results of(a) and (b)?
2.91 For the five-year annualized return from 2001 to2005:I Construct a histogram.h Plot percentage polygons of the five-year annualized
return from 2001 to 2005 for growth mutual funds andvalue mutual funds on the same graph.
Managing the Springville Herald 73
c. What conclusions about the five-year annualized returnfrom 2001 to 2005 can you reach based on the results of(a) and (bX
2.92 For the return in 2005:a. Construct a histogram.b. Plot percentage polygons of the return in 2005 for
growth mutual funds and value mutual funds on thesame graph.
c. What conclusions about the return in 2005 can youreach based on the results of(a) and (b)?
Student Survey Database2.93 Problem 1.27 on page 15 describes a survey of50 undergraduate students (see the fil"@![[!@!f!).For these data, construct all the appropriate tables andcharts and write a report summarizing your conclusions.
2.94 Problem 1.27 on page 15 describes a survey of50 undergraduate students (see the file@EEEE@[email protected]. Select a sample of 50 undergraduate students at your
school and conduct a similar survey for those students.b. For the data collected in (a), construct all the appropriate
tables and charts and write a report summarizing yourconclusions.
c. Compare the results of (b) to those of Problem 2.93.
2.95 Problem 1.28 on page 15 describes a survey of 50MBA students (see the file ![!@lffi). For these data,construct all appropriate tables and charts and write areport summ arizing your conclusions.
2.96 Problem 1.28 on page 15 describes a survey of 50MBA students (see the file![!@!f!).a. Select a sample of 50 MBA students in your MBA pro-
gram and conduct a similar survey for those students.b. For the data collected in (a), construct all the appropriate
tables and charts and write a report summarizing yourconclusions.
c. Comoare the results of (b) to those of Problem2.93.
tion. Identify the data that are important in describing thecustomer service problems. For each set of data you identiff,construct the graphical presentation you think is most appro-priate for the data and explain your choice. Also, suggestwhat other information concerning the different types oferrors would be useful to examine. Offer possible courses ofaction for either the task force or management to take thatwould support the goal of improving customer service.
T"n"ging the Springvi,lle Herald
Advertising fees are an important source of revenue for anylewspaper. In an attempt to boost these revenues and to min-imize costly errors, the management of the Herald has estab-lished a task force charged with improving customer serviceinthe advertising department. Open a Web browser and linkh www.prenhall.com/HeraldCase/Ad_Errors.htm (oropen the Ad_Errors.htm file in the Student CD-ROM'sHeraldCase folder) to review the task force's data collec-