statistics-17 by keller

99
CHAPTER 17 SIMPLE LINEAR REGRESSION AND CORRELATION SECTIONS 1 - 2 MULTIPLE CHOICE QUESTIONS In the following multiple-choice questions, please circle the correct answer. 1. The regression line = 3 + 2x has been fitted to the data points (4, 8), (2, 5), and (1, 2). The sum of the squared residuals will be: a. 7 b. 15 c. 8 d. 22 ANSWER: d 2. If an estimated regression line has a y-intercept of 10 and a slope of 4, then when x = 2 the actual value of y is: a. 18 601

Upload: cookiehacker

Post on 24-Dec-2015

59 views

Category:

Documents


10 download

DESCRIPTION

stat tb

TRANSCRIPT

Page 1: Statistics-17 by Keller

CHAPTER 17

SIMPLE LINEAR REGRESSIONAND CORRELATION

SECTIONS 1 - 2

MULTIPLE CHOICE QUESTIONS

In the following multiple-choice questions, please circle the correct answer.

1. The regression line = 3 + 2x has been fitted to the data points (4, 8), (2, 5), and (1, 2). The sum of the squared residuals will be:a. 7b. 15c. 8d. 22ANSWER: d

2. If an estimated regression line has a y-intercept of 10 and a slope of 4, then when x = 2 the actual value of y is:a. 18b. 15c. 14d. unknownANSWER: d

3. Given the least squares regression line = 5 –2x:a. the relationship between x and y is positiveb. the relationship between x and y is negative

601

Page 2: Statistics-17 by Keller

602 Chapter Seventeen

c. as x increases, so does yd. as x decreases, so does yANSWER: b

4. A regression analysis between weight (y in pounds) and height (x in inches) resulted in the following least squares line: = 120 + 5x. This implies that if the height is increased by 1 inch, the weight, on average, is expected to:a. increase by 1 poundb. decrease by 1 poundc. increase by 5 poundsd. increase by 24 poundsANSWER: c

5. A regression analysis between sales (in $1000) and advertising (in $100) resulted in the following least squares line: = 75 +6x. This implies that if advertising is $800, then the predicted amount of sales (in dollars) is:a. $4875b. $123,000c. $487,500d. $12,300ANSWER: b

6. A regression analysis between sales (in $1000) and advertising (in $) resulted in the following least squares line: = 80,000 + 5x. This implies that an:a. increase of $1 in advertising is expected, on average, to result in an increase of $5 in

salesb. increase of $5 in advertising is expected, on average, to result in an increase of $5,000

in salesc. increase of $1 in advertising is expected, on average, to result in an increase of

$80,005 in salesd. increase of $1 in advertising is expected, on average, to result in an increase of $5,000

in salesANSWER: d

7. Which of the following techniques is used to predict the value of one variable on the basis of other variables?a. Correlation analysisb. Coefficient of correlationc. Covarianced. Regression analysisANSWER: d

8. The residual is defined as the difference between:a. the actual value of y and the estimated value of yb. the actual value of x and the estimated value of x

Page 3: Statistics-17 by Keller

Simple Linear Regression and Correlation 603

c. the actual value of y and the estimated value of xd. the actual value of x and the estimated value of yANSWER: a

9. In the simple linear regression model, the y-intercept represents the:a. change in y per unit change in xb. change in x per unit change in yc. value of y when x = 0d. value of x when y = 0ANSWER: c

10. In the first order linear regression model, the population parameters of the y-intercept and the slope are estimated respectively, by:a. and b. and c. and d. and ANSWER: a

11. In the simple linear regression model, the slope represents the:a. value of y when x = 0b. average change in y per unit change in xc. value of x when y = 0d. average change in x per unit change in yANSWER: b

12. In regression analysis, the residuals represent the:a. difference between the actual y values and their predicted valuesb. difference between the actual x values and their predicted valuesc. square root of the slope of the regression lined. change in y per unit change in xANSWER: a

13. In the first-order linear regression model, the population parameters of the y-intercept and the slope are, respectively,a. and b. and c. and d. and ANSWER: d

14. In a simple linear regression problem, the following statistics are calculated from a sample of 10 observations: = 2250, = 10, = 50, = 75. The least squares estimates of the slope and y-intercept are respectively:a. 1.5 and 0.5

Page 4: Statistics-17 by Keller

604 Chapter Seventeen

b. 2.5 and 1.5c. 1.5 and 2.5d. 2.5 and –5.0ANSWER: d

15. If a simple linear regression model has no y-intercept, then:a. all values of x are zerob. all values of y are zeroc. when y = 0 so does xd. when x = 0 so does yANSWER: d

16. In the least squares regression line = 3 - 2x, the predicted value of y equals:a. 1.0 when x = -1.0b. 2.0 when x = 1.0c. 2.0 when x = -1.0d. 1.0 when x = 1.0ANSWER: d

17. The least squares method for determining the best fit minimizes:a. total variation in the dependent variableb. sum of squares for errorc. sum of squares for regressiond. All of the aboveANSWER: b

18. What do we mean when we say that a simple linear regression model is “statistically” useful?a. All the statistics computed from the sample make senseb. The model is an excellent predictor of yc. The model is “practically” useful for predicting yd. The model is a better predictor of y than the sample ANSWER: d

Page 5: Statistics-17 by Keller

Simple Linear Regression and Correlation 605

TRUE / FALSE QUESTIONS

19. An inverse relationship between an independent variable x and a dependent variably y means that as x increases, y decreases, and vice versa.ANSWER: T

20. A direct relationship between an independent variable x and a dependent variably y means that the variables x and y increase or decrease together.ANSWER: T

21. Another name for the residual term in a regression equation is random error.ANSWER: T

22. A simple linear regression equation is given by . The point estimate of when = 4 is 20.45.ANSWER: T

23. The vertical spread of the data points about the regression line is measured by the y-intercept.ANSWER: F

24. The method of least squares requires that the sum of the squared deviations between actual y values in the scatter diagram and y values predicted by the regression line be minimized.ANSWER: T

25. A regression analysis between sales (in $1000) and advertising (in $) resulted in the following least squares line: = 60 + 5x. This implies that an increase of $1 in advertising is expected to result in an increase of $65 in sales.ANSWER: F

26. A regression analysis between weight ( in pounds) and height ( in inches) resulted in the following least squares line: = 135 + 6 . This implies that if the height is increased by 1 inch, the weight is expected to increase by an average of 6 pounds.ANSWER: T

27. The residual is defined as the difference between the actual value and the estimated

value .ANSWER: T

28. The regression line = 2 + 3x has been fitted to the data points (4,11), (2,7), and (1,5). The sum of squares for error will be 10.0.ANSWER: T

29. A regression analysis between sales (in $1000) and advertising (in $100) resulted in the following least squares line: = 77 +8x. This implies that if advertising is $600, then the predicted amount of sales (in dollars) is $125,000.

Page 6: Statistics-17 by Keller

606 Chapter Seventeen

ANSWER: T

30. The residuals are observations of the error variable . Consequently, the minimized sum of squared deviations is called the sum of squares for error, denoted SSE.ANSWER: T

31. Statisticians have shown that sample y -intercept and sample slope coefficient are

unbiased estimators of the population regression parameters and , respectively.ANSWER: T

32. If cov(x, y) = 7.5075 and = 3.5, then the sample slope coefficient is 2.145.ANSWER: T

33. The first – order linear model is sometimes called the simple linear regression model.ANSWER: T

34. To create a deterministic model, we start with a probabilistic model that approximates the relationship we want to model.ANSWER: F

35. The residual represents the discrepancy between the observed dependent variable and its Predicted or estimated average value.ANSWER: T

Page 7: Statistics-17 by Keller

Simple Linear Regression and Correlation 607

STATISTICAL CONCEPTS & APPLIED QUESTIONS

FOR QUESTIONS 36 AND 37, USE THE FOLLOWING NARRATIVE:Narrative: Car Speed and Gas MileageAn economist wanted to analyze the relationship between the speed of a car (x) and its gas mileage (y). As an experiment a car is operated at several different speeds and for each speed the gas mileage is measured. These data are shown below.

Speed 25 35 45 50 60 65 70Gas Mileage 40 39 37 33 30 27 25

36. {Car Speed and Gas Mileage Narrative} Determine the least squares regression line.

ANSWER: 50.6563 – 0.3531x

37. {Car Speed and Gas Mileage Narrative} Estimate the gas mileage of a car traveling 70 mph.

ANSWER:When x = 70, = 25.9393 mpg

38. The following 10 observations of variables x and y were collected.

x 1 2 3 4 5 6 7 8 9 10y 25 22 21 19 14 15 12 10 6 2

Find the least squares regression line, and the estimated value of y when x = 3

ANSWER: 27.733-2.389x. When x = 3, = 20.566

39. A scatter diagram includes the following data points:

x 3 2 5 4 5y 8 6 12 10 14

Two regression models are proposed: (1) 1.2 + 2.5x, and (2) 5.5 + 4.0x. Using the least squares method, which of these regression models provide the better fit to the data? Why?

ANSWER:SSE = 4.95 and 593.25 for models 1 and 2, respectively. Therefore, model (1) fits the data better than model (2).

40. Consider the following data values of variables x and y.

Page 8: Statistics-17 by Keller

608 Chapter Seventeen

a. Determine the least squares regression line.b. Find the predicted value of y for x = 9.c. What does the value of the slope of the regression line tell you?

ANSWER:a. 0.934 + 2.637xb. When x = 9, = 24.667c. If x increases by one unit, y on average will increase by 2.637.

FOR QUESTIONS 41 THROUGH 45, USE THE FOLLOWING NARRATIVE:Narrative: Sunshine and Skin CancerA medical statistician wanted to examine the relationship between the amount of sunshine (x) in hours, and incidence of skin cancer (y). As an experiment he found the number of skin cancers detected per 100,000 of population and the average daily sunshine in eight counties around the country. These data are shown below.

Average Daily Sunshine 5 7 6 7 8 6 4 3Skin Cancer per 100,000 7 11 9 12 15 10 7 5

41. {Sunshine and Skin Cancer Narrative} Determine the least squares regression line.

ANSWER: -1.115 + 1.846x

42. {Sunshine and Skin Cancer Narrative} Draw a scatter diagram of the data and plot the least squares regression line on it.

ANSWER:

x 2 4 6 8 10 13y 7 11 17 21 27 36

Page 9: Statistics-17 by Keller

Simple Linear Regression and Correlation 609

43. {Sunshine and Skin Cancer Narrative} Estimate the number of skin cancer per 100,000 of population for 6 hours of sunshine.

ANSWER:When x = 6, = 9.961

44. {Sunshine and Skin Cancer Narrative} What does the value of the slope of the regression line tell you?

ANSWER:If the amount of sunshine x increases by one hour, the amount of skin cancer y increases by an average of 1.846 per 100,000 of population.

45. {Sunshine and Skin Cancer Narrative} Calculate the residual corresponding to the pair (x, y) = (8, 15).

ANSWER:e = y - = 15 – 13.653 = 1.347

FOR QUESTIONS 46 THROUGH 49, USE THE FOLLOWING NARRATIVE:NARRATIVE: Sales and ExperienceThe general manager of a chain of furniture stores believes that experience is the most important factor in determining the level of success of a salesperson. To examine this belief she records last month’s sales (in $1,000s) and the years of experience of 10 randomly selected salespeople. These data are listed below.

Salesperson Years of Experience Sales1 0 72 2 93 10 204 3 155 8 186 5 147 12 208 7 179 20 3010 15 25

Page 10: Statistics-17 by Keller

610 Chapter Seventeen

46. {Sales and Experience Narrative} Draw a scatter diagram of the data to determine whether a linear model appears to be appropriate.

ANSWER:

It appears that a linear model is appropriate.

47. {Sales and Experience Narrative} Determine the least squares regression line.

ANSWER:8.63 + 1.0817x

48. {Sales and Experience Narrative} Interpret the value of the slope of the regression line.

ANSWER:For each additional year of experience, monthly sales of a salesperson increase by an average of $1,081.7.

49. {Sales and Experience Narrative} Estimate the monthly sales for a salesperson with 16 years of experience.

ANSWER:When x =16, = 25.94

FOR QUESTIONS 50 THROUGH 53, USE THE FOLLOWING NARRATIVE:Narrative: Income and EducationA professor of economics wants to study the relationship between income (y in $1000s) and education (x in years). A random sample eight individuals is taken and the results are shown below.

Education 16 11 15 8 12 10 13 14Income 58 40 55 35 43 41 52 49

Page 11: Statistics-17 by Keller

Simple Linear Regression and Correlation 611

50. {Income and Education Narrative} Draw a scatter diagram of the data to determine whether a linear model appears to be appropriate.

ANSWER:

It appears that a linear model is appropriate.

51. {Income and Education Narrative} Determine the least squares regression line.

ANSWER:10.6165 + 2.9098x

52. {Income and Education Narrative} Interpret the value of the slope of the regression line.

ANSWER:For each additional year of education, the income increases by an average of $2,909.80.

53. {Income and Education Narrative} Estimate the income of an individual with 15 years of education.

ANSWER:When x = 15, = 54.264 (in $1000s) or $54,264.0

FOR QUESTIONS 54 THROUGH 57, USE THE FOLLOWING NARRATIVE:Narrative: Game Winnings and EducationAn ardent fan of television game shows has observed that, in general, the more educated the contestant, the less money he or she wins. To test her belief she gathers data about the last eight winners of her favorite game show. She records their winnings in dollars and the number of years of education. The results are as follows.

Page 12: Statistics-17 by Keller

612 Chapter Seventeen

Contestant Years of Education Winnings1 11 7502 15 4003 12 6004 16 3505 11 8006 16 3007 13 6508 14 400

54. {Game Winnings and Education Narrative} Draw a scatter diagram of the data to determine whether a linear model appears to be appropriate.

ANSWER:

It appears that a linear model is appropriate.

55. {Game Winnings and Education Narrative} Determine the least squares regression line.

ANSWER:1735 – 89.1667x

56. {Game Winnings and Education Narrative} Interpret the value of the slope of the regression line.

ANSWER:For each additional year of education a contestant has, his or her winnings on TV game shows decreases by an average of approximately $89.20.

Page 13: Statistics-17 by Keller

Simple Linear Regression and Correlation 613

57. {Game Winnings and Education Narrative} Estimate the game winnings for a contestant with 15 years of education.

ANSWER:When x = 15, = $397.50

FOR QUESTIONS 58 THROUGH 61, USE THE FOLLOWING NARRATIVE:Narrative: Movie RevenuesA financier whose specialty is investing in movie productions has observed that, in general, movies with “big-name” stars seem to generate more revenue than those movies whose stars are less well known. To examine his belief he records the gross revenue and the payment (in $ millions) given to the two highest-paid performers in the movie for ten recently released movies.

Movie Cost of Two Highest Paid Performers

Gross Revenue

1 5.3 482 7.2 653 1.3 184 1.8 205 3.5 316 2.6 267 8.0 738 2.4 239 4.5 3910 6.7 58

58. {Movie Revenues Narrative} Draw a scatter diagram of the data to determine whether a linear model appears to be appropriate.

ANSWER: It appears that a linear model is appropriate.

Page 14: Statistics-17 by Keller

614 Chapter Seventeen

59. {Movie Revenues Narrative} Determine the least squares regression line.

ANSWER:4.225 + 8.285x

60. {Movie Revenues Narrative} Interpret the value of the slope of the regression line.

ANSWER:For each million dollar paid to the two highest paid performers, the gross revenue of the movie increases by an average of $8.285 million.

61. {Movie Revenues Narrative} Estimate the gross revenue of a movie if the two highest paid performers received 6 million dollars.

ANSWER:When x = 6, = $53.935 million

FOR QUESTIONS 62 THROUGH 65, USE THE FOLLOWING NARRATIVE:NARRATIVE: Cost of BooksThe editor of a major academic book publisher claims that a large part of the cost of books is the cost of paper. This implies that larger books will cost more money. As an experiment to analyze the claim, a university student visits the bookstore and records the number of pages and the selling price of twelve randomly selected books. These data are listed below.

Book Number of Pages Selling Price ($)1 844 552 727 503 360 354 915 605 295 306 706 507 410 408 905 539 1058 6510 865 5411 677 4212 912 58

62. {Cost of Books Narrative} Determine the least squares regression line.

ANSWER: 19.387 + .0414x

Page 15: Statistics-17 by Keller

Simple Linear Regression and Correlation 615

63. {Cost of Books Narrative} Draw a scatter diagram of the data and plot the least squares regression line on it.

ANSWER:

64. {Cost of Books Narrative} Interpret the value of the slope of the regression line.

ANSWER:For every additional page, the price of a book increases by an average of about 4 cents.

65. {Cost of Books Narrative} Estimate the selling price for a 650 pages book.

ANSWER:When x = 650, = $46.037

FOR QUESTIONS 66 THROUGH 68, USE THE FOLLOWING NARRATIVE:Narrative: Accidents and PrecipitationA statistician investigating the relationship between the amount of precipitation (in inches) and the number of automobile accidents gathered data for 10 randomly selected days. The results

Day Precipitation Number of Accidents1 0.05 52 0.12 63 0.05 24 0.08 45 0.10 86 0.35 147 0.15 78 0.30 139 0.10 710 0.20 10

Page 16: Statistics-17 by Keller

SUMMARY OUTPUT DESCRIPTIVE STATISTICS

Regression Statistics Age ConcertsMultiple R 0.80203 Mean 53 Mean 3.65R Square 0.64326 Standard Error 2.1849 Standard Error 0.3424Adjusted R Square 0.62344 Standard Deviation 9.7711 Standard Deviation 1.5313Standard Error 0.93965 Sample Variance 95.4737 Sample Variance 2.3447Observations 20 Count 20 Count 20

SPEARMAN RANK CORRELATION COEFFICIENT=0.8306

ANOVAdf SS MS F Significance F

Regression 1 28.65711 28.65711 32.45653 2.1082E-05Residual 18 15.89289 0.88294Total 19 44.55

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept -3.01152 1.18802 -2.53491 0.02074 -5.50746 -0.5156Age 0.12569 0.02206 5.69706 0.00002 0.07934 0.1720

616 Chapter Seventeen

66. {Accidents and Precipitation Narrative} Find the least squares regression line.

ANSWER: 2.3704 + 34.864x

67. {Accidents and Precipitation Narrative} Estimate the number of accidents in a day with 0.25 inches of precipitation

ANSWER:When x = 0.25, = 11.08 11 accidents

68. {Accidents and Precipitation Narrative} What does the slope of the least squares regression line tell you?

ANSWER:For each additional inch of precipitation, the number of accidents on average increases by 34.864 (about 35 accidents).

FOR QUESTIONS 69 THROUGH 73, USETHE FOLLOWING NARRATIVE:Narrative: Willie Nelson ConcertAt a recent Willie Nelson concert, a survey was conducted that asked a random sample of 20 people their age and how many concerts they have attended since the first of the year. The following data were collected:

Age 62 57 40 49 67 54 43 65 54 41Number of Concerts 6 5 4 3 5 5 2 6 3 1

Age 44 48 55 60 59 63 69 40 38 52Number of Concerts 3 2 4 5 4 5 4 2 1 3

An Excel output follows :

Page 17: Statistics-17 by Keller

Simple Linear Regression and Correlation 617

69. {Willie Nelson Concert Narrative} Draw a scatter diagram of the data to determine whether a linear model appears to be appropriate to describe the relationship between the age and number of concerts attended by the respondents.

ANSWER:

A linear model appears to be appropriate to describe the relationship between the age and number of concerts attended by the respondents.

70. {Willie Nelson Concert Narrative} Determine the least squares regression line.

ANSWER:-3.0115 + 0.1257x

71. {Willie Nelson Concert Narrative} Plot the least squares regression line on the scatter diagram.

ANSWER:

Page 18: Statistics-17 by Keller

618 Chapter Seventeen

72. {Willie Nelson Concert Narrative} Interpret the value of the slope of the regression line.

ANSWER:For every additional year of age, the number of concerts attended increases on average by 0.1257. Equivalently we may say, for every additional 20 years of age, the number of concerts attended increases on average by about 2.50.

73. {Willie Nelson Concert Narrative} Estimate the number of Willie Nelson concerts attended by a 64 year old person.

ANSWER:When x = 64, = 5.03 (about 5 concerts)

FOR QUESTIONS 74 THROUGH 77, USE THE FOLLOWING NARRATIVE:Narrative: Oil Quality and PriceQuality of oil is measured in API gravity degrees – the higher the degrees API, the higher the quality. The table shown below is produced by an expert in the field who believes that there is a relationship between quality and price per barrel.

A partial Minitab output follows:

Descriptive StatisticsVariable N Mean StDev SE MeanDegrees 13 34.60 4.613 1.280Price 13 12.730 0.457 0.127

CovariancesDegrees Price

Degrees 21.281667Price 2.026750 0.208833

Oil degrees API Price per barrel (in $)27.0 12.0228.5 12.0430.8 12.3231.3 12.2731.9 12.4934.5 12.7034.0 12.8034.7 13.0037.0 13.0041.0 13.1741.0 13.1938.8 13.2239.3 13.27

Page 19: Statistics-17 by Keller

Simple Linear Regression and Correlation 619

Regression Analysis

Predictor Coef StDev T PConstant 9.4349 0.2867 32.91 0.000Degrees 0.095235 0.008220 11.59 0.000

S = 0.1314 R-Sq = 92.46% R-Sq(adj) = 91.7%

Analysis of Variance

Source DF SS MS F PRegression 1 2.3162 2.3162 134.24 0.000Residual Error 11 0.1898 0.0173Total 12 2.5060

74. {Oil Quality and Price Narrative} Draw a scatter diagram of the data to determine whether a linear model appears to be appropriate to describe the relationship between the quality of oil and price per barrel.

ANSWER:

A linear model appears to be appropriate to describe the relationship between the quality of oil and price per barrel.

75. {Oil Quality and Price Narrative} Determine the least squares regression line.

ANSWER:9.4349 + 0.095235x

Scatter Diagram

11.8

12

12.2

12.4

12.6

12.8

13

13.2

13.4

20 25 30 35 40 45

Degrees

Pric

e

Page 20: Statistics-17 by Keller

620 Chapter Seventeen

76. {Oil Quality and Price Narrative} Plot the least squares regression line on the scatter diagram.

ANSWER:

77. {Oil Quality and Price Narrative} Interpret the value of the slope of the regression line.

ANSWER:For every additional API gravity degree, the price of oil per barrel increases by an average of 9.52 cents.

Scatter Diagram

11.812

12.212.412.612.8

1313.213.413.6

20 25 30 35 40 45

Degrees

Pric

e

Page 21: Statistics-17 by Keller

Simple Linear Regression and Correlation 621

SECTIONS 3 - 4

MULTIPLE CHOICE QUESTIONS

In the following multiple-choice questions, please circle the correct answer.

78. In a simple linear regression problem, the following sum of squares are produced: , , and . The percentage of the

variation in y that is explained by the variation in x is:a. 25%b. 75%c. 33%d. 50%ANSWER: b

79. In simple linear regression, most often we perform a two-tail test of the population slope to determine whether there is sufficient evidence to infer that a linear relationship

exists. The null hypothesis is stated as:a.b.c.d.ANSWER: a

80. Testing whether the slope of the population regression line could be zero is equivalent to testing whether the:a. sample coefficient of correlation could be zerob. standard error of estimate could be zeroc. population coefficient of correlation could be zerod. sum of squares for error could be zeroANSWER: c

81. Given that 500, , cov (x, y) = 100, and n = 6, the standard error of estimate is:a. 12.247b. 24.933c. 30.2076d. 11.180ANSWER: c

82. The symbol for the population coefficient of correlation is:a. r

Page 22: Statistics-17 by Keller

622 Chapter Seventeen

b. c. r d.ANSWER: b

83. Given that the sum of squares for error is 60 and the sum of squares for regression is 140, then the coefficient of determination is:a. 0.429b. 0.300c. 0.700d. 0.837ANSWER: c

84. A regression line using 25 observations produced SSR = 118.68 and SSE = 56.32. The standard error of estimate was:a. 2.1788b. 1.5648c. 1.5009d. 2.2716ANSWER: b

85. The symbol for the sample coefficient of correlation is:a. ra.b. c.ANSWER: a

86. Given the least squares regression line = -2.48 + 1.63x, and a coefficient of determination of 0.81, the coefficient of correlation is:a. -0.85b. 0.85c. -0.90d. 0.90ANSWER: d

87. Which value of the coefficient of correlation r indicates a stronger correlation than 0.65?a. 0.55b. -0.75c. 0.60d. -0.45ANSWER: b

88. If the coefficient of determination is 0.975, then the slope of the regression line:a. must be positiveb. must be negative

Page 23: Statistics-17 by Keller

Simple Linear Regression and Correlation 623

c. could be either positive or negatived. None of the above.ANSWER: c

89. In regression analysis, if the coefficient of determination is 1.0, then:a. the sum of squares for error must be 1.0b. the sum of squares for regression must be 1.0c. the sum of squares for error must be 0.0d. the sum of squares for regression must be 0.0ANSWER: c

90 The sample correlation coefficient between x and y is 0.375. It has been found out that the p– value is 0.744 when testing against the one-sided alternative .

To test the against the two-sided alternative at a significance level of 0.193, the p – value isa. 0.372b. 1.488c. 0.256d. 0.512ANSWER: d

91. Correlation analysis is used to determine:a. the strength of the relationship between x and y b. the least squares estimates of the regression parametersc. the predicted value of y for a given value of x d. the coefficient of determinationANSWER: a

92. If the coefficient of correlation is –0.80 then, the percentage of the variation in y that is explained by the variation in x is:a. 80%b. 64%c. –80%d. –64%ANSWER: b

93. If all the points in a scatter diagram lie on the least squares regression line, then the coefficient of correlation must be:a. 1.0b. –1.0c. either 1.0 or –1.0d. 0.0ANSWER: c

94. If the coefficient of correlation is –0.60, then the coefficient of determination is:a. -0.60b. -0.36

Page 24: Statistics-17 by Keller

624 Chapter Seventeen

c. 0.36d. 0.40ANSWER: c

95. In regression analysis, if the coefficient of correlation is –1.0, then:a. the sum of squares for error is –1.0b. the sum of squares for regression is 1.0c. the sum of squares for error and sum of squares for regression are equald. the sum of squares for regression and total variation in y are equalANSWER: d

96. If the coefficient of correlation between x and y is close to 1.0, this indicates that: a. y causes x to happenb. x causes y to happenc. both (a) and (b) d. there may or may not be any causal relationship between x and yANSWER: d

97. For the values of the coefficient of determination listed below, which one implies the greatest value of the sum of squares for regression given that the total variation in y is 1800?a. 0.69b. 0.96c. 0.58d. 0.85ANSWER: b

98. When all the actual and predicted values of y are equal, the standard error of estimate will be:a. 1.0b. –1.0c. 0.0d. 2.0ANSWER: c

99. Which of the following statistics and procedures can be used to determine whether a linear model should be employed?a. The standard error of estimateb. The coefficient of determinationc. The t-test of the sloped. All of the aboveANSWER: d

100. In testing the hypotheses: vs. , the following statistics are available:

n = 10, , , = 1.20, and = 6. The value of the test statistic is:a. 2.042

Page 25: Statistics-17 by Keller

Simple Linear Regression and Correlation 625

b. 0.306c. –1.50d. -0.300ANSWER: a

101. The standard error of estimate is given by:a. SSE/(n – 2)b.c.d. SSE/ANSWER: c

102. If the standard error of estimate = 20 and n = 10, then the sum of squares for error, SSE, is:a. 400b. 3200c. 4000d. 40000ANSWER: b

103. The smallest value that the standard error of estimate can assume is:a. –1b. 0c. 1d. –2ANSWER: b

104. If cov(x, y) = 1260, and then the coefficient of determination is:a. 0.7875b. 1.0286c. 0.8100d. 0.7656ANSWER: c

105. The standard error of estimate is a measure of the:a. variation of y around the regression lineb. variation of x around the regression linec. variation of y around the mean d. variation of x around the mean ANSWER: a

Page 26: Statistics-17 by Keller

626 Chapter Seventeen

106. The Pearson coefficient of correlation r equals 1 when there is no:a. explained variationb. unexplained variationc. y-intercept in the modeld. outliersANSWER: b

107. In regression analysis, the coefficient of determination measures the amount of variation in y that is: a. caused by the variation in xb. explained by the variation in xc. unexplained by the variation in xd. None of the aboveANSWER: b

108. If we are interested in determining whether two variables are linearly related, it is necessary to:a. perform the t-test of the slopeb. perform the t-test of the coefficient of correlation c. either (a) or (b) since they are identicald. calculate the standard error of estimate ANSWER: c

109. In a regression problem the following pairs of (x,y) are given: (3,1), (3,-1), (3,0), (3,-2) and (3,2). That indicates that the:a. correlation coefficient is –1b. correlation coefficient is 0c. correlation coefficient is 1d. coefficient of determination is between –1 and 1ANSWER: b

110. In a regression problem, if the coefficient of determination is 0.95, this means that:a. 95% of the y values are positiveb. 95% of the variation in y can be explained by the variation in xc. 95% of the x values are equald. 95% of the variation in x can be explained by the variation in yANSWER: b

111. The sample correlation coefficient between x and y is 0.375. It has been found out that the p – value is 0.256 when testing against the two-sided alternative

. To test against the one-sided alternative at a significant level of 0.193, the p – value will be equal toa. 0.128b. 0.512

Page 27: Statistics-17 by Keller

Simple Linear Regression and Correlation 627

c. 0.744d. 0.872ANSWER: a

112. In simple linear regression, which of the following statements indicate no linear relationship between the variables x and y?a. Coefficient of determination is 1.0b. Coefficient of correlation is 0.0c. Sum of squares for error is 0.0d. Sum of squares for regression is relatively largeANSWER: b

113. If the sum of squared residuals is zero, then the:a. coefficient of determination must be 1.0b. coefficient of correlation must be 1.0c. coefficient of determination must be 0. 0d. coefficient of correlation must be 0.0ANSWER: a

114. In a regression problem, if all the values of the independent variable are equal, then the coefficient of determination must be:a. 1.0b. 0.5c. 0.0d. –1.0ANSWER: c

115. The standard error of the estimate is a measure ofa. total variation of the y variableb. the variation around the sample regression linec. explained variationd. the variation of the x variableANSWER: b

116. In simple linear regression, the coefficient of correlation r and the least squares estimate of the population slope :

a. must be equalb. must have opposite signsc. must have the same signd. may have opposite signs or the same signANSWER: c

Page 28: Statistics-17 by Keller

628 Chapter Seventeen

117. The coefficient of determination ( ) tells us a. that the coefficient of correlation is larger than 1b. whether r has any significancec. that we should not partition the total variationd. the proportion of total variation in y that is explained by xANSWER: d

118. In performing a regression analysis involving two numerical variables, we are assuming:a. the variances of x and yare equalb. the variation around the line of regression is the same for each x valuec. that x and y are independentd. All of the aboveANSWER: b

119. Which of the following assumptions concerning the probability distribution of the random error term is stated incorrectly?a. The distribution is normalb. The mean of the distribution is 0c. The variance of the distribution increases as x increasesd. The errors are independentANSWER: c

120. If the correlation coefficient (r) = 1.00, thena. The y – intercept ( ) must equal 0b. The explained variation equals the unexplained variationc. There is no unexplained variationd. There is no explained variationANSWER: c

121. In a simple linear regression problem, r and a. may have opposite signsb. must have the same signc. must have opposite signsd. must be equalANSWER: b

122. The sample correlation coefficient between x and y is 0.375. It has been found out that the p – value is 0.256 when testing against a two-sided alternative .

To test against the one-sided alternative at a significance level of 0.193, the p - value will be equal toa. 0.128b. 0.512c. 0.744d. 0.872

Page 29: Statistics-17 by Keller

Simple Linear Regression and Correlation 629

ANSWER: d

123. Which of the following in not a required condition for the error variable in the simple linear regression model?a. The probability distribution of is normal.b. The mean of the probability distribution of is zero.c. The standard deviation of is a constant no matter what the value of x.d. The values of are auto correlated.ANSWER: d

124. Testing for existence of correlation is equivalent toa. testing for the existence of the slope ( )

b. testing for the existence of the Y – intercept ( )c. the confidence interval estimate for predicting Yd. None of the aboveANSWER: a

125. The coefficient of determination measures the amount of:a. variation in y that is explained by variation in xb. variation in x that is explained by variation in yc. variation in y that is unexplained by variation in xd. variation in x that is unexplained by variation in yANSWER: a

126. If the coefficient of correlation is 0.90, then the percentage of the variation in the dependent variable y that is explained by the variation in the independent variable x is:a. 90%b. 81%c. 0.90%d. 0.81%ANSWER: b

127. If a researcher wanted to find out if alcohol consumptions and grade point average on a 4 – point scale are linearly related, he would perform a a. test for the difference in two proportions

b. test for independencec. a z test for the difference in two proportionsd. a t test for no linear relationship between the two variablesANSWER: d

Page 30: Statistics-17 by Keller

630 Chapter Seventeen

TRUE / FALSE QUESTIONS

128. If the value of the sum of squares for error SSE equals zero, then the coefficient of determination must equal zero.ANSWER: F

129. When the actual values y of a dependent variable and the corresponding predicted values are the same, the standard error of the estimate will be 1.0.

ANSWER: F

130. The value of the sum of squares for regression SSR can never be smaller than 0.0.ANSWER: T

131. The value of the sum of squares for regression SSR can never be smaller than 1.ANSWER: F

132. If all the values of an independent variable x are equal, then regressing a dependent variable y on x will result in a coefficient of determination of zero.ANSWER: T

133. In a simple linear regression model, testing whether the slope of the population regression line could be zero is the same as testing whether or not the population coefficient of correlation equals zero.ANSWER: T

134. When the actual values y of a dependent variable and the corresponding predicted values are the same, the standard error of estimate will be 0.0.

ANSWER: T

135. If there is no linear relationship between two variables and , the coefficient of determination must be 1.0.ANSWER: F

136. The value of the sum of squares for regression SSR can never be larger than the value of sum of squares for error SSE.ANSWER: F

137. When the actual values y of a dependent variable and the corresponding predicted values are the same, the standard error of estimate will be -1.0.

ANSWER: F

138. In a simple linear regression problem, the least squares line is = -3.75 + 1.25 , and the coefficient of determination is 0.81. The coefficient of correlation must be –0.90.ANSWER: F

139. In simple linear regression, the divisor of the standard error of estimate is n – 2.

Page 31: Statistics-17 by Keller

Simple Linear Regression and Correlation 631

ANSWER: T

140. In a regression problem the following pairs of (x, y) are given: (4,-2), (4,-1), (4,0), (4,1) and (4,2). That indicates that the coefficient of correlation is –1.ANSWER: F

141. The value of the sum of squares for regression SSR can never be larger than the value of total sum of squares SST.ANSWER: T

142. In regression analysis, if the coefficient of determination is 1.0, then the coefficient of correlation must be 1.0.ANSWER: F

143. Correlation analysis is used to determine the strength of the relationship between an independent variable x and dependent variable y. ANSWER: T

144. If the coefficient of correlation is –0.81, then the percentage of the variation in y that is explained by the regression line is 81%.ANSWER: F

145. If all the points in a scatter diagram lie on the least squares regression line, then the coefficient of correlation must be 1.0.ANSWER: F

146. If the standard error of estimate = 20 and n = 8, then the sum of squares for error SSE is 2,400.ANSWER: T

147. The probability distribution of the error variable is normal, with mean E( ) = 0, and standard deviation =1.ANSWER: F

148. In a simple linear regression problem, if the coefficient of determination is 0.95, this means that 95% of the variation in the independent variable x can be explained by regression line.ANSWER: F

149. Given that cov(x, y) = 10, = 15, = 8, and n = 12, the value of the standard error of

estimate is 2.75.ANSWER: F

150. If the error variable is normally distributed, the test statistic for testing is Student t distributed with n – 2 degrees of freedom.

Page 32: Statistics-17 by Keller

632 Chapter Seventeen

ANSWER: T

151. Given that cov(x, y) = 8.5, = 8, and = 10, then the value of the coefficient of

determination is 0.95.ANSWER: F

152. The coefficient of determination is the coefficient of correlation squared. That is, ANSWER: T

153. Given that SSE = 60 and SSR = 540, the proportion of the variation in y that is explained by the variation in x is 0.90.ANSWER: T

154. Given that SSE = 84 and SSR = 358.12, the coefficient of correlation (also called the Pearson coefficient of correlation) must be 0.90.ANSWER: F

155. Except for the values r = -1, 0, and 1, we cannot be specific in our interpretation of the coefficient of correlation r. However, when we square it we produce a more meaningful statistic.ANSWER: T

156. A zero population correlation coefficient between a pair of random variables means that there is no linear relationship between the random variables.ANSWER: T

157. Given that cov(x, y) = 8, = 14, = 10, and n = 6, the value of the sum of squares for

error SSE is 38.ANSWER: T

158. A store manager gives a pre-employment examination to new employees. The test is scored from 1 to 100. He has data on their sales at the end of one year measured in dollars. He wants to know if there is any linear relationship between pre-employment examination score and sales. An appropriate test to use is the t test on the population correlation coefficient.ANSWER: T

Page 33: Statistics-17 by Keller

Simple Linear Regression and Correlation 633

STATISTICAL CONCEPTS & APPLIED QUESTIONS

FOR QUESTIONS 159 THROUGH 164, USE THE FOLLOWING NARRATIVE:Narrative: Car Speed and Gas MileageAn economist wanted to analyze the relationship between the speed of a car (x) and its gas mileage (y). As an experiment a car is operated at several different speeds and for each speed the gas mileage is measured. These data are shown below.

Speed 25 35 45 50 60 65 70Gas Mileage 40 39 37 33 30 27 25

159. {Car Speed and Gas Mileage Narrative} Calculate the standard error of estimate, and describe what this statistic tells you about the regression line.

ANSWER:1.448; the model’s fit to these data is good.

160. {Car Speed and Gas Mileage Narrative} Do these data provide sufficient evidence at the 5% significance level to infer that a linear relationship exists between higher speeds and lower gas mileage?

ANSWER: vs.

Rejection region: | t | > 2.228Test statistic: t = -9.754Conclusion: Reject the null hypothesis. Yes, these data provide sufficient evidence at the 5% significance level to infer that a linear relationship exists between higher speeds and lower gas mileage.

161. {Car Speed and Gas Mileage Narrative} Predict with 99% confidence the gas mileage of a car traveling 55 mph.

ANSWER:31.236 6.284. Thus, LCL = 24.952, and UCL = 37.52

162. {Car Speed and Gas Mileage Narrative} Calculate the Pearson coefficient of correlation.

ANSWER:r = -0.975

163. {Car Speed and Gas Mileage Narrative} What does the coefficient of correlation tell you about the direction and strength of the relationship between the two variables?

ANSWER:There is a very strong negative linear relationship between car speed and gas mileage.

Page 34: Statistics-17 by Keller

634 Chapter Seventeen

164. {Car Speed and Gas Mileage} Calculate the coefficient of determination and interpret its value.

ANSWER:= 0.95. This means that 95% of the total variation in gas mileage can be explained by

the speed of the car.

165. The following 10 observations of variables x and y were collected.

x 1 2 3 4 5 6 7 8 9 10y 25 22 21 19 14 15 12 10 6 2

a. Calculate the standard error of estimate.b. Test to determine if there is enough evidence at the 5% significance level to indicate

that x and y are negatively linearly related.c. Calculate the coefficient of correlation, and describe what this statistic tells you about

the regression line.

ANSWER:a. 1.322b. vs.

Rejection region: | t | > 1.86 Test statistic: t = -16.402

Conclusion: Reject the null hypothesis. Yes, there is enough evidence at the 5% significance level to indicate that x and y are negatively linearly related.

c. r = -0.9854. This indicates a very strong negative linear relationship between the two variables.

166. Consider the following data values of variables x and y.

x 2 4 6 8 10 13y 7 11 17 21 27 36

a. Calculate the coefficient of determination, and describe what this statistic tells you about the relationship between the two variables.

b. Calculate the Pearson coefficient of correlation. What sign does it have? Why?c. What does the coefficient of correlation calculated Tell you about the direction and

strength of the relationship between the two variables?

ANSWER:a. 0.995. This means that 99.5% of the variation in the dependent variable y is

explained by the variation in the independent variable x.b. r = 0.9975. It is positive since the slope of the regression line is positive.c. There is a very strong (almost perfect) positive linear relationship between the two

variables.FOR QUESTIONS 167 THROUGH 171, USE THE FOLLOWING NARRATIVE:

Page 35: Statistics-17 by Keller

Simple Linear Regression and Correlation 635

Narrative: Sunshine and Skin CancerA medical statistician wanted to examine the relationship between the amount of sunshine (x) and incidence of skin cancer (y). As an experiment he found the number of skin cancers detected per 100,000 of population and the average daily sunshine in eight counties around the country. These data are shown below.

Average Daily Sunshine 5 7 6 7 8 6 4 3Skin Cancer per 100,000 7 11 9 12 15 10 7 5

167. {Sunshine and Skin Cancer Narrative} Calculate the standard error of estimate, and describe what this statistic tells you about the regression line.

ANSWER:0.9608; the model’s fit to these data is good.

168. {Sunshine and Skin Cancer Narrative} Can we conclude at the 1% significance level that there is a linear relationship between sunshine and skin cancer?

ANSWER: vs.

Rejection region: | t | > 3.707Test statistic: t = 8.485

Conclusion: Reject the null hypothesis. Yes, we conclude at the 1% significance level that there is a linear relationship between sunshine and skin cancer.

169. {Sunshine and Skin Cancer Narrative} Calculate the coefficient of determination and interpret it.

ANSWER: 0.9231. This means that 92.31% of the variation in the incidence of skin cancer is

explained by the variation in the amount of sunshine.

170. {Sunshine and Skin Cancer Narrative} Calculate the Pearson coefficient. What sign does it have? Why?

ANSWER:R = 0.9608. It is positive since the slope of the regression line ( = 1.846) is positive.

171. {Sunshine and Skin Cancer Narrative} What does the coefficient of correlation calculated Tell you about the direction and strength of the relationship between the two variables?

ANSWER:There is a very strong (almost perfect) positive linear relationship between the two variables.

FOR QUESTIONS 172 THROUGH 177, USE THE FOLLOWING NARRATIVE:

Page 36: Statistics-17 by Keller

636 Chapter Seventeen

Narrative: Sales and ExperienceThe general manager of a chain of furniture stores believes that experience is the most important factor in determining the level of success of a salesperson. To examine this belief she records last month’s sales (in $1,000s) and the years of experience of 10 randomly selected salespeople. These data are listed below.

Salesperson Years of Experience Sales1 0 72 2 93 10 204 3 155 8 186 5 147 12 208 7 179 20 3010 15 25

172. {Sales and Experience Narrative} Determine the standard error of estimate and describe what this statistic tells you about the regression line.

ANSWER:1.5724; the model’s fit is good.

173. (Sales and Experience Narrative} Determine the coefficient of determination and discuss what its value tells you about the two variables.

ANSWER: 0.9536, which means that 95.36% of the variation in sales is explained by the

variation in years of experience of the salesperson.

174. {Sales and Experience Narrative} Calculate the Pearson correlation coefficient. What sign does it have? Why?

ANSWER:0.9765. It has a positive sign since the slope of the regression line ( = 1.0817) is

positive.

Page 37: Statistics-17 by Keller

Simple Linear Regression and Correlation 637

175. {Sales and Experience Narrative} Conduct a test of the population coefficient of correlation to determine at the 5% significance level whether a linear relationship exists between years of experience and sales.

ANSWER: vs.

Rejection region: | t | > 2.306Test statistic: t = 12.8258Conclusion: Reject the null hypothesis. Yes, a linear relationship exists between years of experience and sales.

176. {Sales and Experience Narrative} Conduct a test of the population slope to determine at the 5% significance level whether a linear relationship exists between years of experience and sales.

ANSWER: vs.

Rejection region: | t | > 2.306 Test statistic: t = 12.8258

Conclusion: Reject the null hypothesis. Yes, a linear relationship exists between years of experience and sales.

177. {Sales and Experience Narrative} Do the tests of and in the previous two questions provide the same results? Explain.

ANSWER:Yes; both tests have the same value of the test statistic, the same rejection region, and of course the same conclusion. This is not a coincidence; the two tests are identical.

FOR QUESTIONS 178 THROUGH 183, USE THE FOLLOWING NARRATIVE:Narrative: Income and EducationA professor of economics wants to study the relationship between income (y in $1000s) and education (x in years). A random sample eight individuals is taken and the results are shown below.

Education 16 11 15 8 12 10 13 14Income 58 40 55 35 43 41 52 49

178. {Income and Education Narrative} Determine the standard error of estimate and describe what this statistic tells you about the regression line.

ANSWER:2.436; the model’s fit to these data is good.

Page 38: Statistics-17 by Keller

638 Chapter Seventeen

179. {Income and Education Narrative} Determine the coefficient of determination and discuss what its value tells you about the two variables.

ANSWER: 0.9223, which means that 92.03% of the variation in income is explained by the

variation in years of education.

180. {Income and Education Narrative} Calculate the Pearson correlation coefficient. What sign does it have? Why?

ANSWER:0.9604. It has a positive sign since the slope of the regression line ( = 2.9098) is

positive.

181. {Income and Education Narrative} Conduct a test of the population coefficient of correlation to determine at the 5% significance level whether a linear relationship exists between years of education and income.

ANSWER: vs.

Rejection region: | t | > 2.447 Test statistic: t = 8.439Conclusion: Reject the null hypothesis. Yes, a linear relationship exists between years of education and income.

182. {Income and Education Narrative} Conduct a test of the population slope to determine at the 5% significance level whether a linear relationship exists between years of education and income.

ANSWER:,

Rejection region: | t | > 2.447 Test statistic: t = 8.439 Conclusion: Reject the null hypothesis. Yes, a linear relationship exists between years of education and income.

183. {Income and Education Narrative} Do the tests of and in the previous two provide the same results? Explain.

ANSWER:Yes; both tests have the same value of the test statistic, the same rejection region, and of course the same conclusion. This is not a coincidence; the two tests are identical.

Page 39: Statistics-17 by Keller

Simple Linear Regression and Correlation 639

FOR QUESTIONS 184 THROUGH 189, USE THE FOLLOWING NARRATIVE:Narrative: Game Winnings and EducationAn ardent fan of television game shows has observed that, in general, the more educated the contestant, the less money he or she wins. To test her belief she gathers data about the last eight winners of her favorite game show. She records their winnings in dollars and the number of years of education. The results are as follows.

Contestant Years of Education Winnings1 11 7502 15 4003 12 6004 16 3505 11 8006 16 3007 13 6508 14 400

184. {Game Winnings and Education Narrative} Determine the standard error of estimate and describe what this statistic tells you about the regression line.

ANSWER:59.395; the model’s fit to these data is good.

185. {Game Winnings and Education Narrative} Determine the coefficient of determination and discuss what its value tells you about the two variables.

ANSWER: 0.9185, which means that 91.85% of the variation in TV game shows’ winnings is

explained by the variation in years of education.

186. {Game Winnings and Education Narrative} Calculate the Pearson correlation coefficient. What sign does it have? Why?

ANSWER:-0.9584. It has a negative sign since the slope of the regression line ( = -89.1667) is

negative.

187. {Game Winnings and Education Narrative} Conduct a test of the population coefficient of correlation to determine at the 5% significance level whether a linear relationship exists between years of education and TV game shows’ winnings.

ANSWER: vs.

Rejection region: | t | > 2.447Test statistic: t = -8.2227

Page 40: Statistics-17 by Keller

640 Chapter Seventeen

Conclusion: Reject the null hypothesis. Yes, a linear relationship exists between years of education and TV game shows’ winnings.

188. {Game Winnings and Education Narrative} Conduct a test of the population slope to determine at the 5% significance level whether a linear relationship exists between years of education and TV game shows’ winnings.

ANSWER: vs.

Rejection region: | t | > 2.447 Test statistic: t = -8.2227

Conclusion: Reject the null hypothesis. Yes, a linear relationship exists between years of education and TV game shows’ winnings.

189. {Game Winnings and Education Narrative} Do the tests and in the previous two questions provide the same results? Explain.

ANSWER:Yes. This is not a coincidence; the two tests are identical.

FOR QUESTIONS 190 THROUGH 195, USE THE FOLLOWING NARRATIVE:Narrative: Movie RevenuesA financier whose specialty is investing in movie productions has observed that, in general, movies with “big-name” stars seem to generate more revenue than those movies whose stars are less well known. To examine his belief he records the gross revenue and the payment (in $ millions) given to the two highest-paid performers in the movie for ten recently released movies.

Movie Cost of Two Highest Paid Performers

Gross Revenue

1 5.3 482 7.2 653 1.3 184 1.8 205 3.5 316 2.6 267 8.0 738 2.4 239 4.5 3910 6.7 58

190. {Movie Revenues Narrative} Determine the standard error of estimate and describe what this statistic tells you about the regression line.

Page 41: Statistics-17 by Keller

Simple Linear Regression and Correlation 641

ANSWER:2.0247; the model’s fit to these is good.

191. {Movie Revenues Narrative} Determine the coefficient of determination and discuss what its value tells you about the two variables.

ANSWER: 0.9908, which means that 99.08% of the variation in gross revenue is explained by

the variation in payment to the highest performers.

192. {Movie Revenues Narrative} Calculate the Pearson correlation coefficient. What sign does it have? Why?

ANSWER:0.9954. It has a positive sign since the slope of the regression line ( = 8.285) is

positive.

193. {Movie Revenues Narrative} Conduct a test of the population coefficient of correlation to determine at the 5% significance level whether a linear relationship exists between payment to the two highest-paid performers and gross revenue.

ANSWER: vs.

Rejection region: | t | > 2.306Test statistic: t = 29.304Conclusion: Reject the null hypothesis. Yes, a linear relationship exists between payment to the two highest-paid performers and gross revenue.

194. {Movie Revenues Narrative} Conduct a test of the population slope to determine at the 5% significance level whether a linear relationship exists between payment to the two highest-paid performers and gross revenue.

ANSWER: vs.

Rejection region: | t | > 2.306Test statistic: t = 29.304Conclusion: Reject the null hypothesis. Yes, a linear relationship exists between payment to the two highest-paid performers and gross revenue.

195. {Movie Revenues Narrative} Do the and tests in the previous questions provide the same results? Explain.

ANSWER:

Page 42: Statistics-17 by Keller

642 Chapter Seventeen

Yes; both tests have the same value of the test statistic, the same rejection region, and of course the same conclusion. This is not a coincidence; the two tests are identical.

FOR QUESTIONS 196 AND 197, USE THE FOLLOWING NARRATIVE:Narrative: Cost of BooksThe editor of a major academic book publisher claims that a large part of the cost of books is the cost of paper. This implies that larger books will cost more money. As an experiment to analyze the claim, a university student visits the bookstore and records the number of pages and the selling price of twelve randomly selected books. These data are listed below.

Book Number of Pages Selling Price ($)1 844 552 727 503 360 354 915 605 295 306 706 507 410 408 905 539 1058 6510 865 5411 677 4212 912 58

196. {Cost of Books Narrative} Determine the coefficient of determination and discuss what its value tells you.

ANSWER:0.9378, which means that 93.78% of the variation in the price of books is explained

by the variation in the number of pages.

197. {Cost of Books Narrative} Can we infer at the 5% significance level that the editor is correct?

ANSWER: vs.

Rejection region: | t | > 2.228 Test statistic: t = 12.2814Conclusion: Reject the null hypothesis. Yes, we can infer at the 5% significance level that the editor is correct

FOR QUESTIONS 198 THROUGH 202, USE THE FOLLOWING NARRATIVE:Narrative: Automobile Accidents and PrecipitationA statistician investigating the relationship between the amount of precipitation (in inches) and the number of automobile accidents gathered data for 10 randomly selected days. The results

Page 43: Statistics-17 by Keller

Simple Linear Regression and Correlation 643

Day Precipitation Number of Accidents1 0.05 52 0.12 63 0.05 24 0.08 45 0.10 86 0.35 147 0.15 78 0.30 139 0.10 710 0.20 10

198. {Automobile Accidents and Precipitation Narrative} Calculate the standard error of estimate, and describe what this statistic tells you about the regression line.

ANSWER:1.3207; the model’s fit to these is good.

199. {Automobile Accidents and Precipitation Narrative} Determine the coefficient of determination and discuss what its value tells you about the two variables.

ANSWER:0.893, which means that 89.3% of the variation in the number of accidents is

explained by the variation in the amount of precipitation.

200. {Automobile Accidents and Precipitation Narrative} Conduct a test of the population slope to determine whether these data allow us to conclude at the 10% significance level that the amount of precipitation and the number of accidents are linearly related?

ANSWER: vs.

Rejection region: | t | > 1.86Test statistic: t = 8.1709Conclusion: Reject the null hypothesis. Yes, these data allow us to conclude at the 10% significance level that the amount of precipitation and the number of accidents are linearly related

201. {Automobile Accidents and Precipitation Narrative} Conduct a test of the population coefficient of correlation to determine whether these data allow us to conclude at the 10% significance level that the amount of precipitation and the number of accidents are linearly related.

ANSWER:

Page 44: Statistics-17 by Keller

SUMMARY OUTPUT DESCRIPTIVE STATISTICS

Regression Statistics Age ConcertsMultiple R 0.80203 Mean 53 Mean 3.65R Square 0.64326 Standard Error 2.1849 Standard Error 0.3424Adjusted R Square 0.62344 Standard Deviation 9.7711 Standard Deviation 1.5313Standard Error 0.93965 Sample Variance 95.4737 Sample Variance 2.3447Observations 20 Count 20 Count 20

SPEARMAN RANK CORRELATION COEFFICIENT=0.8306

ANOVAdf SS MS F Significance F

Regression 1 28.65711 28.65711 32.45653 2.1082E-05Residual 18 15.89289 0.88294Total 19 44.55

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept -3.01152 1.18802 -2.53491 0.02074 -5.50746 -0.5156Age 0.12569 0.02206 5.69706 0.00002 0.07934 0.1720

644 Chapter Seventeen

Rejection region: | t | > 1.86Test statistic: t = 8.1709Conclusion: Reject the null hypothesis. Yes, these data allow us to conclude at the 10% significance level that the amount of precipitation and the number of accidents are linearly related.

202. {Automobile Accidents and Precipitation Narrative} Do the and tests in the previous two questions provide the same results? Explain

ANSWER:Yes, the two tests are identical to each other.

FOR QUESTIONS 203 THROUGH 208, USE THE FOLLOWING NARRATIVE:Narrative: Willie Nelson ConcertAt a recent Willie Nelson concert, a survey was conducted that asked a random sample of 20 people their age and how many concerts they have attended since the first of the year. The following data were collected:

Age 62 57 40 49 67 54 43 65 54 41Number of Concerts 6 5 4 3 5 5 2 6 3 1

Age 44 48 55 60 59 63 69 40 38 52Number of Concerts 3 2 4 5 4 5 4 2 1 3

An Excel output follows :

203.

{Willie Nelson Concert

Narrative} Determine the standard error of estimate and describe what this statistic tells you about the model’s fit.

Page 45: Statistics-17 by Keller

Simple Linear Regression and Correlation 645

ANSWER:0.9396, and since the sample mean = 3.65, we would have to admit that the

standard error of estimate is not very small. On the other hand, it is not a large number either. Because there is no predefined upper limit on , it is difficult in this problem to assess the model in this way. However, using other criteria, it seems that the model’s fit to these data is reasonable.

204. {Willie Nelson Concert Narrative} Determine the coefficient of determination and discuss what its value tells you about the two variables.

ANSWER:0.64326, which means that 64.326% of the variation in number of concerts attended

is explained by the variation in age of the attendees.

205. {Willie Nelson Concert Narrative} Calculate the Pearson correlation coefficient. What sign does it have? Why?

ANSWER:0.80204. It has a positive sign since the slope of the regression line, , is positive.

206. {Willie Nelson Concert Narrative} Conduct a test of the population coefficient of correlation to determine at the 5% significance level whether a linear relationship exists between age and number of concerts attended.

ANSWER: vs.

Rejection region: | t | > 2.101

Test statistic: = 5.6971

Conclusion: Reject the null hypothesis. Yes

207. {Willie Nelson Concert Narrative} Conduct a test of the population slope to determine at the 5% significance level whether a linear relationship exists between age and number of concerts attended.

ANSWER: vs.

Rejection region: | t | > 2.101Test statistic: t = 5.6971Conclusion: Reject the null hypothesis. Yes, we can infer that at the 5% significance level that a linear relationship exists between age and number of concerts attended.

208. {Willie Nelson Concert Narrative} Do the and tests in the previous two questions provide the same results? Explain.

Page 46: Statistics-17 by Keller

646 Chapter Seventeen

ANSWER:Yes; both tests have the same value of the test statistic, the same rejection region, and of course the same conclusion. This is not a coincidence; the two tests are identical.

FOR QUESTIONS 209 THROUGH 214, USE THE FOLLOWING NARRATIVE:Narrative: Oil Quality and PriceQuality of oil is measured in API gravity degrees – the higher the degrees API, the higher the quality. The table shown below is produced by an expert in the field who believes that there is a relationship between quality and price per barrel.

A partial statistical software output follows:

Descriptive StatisticsVariable N Mean StDev SE MeanDegrees 13 34.60 4.613 1.280Price 13 12.730 0.457 0.127

CovariancesDegrees Price

Degrees 21.281667Price 2.026750 0.208833

Regression AnalysisPredictor Coef StDev T PConstant 9.4349 0.2867 32.91 0.000Degrees 0.095235 0.008220 11.59 0.000

S = 0.1314 R-Sq = 92.46% R-Sq(adj) = 91.7%Analysis of Variance

Source DF SS MS F P

Oil degrees API Price per barrel (in $)27.0 12.0228.5 12.0430.8 12.3231.3 12.2731.9 12.4934.5 12.7034.0 12.8034.7 13.0037.0 13.0041.0 13.1741.0 13.1938.8 13.2239.3 13.27

Page 47: Statistics-17 by Keller

Simple Linear Regression and Correlation 647

Regression 1 2.3162 2.3162 134.24 0.000Residual Error 11 0.1898 0.0173Total 12 2.5060

209. {Oil Quality and Price Narrative} Determine the standard error of estimate and describe what this statistic tells you.

ANSWER:0.1314. Since the sample mean = 12.73, the standard error of estimate is judged to

be small, and we may say that the model fits the data well.

210. {Oil Quality and Price Narrative} Determine the coefficient of determination and discuss what its value tells you about the two variables.

ANSWER:0.9246, which means that 92.46% of the variation in the oil price per barrel is

explained by the variation in the API degrees.

211. {Oil Quality and Price Narrative} Calculate the Pearson correlation coefficient. What sign does it have? Why?

ANSWER:0.9616. It has a positive sign since the slope of the regression line, , is positive.

212. {Oil Quality and Price Narrative} Conduct a test of the population coefficient of correlation to determine at the 5% significance level whether a linear relationship exists between the quality of oil and price per barrel.

ANSWER: vs.

Rejection region: | t | > 2.201

Test statistic: = 11.61

Conclusion: Reject the null hypothesis. Yes, we can infer that at the 5% significance level that a linear relationship exists between the quality of oil and price per barrel.

213. {Oil Quality and Price Narrative} Conduct a test of the population slope to determine at the 5% significance level whether a linear relationship exists between the quality of oil and price per barrel.

ANSWER:

Page 48: Statistics-17 by Keller

648 Chapter Seventeen

vs.

Rejection region: | t | > 2.201Test statistic: t = 11.59 (from Minitab output)Conclusion: Reject the null hypothesis. Yes, we can infer at the 5% significance level that a linear relationship exists between the quality of oil and price per barrel.

214. {Oil Quality and Price Narrative} Do the tests in the previous two questions provide the same results? Explain.

ANSWER:Yes; both tests have the same value of the test statistic (the small difference between 11.61 and 11.59 is due to rounding in Minitab output), the same rejection region, and of course the same conclusion. This is not a coincidence; the two tests are identical.

Page 49: Statistics-17 by Keller

Simple Linear Regression and Correlation 649

SECTION 6

MULTIPLE CHOICE QUESTIONS

In the following multiple-choice questions, please circle the correct answer.

215. In order to estimate with 95% confidence the expected value of y for a given value of x in a simple linear regression problem, a random sample of 10 observations is taken. Which of the following t-table values listed below would be used?a. 2.228b. 2.306c. 1.860d. 1.812ANSWER: b

216. Given a specific value of x and confidence level, which of the following statements is correct?a. The confidence interval estimate of the expected value of y can be calculated but the

prediction interval of y for the given value of x cannot be calculated.b. The confidence interval estimate of the expected value of y will be wider than the

prediction interval.c. The prediction interval of y for the given value of x can be calculated but the

confidence interval estimate of the expected value of y cannot be calculated.d. The confidence interval estimate of the expected value of y will be narrower than the

prediction interval.ANSWER: d

217. In order to predict with 90% confidence the expected value of y for a given value of x in a simple linear regression problem, a random sample of 10 observations is taken. Which of the following t-table values listed below would be used?a. 2.228b. 2.306c. 1.860d. 1.812ANSWER: c

218. The confidence interval estimate of the expected value of y for a given value y x, compared to the prediction interval of y for the same given value of x and confidence level, will bea. widerb. narrowerc. the samed. impossible to knowANSWER: b

Page 50: Statistics-17 by Keller

650 Chapter Seventeen

219. In order to predict with 99% confidence the expected value of y for a given value of x in a simple linear regression problem, a random sample of 10 observations is taken. Which of the following t-table values listed below would be used?a. 1.860b. 2.306c. 2.896d. 3.355ANSWER: d

220. The width of the confidence interval estimate for the predicted value of y depends ona. the standard error of the estimateb. the value of x for which the prediction is being madec. the sample sized. All of the aboveANSWER: d

221. In order to predict with 80% confidence the expected value of y for a given value of x in a simple linear regression problem, a random sample of 15 observations is taken. Which of the following t-table values listed below would be used?a. 1.350b. 1.771c. 2.160d. 2.650ANSWER: a

222. In order to predict with 98% confidence the expected value of y for a given value of x in a simple linear regression problem, a random sample of 15 observations is taken. Which of the following t-table values listed below would be used?a. 1.350b. 1.771c. 2.160d. 2.650ANSWER: d

Page 51: Statistics-17 by Keller

Simple Linear Regression and Correlation 651

TRUE / FALSE QUESTIONS

223. In developing a 95% confidence interval for the expected value of y from a simple linear regression problem involving a sample of size 10, the appropriate table value would be 1.86.ANSWER: F

224. In developing a 80% prediction interval for the particular value of y from a simple linear regression problem involving a sample of size 12, the appropriate table value would be 1.372ANSWER: T

225. In developing 90% prediction interval for the particular value of y from a simple linear regression problem involving a sample of size 14, the appropriate table value would be 2.179ANSWER: F

226. In order to predict with 95% confidence a particular value of for a given value of in a simple linear regression problem, a random sample of 20 observations is taken. The appropriate table value that would be used is 2.101.ANSWER: T

227. The confidence interval estimate of the expected value of y will be narrower than the prediction interval for the same given value of x and confidence level. This is because there is less error in estimating a mean value as opposed to predicting an individual value.ANSWER: T

228. The confidence interval estimate of the expected value of y will be wider than the prediction interval for the same given value of x and confidence level. This is because there is more error in estimating a mean value as opposed to predicting an individual value.ANSWER: F

229. In developing a 90% confidence interval for the expected value of y from a simple linear regression problem involving a sample of size 15, the appropriate table value would be 1.761.ANSWER: F

230. In developing a 99% confidence interval for the expected value of y from a simple linear regression problem involving a sample of size 25, the appropriate table value would be 2.807ANSWER: T

231. The prediction interval for a particular value of y is always wider than the confidence interval for mean value of y, given the same data set, x value, and confidence level.ANSWER: T

Page 52: Statistics-17 by Keller

652 Chapter Seventeen

BASIC TECHNIQUES & APPLIED QUESTIONS

232. A medical statistician wanted to examine the relationship between the amount of sunshine (x) and incidence of skin cancer (y). As an experiment he found the number of skin cancers detected per 100,000 of population and the average daily sunshine in eight counties around the country. These data are shown below.

Average Daily Sunshine 5 7 6 7 8 6 4 3Skin Cancer per 100,000 7 11 9 12 15 10 7 5

Predict with 95% confidence the skin cancers per 100,000 in a county with a daily average of 6.5 hours of sunshine.

ANSWER:10.884 2.525. Thus, LCL= 8.359, and UCL = 13.409

FOR QUESTIONS 233 THROUGH 235, USE THE FOLLOWING NARRATIVE:Narrative: Sales and ExperienceThe general manager of a chain of furniture stores believes that experience is the most important factor in determining the level of success of a salesperson. To examine this belief she records last month’s sales (in $1,000s) and the years of experience of 10 randomly selected salespeople. These data are listed below.

Salesperson Years of Experience Sales1 0 72 2 93 10 204 3 155 8 186 5 147 12 208 7 179 20 3010 15 25

233. {Sales and Experience Narrative} Predict with 95% confidence the monthly sales of a salesperson with 10 years of experience.

ANSWER:19.447 3.819. Thus LCL = 15.628 (in $1000s), and UCL = 23.266 (in $1000s)

234. {Sales and Experience Narrative} Estimate with 95% confidence the average monthly sales of all salespersons with 10 years of experience.

ANSWER:19.447 1.199. Thus LCL = 18.248 (in $1000s), and UCL = 20.646 (in $1000s)

Page 53: Statistics-17 by Keller

Simple Linear Regression and Correlation 653

235. {Sales and Experience Narrative} Which interval in the previous two questions is narrower: the confidence interval estimate of the expected value of y or the prediction interval for the same given value of x (10 years) and same confidence level? Why?

ANSWER:The confidence interval estimate of the expected value of y is narrower than the prediction interval for the same given value of x (10 years) and some confidence level. This is because there is less error in estimating a mean value as opposed to predicting an individual value.

FOR QUESTIONS 236 THROUGH 238, USE THE FOLLOWING NARRATIVE:Narrative: Income and EducationA professor of economics wants to study the relationship between income (y in $1000s) and education (x in years). A random sample eight individuals is taken and the results are shown below.

Education 16 11 15 8 12 10 13 14Income 58 40 55 35 43 41 52 49

236. {Income and Education Narrative} Predict with 95% confidence the income of an individual with 10 years of education.

ANSWER:39.715 2.710. Thus, LCL = 37.005 (in $1000s), and UCL = 42.425 (in $1000s)

237. {Income and Education Narrative} Estimate with 95% confidence the average income of all individuals with 10 years of education.

ANSWER:39.715 1.188. Thus, LCL = 38.527 (in $1000s), and UCL = 40.903 (in $1000s)

238. {Income and Education Narrative} Which interval in the previous two questions is narrower: the confidence interval estimate of the expected value of y or the prediction interval for the same given value of x (10 years) and same confidence level? Why?

ANSWER:The confidence interval estimate of the expected value of y is narrower than the prediction interval for the same given value of x (10 years) and some confidence level. This is because there is less error in estimating a mean value as opposed to predicting an individual value.

Page 54: Statistics-17 by Keller

654 Chapter Seventeen

FOR QUESTIONS 239 THROUGH 242, USE THE FOLLOWING NARRATIVE:Narrative: Movie RevenuesAn ardent fan of television game shows has observed that, in general, the more educated the contestant, the less money he or she wins. To test her belief she gathers data about the last eight winners of her favorite game show. She records their winnings in dollars and the number of years of education. The results are as follows.

Contestant Years of Education Winnings1 11 7502 15 4003 12 6004 16 3505 11 8006 16 3007 13 6508 14 400

239. {Movie Revenues Narrative} Predict with 95% the winnings of a contestant who has 15 years of education.

ANSWER:397.500 159.213. Thus, LCL = $238.287, and UCL = $556.713

240. {Movie Revenues Narrative} Predict with 95% the winnings of a contestant who has 10 years of education.

ANSWER:397.500 179.971. Thus, LCL = $217.529, and UCL = $577.471

241. {Movie Revenues Narrative} Estimate with 95% confidence the average winnings of all contestants who have 15 years of education.

ANSWER:397.500 64.998. Thus, LCL = $332.502, and UCL = $462.498

242. {Movie Revenues Narrative} Estimate with 95% confidence the average winnings of all contestants who have 10 years of education.

ANSWER:397.500 106.141. Thus, LCL = $291.359, and UCL = $503.641

Page 55: Statistics-17 by Keller

Simple Linear Regression and Correlation 655

FOR QUESTIONS 243 THROUGH 245, USE THE FOLLOWING NARRATIVE:Narrative: Movie RevenuesA financier whose specialty is investing in movie productions has observed that, in general, movies with “big-name” stars seem to generate more revenue than those movies whose stars are less well known. To examine his belief he records the gross revenue and the payment (in $ millions) given to the two highest-paid performers in the movie for ten recently released movies.

Movie Cost of Two Highest Paid Performers

Gross Revenue

1 5.3 482 7.2 653 1.3 184 1.8 205 3.5 316 2.6 267 8.0 738 2.4 239 4.5 3910 6.7 58

243. {Movie Revenues Narrative} Predict with 95% confidence the gross revenue of a movie whose top two stars earn $5.0 million.

ANSWER:45.65 4.916. Thus, LCL = 40.734 (in $1,000,000s), and UCL = 50.566 (in $1,000,000s)

244. {Movie Revenues Narrative} Estimate with 95% confidence the average gross revenue of a movie whose top two stars earn $5.0 million.

ANSWER:45.65 1.54. Thus, LCL= 44.11 (in $1,000,000s), and UCL = 47.19 (in $1,000,000s)

245. {Movie Revenues Narrative} Which interval in the previous two questions is narrower: the confidence interval estimate of the expected value of y or the prediction interval for the same given value of x (10 years) and same confidence level? Why?

ANSWER:The confidence interval estimate of the expected value of y is narrower than the prediction interval for the same given value of x (10 years) and some confidence level. This is because there is less error in estimating a mean value as opposed to predicting an individual value.

Page 56: Statistics-17 by Keller

656 Chapter Seventeen

FOR QUESTIONS 246 THROUGH 248, USE THE FOLLOWING NARRATIVE:Narrative: Cost of BooksThe editor of a major academic book publisher claims that a large part of the cost of books is the cost of paper. This implies that larger books will cost more money. As an experiment to analyze the claim, a university student visits the bookstore and records the number of pages and the selling price of twelve randomly selected books. These data are listed below.

Book Number of Pages Selling Price ($)1 844 552 727 503 360 354 915 605 295 306 706 507 410 408 905 539 1058 6510 865 5411 677 4212 912 58

246. {Cost of Books Narrative} Predict with 90% confidence the selling price of a book with 900 pages.

ANSWER:56.647 5.311. Thus, LCL = $51.336, and UCL = $61.958

247. {Cost of Books Narrative} Estimate with 90% confidence the average selling price of all books with 900 pages.

ANSWER:56.647 1.803. Thus, LCL = $54.844, and UCL = $58.450

248. {Cost of Books Narrative} Which interval in the previous two questions is narrower: the confidence interval estimate of the expected value of y or the prediction interval for the same given value of x (10 years) and same confidence level? Why?

ANSWER:The confidence interval estimate of the expected value of y is narrower than the prediction interval for the same given value of x (10 years) and some confidence level. This is because there is less error in estimating a mean value as opposed to predicting an individual value.

FOR QUESTIONS 249 THROUGH 251, USE THE FOLLOWING NARRATIVE:

Page 57: Statistics-17 by Keller

Simple Linear Regression and Correlation 657

Narrative: Automobile Accidents and PrecipitationA statistician investigating the relationship between the amount of precipitation (in inches) and the number of automobile accidents gathered data for 10 randomly selected days. The results

Day Precipitation Number of Accidents1 0.05 52 0.12 63 0.05 24 0.08 45 0.10 86 0.35 147 0.15 78 0.30 139 0.10 710 0.20 10

249. {Automobile Accidents and Precipitation Narrative} Predict with 95% confidence the number of accidents that occur when there is 0.40 inches of rain.

ANSWER:16.316 4.032. Thus, LCL = 12.284, and UCL = 20.348

250. {Automobile Accidents and Precipitation Narrative} Estimate with 95% confidence the average daily number of accidents when the daily precipitation is 0.25 inches.

ANSWER:11.086 1.377. Thus, LCL = 9.709, and UCL = 12.463

251. {Automobile Accidents and Precipitation Narrative} Which interval in the previous two questions is narrower: the confidence interval estimate of the expected value of y or the prediction interval for the same given value of x (10 years) and same confidence level? Why?

ANSWER:The confidence interval estimate of the expected value of y is narrower than the prediction interval for the same given value of x (10 years) and some confidence level. This is because there is less error in estimating a mean value as opposed to predicting an individual value.

FOR QUESTIONS 252 THROUGH 254, USE THE FOLLOWING NARRATIVE:

Page 58: Statistics-17 by Keller

SUMMARY OUTPUT DESCRIPTIVE STATISTICS

Regression Statistics Age ConcertsMultiple R 0.80203 Mean 53 Mean 3.65R Square 0.64326 Standard Error 2.1849 Standard Error 0.3424Adjusted R Square 0.62344 Standard Deviation 9.7711 Standard Deviation 1.5313Standard Error 0.93965 Sample Variance 95.4737 Sample Variance 2.3447Observations 20 Count 20 Count 20

SPEARMAN RANK CORRELATION COEFFICIENT=0.8306

ANOVAdf SS MS F Significance F

Regression 1 28.65711 28.65711 32.45653 2.1082E-05Residual 18 15.89289 0.88294Total 19 44.55

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept -3.01152 1.18802 -2.53491 0.02074 -5.50746 -0.5156Age 0.12569 0.02206 5.69706 0.00002 0.07934 0.1720

658 Chapter Seventeen

Narrative: Willie Nelson ConcertAt a recent Willie Nelson concert, a survey was conducted that asked a random sample of 20 people their age and how many concerts they have attended since the first of the year. The following data were collected:

Age 62 57 40 49 67 54 43 65 54 41Number of Concerts 6 5 4 3 5 5 2 6 3 1

Age 44 48 55 60 59 63 69 40 38 52Number of Concerts 3 2 4 5 4 5 4 2 1 3

An Excel output follows :

252. {Willie Nelson Concert Narrative} Predict with 95% confidence the number of concerts attended by a 45 years-old individual.

ANSWER:2.645 2.057. Thus, LCL = 0.588, and UCL = 4.702

253. {Willie Nelson Concert Narrative} Estimate with 95% confidence the average number of concerts attended by all 45 year-old individuals.

ANSWER:2.645 0.577. Thus, LCL = 2.068, and UCL = 3.222

254. {Willie Nelson Concert Narrative} Which interval in the previous two questions is narrower: the confidence interval estimate of the expected value of y or the prediction interval for the same given value of x (10 years) and same confidence level? Why?

Page 59: Statistics-17 by Keller

Simple Linear Regression and Correlation 659

ANSWER:The confidence interval estimate of the expected value of y is narrower than the prediction interval for the same given value of x (10 years) and some confidence level. This is because there is less error in estimating a mean value as opposed to predicting an individual value.

FOR QUESTIONS 255 THROUGH 257, USE THE FOLLOWING NARRATIVE:Narrative: Oil Quality and PriceQuality of oil is measured in API gravity degrees – the higher the degrees API, the higher the quality. The table shown below is produced by an expert in the field who believes that there is a relationship between quality and price per barrel.

A partial Minitab output follows:

Descriptive StatisticsVariable N Mean StDev SE MeanDegrees 13 34.60 4.613 1.280Price 13 12.730 0.457 0.127

CovariancesDegrees Price

Degrees 21.281667Price 2.026750 0.208833

Regression Analysis

Predictor Coef StDev T PConstant 9.4349 0.2867 32.91 0.000

Oil degrees API Price per barrel (in $)27.0 12.0228.5 12.0430.8 12.3231.3 12.2731.9 12.4934.5 12.7034.0 12.8034.7 13.0037.0 13.0041.0 13.1741.0 13.1938.8 13.2239.3 13.27

Page 60: Statistics-17 by Keller

660 Chapter Seventeen

Degrees 0.095235 0.008220 11.59 0.000

S = 0.1314 R-Sq = 92.46% R-Sq(adj) = 91.7%

Analysis of Variance

Source DF SS MS F PRegression 1 2.3162 2.3162 134.24 0.000Residual Error 11 0.1898 0.0173Total 12 2.5060

255. {Oil Quality and Price Narrative} Predict with 95% confidence the oil price per barrel for an API degree of 35.

ANSWER:12.768 (2.201)(0.1314)(1.038) = 12.768 0.300 . Thus, LCL = 12.468, and UCL = 13.068

256. {Oil Quality and Price Narrative} Estimate with 95% confidence the average oil price per barrel for an API degree of 35.

ANSWER:12.768 (2.201)(0.1314)(0.2785) = 12.768 0.081. Thus, LCL = 12.687, and UCL = 12.849

257. {Oil Quality and Price Narrative} Which interval in the previous two questions is narrower: the confidence interval estimate of the expected value of y or the prediction interval for the same given value of x (10 years) and same confidence level? Why?

ANSWER:The confidence interval estimate of the expected value of y is narrower than the prediction interval for the same given value of x (10 years) and some confidence level. This is because there is less error in estimating a mean value as opposed to predicting an individual value.

Page 61: Statistics-17 by Keller

Simple Linear Regression and Correlation 661

SECTION 7

MULTIPLE CHOICE QUESTIONS

In the following multiple-choice questions, please circle the correct answer.

258. The standardized residual is defined as:a. residual divided by the standard error of estimateb. residual multiplied by the square root of the standard error of estimatec. residual divided by the square of the standard error of estimated. residual multiplied by the standard error of estimateANSWER: a

259. The least squares method requires that the variance of the error variable is a constant no matter what the value of x is. When this requirement is violated, the condition is called:a. non-independence of b. homoscedasticityc. heteroscedasticityd. influential observationANSWER: c

260. When the variance of the error variable is a constant no matter what the value of x is, this condition is called:a. homocausalityb. heteroscedasticityc. homoscedasticityd. heterocausalityANSWER: c

261. If the plot of the residuals is fan shaped, which assumption of regression analysis if violated?a. Normalityb. Homoscedasticityc. Independence of errorsd. No assumptions are violated, the graph should resemble a fanANSWER: b

Page 62: Statistics-17 by Keller

662 Chapter Seventeen

262. In regression analysis we use the Spearman rank correlation coefficient to measure and test to determine whether a relationship exists between the two variables if a. one or both variables may be ordinalb. both variables are interval but the normality requirement is not metc. both (a) and (b)d. neither (a) nor (b)ANSWER: c

263. The sample Spearman rank correlation coefficient, where a and b are the ranks of x and y, respectively, is given bya.

b.

c.

d.ANSWER: d

Page 63: Statistics-17 by Keller

Simple Linear Regression and Correlation 663

TRUE / FALSE QUESTIONS

264. The variance of the error variable is required to be constant. When this requirement is satisfied, the condition is called homoscedasticity.ANSWER: T

265. The variance of the error variable is required to be constant. When this requirement is violated, the condition is called heteroscedasticity.ANSWER: T

266. We standardize residuals in the same way we standardize all variables, by subtracting the mean and dividing by the variance.ANSWER: F

267. An outlier is an observation that is unusually small or unusually large.ANSWER: T

268. One method of diagnosing heteroscedasticity is to plot the residuals against the predicted values of y, then look for a change in the spread of the plotted values.ANSWER: T

269. Regardless of the value of x, the standard deviation of the distribution of y values about the regression line is the same. This assumption of equal standard deviations about the regression line is called residual analysis.ANSWER: F

270. Data that exhibit an autocorrelation effect violate the regression assumption of independence.ANSWER: T

271. When n is greater than 30, the sample Spearman rank correlation coefficient is approximately normally distributed with mean of 0 and standard deviation of 1.ANSWER: F

272. Given that n = 37, and the value of sample Spearman rank correlation coefficient = 0.35, the value of the test statistic for testing is z = 2.10ANSWER: T

273. Another name for Pearson coefficient of correlation is the Spearman rank correlation coefficient.ANSWER: F

Page 64: Statistics-17 by Keller

664 Chapter Seventeen

STATISTICAL CONCEPTS & APPLIED QUESTIONS

FOR QUESTIONS 274 THROUGH 278, USE THE FOLLOWING NARRATIVE:Narrative: Sales and ExperienceThe general manager of a chain of furniture stores believes that experience is the most important factor in determining the level of success of a salesperson. To examine this belief she records last month’s sales (in $1,000s) and the years of experience of 10 randomly selected salespeople. These data are listed below.

Salesperson Years of Experience Sales1 0 72 2 93 10 204 3 155 8 186 5 147 12 208 7 179 20 3010 15 25

274. {Sales and Experience Narrative} Use the regression equation to determine the predicted values of y.

ANSWER:: 8.630, 10.793, 19.447, 11.875, 17.284, 14.039, 21.610, 16.202, 30.264, and 24.856

275. {Sales and Experience Narrative} Use the predicted and actual values of y to calculate the residuals.

ANSWER:: –1.630, -1.793, 0.553, 3.125, 0.716, -0.039, -1.610, 0.798. –0.264, and 0.144

Page 65: Statistics-17 by Keller

Simple Linear Regression and Correlation 665

276. {Sales and Experience Narrative} Plot the residuals against the predicted values of y. Does the variance appear to be constant?

ANSWER:

It appears that heteroscedasticity is not a problem.

277. {Sales and Experience Narrative} Compute the standardized residuals.

ANSWER:–1.100, -1.210, 0.373, 2.108, 0.483, -0.026, -1.086, 0.538, -0.178, and 0.097

278. {Sales and Experience Narrative} Identify possible outliers.

ANSWER:The point (3, 15) is a possible outlier since its standardized residual 2.108 exceeds 2.0.

FOR QUESTIONS 279 THROUGH 283, USE THE FOLLOWING NARRATIVE:Narrative: Income and EducationA professor of economics wants to study the relationship between income (y in $1000s) and education (x in years). A random sample eight individuals is taken and the results are shown below.

Education 16 11 15 8 12 10 13 14Income 58 40 55 35 43 41 52 49

279. {Income and Education Narrative} Use the regression equation to determine the predicted values of y.

ANSWER:: 57.173, 42.624, 54.263, 33.895, 45.534, 39.714, 48.444, and 51.353

Page 66: Statistics-17 by Keller

666 Chapter Seventeen

280. {Income and Education Narrative} Use the predicted and actual values of y to calculate the residuals.

ANSWER:: 0.877, -2.624, 0.737, 1.105, -2.534, 1.286, 3.556, and –2.353.

281. {Income and Education Narrative} Plot the residuals against the predicted values of y. Does the variance appear to be constant?

ANSWER:

It appears that heteroscedasticity is not a problem.

282. {Income and Education Narrative} Compute the standardized residuals.

ANSWER:0.367, -1.164, 0.327, 0.490, -1.124, 0.570, 1.577, and –1.044

283. {Income and Education Narrative} Identify possible outliers.

ANSWER:No outliers exist, since no observation has standard residual whose absolute value exceeds 2.0.

FOR QUESTIONS 284 THROUGH 288, USE THE FOLLOWING NARRATIVE:Narrative: Game Winnings and EducationAn ardent fan of television game shows has observed that, in general, the more educated the contestant, the less money he or she wins. To test her belief she gathers data about the last eight winners of her favorite game show. She records their winnings in dollars and the number of years of education. The results are as follows.

Page 67: Statistics-17 by Keller

Simple Linear Regression and Correlation 667

Contestant Years of Education Winnings1 11 7502 15 4003 12 6004 16 3505 11 8006 16 3007 13 6508 14 400

284. {Game Winnings and Education Narrative} Use the regression equation to determine the predicted values of y.

ANSWER:: 754.167, 397.500, 665.000, 308.333, 754.167, 308.333, 575.833, and 486.667

285. {Game Winnings and Education Narrative} Use the predicted and actual values of y to calculate the residuals.

ANSWER:: –4.167, 2.500, -65.000, 41.667, 45.833, -8.333, 74.167, and –86.667

286. {Game Winnings and Education Narrative} Plot the residuals against the predicted values. Does the variance appear to be constant.

ANSWER:

The variance appears to be constant.

Page 68: Statistics-17 by Keller

668 Chapter Seventeen

287. {Game Winnings and Education Narrative} Compute the standardized residuals.

ANSWER:The standardized residuals are: –0.076, 0.045, -1.182, 0.758, 0.833, -0.152, 1.349, and –1.576.

288. {Game Winnings and Education Narrative} Identify possible outliers.

ANSWER:No outliers exist, since no observation has standard residual whose absolute value exceeds 2.0.

FOR QUESTIONS 289 THROUGH 293, USE THE FOLLOWING NARRATIVE:Narrative: Movie RevenuesA financier whose specialty is investing in movie productions has observed that, in general, movies with “big-name” stars seem to generate more revenue than those movies whose stars are less well known. To examine his belief he records the gross revenue and the payment (in $ millions) given to the two highest-paid performers in the movie for ten recently released movies.

Movie Cost of Two Highest Paid Performers

Gross Revenue

1 5.3 482 7.2 653 1.3 184 1.8 205 3.5 316 2.6 267 8.0 738 2.4 239 4.5 3910 6.7 58

289. {Movie Revenues Narrative} Use the regression equation to determine the predicted values of y.

ANSWER:: 48.137, 63.878, 14.996, 19.139, 33.223, 25.767, 70.506, 24.110, 41.508, and 59.736.

290. {Movie Revenues Narrative} Use the predicted and actual values of y to calculate the residuals.

ANSWER:: -0.137, 1.122, 3.004, 0.861, -2.223, 0.233, 2.494, -1.110, –2.508, and –1.736

Page 69: Statistics-17 by Keller

Simple Linear Regression and Correlation 669

291. {Movie Revenues Narrative} Plot the residuals against the predicted values of y. Does the variance appear to be constant.

ANSWER:

It appears that heteroscedasticity is not a problem.

292. {Movie Revenues Narrative} Compute the standardized residuals.

ANSWER:The standardized residuals are: –0.072, 0.588, 1.574, 0.451, -1.165, 0.122, 1.306, -0.581, -1.314, and –0.909.

293. {Movie Revenues Narrative} Identify possible outliers.

ANSWER:No outliers exist, since no observation has standardized residual whose absolute value exceeds 2.0.

FOR QUESTIONS 294 THROUGH 301, USE THE FOLLOWING NARRATIVE:Narrative: Willie Nelson ConcertAt a recent Willie Nelson concert, a survey was conducted that asked a random sample of 20 people their age and how many concerts they have attended since the first of the year. The following data were collected:

Age 62 57 40 49 67 54 43 65 54 41Number of Concerts 6 5 4 3 5 5 2 6 3 1

Age 44 48 55 60 59 63 69 40 38 52Number of Concerts 3 2 4 5 4 5 4 2 1 3

Page 70: Statistics-17 by Keller

SUMMARY OUTPUT DESCRIPTIVE STATISTICS

Regression Statistics Age ConcertsMultiple R 0.80203 Mean 53 Mean 3.65R Square 0.64326 Standard Error 2.1849 Standard Error 0.3424Adjusted R Square 0.62344 Standard Deviation 9.7711 Standard Deviation 1.5313Standard Error 0.93965 Sample Variance 95.4737 Sample Variance 2.3447Observations 20 Count 20 Count 20

SPEARMAN RANK CORRELATION COEFFICIENT=0.8306

ANOVAdf SS MS F Significance F

Regression 1 28.65711 28.65711 32.45653 2.1082E-05Residual 18 15.89289 0.88294Total 19 44.55

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept -3.01152 1.18802 -2.53491 0.02074 -5.50746 -0.5156Age 0.12569 0.02206 5.69706 0.00002 0.07934 0.1720

670 Chapter Seventeen

An Excel output follows :

294. {Willie Nelson Concert Narrative} Use the regression equation to determine the predicted values of y.

ANSWER:The predicted values are:4.781 4.153 2.016 3.147 5.410 3.776 2.393 5.158 3.776 2.1422.519 3.022 3.901 4.530 4.404 4.907 5.661 2.016 1.765 3.524

295. {Willie Nelson Concert Narrative} Use the predicted values and the actual values of y to calculate the residuals.

ANSWER:The residuals are:

1.219 0.847 1.984 -0.147 -0.410 1.224 -0.393 0.842 -0.776 -1.1420.481 -1.022 0.099 0.470 -0.404 0.093 -1.661 -0.016 -0.765 -0.524

296. {Willie Nelson Concert Narrative} Plot the residuals in against the predicted values .

Page 71: Statistics-17 by Keller

Simple Linear Regression and Correlation 671

ANSWER:

297. {Willie Nelson Concert Narrative} Does it appear that heteroscedasticity is a problem? Explain.

ANSWER: The variance of the error variable appears to be constant; therefore heteroscedasticity is not a problem.

298. {Willie Nelson Concert Narrative} Draw a histogram of the residuals.

ANSWER:

299. {Willie Nelson Concert Narrative} Does it appear that the errors are normally distributed? Explain.

Histogram

0

2

4

6

8

10

-1 0 1 2

Residuals

Freq

uen

cy

Page 72: Statistics-17 by Keller

672 Chapter Seventeen

ANSWER: The histogram is positively skewed. The errors may not be normally distributed.

300. {Willie Nelson Concert Narrative} Use the residuals to compute the standardized residuals.

ANSWER: The standardized residuals are:1.297 0.902 2.111 -0.157 -0.436 1.303 -0.418 0.896 -0.826 -1.2150.512 -1.087 0.105 0.500 -0.430 0.099 -1.768 -0.017 -0.814 -0.558

301. {Willie Nelson Concert Narrative} Identify possible outliers.

ANSWER: There are no outliers since none of the 20 observations has a standardized residual whose absolute value exceeds 2.0.

FOR QUESTIONS 302 THROUGH 309, USE THE FOLLOWING NARRATIVE:Narrative: Oil Quality and PriceQuality of oil is measured in API gravity degrees – the higher the degrees API, the higher the quality. The table shown below is produced by an expert in the field who believes that there is a relationship between quality and price per barrel.

A partial Minitab output follows:

Descriptive StatisticsVariable N Mean StDev SE Mean

Oil degrees API Price per barrel (in $)27.0 12.0228.5 12.0430.8 12.3231.3 12.2731.9 12.4934.5 12.7034.0 12.8034.7 13.0037.0 13.0041.0 13.1741.0 13.1938.8 13.2239.3 13.27

Page 73: Statistics-17 by Keller

Simple Linear Regression and Correlation 673

Degrees 13 34.60 4.613 1.280Price 13 12.730 0.457 0.127

CovariancesDegrees PriceDegrees 21.281667Price 2.026750 0.208833

Regression AnalysisPredictor Coef StDev T PConstant 9.4349 0.2867 32.91 0.000Degrees 0.095235 0.008220 11.59 0.000

S = 0.1314 R-Sq = 92.46% R-Sq(adj) = 91.7%

Analysis of Variance

Source DF SS MS F PRegression 1 2.3162 2.3162 134.24 0.000Residual Error 11 0.1898 0.0173Total 12 2.5060

302. {Oil Quality and Price Narrative} Use the regression equation to determine the predicted values of y.

ANSWER:The predicted values are: 12.006, 12.149, 12.368, 12.416, 12.473, 12.721, 12.673,

12.740, 12.959, 13.340, 13.340, 13.130, and 13.178.

303. {Oil Quality and Price Narrative} Use the predicted values and the actual values of y to calculate the residuals.

ANSWER:The residuals are: 0.014, -0.109, -0.048, -0.146, 0.017, -0.021, 0.127, 0.260, 0.041, -0.170, -0.150, 0.090, and 0.092.

304. {Oil Quality and Price Narrative} Plot the residuals against the predicted values .

Page 74: Statistics-17 by Keller

674 Chapter Seventeen

ANSWER:

305. {Oil Quality and Price Narrative} Does it appear that heteroscedasticity is a problem? Explain.

ANSWER: The variance of the error variable appears to be constant; therefore heteroscedasticity is not a problem.

306. {Oil Quality and Price Narrative} Draw a histogram of the residuals.

ANSWER:

307. {Oil Quality and Price Narrative} Does it appear that the errors are normally distributed? Explain.

13.413.213.012.812.612.412.212.0

0.3

0.2

0.1

0.0

-0.1

-0.2

Fitted Value

Res

idua

lResiduals Versus the Fitted Values

(response is Price)

0.30.20.10.0-0.1-0.2

5

4

3

2

1

0

Residual

Fre

quen

cy

Histogram of the Residuals(response is Price)

Page 75: Statistics-17 by Keller

Simple Linear Regression and Correlation 675

ANSWER: The histogram is fairly symmetric; therefore we may conclude that the errors are normally distributed.

308. {Oil Quality and Price Narrative} Use the residuals to compute the standardized residuals.

ANSWER: The standardized residuals are: 0.105, -0.830, -0.366, -1.109, 0.130, -0.156, 0.967, 1.982, 0.315, -1.290, -1.138, 0.685, and 0.703.

309. Identify possible outliers.

ANSWER: There are no outliers since none of the 13 observations has a standardized residual whose absolute value exceeds 2.0. However, observation 9 with standardized residual of 1.982 may be an outlier.