technology - winona state universitycourse1.winona.edu/cmalone/stat360/notes/handout6.docx · web...

19
Handout #6: Confidence and Prediction Intervals for Predictions Section 6.1 : Modeling Used Car Prices Example 6.1 : The first example in this handout will use the CarPrices datasets from our course website. This dataset includes a various variables that are thought to influence the price of a vehicle. A snip-it of the data is provided here. Simple Linear Regression Setup Model to be fit using only used cars, i.e., New=No. Response Variable: Price Predictor Variable: Miles Assume the following structure for mean and variance functions o E ( Price Miles,New =No ) =β 0 +β 1 Miles o Var ( Price Miles,New =No ) =σ 2 The first step in running an analysis in JMP is to subset or filter the original dataset to includes only the used vehicles. This can be done easily in JMP. First, select Analyze > Distribution. Place New in the Y, Columns box and click OK. Select Analyze > Distribution, place New in Y, Columns box. Double clicking on New=No, will create a new dataset that includes only the New=No vehicles 1

Upload: others

Post on 14-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

Handout #6: Confidence and Prediction Intervals for Predictions

Section 6.1 : Modeling Used Car Prices

Example 6.1: The first example in this handout will use the CarPrices datasets from our course website. This dataset includes a various variables that are thought to influence the price of a vehicle. A snip-it of the data is provided here.

Simple Linear Regression Setup

Model to be fit using only used cars, i.e., New=No. Response Variable: Price Predictor Variable: Miles Assume the following structure for mean and variance functions

o E (Price∨Miles ,New=No )=β0+β1∗Mileso Var (Price∨Miles , New=No )=σ2

The first step in running an analysis in JMP is to subset or filter the original dataset to includes only the used vehicles. This can be done easily in JMP. First, select Analyze > Distribution. Place New in the Y, Columns box and click OK.

Select Analyze > Distribution, place New in Y, Columns box.

Double clicking on New=No, will create a new dataset that includes only the New=No vehicles

1

Page 2: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

JMP clearly names this new subset of the original data. This new subset is called CarPrices (New=No). We can see below this dataset includes 170 of the observations from the original CarPrices dataset.

First, consider a scatterplot to visualize the relationship between Price | Miles, New=No.

Questions

1. What is the general pattern/trend/relationship between Price and Miles?

2. Do you think the assumed mean function stated above is appropriate? How about the assumed form for the variance function? Discuss.

2

Page 3: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

Lowess smoother for Price | Miles, New=No Lowess smoother of the variability

Consider again the much simpler form for the mean function.

E (Price∨Miles ,New=No )=β0+β1∗Miles

To fit this model in JMP, select Analyze > Fit Model. Put Price, i.e. the response, in the Y box, and Miles, i.e. the predictor variable, in the Construct Model Effect box. Click Run.

3

Page 4: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

The distribution of Price | Miles, New=No with the estimated mean function and the summary of fit output as provided by JMP.

Questions

3. Does a linear mean function appear to fit this data well? Discuss.

4. The New=No filtered dataset had 170 observations (see JMP spreadsheet on p2), but it appears only 169 were used in our regression analysis (see Observations (or Sum Wgts) = 169). Why is this the case?

5. What is the best estimate for the variance in the condition distribution, i.e.,

Var (Price|Miles, New=No )= σ2?

6. The standard deviation is simply the square root of the variance. Compute this quantity for the conditional distribution given here. That is, compute √ Var (Price|Miles ,New=No )=√ σ2. Does this match the Root Mean Square Error computed by JMP?

7. What is the interpretation of the Root Mean Square Error quantity in the context of this problem?

8. What is the interpretation of the R2 value for this model?

4

Page 5: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

Consider again the form of the mean function and the parameter estimate portion of the output provided by JMP.

E (Price∨Miles ,New=No )=β0+β1∗Miles

Questions

9. What is the best estimate for the slope of the true mean function? That is, what is β1?

10. What is the best estimate for the y-intercept of the true mean function? That is, what is β0?

11. Write out the estimated mean function using the estimated parameters.

12. Interpret, in context and using laymen’s language, the slope in the above equation.

13. Interpret, in context and using laymen’s language, the y-intercept in the above equation.

The 95% confidence intervals for the parameters in the true mean function can be obtained by selecting Show All Confidence Intervals under Regression Reports from the red drop-down menu in JMP.

5

Page 6: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

Questions

14. Interpret, in context and using laymen’s language, the 95% confidence interval for β1?

15. Interpret, in context and using laymen’s language, the 95% confidence interval for β0?

The estimated mean function is given by the quantity

E (Price∨Miles ,New=No ) ¿ β0+ β1∗Miles¿ ¿ ¿

This quantity can be used to estimate the average Price of a used car with 50,000 Miles. The math is shown here.

E (Price∨Miles=50000 , New=No ) ¿ 20889−0.10∗50000¿ ¿ ¿

The $15,889 value appears to be reasonable when we isolate vehicles near 50,000 miles in our original scatterplot.

6

Page 7: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

Confidence Interval for Predictions

Akin to all other estimated quantities, we can expect variation to exist in this estimate. That is, a different random sample will produce a different estimated price for a vehicle with 50,000 miles.

A 95% confidence interval for the quantity

E (Price∨Miles=50000 , New=No )

can be obtained directly in JMP by selecting Mean Confidence Interval from the Save Columns menu item in JMP.

7

Page 8: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

A 95% confidence interval for every observation in the dataset is provided in the JMP spreadsheet.

The JMP spreadsheet can be sorted by Miles in order to more easily find a 95% confidence interval for vehicles with 50,000 miles.

From a review of the table above, we can see there are no observations that have exactly 50,000 miles. That is, none of the confidence intervals provided are correct for the quantity

E (Price∨Miles=50000 , New=No )

Obtaining output for a New Observations To obtaining a prediction, standard error, and confidence interval for a new observation use the Formula versions of these quantities as provided in JMP.

8

Page 9: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

9

Page 10: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

Next, create a pseudo observation in JMP with the desired characteristics. For our example here, this pseudo observation will have Miles = 50000.

Questions

16. Does the predicted price (aside from rounding) provided by JMP agree with what we computed above? Discuss.

17. What is the standard error for E (Price∨Miles=50000 , New=No )? Give a practical interpretation of this quantity.

The formula for the 95% normal-based confidence interval for the average predicted value is given by

o Lower Limit=Predicted Value−c∗Standard Erroro Upper Limit=Predicted Value+c∗Standard Error

where, c is the 97.5th percentile from a t-distribution with n-2 degrees-of-freedom.

Task: Verify the calculations for the 95% confidence interval for the average predicted value here.

Lower Limit:

Upper Limit:

t-distribution with df = 169 – 2 = 167

In Excel:

10

Page 11: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

18. Interpret the 95% confidence interval the average predicted value for a used vehicle with 50,000 miles, i.e. the 95% confidence interval for the quantity E (Price∨Miles=50000 , New=No ) .

CommentThe above confidence interval is not appropriate when attempting to predict the Price for a single vehicle, but instead is a reasonable range of values for average predicted price for vehicles with 50,000 miles.

Confidence IntervalAverage Predicted Price

All vehicles have 50,000 Miles

Prediction IntervalSingle Predicted Price

Single Vehicle with 50,000 Miles

An overlay plot of the data, the estimated mean function, and the 95% confidence interval for the average predicted price for vehicles with 50,000 miles.

11

Page 12: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

Consider the plot of the estimated mean function over repeated samples. These plots were introduced in a previous handout.

Notice that the variation in the estimated mean function is smaller for some values and larger for others. In particular, the variation is smallest near the average miles and increases as miles either increases or decrease. The reason is for this is that all estimated linear mean functions must pass through the point (Average Miles, Average Price). The average number of miles for our dataset is 78,442, thus, the standard error for the average prediction will be smallest when making predictions for vehicles near 78,000 miles.

The 95% confidence interval bands for an average prediction across all miles.

12

Page 13: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

Prediction Interval for Predictions

Recall, a confidence interval is the appropriate quantity when interested in the average predicted price for a vehicle with 50,000 miles.

E (Price∨Miles=50000 , New=No )

However, a prediction interval is necessary when attempting to make predictions for a single vehicle.

Price∨Miles=50000 , New=No

The 95% prediction intervals can be obtained in JMP by selecting Indiv Confidence Interval from the Save Columns menu in JMP.

The prediction intervals are placed in the spreadsheet in JMP.

13

Page 14: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

Again, sorting the spreadsheet by Miles to obtain prediction intervals for used vehicles with close to 50,000 miles.

Similar to what was done above, in order to obtain a 95% prediction interval for a new observation not currently in the dataset, you must use the Formula versions of these quantities.

Obtaining a prediction interval for a vehicle with 50,000 miles.

14

Page 15: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

The formula for the 95% normal-based prediction interval for the single value is given by

o Lower Limit=Predicted Value−c∗Standard Erroro Upper Limit=Predicted Value+c∗St andard Error

where, c is the 97.5th percentile from a t-distribution with n-2 degrees-of-freedom.

For some reason, JMP does not compute the standard error for an individual prediction – notice the dot in the StdErr Indiv Price column for our new observation. The standard error for a prediction interval is computed as follows.

Standard Error ¿ √Var E (Price∨Miles=50000 , New=No)⏟Varibility∈MeanFunction

+ V arPrice∨Miles, New=No⏟Varibility∈individualobservations

¿ ¿ ¿4869.22¿

The actual calculations for the 95% prediction interval.

Lower Limit: $15,882.72−(1.9743∗4869.22 )

¿ $6,269.4

Upper Limit: $15,882.72+ (1.9743∗4869.22 )

¿$25,496.02

t-distribution with df = 169 – 2 = 167

In Excel:

Questions

19. Interpret the above 95% prediction interval for a single used vehicle with 50,000 miles, i.e. the 95% prediction interval for Price∨Miles=50000 , New=No.

20. In the context of this example, explicitly explain the difference in the scope of inference between the 95% confidence interval and the 95% prediction interval.

15

Page 16: Technology - Winona State Universitycourse1.winona.edu/cmalone/stat360/Notes/Handout6.docx · Web viewThe standard deviation is simply the square root of the variance. Compute this

Visual contrast of the 95% confidence interval and 95% prediction interval for used vehicles with 50,000 miles.

Plotting the 95% confidence and prediction bands across all observations in our dataset in JMP can be done using an overlay plot.

Overlay plot in JMP

16