chapter 4. linear regression and correlation · pdf filedepartment of applied mathematics...
TRANSCRIPT
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-1
Chapter 4. Linear regression and correlation analysis
4.1. Linear regression
Linear regression is applied to infer the relationship between the dependent variable
and the regressors, using a set of sample data. Prediction can be made according to the
fitted model. Generally, the true model is ExxY pp ++++= βββ ⋯110
= EpxxY+
,,1 ⋯µ , where Y is the dependent (response) variable that depends on the
ix ’s
(independent variable, regressor or predictor) and E is the error which is normally
distributed with zero mean and independently with the regressors. Given a set of data
of size n, the model becomes
iippii xxy εβββ ++++= ⋯110 , ni ,,1…= and ( )2,0~ σε �i independently;
or in matrix form
+
=
npnpn
p
p
nxx
xx
xx
y
y
y
ε
εε
β
ββ
⋮⋮
⋯
⋮⋮⋮
⋯
⋯
⋮
2
1
1
0
1
221
111
2
1
1
1
1
, that is, εεεεββββ += Xy .
The vector of the least squares estimate b is given by
( ) yXXXb ′′=
= −11
0
pb
b
b
⋮.
The fitted regression equation is
ippii xbxbby +++= ⋯110ˆ
with residual iii yye ˆ−= . The fitness of the regression line is measured by the
coefficient of determination 2R , where 10 2 ≤≤ R .
Example 1: (car.sav)
� Fit a multiple linear regression model for the model miles on weight and
temperature
Analyze
Regression
Linear
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-2
Dependent: miles
Independents: weight, temperature
Method: enter
� Analyze the results
Remark: When there are a number of candidate variables that might consider as
regressors in the regression model, we can use the method of enter, stepwise,
backward or forward to choose the appropriate subset of variables into the model.
Regression (Enter)
The linear model explains proportion
of variability in y.
Variables Entered/RemovedVariables Entered/RemovedVariables Entered/RemovedVariables Entered/Removed bbbb
Temperature inFahrenheit,Weight in tons
a . Enter
Model1
VariablesEntered
VariablesRemoved Method
All requested variables entered.a.
Dependent Variable: Miles per gallonb.
Model SummaryModel SummaryModel SummaryModel Summary
.795a .632 .527 .7412Model1
R R SquareAdjusted R
SquareStd. Error ofthe Estimate
Predictors: (Constant), Temperature in Fahrenheit, Weight intons
a.
It measures the goodness of fit useful in
multiple regression, which is adjusted by the
degrees of freedom (no. of regressors).
Absolute value of
correlation coefficient
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-3
63.2% variation of “mile” is explained by the regression model.
The ANOVA (Analysis of Variance) approach is used to test the hypothesis
0: 210 ==== pH βββ ⋯
1H : any one of the iβ ’s are non-zero
The significance of individual regressor is tested by the t-test.
0:0 =iH β
0:1 ≠iH β
The fitted regression equation is
^
21.994 2.848 0.00762mile weight temperature= − × − × .
If the temperature is fixed and the weight of an automobile is increased by 1 ton, the
gasoline mileage is expected to decrease by 2.848 miles per gallon. If the weight of an
automobile is unchanged, then the increment of temperature by one Fahrenheit will
decrease the gasoline mileage by 0.00762 miles per gallon. From the above results of
the t-test, only “weight” is significant and “temperature” is not significant at 5%
level of significance.
CoefficientsCoefficientsCoefficientsCoefficients aaaa
21.994 1.487 14.794 .000-2.848 .845 -.774 -3.372 .012
-7.620E-03 .012 -.147 -.642 .541
(Constant)Weight in tonsTemperature in Fahrenheit
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: Miles per gallona.
ANOVAANOVAANOVAANOVAbbbb
6.614 2 3.307 6.020 .030a
3.846 7 .54910.460 9
RegressionResidualTotal
Model1
Sum of Squares df Mean Square F Sig.
Predictors: (Constant), Temperature in Fahrenheit, Weight in tonsa.
Dependent Variable: Miles per gallonb.
Test for the significance of individual regressors
in the model.
The variable “temperature” is not significance at 5%
level of significance.
Test statistic for the
significance of the
regression model.
The probability of F greater
than 6.020 equal to 0.030,
therefore the regression
model is significant at 5%
level of significance.
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-4
Regression (Stepwise)
Variables Entered/RemovedVariables Entered/RemovedVariables Entered/RemovedVariables Entered/Removed aaaa
Weight intons
.
Stepwise(Criteria:Probability-of-F-to-enter <=.050,Probability-of-F-to-remove >=.100).
Model1
VariablesEntered
VariablesRemoved Method
Dependent Variable: Miles per gallona.
Model SummaryModel SummaryModel SummaryModel Summary
.781a .611 .562 .7134Model1
R R SquareAdjusted R
SquareStd. Error ofthe Estimate
Predictors: (Constant), Weight in tonsa.
ANOVAANOVAANOVAANOVAbbbb
6.388 1 6.388 12.550 .008a
4.072 8 .50910.460 9
RegressionResidualTotal
Model1
Sum of Squares df Mean Square F Sig.
Predictors: (Constant), Weight in tonsa.
Dependent Variable: Miles per gallonb.
CoefficientsCoefficientsCoefficientsCoefficients aaaa
21.639 1.329 16.284 .000-2.876 .812 -.781 -3.543 .008
(Constant)Weight in tons
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: Miles per gallona.
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-5
Fitted regression equation is weightmile ×−= 876.2639.21^
. If the weight of an
automobile is increased by 1 ton, the gasoline mileage is expected to decrease by
2.876 miles per gallon.
Regression (Backward)
Excluded VariablesExcluded VariablesExcluded VariablesExcluded Variables bbbb
-.147a -.642 .541 -.236 .997Temperature in FahrenheitModel1
Beta In t Sig.Partial
Correlation Tolerance
CollinearityStatistics
Predictors in the Model: (Constant), Weight in tonsa.
Dependent Variable: Miles per gallonb. The proportion of a
variable's variance not
accounted for by other
independent variables in
the equation (weight).
The correlation that remains
between two variables (mile &
temper) after removing the
correlation that is due to their
mutual association with the other
variables (weight).
Variables Entered/RemovedVariables Entered/RemovedVariables Entered/RemovedVariables Entered/Removed bbbb
Temperature inFahrenheit,Weight in tons
a . Enter
. Temperature inFahrenheit
Backward(criterion:ProbabilityofF-to-remove>= .100).
Model1
2
VariablesEntered
VariablesRemoved Method
All requested variables entered.a.
Dependent Variable: Miles per gallonb.
Model SummaryModel SummaryModel SummaryModel Summary
.795a .632 .527 .7412
.781b .611 .562 .7134
Model12
R R SquareAdjusted R
SquareStd. Error ofthe Estimate
Predictors: (Constant), Temperature in Fahrenheit, Weight intons
a.
Predictors: (Constant), Weight in tonsb.
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-6
Fitted regression equation is weightmile ×−= 876.2639.21^
ANOVAANOVAANOVAANOVAcccc
6.614 2 3.307 6.020 .030a
3.846 7 .54910.460 96.388 1 6.388 12.550 .008b
4.072 8 .50910.460 9
RegressionResidualTotalRegressionResidualTotal
Model1
2
Sum of Squares df Mean Square F Sig.
Predictors: (Constant), Temperature in Fahrenheit, Weight in tonsa.
Predictors: (Constant), Weight in tonsb.
Dependent Variable: Miles per gallonc.
CoefficientsCoefficientsCoefficientsCoefficients aaaa
21.994 1.487 14.794 .000-2.848 .845 -.774 -3.372 .012
-7.620E-03 .012 -.147 -.642 .54121.639 1.329 16.284 .000-2.876 .812 -.781 -3.543 .008
(Constant)Weight in tonsTemperature in Fahrenheit(Constant)Weight in tons
Model1
2
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: Miles per gallona.
Excluded VariablesExcluded VariablesExcluded VariablesExcluded Variables bbbb
-.147a -.642 .541 -.236 .997Temperature in FahrenheitModel2
Beta In t Sig.Partial
Correlation Tolerance
CollinearityStatistics
Predictors in the Model: (Constant), Weight in tonsa.
Dependent Variable: Miles per gallonb.
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-7
Regression (Forward)
Fitted regression equation is weightmile ×−= 876.2639.21^
.
Variables Entered/RemovedVariables Entered/RemovedVariables Entered/RemovedVariables Entered/Removed aaaa
Weight intons
.
Forward(Criterion:Probability-of-F-to-enter <=.050)
Model1
VariablesEntered
VariablesRemoved Method
Dependent Variable: Miles per gallona.
Model SummaryModel SummaryModel SummaryModel Summary
.781a .611 .562 .7134Model1
R R SquareAdjusted R
SquareStd. Error ofthe Estimate
Predictors: (Constant), Weight in tonsa.
ANOVAANOVAANOVAANOVAbbbb
6.388 1 6.388 12.550 .008a
4.072 8 .50910.460 9
RegressionResidualTotal
Model1
Sum of Squares df Mean Square F Sig.
Predictors: (Constant), Weight in tonsa.
Dependent Variable: Miles per gallonb.
CoefficientsCoefficientsCoefficientsCoefficients aaaa
21.639 1.329 16.284 .000-2.876 .812 -.781 -3.543 .008
(Constant)Weight in tons
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: Miles per gallona.
Excluded VariablesExcluded VariablesExcluded VariablesExcluded Variables bbbb
-.147a -.642 .541 -.236 .997Temperature in FahrenheitModel1
Beta In t Sig.Partial
Correlation Tolerance
CollinearityStatistics
Predictors in the Model: (Constant), Weight in tonsa.
Dependent Variable: Miles per gallonb.
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-8
Using the methods of selection of variables, only one regressor – weight is significant
in the regression model.
4.2. Diagnosis of the regression model
In fitting the regression model, the error terms are assumed to follow normal
distribution with zero mean and have constant variance, and they are independent. We
should check whether the assumptions are fulfilled as it may affect the validity of the
regression analysis.
Analyze
Regression
Linear
Select the Durbin-Watson Statistics to test for the first order autocorrelation of the
errors. Using the residual Plots to examine the distribution of the error using normal
probability plot or histogram, check whether the error variance follows a normal
distribution. Plot the standardized residual against the standardized fitted value of
dependent variable to see whether the error variance is a constant. Save the
standardized residuals to detect outliers. Use the scatter plot to obtain other residual
plots of standardized residual against the regressors.
Remark: remedial action should be taken if the assumptions about the error are
violated, for example
� Polynomial regression
� Reciprocal transformation of the x variable
� Log transformation of the x variable
� Log transformation of both the x and y variables
Example 2: (tele)
A company that sells transportation services uses a telemarketing division to help sell
its services. The division manager is interested in the time spent on the phone by the
telemarketers in the division. Data on the number of months of employment and the
number of calls placed per day (an average for 20 working days) is recorded for 20
employees. These data are saved in the file “tele.sav”. Perform regression analysis
yields the following results.
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-9
Regression
Variables Entered/Removedb
MONTHSa . Enter
Model
1
Variables
Entered
Variables
Removed Method
All requested variables entered.a.
Dependent Variable: CALLSb.
Model Summaryb
.935a .874 .867 1.79 .570
Model
1
R R Square
Adjusted R
Square
Std. Error
of the
Estimate Durbin-Watson
Predictors: (Constant), MONTHSa.
Dependent Variable: CALLSb.
ANOVAb
397.446 1 397.446 124.409 .000a
57.504 18 3.195
454.950 19
Regression
Residual
Total
Model
1
Sum of
Squares df
Mean
Square F Sig.
Predictors: (Constant), MONTHSa.
Dependent Variable: CALLSb.
Coefficientsa
13.671 1.427 9.580 .000 10.673 16.669
.744 .067 .935 11.154 .000 .603 .884
(Constant)
MONTHS
Model
1
B Std. Error
Unstandardized
Coefficients
Beta
Standardi
zed
Coefficien
ts
t Sig.
Lower
Bound
Upper
Bound
95% Confidence Interval
for B
Dependent Variable: CALLSa.
Durbin-Watson statistic equals to
0.57 which is less than dL=1.20 at
5% level of significance, therefore the error terms have first order
autocorrelation.
Confidence interval for the
regression coefficients
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-10
Residuals Statisticsa
21.11 35.98 28.95 4.57 20
-3.11 2.97 8.88E-16 1.74 20
-1.715 1.536 .000 1.000 20
-1.738 1.663 .000 .973 20
Predicted Value
Residual
Std. Predicted Value
Std. Residual
Minimum Maximum Mean
Std.
Deviation N
Dependent Variable: CALLSa.
Charts
Regression Standardized Residual
1.501.00.500.00-.50-1.00-1.50
Histogram
Dependent Variable: CALLS
Fre
qu
en
cy
5
4
3
2
1
0
Std. Dev = .97
Mean = 0.00
N = 20.00
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-11
Scatterplot
Dependent Variable: CALLS
Regression Standardized Predicted Value
2.01.51.0.50.0-.5-1.0-1.5-2.0
Re
gre
ssio
n S
tan
da
rdiz
ed
Re
sid
ua
l
2.0
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
-2.0
A systematic pattern can be observed in both of the residual plots. The standardized
residuals plot in a curvilinear pattern, suggesting a curvilinear component may be
omitted from the equation expressing the relationship between CALLS and MO�THS.
The plots of the standardized residuals versus the fitted values and MO�THS show
identical patterns in this case.
Normal P-P Plot of
Regression Standardized Residual
Dependent Variable: CALLS
Observed Cum Prob
1.00.75.50.250.00
Exp
ecte
d C
um
Pro
b
1.00
.75
.50
.25
0.00
If the normal probability plot close to a straight line,
the error terms follow
normal distribution.
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-12
Scatter plot is used to produce residual plot of standardized residual against the regressor. Graphs
Legacy Dialogs
Scatter
Simple
y-axis: zre_1
x-axis: months
Graph
MONTHS
403020100
Sta
nd
ard
ize
d R
esid
ua
l
2.0
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
-2.0
A second order polynomial regression will be tried.
Regression
Variables Entered/Removedb
MONTHS2,
MONTHSa . Enter
Model
1
Variables
Entered
Variables
Removed Method
All requested variables entered.a.
Dependent Variable: CALLSb.
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-13
Model Summaryb
.981a .962 .958 1.00 2.092
Model
1
R R Square
Adjusted R
Square
Std. Error
of the
Estimate Durbin-Watson
Predictors: (Constant), MONTHS2, MONTHSa.
Dependent Variable: CALLSb.
ANOVAb
437.839 2 218.920 217.503 .000a
17.111 17 1.007
454.950 19
Regression
Residual
Total
Model
1
Sum of
Squares df
Mean
Square F Sig.
Predictors: (Constant), MONTHS2, MONTHSa.
Dependent Variable: CALLSb.
Coefficientsa
-.140 2.323 -.060 .952 -5.041 4.760
2.310 .250 2.904 9.236 .000 1.782 2.838
-4.01E-02 .006 -1.992 -6.335 .000 -.053 -.027
(Constant)
MONTHS
MONTHS2
Model
1
B Std. Error
Unstandardized
Coefficients
Beta
Standardi
zed
Coefficien
ts
t Sig.
Lower
Bound
Upper
Bound
95% Confidence Interval
for B
Dependent Variable: CALLSa.
The fitted model becomes 2^
0401.0310.214.0 monthsmonthscalls ×−×+−=
Residuals Statisticsa
18.95 33.12 28.95 4.80 20
-1.54 1.73 3.73E-15 .95 20
-2.083 .868 .000 1.000 20
-1.536 1.728 .000 .946 20
Predicted Value
Residual
Std. Predicted Value
Std. Residual
Minimum Maximum Mean
Std.
Deviation N
Dependent Variable: CALLSa.
The Durbin-Watson statistic has value
2.092 and it is greater than dU=1.54 at
5% level of significance. Hence the
error terms of this regression model do
not have first order autocorrelation.
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-14
Charts
Regression Standardized Residual
1.501.00.500.00-.50-1.00-1.50
Histogram
Dependent Variable: CALLS
Fre
qu
en
cy
6
5
4
3
2
1
0
Std. Dev = .95
Mean = 0.00
N = 20.00
Normal P-P Plot of
Regression Standardized Residual
Dependent Variable: CALLS
Observed Cum Prob
1.00.75.50.250.00
Exp
ecte
d C
um
Pro
b
1.00
.75
.50
.25
0.00
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-15
Scatterplot
Dependent Variable: CALLS
Regression Standardized Predicted Value
1.0.50.0-.5-1.0-1.5-2.0-2.5
Re
gre
ssio
n S
tan
da
rdiz
ed
Re
sid
ua
l
2.0
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
-2.0
The residual plots do not have any systematic pattern that the residuals are scattered
around zero within a horizontal band. Therefore, the error terms seem to have zero
mean and constant variance.
Partial Regression Plot
Dependent Variable: CALLS
MONTHS
1.0.50.0-.5-1.0-1.5-2.0
CA
LL
S
4
2
0
-2
-4
-6
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-16
Partial Regression Plot
Dependent Variable: CALLS
MONTHS2
806040200-20-40
CA
LL
S
4
3
2
1
0
-1
-2
-3
-4
Graph
MONTHS
403020100
Sta
nd
ard
ize
d R
esid
ua
l
2.0
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
-2.0
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-17
Graph
MONTHS2
10008006004002000
Sta
nd
ard
ize
d R
esid
ua
l
2.0
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
-2.0
When a sample data point has a y value that is much different from the y values of the
other points in the sample, it is called an outlier.
Example 3: (car.sav)
Regression
Variables Entered/RemovedVariables Entered/RemovedVariables Entered/RemovedVariables Entered/Removed bbbb
Weight intons
a . Enter
Model1
VariablesEntered
VariablesRemoved Method
All requested variables entered.a.
Dependent Variable: Miles per gallonb.
Model SummaryModel SummaryModel SummaryModel Summary
.781a .611 .562 .7134Model1
R R SquareAdjusted R
SquareStd. Error ofthe Estimate
Predictors: (Constant), Weight in tonsa.
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-18
ANOVAANOVAANOVAANOVAbbbb
6.388 1 6.388 12.550 .008a
4.072 8 .50910.460 9
RegressionResidualTotal
Model1
Sum of Squares df Mean Square F Sig.
Predictors: (Constant), Weight in tonsa.
Dependent Variable: Miles per gallonb.
CoefficientsCoefficientsCoefficientsCoefficients aaaa
21.639 1.329 16.284 .000 18.575 24.704-2.876 .812 -.781 -3.543 .008 -4.748 -1.004
(Constant)Weight in tons
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Lower Bound Upper Bound95% Confidence Interval for B
Dependent Variable: Miles per gallona.
Coefficient CorrelationsCoefficient CorrelationsCoefficient CorrelationsCoefficient Correlations aaaa
1.000.659
Weight in tonsWeight in tons
CorrelationsCovariances
Model1
Weight in tons
Dependent Variable: Miles per gallona.
Residuals StatisticsResiduals StatisticsResiduals StatisticsResiduals Statistics aaaa
15.743 18.245 17.000 .8425 10-1.445 .900 .000 .6726 10-1.492 1.478 .000 1.000 10-2.026 1.261 .000 .943 10
Predicted ValueResidualStd. Predicted ValueStd. Residual
Minimum Maximum Mean Std. Deviation N
Dependent Variable: Miles per gallona.
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-19
Charts
Normal P-P Plot of Regression Standardized Residual
Dependent Variable: MILE
Observed Cum Prob
1.00.75.50.250.00
Exp
ecte
d C
um
Pro
b
1.00
.75
.50
.25
0.00
Regression Standardized Residual
1.501.00.500.00-.50-1.00-1.50-2.00
Histogram
Dependent Variable: Miles per gallonF
req
ue
ncy
3.5
3.0
2.5
2.0
1.5
1.0
.5
0.0
Std. Dev = .94
Mean = 0.00
N = 10.00
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-20
Outlier is unusual observation that deserves further examination. It does not mean that
it is useless or that it should be deleted from the analysis. When the outlier should be
deleted from the regression model, filter the observation and perform linear regression
again.
4.3. Correlation analysis
Linear association between two quantitative variables can be analyzed by correlation
analysis. Pearson correlation coefficient r ( 11 ≤≤− r ) is used to measure the strength
of linear association between two sets of sample data (say x and y). The population
correlation coefficient is estimated by the sample correlation coefficient.
( )( )( ) ( )∑∑∑
−−
−−=
22yyxx
yyxxr
ii
ii
Example 4: (car.sav)
� Compute the correlation coefficients for the variables and test for their
significance
Analyze
Correlate
Scatterplot
Dependent Variable: Miles per gallon
Regression Standardized Predicted Value
2.01.51.0.50.0-.5-1.0-1.5-2.0
Re
gre
ssio
n S
tan
da
rdiz
ed
Re
sid
ua
l1.5
1.0
.5
0.0
-.5
-1.0
-1.5
-2.0
-2.5
It is an outlier, as its studentized residual is
greater than 2 in absolute value.
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-21
Bivariate
Variables: miles, weight, temperature
Correlation coefficients: Pearson
The statistics such as mean, standard deviation, cross-product deviations and
covariances can be produced.
Correlations
Let 1ρ and 2ρ be the correlation coefficient of gasoline mileage and weight of
automobile; and gasoline mileage and temperature at the time of operation
respectively. We can test whether the correlation coefficients are significant different
from zero. The hypothesis of the correlation coefficient of gasoline mileage and
weight is
0: 10 =ρH
0: 11 ≠ρH
Descriptive StatisticsDescriptive StatisticsDescriptive StatisticsDescriptive Statistics
17.000 1.0781 101.6130 .29292 1052.50 20.850 10
Miles per gallonWeight in tonsTemperature in Fahrenheit
Mean Std. Deviation N
CorrelationsCorrelationsCorrelationsCorrelations
1 -.781** -.188. .008 .603
10.460 -2.221 -38.000
1.162 -.247 -4.22210 10 10
-.781** 1 .052.008 . .886
-2.221 .772 2.875
-.247 .086 .31910 10 10
-.188 .052 1.603 .886 .
-38.000 2.875 3912.500
-4.222 .319 434.72210 10 10
Pearson CorrelationSig. (2-tailed)Sum of Squares andCross-productsCovarianceNPearson CorrelationSig. (2-tailed)Sum of Squares andCross-productsCovarianceNPearson CorrelationSig. (2-tailed)Sum of Squares andCross-productsCovarianceN
Miles per gallon
Weight in tons
Temperature in Fahrenheit
Miles per gallon Weight in tonsTemperature in
Fahrenheit
Correlation is significant at the 0.01 level (2-tailed).**.
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-22
The sample correlation coefficient is –0.781, with significance level equals to 0.008
which is smaller than 0.05. Therefore, the null hypothesis is rejected and the
correlation coefficient of gasoline mileage and weight is significant different from
zero. The gasoline mileage and weight have significant linear association at 5% level
of significance. Similarly, the linear association between gasoline mileage and
temperature at the time of operation is tested with the hypothesis
0: 20 =ρH
0: 21 ≠ρH
The sample correlation coefficient is –0.188, with significance level equals to 0.603
which is greater than 0.05. Therefore, the null hypothesis cannot be rejected and the
correlation coefficient of gasoline mileage and temperature is not significant different
from zero. Gasoline mileage and temperature at time of operation do not have
significant linear association at 5% level of significance.
Department of Applied Mathematics Chapter 4. Linear regression and correlation analysis
Marjorie Chiu, 2009 4-23
Exercise 4
Question 1. (R&D)
A company is interested in the relationship between profit on a number of projects and two
explanatory variables. These variables are the expenditure on research and development for
the project (RD) and a measure of risk assigned at the outset of the project (RISK). The
following table shows the data on the three variables PROFIT, RISK, and RD. PROFIT is
measured in thousands of dollars and RD is measured in hundreds of dollars. Fit a multiple
linear regression model for the PROFIT, using RISK and RD as regressors.
RD RISK PROFIT RD RISK PROFIT
132.580 8.5 396 74.816 7.5 102 81.928 7.5 130 108.752 6.0 214 145.992 10.0 508 92.372 8.5 200 90.020 8.0 172 92.260 7.0 158 114.408 7.0 256 60.732 6.5 32 53.704 7.5 32 78.120 7.5 116 76.244 7.0 102 90.000 5.5 120 71.680 8.0 102 105.532 9.0 270 151.592 9.5 536 111.832 8.0 270
Interpret the results and check whether the error terms have first order autocorrelation, follow
normal distribution with zero mean and constant variance. Modify the model if necessary.
Question 2. (marketing)
The marketing department of a firm desires to develop a one-factor linear model to forecast
the sales of a product. It is believed that the most prominent factors that affect sales are
“Advertising costs” and “Price”. The following semi-annual data are collected:
Sales (in $,000) Advertising costs (in $,000) Price (in $,000)
1,320 38 6 1,440 42 6 1,480 46 6 1,520 48 5 1,440 50 5 1,560 52 5
By calculating correlation coefficients, determine the factor (i.e. advertising costs or price)
which is more relevant to sales. Find the regression equation for the factor that you have
determined.