dadm-correlation and regression
TRANSCRIPT
-
8/3/2019 DADM-Correlation and Regression
1/138
ACE INSTITUTEOF MANAGEMENTMASTEROF BUSINESS ADMINISTRATION (MBAE)
Semester: III
Credits: 2
Course Name: DATA ANALYSIS AND DECISION MODELINGEffective Date: April, 2011
Class Schedule: Wednesday and Thursday
Time: (6:00-9:00 P.M.)
-
8/3/2019 DADM-Correlation and Regression
2/138
CORRELATION ANALYSIS
-
8/3/2019 DADM-Correlation and Regression
3/138
PURPOSEOF CORRELATION ANALYSIS Population Correlation Coefficient (Rho) is Used
to Measure the Strength between the Variables
Sample Correlation Coefficientr is an Estimate of and is Used to Measure the Strength of the Linear
Relationship in the Sample Observations
3
-
8/3/2019 DADM-Correlation and Regression
4/138
CORRELATION
Mutual relationship between two or more than two variable
Variables under consideration are said to be correlated if theeffect of change in one variable tends to change in anothervariable
Example: height/weight of persons weight/blood pressure
price and supply
Demand and commodity
sales of a company and Earning per Share or Price-Earning Ratio of itsstock
income/house value,
We are interested to know what kind of relationship exist andwhat is the degree (strength) of relationship between thevariables
-
8/3/2019 DADM-Correlation and Regression
5/138
TYPESOF CORRELATION
Positive and Negative
Simple correlation
Partial correlation
Multiple correlation Linear and Non-linear
-
8/3/2019 DADM-Correlation and Regression
6/138
MEASUREMENTOFCORRELATION
SCATTERDIAGRAMMETHOD:A scatter plot is a graph of the ordered pairs (x,y) of numbersconsisting of the independent variables, x, and the dependentvariables, y.
KARL PEARSONSCOEFFICIENTOFCORRELATION
RANKMETHODfor finding the qualitative coefficient of correlation
beauty, intelligence, honesty..
-
8/3/2019 DADM-Correlation and Regression
7/138
SCATTERDIAGRAMMETHOD
The scatter shows the joint variation among the pairsof values and gives an idea about the degree anddirection of the relationship between the variables xand y
Greater the scatter of points over the graph, thelesser the relationship between the variables
If all the points lie in a straight line, there is eitherperfect positive or perfect negative correlation
-
8/3/2019 DADM-Correlation and Regression
8/138
The nearer the points are to the straight line thehigh degree of correlation and the farther the pointsare to the straight line the low degree of correlation
If the points are widely scatted and no trend arerevealed, the variables may be uncorrelated
It does not provide an exact measure of the extentof the relationship between the variables
-
8/3/2019 DADM-Correlation and Regression
9/138
GRAPHICAL EXPLORE:
SCATTER
PLOT
(THE
COLLECTION
OF
DOT
CORRESPONDING
TO
(X
I,Y
I))
-
8/3/2019 DADM-Correlation and Regression
10/138
PERFECT POSITIVE CORRELATION
20100
60
50
40
30
20
x
y
r=1
PERFECT NEGATIVE CORRELATION
20100
120
110
100
90
80
x
y
r = -1
-
8/3/2019 DADM-Correlation and Regression
11/138
EXAMPLESOFRVALUES:
-
8/3/2019 DADM-Correlation and Regression
12/138
EXAMPLE
Independent variable inthis example is thenumber of hours studied.
The grade the student
receives is a dependentvariable.
The grade studentreceives depend upon thenumber of hours he or she
will study. Are these two variables
related?
Student Hoursstudied
% Grade
A 6 82
B 2 63
C 1 57
D 5 88
E 3 68
F 2 75
-
8/3/2019 DADM-Correlation and Regression
13/138
SCATTER PLOT
the independent variable is plotted on the horizontal x-axis. The dependent variable is plotted on the verticaly-axis.
Scatter Plot
0
20
40
60
80
100
0 1 2 3 4 5 6 7
Hours Studied
Grade(%)
-
8/3/2019 DADM-Correlation and Regression
14/138
RANGEOFCORRELATIONCOEFFICIENT
In case of exactpositive linearrelationship the value
of r is +1. In case of a strong
positive linearrelationship, the value
of rwill be close to +1.
Correlation = +1
15
20
25
10 12 14 16 18 20
Independent variable
Dependentvariable
-
8/3/2019 DADM-Correlation and Regression
15/138
RANGEOFCORRELATIONCOEFFICIENT
In case of exactnegative linearrelationship the
value of ris1. In case of a strong
negative linearrelationship, the
value of rwill beclose to 1.
Correlation = -1
15
20
25
10 12 14 16 18 20
Independent variable
Dependen
t
variable
-
8/3/2019 DADM-Correlation and Regression
16/138
RANGEOFCORRELATIONCOEFFICIENT
In case of a weakrelationship the valueof rwill be close to 0i.e. absence of linearrelationship.
the low or zero valueof r means that therelationship is notlinear but there couldbe other type ofrelationship.
Correlation = 0
10
15
20
25
30
0 2 4 6 8 10 12
Independent variable
Dependentvariable
x y
1 0
0 1
-1 0
0 -1
122
yx
-
8/3/2019 DADM-Correlation and Regression
17/138
RANGEOFCORRELATIONCOEFFICIENT
In case of nonlinearrelationship the valueof rwill be close to 0.
Correlation = 0
0
10
20
30
0 2 4 6 8 10 12
Independent variable
Dependentvariable
-
8/3/2019 DADM-Correlation and Regression
18/138
KARL PEARSON CORRELATIONCOEFFICIENT
(rho), for population values rfor sample values
usually denoted by r(x,y), or rxy, simply r
r =is a numerical measure of relationshipbetween them
-
8/3/2019 DADM-Correlation and Regression
19/138
(PEARSONPRODUCT-MOMENT)SAMPLECORRELATION
)()(
),(
yVarxVar
yxCovr
yyxx
xy
SS
Sr
-
8/3/2019 DADM-Correlation and Regression
20/138
FEATURESOFANDrUnit Free
Range between -1 and 1
The Closer to -1, the Stronger the Negative
Linear Relationship
The Closer to 1, the Stronger the Positive
Linear Relationship
The Closer to 0, the Weaker the Linear
Relationship
20
-
8/3/2019 DADM-Correlation and Regression
21/138
EXAMPLE:
Numbers of weeks
(in the program)
Speed gain
(words per minute)
3 86
5 118
2 49
8 493
6 164
9 232
-
8/3/2019 DADM-Correlation and Regression
22/138
r =0.991
-
8/3/2019 DADM-Correlation and Regression
23/138
EXAMPLE:COMPUTECOEFFICIENTOFCORRELATION
X Y
6 9
2 1110 ?
4 8
8 7
Arithmetic mean of X and Y-series are 6 and 8
EXAMPLE
-
8/3/2019 DADM-Correlation and Regression
24/138
EXAMPLETHEFOLLOWINGDATAPERTAINTOTHEDEMANDFORAPRODUCT (INTHOUSANDSOFUNITS) ANDITSPRICE (IN RS.)CHARGEDINFIVEDIFFERENTAREAS;
Price
x
Demand
y
20 22
16 41
10 141
11 89
14 56
Draw a scatter diagram
Calculate the coefficient of correlation.
-
8/3/2019 DADM-Correlation and Regression
25/138
EXAMPLETHEANNUALLABORWELFAREFUNDS (LAKHSOFRUPEES) ANDTHECORRESPONDINGANNUALPRODUCTION (INCORESOFRUPEES) FORTHEPAST 8 YEARSOFACOMPANYAREPRESENTEDBELOW.
Year Price
x
Demand
y
1 8 18
2 10 28
3 12 35
4 14 45
5 16 50
6 18 70
7 20 858 22 95
Draw a scatter diagram
Calculate the coefficient of correlation annual labor welfare funds and thecorresponding annual production. Also test the significance of the correlationcoefficient at a significance level of 0.05
-
8/3/2019 DADM-Correlation and Regression
26/138
HYPOTHESIS TESTING
Null hypothesis: =0 (two variables are not associated)
Alternative hypothesis: 0 (two variables are associated)
Level of significance =0.05
Test statistic
Decision : if null hypothesis is rejected there is arelationship between the two variables.
2n
r1
-rt
2
-
8/3/2019 DADM-Correlation and Regression
27/138
t- TESTFOR CORRELATION
Hypotheses
H0:= 0 (No Correlation)
H1: 0 (Correlation)
Test Statistic
2
2 1
2 2
1 1
where
2
n
i i
i
n n
i i
i i
rt
r
n
X X Y Y
r r
X X Y Y
27
-
8/3/2019 DADM-Correlation and Regression
28/138
HYPOTHESIS TESTING
)1,0(~1111
ln2
3
31
Nr
rn
n
zZ z
r
rz
1
1ln
2
1
3
1
n
Null hypothesis: = 0 (two variables are not associated)
Alternative hypothesis: 0 (two variables areassociated)
Level of significance =0.05
Test statistic
Decision : if null hypothesis is rejected there is arelationship between the two variables.
1
1ln
2
1z
-
8/3/2019 DADM-Correlation and Regression
29/138
EXAMPLECOEFFICIENTOFCORRELATION BASEDONASAMPLEOFSIZE18 WASCOMPUTEDTOBE 0.32. CANWECONCLUDEAT
SIGNIFICANCELEVELSOFA) 0.05 B) 0.01
Null hypothesis: =0 (two variables are not associated)
Alternative hypothesis: > 0 One tail test Alternative hypothesis: 0 Two tail test
-
8/3/2019 DADM-Correlation and Regression
30/138
EXAMPLECOEFFICIENTOFCORRELATION BASEDONASAMPLEOFSIZE24 WASCOMPUTEDTOBE 0.75. CANWECONCLUDEATSIGNIFICANCELEVELSOFA) 0.05 B) 0.01
Null hypothesis: =0.60 (two variables are not associated)
Alternative hypothesis: > 0.60 One tail test Alternative hypothesis: 0.60 Two tail test
-
8/3/2019 DADM-Correlation and Regression
31/138
CONFIDENCEINTERVALFOR
33
22
n
zz
n
zz z
z
EXAMPLE:
-
8/3/2019 DADM-Correlation and Regression
32/138
EXAMPLE:IFR = 0.7 FORTHEMATHEMATICSANDSTATISTICSGRADESOF 30STUDENTS, CONSTRUCT 95% CONFIDENCEINTERVALFORTHEPOPULATIONCORRELATIONCOEFFICIENT.
r = 0.70, n = 30, andz0.025=1.96
z that correspond to r =0.70 from table is 0.867
95% confidence interval for the population correlationcoefficient
85.045.0
27
96.1867.0
27
96.1867.0
33
22
z
z
n
z
z
n
z
z
-
8/3/2019 DADM-Correlation and Regression
33/138
construct 95% confidence interval for thepopulation correlation coefficient when
a) r = 0.72, n = 30
b) r = 0.35, n = 40
c) r = -0.87, n = 35,
d) r = 0.16, n = 42,
-
8/3/2019 DADM-Correlation and Regression
34/138
construct 99% confidence interval for thepopulation correlation coefficient when
a) r = 0.72, n = 30
b) r = 0.35, n = 40
c) r = -0.87, n = 35,
d) r = 0.16, n = 42,
S
-
8/3/2019 DADM-Correlation and Regression
35/138
STRENGTHVS. SIGNIFICANCEOFTHECORRELATION:
the significance, given by P-value, depends on thestatistical evidence. When small, the correlationexists.
the strength, given by the r value, is meaningful only
it is supported by statistical significance.
-
8/3/2019 DADM-Correlation and Regression
36/138
R2=12.70%
Means that the variables in the model explainsabout 12.70% of the total variation in that age
-
8/3/2019 DADM-Correlation and Regression
37/138
-
8/3/2019 DADM-Correlation and Regression
38/138
r = .6 r = 1
SAMPLEOF OBSERVATIONSFROM VARIOUSr
VALUES
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1 r = -.6 r = 0
38
EXAMPLE: PRODUCE STORES
-
8/3/2019 DADM-Correlation and Regression
39/138
EXAMPLE: PRODUCE STORES
R eg ressio n S tatistics
M u l t ip le R 0 . 9 7 0 5 5 7 2
R S q u a re 0 . 9 4 1 9 8 1 2 9A d j u s t e d R S q u a r e 0 . 9 3 0 3 7 7 5 4
S t a n d a rd E r ro r 6 1 1 . 7 5 1 5 1 7
O b s e rva t io n s 7
From Excel Printout r
Is there any
evidence of linearrelationship betweenAnnual Sales of astore and its Square
Footage at .05 level
H0:= 0 (No association)
H1: 0 (Association)
.05df7 - 2 = 5
39
AnnualStore Square Sales
Feet ($000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,3186 2,208 5,563
7 1,313 3,760
-
8/3/2019 DADM-Correlation and Regression
40/138
EXAMPLE: PRODUCE STORES SOLUTION
0 2.5706-2.5706
.025Reject Reject.025
Critical Value(s):
Conclusion:There is evidence of a linearrelationship at 5% level ofsignificance
Decision:Reject H0
2
.97069.0099
1 .9420
52
rt
r
n
The value of the t statistic isexactly the same as the t statisticvalue for test on the slopecoefficient 40
-
8/3/2019 DADM-Correlation and Regression
41/138
SIMPLE REGRESSION
-
8/3/2019 DADM-Correlation and Regression
42/138
TOPICS
Introduction
Types of Regression Models
Determining the Simple Linear Regression
Equation
Interpretation of regression coefficients
42
-
8/3/2019 DADM-Correlation and Regression
43/138
INTRODUCTION
Decisions based on forecast Relationship between variables between what is
known and what is to be estimated
e.g. relationship between annual sales and size of store
e.g. relationship between annual profits and investmentin R&D
Regression and Correlation Analyses Determine nature and strength of relationship
Simple Regression Model develops relationship
between a response variable and ONE explanatoryvariable (independent variable)
Simple Regression Analysis determines degreeto which variables are related, how best the modeldescribes the relationship
43
-
8/3/2019 DADM-Correlation and Regression
44/138
PURPOSEOF REGRESSION ANALYSIS
Regression Analysis is Used Primarily to Model
Causality and Provide Prediction
Predict the values of a dependent (response) variable
based on values of at least one independent
(explanatory) variable e.g. predict annual sales based
on expenditure in advertising
Explain the effect of the independent variables on the
dependent variable
44
-
8/3/2019 DADM-Correlation and Regression
45/138
Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship
TYPESOF RELATIONSHIPS
45
-
8/3/2019 DADM-Correlation and Regression
46/138
SIMPLE LINEAR REGRESSION MODEL
Relationship Between Variables is Described by
a Linear Function
The Change of One Variable Causes the Other
Variable to Change
A Dependency of One Variable on the Other
46
-
8/3/2019 DADM-Correlation and Regression
47/138
PopulationRegressionLine(conditional mean)
Population regression line is a straight line thatdescribes the dependence of the averagevalue (conditional mean) of one variable on the
otherPopulationY intercept
PopulationSlopeCoefficient
RandomError
Dependent(Response)Variable
Independent(Explanatory)Variable
ii iY X |Y X
SIMPLE LINEAR REGRESSION MODEL(continued)
47
-
8/3/2019 DADM-Correlation and Regression
48/138
ii iY X
= Random Error
Y
X
(Observed Value of Y) =
Observed Value of Y
|Y X iX
i
(Conditional Mean)
SIMPLE LINEAR REGRESSION MODEL(continued)
48
-
8/3/2019 DADM-Correlation and Regression
49/138
Sample regression line provides an estimateofthe population regression line as well as apredicted value of Y
SampleY Intercept
SampleSlopeCoefficient
Residual0 1i ii
b bY X e
0 1
Y b b X Simple Regression Equation(Fitted Regression Line, Predicted Value)
LINEAR REGRESSION EQUATION
49
-
8/3/2019 DADM-Correlation and Regression
50/138
b0
andb1
are obtained by finding the values ofb0
andb1
that minimizes the sum of the squared
residuals (Least Squares Method)
b0provides an estimateof 0
b1
provides and estimateof 1
(continued)
2
2
1 1
n n
i i i
i i
Y Y e
LINEAR REGRESSION EQUATION
50
-
8/3/2019 DADM-Correlation and Regression
51/138
LEAST SQUARES METHOD
b0
= Y -b1X
b1
=xy (xy)/n
x2- (x)2/n
Y =y/n
X =x/n51
-
8/3/2019 DADM-Correlation and Regression
52/138
Y
X
Observed Value
|Y X iX
i
ii iY X
0 1i iY b b X
ie
0 1i iib bY X e
1b
0b
(continued)
LINEAR REGRESSION EQUATION
52
-
8/3/2019 DADM-Correlation and Regression
53/138
is the average value of Y when the value of
X is zero.
measures the change in the average
value of Y as a result of a one-unit change in X.
| 0Y X
|
1
Y X
X
INTERPRETATIONOFTHE SLOPEANDINTERCEPT
53
-
8/3/2019 DADM-Correlation and Regression
54/138
You wish to examinethe linear dependencyof the annual sales ofproduce stores on theirsizes in squarefootage. Sample data
for 7 stores wereobtained. Find theequation of the straightline that fits the data
best.
AnnualStore Square Sales
Feet ($1000)
1 1,726 3,6812 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
LINEAR REGRESSION EQUATION: EXAMPLE
54
-
8/3/2019 DADM-Correlation and Regression
55/138
0
2 0 0 0
4 0 0 0
6 0 0 0
8 0 0 0
1 0 0 0 0
1 2 0 0 0
0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0
S q u a re F e e t
AnnualSale
s($000)
Excel Output
SCATTER DIAGRAM: EXAMPLE
55
S L R E
-
8/3/2019 DADM-Correlation and Regression
56/138
0 1
1636.415 1.487
i i
i
Y b b X
X
From Excel Printout:
Coef f ic ients
I n t e r c e p t 1 6 3 6 . 4 1 4 7 2 6
X V a r i a b l e 1 . 4 8 6 6 3 3 6 5 7
SIMPLE LINEAR REGRESSION EQUATION:EXAMPLE
56
-
8/3/2019 DADM-Correlation and Regression
57/138
GRAPHOFTHE SIMPLE LINEAR REGRESSIONEQUATION: EXAMPLE
0
2 0 0 0
4 0 0 0
6 0 0 0
8 0 0 0
1 0 0 0 0
1 2 0 0 0
0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0
S q u a r e F e e t
AnnualSales($000)
57
-
8/3/2019 DADM-Correlation and Regression
58/138
INTERPRETATIONOF RESULTS: EXAMPLE
The slope of 1.487 means that each increase of one unitin X, we predict the average of Y to increase by anestimated 1.487 units.
The equationestimatesthat foreach increase of 1square footin the size of the store, theexpectedannualsales are predictedto increase by $1487.
1636.415 1.487i i
Y X
58
-
8/3/2019 DADM-Correlation and Regression
59/138
TOPICS
Measures of Variation
Coefficient of Determination
Coefficient of Correlation
59
M V
-
8/3/2019 DADM-Correlation and Regression
60/138
MEASURESOF VARIATION:
THE SUMOF SQUARES
SST = SSR + SSETotal
Sample
Variability
Explained
Variability
Unexplained
Variability
To examine the ability of the independent variable topredict the dependant variable
60
M V
-
8/3/2019 DADM-Correlation and Regression
61/138
MEASURESOF VARIATION:
THE SUMOF SQUARES
SST = Total Sum of Squares
Measures the variation of the Yi values around theirmean,
SSR = Regression Sum of Squares
Explained variation attributable to the relationshipbetween Xand Y, between predicted value and meanvalue
SSE = Error Sum of Squares
Variation attributable to factors other than therelationship between Xand Y, between observed valueand predicted value
(continued)
Y
61
M V
-
8/3/2019 DADM-Correlation and Regression
62/138
MEASURESOF VARIATION:
THE SUMOF SQUARES (continued)
SST=(Yi- Y)2=Yi2 (Yi)2/n_
SSR=(Yi- Y)2= b0Yi+ b1XiYi- (Yi)2/n_
SSE=(Yi- Y)2=Y
i
2- b0Y
i- b
1X
iY
i
62
-
8/3/2019 DADM-Correlation and Regression
63/138
MEASURESOF VARIATION:THE SUMOF SQUARES
(continued)
Xi
Y
X
Y
SST=(Yi-Y)2SSE=(Yi-Yi)2
SSR=(Yi-Y)2_
_
_
63
-
8/3/2019 DADM-Correlation and Regression
64/138
MEASURES OF VARIATIONTHE SUM OF SQUARES: EXAMPLE
ANOVAdf SS M S F Significance F
Regressio 1 30380456.12 30380456.1 81.1790902 0.000281201
Residual 5 1871199.595 374239.919
Total 6 32251655.71
Excel Output for Produce Stores
SSR
SSERegression (explained) df
Degrees of freedom
Error (residual) df
Total df
SST
64
-
8/3/2019 DADM-Correlation and Regression
65/138
THE COEFFICIENT OF DETERMINATION
Measures the proportion of variation in Y that is
explained by the independent variable X in the
regression model
2 Regression Sum of Squares
Total Sum of Squares
SSRr
SST
65
-
8/3/2019 DADM-Correlation and Regression
66/138
COEFFICIENTS OF DETERMINATION (R2)AND CORRELATION (R)
r2 = 1, r2 = 1,
r2 = .81, r2 = 0,Y
Yi= b
0+ b
1X
i
X
^
YY
i= b
0+ b
1X
i
X
^Y
Yi= b0 + b1Xi
X^
Y
Yi= b
0+ b
1X
i
X
^
r= +1 r= -1
r= +0.9 r= 0
66
T
-
8/3/2019 DADM-Correlation and Regression
67/138
TOPICS
Standard Error of Estimate
Assumptions of Simple Linear Regression
Model
Residual Analysis
67
-
8/3/2019 DADM-Correlation and Regression
68/138
STANDARD ERROROF ESTIMATE
The standard deviation of the variation of
observations around the regression equation
2
1
2 2
n
i
iYX
Y YSSE
Sn n
68
-
8/3/2019 DADM-Correlation and Regression
69/138
69
21
102
n
XYbXbY
S
n
iYX
X = values of the independent variableY = values of the dependent variable
b0= Y-intercept
b1= slope of the estimating equation
n = number of data points
Finding the Standard Error of Estimate
I S
-
8/3/2019 DADM-Correlation and Regression
70/138
INFERENCEABOUTTHE SLOPE:t- TEST
t Test for a Population Slope Is there a linear dependency of Yon X?
Null and Alternative Hypotheses
H0: 1 = 0 (No Linear Dependency)
H1: 1 0 (Linear Dependency)
Test Statistic
1
1
1 1
2
1
where
( )
YXb
nb
i
i
b St S
S X X
. . 2d f n
70
-
8/3/2019 DADM-Correlation and Regression
71/138
EXAMPLE: PRODUCE STORE
Data for 7 Stores:
Estimated Regression Equation:
The slope of thismodel is 1.487.
Does SquareFootage AffectAnnual Sales?
AnnualStore Square Sales
Feet ($000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
1636.415 1.487 iY X
71
I S
-
8/3/2019 DADM-Correlation and Regression
72/138
INFERENCESABOUTTHE SLOPE:T TEST EXAMPLE
H0: 1 = 0
H1: 1 0
.05df7 - 2 = 5Critical Value(s):
Test Statistic:
Decision:
Conclusion:
There is evidence thatsquare footage affects
annual sales.
t0 2.5706-2.5706
.025
Reject Reject
.025
Reject H0
72
INFERENCES ABOUT THE SLOPE
-
8/3/2019 DADM-Correlation and Regression
73/138
INFERENCESABOUTTHE SLOPE:F TEST
F Test for a Population Slope
Is there a linear dependency of Yon X?
Null and Alternative Hypotheses
H0: 1 = 0 (No Linear Dependency) H1: 1 0 (Linear Dependency)
Test Statistic
Numerator d.f.=1, denominator d.f.=n-2
1
2
SSR
F SSE
n
73
INFERENCES ABOUT THE SLOPE
-
8/3/2019 DADM-Correlation and Regression
74/138
INFERENCESABOUTTHE SLOPE:CONFIDENCE INTERVAL EXAMPLE
Confidence Interval Estimate of the Slope:
11 2n bb t S
At 95% level of confidence the confidence interval for theslope is (1.062, 1.911). Does not include 0.
Conclusion: There is a significant linear dependency of
annual sales on the size of the store.
74
-
8/3/2019 DADM-Correlation and Regression
75/138
ESTIMATIONOF MEAN VALUES
Confidence Interval Estimate for :
The Mean of Ygiven a particular Xi
2
22
1
( )1
( )
i
i n YX n
i
i
X X
Y t S nX X
t value from table with
df=n-2
Standard error of theestimate
Size of interval vary according to distanceaway from mean,X
| iY X X
75
-
8/3/2019 DADM-Correlation and Regression
76/138
PREDICTIONOF INDIVIDUAL VALUES
Prediction Interval for Individual ResponseYi at a Particular Xi
Addition of 1 increases width of interval from that for the
mean of Y
2
22
1
( )1 1
( )
ii n YX n
i
i
X XY t Sn
X X
76
-
8/3/2019 DADM-Correlation and Regression
77/138
EXAMPLE: PRODUCE STORES
Data for 7 Stores:
Regression Equation
Obtained:
Consider a storewith 2000 square
feet.
AnnualStore Square Sales
Feet ($000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760 1636.415 1.487 iY X 77
-
8/3/2019 DADM-Correlation and Regression
78/138
ESTIMATIONOF MEAN VALUES: EXAMPLE
Find the 95% confidence interval for the averageannual sales for stores of 2,000 square feet
2
22
1
( )1 4610.45 612.66
( )
i
i n YX n
i
i
X XY t S
nX X
Predicted Sales
Confidence Interval Estimate for| iY X X
1636.415 1.487 4610.45 $000iY X
2 52350.29 611.75 2.5706YX nX S t t
78
-
8/3/2019 DADM-Correlation and Regression
79/138
PREDICTION INTERVALFORY: EXAMPLE
Find the 95% prediction interval for annual sales ofone particular store of 2,000 square feet
Predicted Sales)
2
22
1
( )1 1 4610.45 1687.68
( )
i
i n YX n
i
i
X XY t S
nX X
Prediction Interval for IndividualiX X
Y
1636.415 1.487 4610.45 $000iY X
2 52350.29 611.75 2.5706YX nX S t t
79
-
8/3/2019 DADM-Correlation and Regression
80/138
MULTIPLE REGRESSION
TOPICS
-
8/3/2019 DADM-Correlation and Regression
81/138
TOPICS
The Multiple Regression Model
Residual Analysis
Coefficient of Multiple Determination
81
-
8/3/2019 DADM-Correlation and Regression
82/138
THE MULTIPLE REGRESSION MODEL
Relationship between 1 dependent & 2 or moreindependent variables is a linear function
Population Y-
intercept
Population slopes Random
Error
Dependent (Response) variable Independent (Explanatory) variables
1 2i i i k ki iY X X X
82
-
8/3/2019 DADM-Correlation and Regression
83/138
MULTIPLE REGRESSION EQUATION
The coefficients of the multiple regression model are estimated using
sample data
kik2i21i10i XbXbXbbY
Estimated(or predicted)value of Y
Estimated slope coefficients
Multiple regression equation with k independent variables:
Estimatedintercept
-
8/3/2019 DADM-Correlation and Regression
84/138
MULTIPLE REGRESSION EQUATION
Example with
two independent
variables
Y
X1
X2
22110 XbXbbY
INTERPRETATION OF ESTIMATED
-
8/3/2019 DADM-Correlation and Regression
85/138
INTERPRETATIONOF ESTIMATEDCOEFFICIENTS
Slope (bi) Estimated that the average value of Ychanges by
bi for each 1 unit increase in Xiholding all othervariables constant
Example: If b1 = -2, then fuel oil usage (Y) isexpected to decrease by an estimated 2 gallons foreach 1 degree increase in temperature (X1) giventhe inches of insulation (X2)
Y-Intercept (b0
)
The estimated average value of Ywhen all Xi= 0
85
-
8/3/2019 DADM-Correlation and Regression
86/138
MULTIPLE REGRESSION MODEL: EXAMPLEOil (Gal) Temp Insulation
275.30 40 3363.80 27 3
164.30 40 10
40.80 73 6
94.30 64 6
230.90 34 6366.70 9 6
300.60 8 10
237.80 23 10
121.40 63 3
31.40 65 10203.50 41 6
441.10 21 3
323.00 38 3
52.50 58 10
(0F)
Develop a model for estimatingheating oil used for a single familyhome in the month of January basedon average temperature and amountof insulation in inches.
86
-
8/3/2019 DADM-Correlation and Regression
87/138
1 2 562.151 5.437 20.012i i iY X X
MULTIPLE REGRESSION EQUATION: EXAMPLE
Coefficients
Intercept 562.1510092
X Variable 1 -5.436580588X Variable 2 -20.01232067
ExcelOutput
For each degree increase in temperature,the estimated average amount of heatingoil used is decreased by 5.437 gallons,holding insulation constant.
For each increase in one inch ofinsulation, the estimated average useof heating oil is decreased by 20.012gallons, holding temperatureconstant.
0 1 1 2 2i i i k kiY b b X b X b X
87
STANDARD ERROR OF ESTIMATE FOR MULTIPLE
-
8/3/2019 DADM-Correlation and Regression
88/138
STANDARDERROROFESTIMATEFOR MULTIPLEREGRESSION
The standard error of estimate of dependent variableY on independent variables
1
2
kn
YYs e
-
8/3/2019 DADM-Correlation and Regression
89/138
COEFFICIENTOFMULTIPLEDETERMINATION
Proportion of Total Variation in Y Explained by All XVariables Taken Together
Never Decreases When a New X Variable is Addedto Model
2
12
Explained Variation
Total VariationY k
SSRr
SST
89
COEFFICIENTOF MULTIPLE DETERMINATION
-
8/3/2019 DADM-Correlation and Regression
90/138
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted RSquare 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS F Significance FRegression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients
StandardError t Stat P-value Lower 95%
Upper95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.4640
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.7088
.5214856493.3
29460.0
SST
SSRR2
52.1% of the variation in pie sales is
explained by the variation in price
and advertising
ADJUSTED COEFFICIENT OF MULTIPLE
-
8/3/2019 DADM-Correlation and Regression
91/138
ADJUSTEDCOEFFICIENTOF MULTIPLEDETERMINATION
Adding additional variables will necessarily reduce the SSE and increase the r2.To account for this, the
adjusted coefficient of determination given by
Proportion of Variation in Y Explained by All XVariables Adjusted for the Number of XVariables
Used and Sample Size Penalizes Excessive Use of Independent Variables
Smaller than
Useful in Comparing among Models having different
exploratory variables
2 2 12 11 11
adj Y k nr r
n k
2
12Y kr
91
ADJUSTED R2
-
8/3/2019 DADM-Correlation and Regression
92/138
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted RSquare 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS F SignificanceFRegression 2 29460.027 14730.01 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients
StandardError t Stat P-value Lower 95%
Upper95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
.44172r2adj
44.2% of the variation in pie sales is explained by the
variation in price and advertising, taking into account
the sample size and number of independent variables
-
8/3/2019 DADM-Correlation and Regression
93/138
COEFFICIENTOF MULTIPLE DETERMINATION
R eg ressio n S tatist ics
M u l t ip le R 0 . 9 8 2 6 5 4 7 5 7
R S q u a re 0 . 9 6 5 6 1 0 3 7 1
A d ju s t e d R S q u a re 0 . 9 5 9 8 7 8 7 6 6
S t a n d a rd E rro r 2 6 . 0 1 3 7 8 3 2 3
O b s e rva t io n s 1 5
Excel Output2
12Y
SSRr
SST
Adjusted r2
reflects the numberof explanatoryvariables and sample
size
is smaller than r2
93
INTERPRETATION OF COEFFICIENT OF
-
8/3/2019 DADM-Correlation and Regression
94/138
INTERPRETATIONOF COEFFICIENTOFMULTIPLE DETERMINATION
96.56% of the total variation in heating oil can beexplained by temperature and amount of insulation
95.99% of the total fluctuation in heating oil can beexplained by temperature and amount of insulationafter adjusting for the number of explanatoryvariables and sample size
212 .9656Y SSRr
SST
2
adj .9599r
94
USING THE REGRESSION EQUATION TO
-
8/3/2019 DADM-Correlation and Regression
95/138
USING THE REGRESSION EQUATIONTOMAKE PREDICTIONS
Predict the amount of heating oil used for ahome if the average temperature is 300 and theinsulation is 6 inches.
The predicted heatingoil used is 278.97
gallons
1 2 562.151 5.437 20.012
562.151 5.437 30 20.012 6
278.969
i i iY X X
95
-
8/3/2019 DADM-Correlation and Regression
96/138
RESIDUAL PLOTS
Residuals Vs
Residuals Vs
Residuals Vs
Residuals Vs Time
May have autocorrelation
Y
1X
2X
96
-
8/3/2019 DADM-Correlation and Regression
97/138
RESIDUAL PLOTS: EXAMPLE
Insulation R esidual Plot
0 2 4 6 8 10 1 2
No Discernible Pattern
Temperature Residual Plot
-60
-40
-20
0
20
40
60
0 20 40 60 80Residual
s
Maybe some non-linear relationship
97
-
8/3/2019 DADM-Correlation and Regression
98/138
TESTINGFOR OVERALL SIGNIFICANCE
Shows if there is a Linear Relationship between all ofthe X Variables together and Y
Use F test Statistic
Hypotheses: H0: k=0 (No linear relationship)
H1: At least onei ( At least one independent variableaffects Y)
The Null Hypothesis is a Very Strong Statement The Null Hypothesis is Almost Always Rejected
98
-
8/3/2019 DADM-Correlation and Regression
99/138
TESTINGFOR OVERALL SIGNIFICANCE
Test Statistic:
where F has k numerator and (n-k-1)
denominator degrees of freedom
(continued)
all /
all
SSR k MSR
F MSE MSE
99
TESTFOR OVERALL SIGNIFICANCE
-
8/3/2019 DADM-Correlation and Regression
100/138
EXCEL OUTPUT: EXAMPLE
ANOVA
df SS MS F Significance F
Regression 2 228014.6 114007.3 168.4712 1.65411E-09
Residual 12 8120.603 676.7169
Total 14 236135.2
k= 2, the number ofexplanatory variables n- 1
p value
Test StatisticMSR
FMSE
100
TESTFOR OVERALL SIGNIFICANCE
-
8/3/2019 DADM-Correlation and Regression
101/138
EXAMPLE SOLUTION
F0 3.89
H0:1 =2= =k=0H1: At least onei0
= .05
df = 2 and 12
Critical Value:
Test Statistic:
Decision:
Conclusion:
Reject at = 0.05
There is evidence that atleast one independentvariable affects Y
= 0.05
F 168.47(Excel Output)
101
TESTFOR SIGNIFICANCE:
-
8/3/2019 DADM-Correlation and Regression
102/138
INDIVIDUAL VARIABLES
Shows if There is a Linear Relationship Between
the Variable Xiand Y
Use t Test Statistic
Hypotheses:
H0:i 0 (No linear relationship)
H1:i 0 (Linear relationship between Xiand Y)
102
-
8/3/2019 DADM-Correlation and Regression
103/138
t TEST STATISTIC OUTPUT: EXAMPLE
Coefficients Standard Error t Stat
Intercept 562.1510092 21.09310433 26.65093769
X Variable 1 -5.436580588 0.336216167 -16.16989642
X Variable 2 -20.01232067 2.342505227 -8.543127434
tTest Statistic for X1(Temperature)
tTest Statistic for X2(Insulation)
i
i
b
bt
S
103
-
8/3/2019 DADM-Correlation and Regression
104/138
T TEST : EXAMPLE SOLUTION
H0: 1 = 0H1: 1 0df = 12
Critical Values:
Test Statistic:
Decision:
Conclusion:
Reject H0 at = 0.05
There is evidence of asignificant effect oftemperature on oilconsumption.
t0 2.1788-2.1788
.025
Reject H0 Reject H0
.025
Does temperature have a significant effect onmonthly consumption of heating oil? Test at =
0.05.
tTest Statistic = -16.1699
104
CONFIDENCE INTERVAL ESTIMATE
-
8/3/2019 DADM-Correlation and Regression
105/138
FORTHE SLOPE
Confidence interval for the population slope i
Example: Form a 95% confidence interval for the effect of changes inprice (X1) on pie sales, holding constant the effects of advertising:
-24.975 (2.1788)(10.832): So the interval is (-48.576, -1.374)
ib1kniStb
Coefficients Standard Error
Intercept 306.52619 114.25389
Price -24.97509 10.83213
Advertising 74.13096 25.96732
where t has(n k 1) d.f.
Here, t has
(15 2 1) = 12 d.f.
ASSUMPTIONSOF REGRESSION
-
8/3/2019 DADM-Correlation and Regression
106/138
1
06
Linearity
The relationship between X and Y is linear
Independence of Errors
Error values are statistically independent
Normality of ErrorError values are normally distributed for any givenvalue of X
Equal Variance (also called homoscedasticity)
The probability distribution of the errors has constantvariance
L.I.N.E
VARIATION OF ERRORS AROUND THE
-
8/3/2019 DADM-Correlation and Regression
107/138
1
07
VARIATION OF ERRORS AROUND THEREGRESSION LINE
Y values are normally distributedaround the regression line.
For each Xvalue, the spread orvariance around the regression line is
the same.
X1
X2
f()
Sample Regression Line
PURPOSESOFRESIDUALANALYSIS
-
8/3/2019 DADM-Correlation and Regression
108/138
1
08
Examine for linearity assumption
Examine for constant variance for all levels of x
Evaluate normal distribution assumption
GRAPHICAL ANALYSISOF RESIDUALS
Can plot residuals vs. x
Can create histogram of residuals to check for
normality
RESIDUAL ANALYSIS
-
8/3/2019 DADM-Correlation and Regression
109/138
1
09
RESIDUAL ANALYSIS
The residual for observation i, ei, is the difference between
its observed and predicted value
Check the assumptions of regression by examining theresiduals
Examine for Linearity assumption
Evaluate Independence assumption
Evaluate Normal distribution assumption Examine Equal variance for all levels of X
Graphical Analysis of Residuals
Can plot residuals vs. X
iii YYe
-
8/3/2019 DADM-Correlation and Regression
110/138
1
10
RESIDUAL ANALYSISFOR LINEARITY
Not Linear Linear
x
residu
als
x
Y
x
Y
x
residu
als
-
8/3/2019 DADM-Correlation and Regression
111/138
1
11
RESIDUAL ANALYSISFOR INDEPENDENCE
Not Independent Independent
X
Xresidua
ls
residuals
X
residuals
-
8/3/2019 DADM-Correlation and Regression
112/138
1
12
CHECKINGFOR NORMALITY
Examine the Stem-and-Leaf Display of the Residuals Examine the Box-and-Whisker Plot of the Residuals
Examine the Histogram of the Residuals
Construct a Normal Probability Plot of the Residuals
-
8/3/2019 DADM-Correlation and Regression
113/138
1
13
RESIDUAL ANALYSISFOR EQUAL VARIANCE
Unequal variance Equal variance
x x
Y
x x
Y
residu
als
residu
als
LINEAR REGRESSION EXAMPLE EXCEL RESIDUALO
-
8/3/2019 DADM-Correlation and Regression
114/138
1
14
OUTPUT
House Price Model Residual Plot
-60
-40
-20
0
20
40
60
80
0 1000 2000 3000
Square Feet
Residuals
RESIDUAL OUTPUT
PredictedHousePrice Residuals
1 251.92316 -6.923162
2 273.87671 38.12329
3 284.85348 -5.853484
4 304.06284 3.937162
5 218.99284 -19.99284
6 268.38832 -49.38832
7 356.20251 48.79749
8 367.17929 -43.17929
9 254.6674 64.33264
10 284.85348 -29.85348
Does not appear to violate
any regression assumptions
AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
115/138
One of the assumption on regression model is that the errorsEi and Ej, associated with the ith and jth observation areuncorrelated
Autocorrelation is correlation of the errors (residuals) overtime
115
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
T
-
8/3/2019 DADM-Correlation and Regression
116/138
The standard test statistic for autocorrelation of the AR(1) type is theDurbinWatson dstatistic, computed from the residuals as shown above.Most regression applications calculate it automatically and present it asone of the standard regression diagnostics.
T
t
t
t
tt
e
ee
d
1
2
2
2
1)(
116
T
2
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
117/138
In large samples
2
It can be shown that in large samples dtends to 2 2, whereis theparameter in the AR(1) relationship ut=ut1 + t.
T
t
t
t
tt
e
ee
d
1
2
2
2
1)(
22d
117
T
2
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
118/138
In large samples
No autocorrelation
If there is no autocorrelation,is 0 and dshould be distributed randomlyaround 2.
T
t
t
t
tt
e
ee
d
1
2
2
2
1)(
22d2d
118
T
2
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
119/138
In large samples
No autocorrelation
Severe positive autocorrelation
If there is severe positive autocorrelation,will be near 1 and dwill benear 0.
T
t
t
t
tt
e
ee
d
1
2
2
2
1)(
22d2d
0d
119
T
2
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
120/138
In large samples
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
Likewise, if there is severe positive autocorrelation,will be near1 anddwill be near 4.
T
t
t
t
tt
e
ee
d
1
2
2
2
1)(
22d2d
0d
4d
120
i i i
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
121/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
Thus dbehaves as illustrated graphically above.
2d
0d
4d
2 40
positiveautocorrelation
negativeautocorrelation
noautocorrelation
121
iti ti
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
122/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
To perform the DurbinWatson test, we define critical values of d. The nullhypothesis is H0:= 0 (no autocorrelation). If dlies between these values,we do not reject the null hypothesis.
2d
0d
4d
2 40
positiveautocorrelation
negativeautocorrelation
noautocorrelation
dcrit dcrit
122
iti ti
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
123/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
The critical values, at any significance level, depend on the number ofobservations in the sample and the number of explanatory variables.
2d
0d
4d
2 40
positiveautocorrelation
negativeautocorrelation
noautocorrelation
dcrit dcrit
123
iti ti
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
124/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
Unfortunately, they also depend on the actual data for the explanatoryvariables in the sample, and thus vary from sample to sample.
2d
0d
4d
2 40 dcrit
positiveautocorrelation
negativeautocorrelation
noautocorrelation
dcrit
124
positi e negati eno
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
125/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
However Durbin and Watson determined upper and lower bounds, dUanddL, for the critical values, and these are presented in standard tables.
2d
0d
4d
2 40 dL dUdcrit
positiveautocorrelation
negativeautocorrelation
noautocorrelation
dcrit
125
positive negativeno
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
126/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
If dis less than dL, it must also be less than the critical value of dforpositive autocorrelation, and so we would reject the null hypothesis andconclude that there is positive autocorrelation.
2d
0d
4d
2 40 dL dUdcrit
positiveautocorrelation
negativeautocorrelation
noautocorrelation
dcrit
126
positive negativeno
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
127/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
If dis above than dU, it must also be above the critical value of d, and so wewould not reject the null hypothesis. (Of course, if it were above 2, weshould consider testing for negative autocorrelation instead.)
2d
0d
4d
2 40 dL dUdcrit
positiveautocorrelation
negativeautocorrelation
noautocorrelation
dcrit
127
positive negativeno
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
128/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
If dlies between dL and dU, we cannot tell whether it is above or below thecritical value and so the test is indeterminate.
2d
0d
4d
2 40 dL dUdcrit
positiveautocorrelation
negativeautocorrelation
noautocorrelation
dcrit
128
positive negativeno
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
129/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
Here are dL and dUfor 45 observations and two explanatory variables, atthe 5% significance level.
2d
0d
4d
2 40 dL dU
positiveautocorrelation
negativeautocorrelation
noautocorrelation
1.43 1.62
(n= 45, k= 3, 5% level)
129
positive negativeno
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
130/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
There are similar bounds for the critical value in the case of negativeautocorrelation. They are not given in the standard tables becausenegative autocorrelation is uncommon, but it is easy to calculate them
because are they are located symmetrically to the right of 2.
2d
0d
4d
2 40 dL dU
positiveautocorrelation
negativeautocorrelation
noautocorrelation
2.38 2.571.43 1.62
(n= 45, k= 3, 5% level)
130
positive negativeno
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
131/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
So if d< 1.43, we reject the null hypothesis and conclude that there ispositive autocorrelation.
2d
0d
4d
2 40 dL dU
positiveautocorrelation
negativeautocorrelation
noautocorrelation
1.43 1.62 2.38 2.57
(n= 45, k= 3, 5% level)
131
positive negativeno
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
132/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
If 1.43 < d< 1.62, the test is indeterminate and we do not come to anyconclusion.
2d
0d
4d
2 40 dL dU
positiveautocorrelation
negativeautocorrelation
noautocorrelation
1.43 1.62 2.38 2.57
(n= 45, k= 3, 5% level)
132
positive negativeno
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
133/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
If 1.62 < d< 2.38, we do not reject the null hypothesis of no autocorrelation.
2d
0d
4d
2 40 dL dU
positiveautocorrelation
negativeautocorrelation
noautocorrelation
1.43 1.62 2.38 2.57
(n= 45, k= 3, 5% level)
133
positive negativeno
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
134/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
If 2.38 < d< 2.57, we do not come to any conclusion.
2d
0d
4d
2 40 dL dU
positiveautocorrelation
negativeautocorrelation
noautocorrelation
1.43 1.62 2.38 2.57
(n= 45, k= 3, 5% level)
134
positive negativeno
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
135/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
If d> 2.57, we conclude that there is significant negative autocorrelation.
2d
0d
4d
2 40 dL dU
positiveautocorrelation
negativeautocorrelation
noautocorrelation
1.43 1.62 2.38 2.57
(n= 45, k= 3, 5% level)
135
positive negativeno
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
136/138
No autocorrelation
Severe positive autocorrelation
Severe negative autocorrelation
Here are the bounds for the critical values for the 1% test, again with 45observations and two explanatory variables.
2d
0d
4d
2 40 dL dU
positiveautocorrelation
negativeautocorrelation
noautocorrelation
1.24 1.42 2.58 2.76
(n= 45, k= 3, 1% level)
136
0.04
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
-
8/3/2019 DADM-Correlation and Regression
137/138
Here is a plot of the residuals from a logarithmic regression of expenditure on housing services onincome and the relative price of housing services. The residuals exhibit strong positiveautocorrelation.
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
1959 1963 1967 1971 1975 1979 1983 1987 1991 1995 1999 2003
137
============================================================
Dependent Variable: LGHOUS
DURBINWATSON TEST FOR AR(1) AUTOCORRELATION
d d
-
8/3/2019 DADM-Correlation and Regression
138/138
Dependent Variable: LGHOUS
Method: Least Squares
Sample: 1959 2003
Included observations: 45============================================================
Variable Coefficient Std. Error t-Statistic Prob.
============================================================
C 0.005625 0.167903 0.033501 0.9734
LGDPI 1.031918 0.006649 155.1976 0.0000
LGPRHOUS -0.483421 0.041780 -11.57056 0.0000
============================================================
R-squared 0.998583 Mean dependent var 6.359334
Adjusted R-squared 0.998515 S.D. dependent var 0.437527
S.E. of regression 0.016859 Akaike info criter-5.263574
Sum squared resid 0.011937 Schwarz criterion -5.143130
Log likelihood 121.4304 F-statistic 14797.05
Durbin-Watson stat 0.633113 Prob(F-statistic) 0.000000============================================================
dL dU1.24 1.42
(n= 45, k= 3, 1% level)