ec339: lecture 7 chapter 4-5: analytical solutions to ols
TRANSCRIPT
EC339: Lecture 7Chapter 4-5: Analytical Solutions to OLS
The Linear Regression Model Postulate: The dependent variable, Y, is a function
of the explanatory variable, X, or Yi = ƒ(Xi)
However, the relationship is not deterministic Value of Y is not completely determined by value of X
Thus, we incorporate an error term (residual) into the model which provides a statistical relationship Yi = ƒ(Xi) + ui
The Simple Linear Regression Model (SLR) Remember we are trying to predict Y for a
given X. We assume a linear relationship (in the parameters (i.e., the BETAS))
Ceteris Paribus—All else held equal To account for our ERROR in prediction, we
can add an error term to our prediction. If Y is a linear function of X then
ERRORS are typically written as u, or epsilon representing ANYTHING ELSE that might cause the deviation between actual and predicted values
We are interested in determining the intercept (0) and slope (1)
XY 1ˆ
uYY ˆ
uXY 1
SLR Uses Multivariate Expectations Univariate Distributions
Means, Variances, Standard Deviations Multivariate Distributions
Correlation, Covariance Marginal, Joint, and Conditional Probabilities
,|
1 1
( , )[ | ] ( | )
( )
m mX Y
j Y X j ij j X
f y xE Y x y f y x y
f x
Marginal ProbabilityDensityFn.
Joint ProbabilityDensity Fn.
Conditional ProbabilityDensity Fn.
Conditional Expectation
Joint Distributions Joint Distribution Probability Density Functions
Now want to consider how Y and X are distributed when considered together
INDEPENDENCE When outcomes of X and Y have no influence on one
another, the joint probability is equal to the product of the marginal probability density function
, ( , ) ( , )X Yf x y P X x Y y
, ( , ) ( ) ( )X Y X Yf x y f x f yThink about BINOMIAL DISTRIBUTIONS, each TRIAL is INDEPENDENTand has no effect on the subsequent trial. Also, think of marginal distributions much like a histogram of a single variable.
Conditional Distributions Conditional Probability Density Functions
Now want to consider how Y is distributed when GIVEN a certain value for X Conditional Probability of Y occurring given X, is equal to the joint
probability of X and Y, divided by the marginal probability of X occurring in the first place
INDEPENDENCE If X and Y are independent then the conditional distribution shows
these as marginal distributions. Just as if there is no new information.
,|
( , )( | )
( )X Y
Y XX
f x yf y x
f x
|
( ) ( )( | ) ( )
( )X Y
Y X YX
f x f yf y x f y
f x |
( ) ( )( | ) ( )
( )X Y
X Y XY
f x f yf x y f x
f y
A joint probability is like finding the probability of a “high school graduate” with an hourly wage between “$8 and $10” if looking at education and wage data.
Discrete Bivariate Distributions—Joint Probability Function
For example, assume we flip a coin 3 times, recording the number of heads (H) X = number of heads on the last (3rd) flip Y = total number of heads in three flips
S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} X takes on the values {0,1} Y takes on the values {0,1,2,3}
There are 8 possible different joint outcomes (X = 0,Y = 0) (X = 0,Y = 1) (X = 0,Y = 2) (X = 0,Y = 3) (X = 1,Y = 0) (X = 1,Y = 1) (X = 1,Y = 2) (X = 1,Y = 3)
Attaching a probability to each of the different joint outcomes gives us a discrete bivariate probability distribution or joint probability function
Thus, ƒ(x,y) gives us the probability the random variables X and Y assume the joint outcome (x,y)
Properties of Covariance If X and Y are discrete
If X and Y are continuous
If X and Y are independent then cov(X,Y) = 0
1
2
,1 1
( , ) [ ] [ ] [ ]
If independent
[ ] [ ] [ ]
Since this expectation is a "function" of X
[ ( )] ( ) ( )
Remember, X is a "function" of X
[ ( , )] ( , ) ( , )
k
j X jj
k m
h j X Y h jh j
Cov X Y E XY E X E Y
E XY E X E Y
E g X g x f x
E g X Y g x y f x y
Properties of Conditional Expectations,
|1 1
( , )[ | ] ( | )
( )
Conditional Expectations, see Wooldridge
[ | ] 1.05 .45
This is a regression of Wages (Y) on Education (X)
Summing over all possible values of Y i
m mX Y
j Y X j ij j X
f y xE Y x y f y x y
f x
E WAGE EDUC EDUC
n the conditional
expectation. E[WAGE|EDUC=12] is the expected value of
a wage given that the years of education is 12 years, giving
a value of $6.45, much like the predictions we have seen.
The Linear Regression Model Ceteris Paribus—All else held equal Conditional Expectations can be linear or nonlinear
—We will only examine LINEAR functions here.
,|
1 1
( , )[ | ] ( | )
( )
m mX Y
j Y X j ij j X
f y xE Y x y f y x y
f x
The Linear Regression Model For any given level of X many possible values
of Y can exist If Y is a linear function of X then
Yi = 0 + 1Xi + ui
u represents the deviation between the actual value of Y and the predicted value of Y (or 0 + 1X1i)
We are interested in determining the intercept (0) and slope (1)
The Simple Linear Regression Model (SLR) Thus, what we are looking for is the Conditional Expectation of Y Given
values of X. This is what we have called Y-hat thus far. We are trying to predict values of Y given values of X. To do this we must hold ALL OTHER FACTORS FIXED (Ceteris Paribus).
XYXYE 1ˆ]|[
uYY
uYY
ˆ
ˆ
The Simple Linear Regression Model (SLR) LINEAR POPULATION REGRESSION FUNCTION
We can assume that the EXPECTED VALUE of our error term is zero. If the value were NOT equal to zero, we could alter this expected value to equal zero by altering the INTERCEPT to account for this fact.
This makes no statement about how X and the errors are related. IF u and X are unrelated linearly, their CORRELATION will equal zero! Correlation is not sufficient though, since they could be related
NONLINEARLY… Conditional probability gives sufficient conditions as it looks at ALL values
of u, given a value for X. This is zero conditional mean error.
0][ uE
0][]|[ uExuE
The Linear Regression Model: Assumptions Several assumptions must be made about the
random variable error term The mean error is zero, or E(ui) = 0
Errors above and below the regression line tend to balance out
Errors can arise due to Human behavior (may be unpredictable) A large number of explanatory variables are not in the
model Imperfect measuring of dependent variable
The Simple Linear Regression Model (SLR) Beginning with the simple linear regression, taking conditional
expectations, and using our current assumptions gives us the POPULATION REGRESSION FUNCTION (Notice, no hats over the Betas, and that y, is equal to the predicted value, plus an error).
xxyE
xuExxyE
uxy
10
10
]|[
0x]|E[u that assumption theusing and
]|[]|[
on x lconditiona value,expected taking
The Linear Regression Model The regression model asserts that the
expected value of Y is a linear function of X E(Yi) = 0 + 1X1i
Known as the population regression function
From a practical standpoint not all of a population’s observations are available Thus we typically estimate the slope and intercept
using sample data
The Simple Linear Regression Model (SLR) We can also make the following assumptions knowing that E[u|x]=0
0][)]([
0E[xu] that assumption theusing and
0][][
ˆ and 0E[u]
and eduncorrelat areu and that x assumption theUsing
][][][],[
10
10
xuExyxE
uExyE
uyy
uExExuEuxCov
WE NOW HAVE TWO EQUATIONS IN TWO UNKNOWNS!!(The Beta’s are the unknowns). This is how the Method of Momentsis constructed.
Additional assumptions are necessary to develop confidence intervals and perform hypothesis tests
The Linear Regression Model: Assumptions
Says that errors are drawn from a distribution with a constant variance (heteroskedasticity exists if this assumption fails)
ui and uj are independent One observation’s error does not influence
another observation’s error—errors are uncorrelated (serial correlation of errors exist if this assumption fails)
Cov(ui,uj) = 0 for all i j
ii all for )var(ui2
2 ui ~ N(0, ) Error term follows a normal distribution
The Linear Regression Model: Assumptions Cov(Xi,ui) = 0 for all i
Error term is uncorrelated with the explanatory variable, X
The Linear Regression Model: Assumptions Cov(Xi,ui) = 0 for all i
Error term is uncorrelated with the explanatory variable, X
Error term follows a normal distribution
),0(~ 2ui Nu
Ordinary Least Squares-Fit
SSRSSESST
Residuals Squared of Sum ,)ˆ(
Squares of Sum Explained ,)ˆ(
Squares of Sum Total ,)(
0ˆ
ˆˆˆ where,0ˆ
2
1
2
1
2
1
1
101
n
ii
n
ii
n
ii
n
iii
iii
n
ii
uSSR
yySSE
yySST
ux
xyuu
Ordinary Least Squares-Fit
eduncorrelat are valuespredicted and residuals since 0)ˆ(ˆ2 where
SSE )ˆ(ˆ2SSRSST showing
)ˆ()ˆ(ˆ2ˆ)(
)]ˆ(ˆ[)(
)]ˆ()ˆ[()(
SSRSSESST
1
1
1 1
2
1
22
1
2
1
2
1
2
1
2
1
n
iii
n
iii
n
i
n
iiii
n
ii
n
ii
i
n
ii
n
ii
i
n
iii
n
ii
yyu
yyu
yyyyuuyy
yyuyy
yyyyyy
Ordinary Least Squares-Fit
2
1
2
1
2
1
2
12
2
ii
2
)(
)ˆ(1
)(
)ˆ(
1
SSRSSESST
y andbetween x n correlatio down to boils this,t variableindependen one
on dependsonly prediction When .y and ybetween n correlatio the
of valuesquared the toequal is R same. eexactly th istion Interpreta
ncorrelatio squaresimply cannot you ,regression multipleIn
DGENERALIZE SQUARED-R
n
ii
n
ii
n
ii
n
ii
yy
u
yy
yyR
SST
SSR
SST
SSER
Estimation (Three Ways-We will not discuss Maximum Likelihood) We need a formal method to determine the
line that “fits” the data well Distance of the line from observations should be
minimized^ ^ ^ Let Yi = 0 + 1X1i
The deviation of the observation from the line is the estimated error, or residual (ui)
^ ui = Yi - Yi
Ordinary Least Squares Designed to minimize the magnitude of estimated
residuals Selecting an estimated slope and estimated intercept that
minimizes the sum of the squared errors Most popular method known as Ordinary Least Squares
Ordinary Least Squares—Minimize Sum of Squared Errors Identifying the parameters (estimated slope and
estimated y-intercept) that minimize the sum of the squared errors is a standard optimization problem in multivariable calculus Take first derivatives with respect to the estimated
slope coefficient and estimated y-intercept coefficient Set both equations equal to zero and solve the two
equations
Ordinary Least Squares
EQUATIONS. NORMAL the
called are These . parametersour for system thesolve tous allowsWhich
0)ˆˆ(2ˆ
ˆˆ
0)ˆˆ(2ˆ
ˆˆ
where,)ˆˆ(min)(min
.ˆˆˆˆ giving rulechain andon optimizati of calculus
usingfunction thisEstimate ).,( parameters
theoffunction a s themselveare which errors, squared theof value
theminimize want to weY and Xon data of sample a Using
110
1
10
110
0
10
1
210ˆ,ˆ
1
2
ˆ,ˆ
1010
1010
1010
n
iiii
n
iii
n
iii
n
ii
ii
ii
xyx)β,βQ(
xy)β,βQ(
xyu
)β,βQ(xy
Qxy
Ordinary Least Squares-Derived
2)1
(
1)(
2
1
1
1
2
1
2
1 111
1
2
1
21
1 11
1
21
1
21
1 11
1
21
1 111
11
1
21
1 11
11
1
21
1 11
0
11 1
101 1
101
10
1
21
1 10
1 1 1
210
110
)(
)(
][
ˆ
)][(ˆ
][ˆˆ
ˆˆ
ˆˆ1
ˆ)ˆ(
for value thisngSubstituti
ˆ]ˆ[1ˆˆˆ0)ˆˆ(2
equation derivativefirst theFrom
ˆˆ
ˆˆ0)ˆˆ(2
xn
i ix
n
i iyxix
xxn
yxxn
xxn
yxyxn
xxnyxyxn
xxnyxyxn
xnxxyxyxn
xxxxyn
yx
xxxyyx
xyxyn
xnyxy
xxyx
xxyxxyx
n
ii
n
iii
n
ii
n
ii
n
i
n
ii
n
iiii
n
ii
n
ii
n
i
n
ii
n
iiii
n
ii
n
ii
n
i
n
ii
n
iiii
n
ii
n
i
n
ii
n
ii
n
ii
n
iiii
n
ii
n
i
n
ii
n
ii
n
iiii
n
ii
n
i
n
iiii
n
i
n
iii
n
i
n
iii
n
iii
n
ii
n
i
n
iiii
n
i
n
i
n
iiiii
n
iiii
Ordinary Least Squares This results in the normal equations
Which suggests an estimator for the intercept. The means of X and Y are ALWAYS on the regression line.
Ordinary Least Squares
No other estimators will result in a smaller sum of squared errors
Which yields an estimator for the slope of the line
SLR Assumption 1 Linear in Parameters SLR.1
Defines POPULATION model
The dependent variable y is related to the independent variable x and the error (or disturbance) u as SLR.1
andare population parameters
uxy 10
SLR.1 Assumption
SLR Assumption 2 Random Sampling Use a random sample of
size n, {xi,yi): i=1,2,…,n} from the population model
Allows redefinition of SLR.1. Want to use DATA to estimate our parameters
andare population parameters to be estimated
},...,2,1:),{(
SLR.2 Assumption
niyx ii
n1,2,..., ,10 iuxy iii
SLR Assumption 3 Sample variation in
independent variable X values must vary. The
variance of X cannot equal zero
0])[(x
SLR.3 Assumption
2
1i
xEn
i
SLR Assumption 4 Zero Conditional Mean For a random sample,
implication is that NO independent variable is correlated with ANY unobservable (remember error includes unobservable data)
0]|[
SLR.4 Assumption
xuE
n1,2,...,i allfor
0]|[
Sample RANDOM aFor
ii xuE
SLR Theorem 1 Unbiasedness of OLS, estimators should equal the
population value in expectation
12
1
11
2
1
1
2
11
1
2
1
1 1110
1
2
1
1 11
10
1
2
1
110
1
102
1
11
1100
)(
)(
)(
)()(ˆ
)(
)()()(ˆ
)(
)()()(ˆ
numerator examining and)(
))((ˆ
ˆˆ and)(
)(ˆ
]ˆ[ and ,]ˆ[
1 Theorem
,
,
xx
uxx
xx
uxxxx
xx
uxxxxxxx
xx
uxxxxxxx
xx
uxxx
xyxx
yxx
EE
n
ii
i
n
ii
n
ii
i
n
ii
n
ii
n
ii
n
ii
n
iii
n
iii
n
ii
n
ii
n
iii
n
iii
n
ii
n
iiii
n
ii
n
iii
This holds because x and u are assumed to be uncorrelated.
Thus our estimator equals the actual value of Beta
SLR Theorem 1 Unbiasedness of OLS, estimators should equal the
population value in expectation
00
1100
1100
1100
11010
1100
]ˆ[
)]ˆ([]ˆ[
])ˆ([]ˆ[
)ˆ(ˆ
ˆˆˆ
]ˆ[ and ,]ˆ[
1 Theorem
E
xEE
uxEE
ux
xuxxy
EE
The expected value of theresiduals is zero.
Thus our estimator equals the actual value of Beta
SLR Assumption 5 Homoskedasticity The variance of the errors is INDEPENDENT of the
values of X.
2
10
2
)|(
]|[
thatalso implies valueThis
SLR.5 and , SLR.4 SLR.3, Rewriting
)|(
SLR.5 Assumption
xyVar
xxyE
xuVar
Method of Moments Seeks to equate the moments implied by a statistical
model of the population distribution to the actual moments found in the sample
Certain restrictions are implied in the population E(u) = 0
Cov(Xi,uj) = 0 i,j
Results in the same estimators as least squares method
Interpretation of the Regression Slope Coefficient The coefficient, 1, tells us the effect X has on
Y Increasing X by one unit will change the mean
value of Y by 1 units
Units of Measurement and Regression Coefficients Magnitude of regression coefficients depends
upon the units in which the dependent and explanatory variables are measured For example, using cents versus dollars will result
in smaller coefficients Changing both the Y and X variables by the
same amount will not affect the slope although it will impact the y-intercept
Models Including Logarithms For a log-linear model the slope represents the proportionate
(like percentage change) change in Y arising from a unit change in X The coefficients in your regression result in the SEMI-elasticity of Y
with respect to X For a log-log model the slope represents the proportionate
change in Y arising from a proportionate change in X The coefficients in your regression results in the elasticity of Y with
respect to X. This is the CONSTANT ELASTICITY MODEL. For a linear-log model the slope is the unit change in Y
arising from a proportionate change in X
Regression in Excel Step 1: Reorganize data so that variables are
right next to one another in columns Step 2: Data AnalysisRegression
Regression in Excel-Ex. 2.11
Regression in Excel
Regression in Excel
Regression in Excel
ceotensalary 0097.5055.6)log(
follows as isEquation EstimatedYour
T-statistics show that the coefficient on ceoten is insignificant at the 5% level. The p-value for ceoten is 0.128368 which is greater than .05, meaning that you could see this value about 13% of the time. You areInherently testing the null hypothesis that all coefficients are equal to ZERO. YOU FAIL TO REJECT THE NULL HYPOTHESIS HERE ON BETA-1.
Regression in ExcelX Variable 1 Line Fit Plot
y = 0.0097x + 6.5055
R2 = 0.0132
0
1
2
3
4
5
6
7
8
9
10
0 5 10 15 20 25 30 35 40
X Variable 1
Y
Y
Predicted Y
Linear (Y)
Regression in ExcelX Variable 1 Line Fit Plot
y = 0.0097x + 6.5055
R2 = 0.0132
0
1
2
3
4
5
6
7
8
9
10
0 5 10 15 20 25 30 35 40
X Variable 1
Y
Y
Predicted Y
Linear (Y)