economics revision guide ii
Post on 04-Sep-2015
32 Views
Preview:
DESCRIPTION
TRANSCRIPT
-
Chapter 11 Regression with a Binary Dependent Variables
So far the dependent variable Y has been continuous:
traffic fatality rate
cigarette consumption
test scores
What if Y is binary?
whether a person gets into college, or not
whether a person smokes, or not
whether a mortgage application is denied or accepted
1
-
Example: Mortgage Denial and Race, The Boston Fed HMDA Dataset
Individual applications for single-family mortgages made in 1990 in the greater Boston area
2,380 observations collected under the Home Mortgage Disclosure Act (HMDA)
Variables
Dependent variable:
Is the mortgage denied or accepted?
Independent variables:
income, wealth, employment status
other loan, property characteristics
race of applicant
2
-
Scatter plot of mortgage denial and ratio of debt payments to income (P/I ratio) for a subset
of the data set (n = 127)
Example: linear probability model, HMDA data Mortgage denial v. ratio of debt payments to income
(P/I ratio) in the HMDA data set (subset)
11-6
3
-
Section 11.1 Binary Dependent Variables and the Linear Probability Model
The regression line plots the predicted value of deny as a linear function of P/I ratio
For example, when P/I ratio = 0.3, the predicted value of deny is 0.2
But what exactly does it mean for the predicted value of a binary variable to be 0.2?
When Y is binary,
E (Y |X) = 1 Pr (Y = 1 |X) + 0 Pr (Y = 0 |X) = Pr (Y = 1 |X)
so
E (Y |X) = Pr (Y = 1 |X)
That is, when Y is binary, the predicted value, Y , is the probability that Y = 1 given X = x:
Y = Pr (Y = 1 |X = x) = E (Y |X = x)
4
-
For the linear regression model, given the OLS assumption that E (u |X) = 0:
Y = Pr (Y = 1 |X = x) = E (Y |X) = E (0 + 1X + u |X) = 0 + 1X
This model is called the linear probability model
It is simply the linear regression model with a binary dependent variable
Back to our example: when P/I ratio = 0.3, the predicted probability of deny is 0.2:
Pr (Deny |P/I ratio = 0.3) = 0 + 1 0.3 = 0.2
In other words, if there were many applications with P/I ratio = 0.3, then 20% of them would
be denied
Note that 1 is the change in the predicted probability that Y = 1 for a unit increase in X
5
-
Ex: full HMDA data set
Deny = .080 + 0.604 P/I ratio(0.32) (0.80)
Measuring the effect of increasing P/I ratio by 1 doesnt make much sense
Instead, what is the effect of increasing P/I ratio from .3 to .4?
The predicted value for P/I ratio = .3 is
P r (Deny |P/I ratio = .3) = .080 + .604 .3 = 0.101
The predicted value for P/I ratio = .4 is
P r (Deny |P/I ratio = .4) = .080 + .604 .4 = 0.162
Thus, the effect of increasing the P/I ratio from .3 to .4 is to increase the probability of denial
by 0.061, that is, by 6.1 percentage points
More simply, we can calculate the effect as 1 0.1 = 0.0616
-
Linear probability model: HMDA data, ctd.
Next include black as a regressor:
Deny = .091 + 0.559 P/I ratio + 0.177 black(0.32) (0.098) (0.025)
What is the difference in the probability of denial for a black person versus a white person?
For a black applicant with P/I ratio = .3:
P r (Deny = 1) = .091 + .559 .3 + .177 1 = .254
For a white applicant with P/I ratio = .3:
P r (Deny = 1) = .091 + .559 .3 + .177 0 = .077
The difference = 0.177 = 17.7 percentage points (the value of 2)
7
-
The linear probability model, ctd.
The linear probability model is easy to estimate and to interpret
But the LPM says that the change in the predicted probability for a given change in X is the
same for all values of X
Is this reasonable?
Further, the predicted probabilities of the LPM can be < 0 or > 1!
To overcome these shortcomings, people use the nonlinear probability models probit and logit
8
-
Section 11.2 Probit and logit regression
The probit and logit models satisfy the following conditions:
The effect of X on Pr (Y = 1 |X) is nonlinear 0 Pr (Y = 1 |X) 1 for all X
The probit model satisfies these conditions: 0 Pr(Y = 1|X) 1 for all X Pr(Y = 1|X) to be increasing in X (for 1>0)
11-11
9
-
The probit regression models the probability that Y = 1 using the cumulative standard
normal distribution function, (z), evaluated at z = 0 + 1X :
Pr (Y = 1 |X) = (0 + 1X)
where is the cumulative normal distribution function and z = 0 + 1X is the z-value
Ex. Suppose 0 = 2, 1 = 3, X = .4:
Pr (Y = 1 |X = .4) = (2 + 3 .4) = (0.8) = 0.2119
10
-
STATA Example: HMDA dataSTATA Example: HMDA data
. probit deny p_irat, r; Iteration 0: log likelihood = -872.0853 Well discuss this later Iteration 1: log likelihood = -835.6633 Iteration 2: log likelihood = -831.80534 Iteration 3: log likelihood = -831.79234 Probit estimates Number of obs = 2380 Wald chi2(1) = 40.68 Prob > chi2 = 0.0000 Log likelihood = -831.79234 Pseudo R2 = 0.0462 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 2.967908 .4653114 6.38 0.000 2.055914 3.879901 _cons | -2.194159 .1649721 -13.30 0.000 -2.517499 -1.87082 ------------------------------------------------------------------------------ nPr( 1 | / )deny P Iratio= = (-2.19 + 2.97P/I ratio)
(.16) (.47)
11-15
Pr (Deny = 1 | P/I ratio) = ( 2.19 + 2.97 P/I ratio )(0.16) (0.47)
11
-
STATA Example: HMDA data, ctd.
Pr (Deny = 1 | P/I ratio) = ( 2.19 + 2.97 P/I ratio )(0.16) (0.47)
Positive coefficient: does this make sense?
Standard errors have the usual interpretation
Predicted probabilities:
Pr (Deny = 1 | P/I ratio = .3) = (2.19 + 2.97 .3)= (1.3) = .097
Pr (Deny = 1 | P/I ratio = .4) = (2.19 + 2.97 .4)= (1.0) = 0.159
The effect of increasing P/I ratio from 0.3 to 0.4 on the probability of denial is .159 .097 =0.062 (6= 1 0.1!)
12
-
STATA Example: HMDA data, multiple regressors
11-18
STATA Example: HMDA data . probit deny p_irat black, r; Iteration 0: log likelihood = -872.0853 Iteration 1: log likelihood = -800.88504 Iteration 2: log likelihood = -797.1478 Iteration 3: log likelihood = -797.13604 Probit estimates Number of obs = 2380 Wald chi2(2) = 118.18 Prob > chi2 = 0.0000 Log likelihood = -797.13604 Pseudo R2 = 0.0859 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181 black | .7081579 .0831877 8.51 0.000 .545113 .8712028 _cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463 ------------------------------------------------------------------------------
Well go through the estimation details later
13
-
STATA Example, ctd.: Predicted probit probabilities
11-19
STATA Example, ctd.: predicted probit probabilities
. probit deny p_irat black, r; Probit estimates Number of obs = 2380 Wald chi2(2) = 118.18 Prob > chi2 = 0.0000 Log likelihood = -797.13604 Pseudo R2 = 0.0859 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181 black | .7081579 .0831877 8.51 0.000 .545113 .8712028 _cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463 ------------------------------------------------------------------------------ . sca z1 = _b[_cons]+_b[p_irat]*.3+_b[black]*0; . display "Pred prob, p_irat=.3, white: " normprob(z1); Pred prob, p_irat=.3, white: .07546603 NOTE
_b[_cons] is the estimated intercept (-2.258738) _b[p_irat] is the coefficient on p_irat (2.741637) sca creates a new scalar which is the result of a calculation display prints the indicated information to the screen
b[ cons] is the estimated intercept
b[p irat] is the coefficient on p irat
sca creates a scalar which equals the result of a calculation
display prints the indicated information to the screen
14
-
STATA Example, ctd.
Pr (Deny = 1 | P/I ratio) = ( 2.26 + 2.74 P/I ratio + 0.71 black )(0.16) (0.44) (0.08)
Is the coefficient on black statistically significant?
Predicted probabilities:
Pr (Deny = 1 | P/I ratio = .3, black = 1) = (2.26 + 2.74 .3 + .71 1) = .233
Pr (Deny = 1 | P/I ratio = .3, black = 0) = (2.26 + 2.74 .3 + .71 0) = .075
Difference in rejection probabilities is 0.158 (15.8 percentage points)
15
-
Logit Regression
Logit regression models the probability of Y = 1 givenX using the logistic distribution function,
, evaluated at z = 0 + 1X :
Pr (Y = 1 |X) = (0 + 1X)
The logistic distribution function is:
(0 + 1X) =1
1 + e(0+1X)
Ex. 0 = 3, 1 = 2, X = .4
Pr (Y = 1 |X = .4) = 11 + e2.2
= .0998
16
-
Why bother with logit if we have probit?
The main reason is historical: logit is computationally faster and easier but this doesnt matter
so much nowadays
In practice, logit and probit are very similar - since empirical results typically dont hinge on
the logit/probit choice, both tend to be used in practice
In more complicated situations, though, extensions of the logit model work better than exten-
sions of the probit model
17
-
The predicted probabilities from the probit and logit models are very close in these HMDA
regressions (as is usual):Predicted probabilities from estimated probit and logit models usually are (as usual) very close in this application.
11-24
18
-
STATA Example: HMDA data
11-23
STATA Example: HMDA data . logit deny p_irat black, r; Iteration 0: log likelihood = -872.0853 Later Iteration 1: log likelihood = -806.3571 Iteration 2: log likelihood = -795.74477 Iteration 3: log likelihood = -795.69521 Iteration 4: log likelihood = -795.69521 Logit estimates Number of obs = 2380 Wald chi2(2) = 117.75 Prob > chi2 = 0.0000 Log likelihood = -795.69521 Pseudo R2 = 0.0876 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 5.370362 .9633435 5.57 0.000 3.482244 7.258481 black | 1.272782 .1460986 8.71 0.000 .9864339 1.55913 _cons | -4.125558 .345825 -11.93 0.000 -4.803362 -3.447753 ------------------------------------------------------------------------------ . dis "Pred prob, p_irat=.3, white: " > 1/(1+exp(-(_b[_cons]+_b[p_irat]*.3+_b[black]*0))); Pred prob, p_irat=.3, white: .07485143 NOTE: the probit predicted probability is .07546603
The predicted probability from the probit model was 0.075
19
-
Section 11.3 Estimation and Inference in the Logit and Probit Models
Probit estimation by nonlinear least squares
Nonlinear least squares extends the idea of OLS to models in which the parameters enter
nonlinearly:
minb0, b1
ni=1
(Yi (b0 + b1Xi))2
How can we solve this minimization problem?
There is no explicit solution - we cant write the estimators as a function of the sample data
The estimators are found by solving the problem numerically on a computer (using specialized
minimization algorithms)
The estimators are consistent and asymptotically normally distributed
In practice, nonlinear least squares isnt used since a more efficient estimator (smaller variance)
exists
20
-
The Maximum Likelihood Estimator
The likelihood function is the conditional density of Y1, . . . , Yn given X1, . . . , Xn, treated as a
function of the unknown parameters (0 and 1 in the probit model)
The maximum likelihood estimator (MLE) of the probit model is the value of (0, 1) that
maximizes the likelihood function
The maximum likelihood estimator (MLE) is the value
of (0, 1) that best describes the distribution of the sample data
In large samples, the MLE is:
consistent
normally distributed
efficient (has the smallest variance of all estimators)
Inference is as usual: hypothesis testing via t-statistic, confidence interval as 1.96SE21
-
MLE for a binary dependent variable (no X)
Y =
1 with probability p0 with probability 1 pThat is, Y has a Bernoulli distribution. The goal is to estimate the unknown parameter p.
Data: Y1, . . . , Yn, i.i.d.
Lets start by deriving the density function of Y1:
Pr (Y1 = 1) = p and Pr (Y1 = 0) = 1 p
so
Pr (Y1 = y1) = py1 (1 p)1y1
22
-
Now lets find the joint density of (Y1, Y2). Because Y1 and Y2 are independent:
Pr (Y1 = y1, Y2 = y2) = Pr (Y1 = y1) Pr (Y2 = y2)
= py1 (1 p)1y1 py2 (1 p)1y2
= py1+y2 (1 p)2y1y2
Generally, the joint density of (Y1, Y2, . . . , Yn) is:
Pr (Y1 = y1, Y2 = y2, . . . , Yn) = Pr (Y1 = y1) Pr (Y2 = y2) Pr (Yn = yn)
= py1 (1 p)1y1 py2 (1 p)1y2 pyn (1 p)1yn
= pn
i=1 yi (1 p)nn
i=1 yi
23
-
The likelihood function is the joint density, treated as a function of the unknown parameters,
which here is p:
f (p;Y1, Y2, . . . , Yn) = pn
i=1 yi (1 p)nn
i=1 yi
The MLE maximizes this likelihood function.
In practice, its easier to work with the logarithm of the likelihood, ln (f (p;Y1, Y2, . . . , Yn)):
ln (f (p;Y1, Y2, . . . , Yn)) =
ni=1
yi ln (p) +
(n
ni=1
yi
)ln (1 p)
Maximize the likelihood function by setting the derivative with respect to p equal to 0:
d ln (f (p;Y1, Y2, . . . , Yn))
dp=
1
p
ni=1
yi 11 p
(n
ni=1
yi
)= 0
Solving for p yields the MLE, pMLE
24
-
1pMLE
ni=1
yi 11 pMLE
(n
ni=1
yi
)= 0
or
Y
1 Y=
pMLE
1 pMLE
So
pMLE = Y = the fraction of observations with Y = 1
Whew...a lot of work to get back to the first thing you might think of using...but the nice thing
is that this whole approach generalizes to more complicated models.
Now we apply MLE to probit
25
-
The Probit and Logit MLE
The derivation starts with the density of Y1 given X1:
Pr (Y1 = 1 |X1) = (0 + 1X1) and Pr (Y1 = 0 |X1) = 1 (0 + 1X1)
so
Pr (Y1 = y1 |X1) = (0 + 1X1)y1 (1 (0 + 1X1))1y1
The probit likelihood function is the joint density of Y1, . . . , Yn given X1, . . . , Xn:
f (0, 1; Y1, . . . , Yn |X1, . . . , Xn) =
(0 + 1X1)y1 (1 (0 + 1X1))1y1 (0 + 1Xn)yn (1 (0 + 1Xn))1yn
MLE0 and MLE1 maximize this likelihood function
But we cant solve for the estimators explicitly...the MLE is maximized using numerical methods
To find the logit MLE, simply take the probit likelihood function and replace with
26
-
Measures of Fit for Logit and Probit
R2 doesnt work well in binary dependent variable models as it tells us very little about how
well the model explains behavior
Reason: Yi can take on only 0 or 1 but Yi is continuous so Yi is likely very different than Yi
Two other measures that are used:
1. The fraction correctly predicted equals the fraction of Yis for which the predicted
probability is > 50% when Yi = 1 or is < 50% when Yi = 0
2. The pseudo-R2 measures the improvement in the value of the log likelihood relative to
the Bernoulli log likelihood (i.e., no X s):
pseudo-R2 = 1 ln(fmaxprobit
)ln (fmaxBernoulli)
,
where ln(fmaxprobit
)is the value of the maximized probit likelihood and ln (fmaxBernoulli) is the
value of the maximized Bernoulli likelihood
27
-
Ex. fraction correctly predicted
obs Yi correctlypredicted?1 0 0.40 yes
2 1 0.72 yes
3 0 0.55 no
4 1 0.44 no
5 1 0.55 yes
numbercorrectlypredicted: 3numberofobservations: 5fractioncorrectlypredicted: 0.6
28
-
pseudo-R2
Its a bit hard to see how the pseudo-R2 works so lets rewrite the formula in a slightly different
way
Note that ln(fmaxprobit
)< 0 and ln (fmaxBernoulli) < 0
Thus, we can re-write the pseudo-R2 as
pseudo-R2 = 1 ln(fmaxprobit
)ln (fmaxBernoulli)
= 1 | ln(fmaxprobit
) || ln (fmaxBernoulli) |
29
-
Section 11.4 Application to the Boston HMDA Data
Mortgages (home loans) are an essential part of buying a home
Question: is it harder for a black person to get a loan than a white person?
Specifically: if two otherwise identical individuals, one white and one black, applied for a home
loan, is there a difference in the probability of denial?
The mortgage application process in the US circa 1990-1991:
Go to a bank or mortgage company
Fill out an application (personal and financial info)
Meet with the loan officer
Then the loan officer decides - by law, without considering race. Presumably, the bank wants
to make profitable loans and the loan officer doesnt want to originate defaults
30
-
The Loan Officers Decision
Loan officer uses key financial variables:
P/I ratio
housing expense-to-income ratio
loan-to-value ratio
personal credit history
The decision rule is nonlinear:
loan-to-value ratio > 80%
loan-to-value ratio > 95% (what happens in default?)
credit score
31
-
Regression Specifications
Pr (deny = 1 | black, other Xs) = . . .
linear probability model
probit
logit
Main problem with the regressions so far: potential omitted variable bias. The following
variables (i) enter the loan officer decision and (ii) are correlated with race:
wealth, type of employment
credit history
family status
Fortunately, the HMDA data set is very rich, containing data on individual characteristics,
property characteristics, and loan denial/acceptance
32
-
11-48
33
-
Table 11, ctd.
11-4934
-
11-50
35
-
Table 12, ctd.Table 11.2, ctd.
11-51
36
-
Table 12, ctd.Table 11.2, ctd.
11-52
37
-
Summary of Empirical Results
Coefficients on the financial variables make sense
Black is statistically significant in all specifications
Race-financial variable interactions arent significant
Including the covariates sharply reduces the effect of race on denial probability
LPM, probit, logit: similar estimates of effect of race on the probability of denial
Estimated effects are large in a real world sense
38
top related