dummy x -variables dummy y-variables - wordpress.com · 2011-05-09 · dummy y -variables discrete...

14
3/3/2014 1 CDS M Phil Econometrics Vijayamohan CDS M Phil Econometrics Vijayamohanan Pillai N 1 3-Mar-14 CDS Mphil Econometrics Vijayamohan Dummy variable Models Dummy variable Models CDS Mphil Econometrics Vijayamohan Dummy X Dummy X- -variables variables Dummy Dummy Y Y- -variables variables CDS Mphil Econometrics Vijayamohan Dummy X Dummy X- -variables variables CDS M Phil Econometrics Vijayamohan 5 3-Mar-14 Dummy X Dummy X-variables variables Dummy variable: Dummy variable: variable assuming values 0 and 1 to indicate variable assuming values 0 and 1 to indicate some attributes some attributes To classify data into mutually exclusive To classify data into mutually exclusive categories categories Also called: Also called: indicator variable, binary variable, indicator variable, binary variable, dichotomous variable, categorical variable, dichotomous variable, categorical variable, qualitative variable qualitative variable 3-Mar-14 CDS M Phil Econometrics Vijayamohan 6 Dummy X Dummy X-variables variables Y i = = α α + + β βD i + + u i Y i = Wage rate of an agricultural = Wage rate of an agricultural labourer labourer D i = 1, if male worker = 1, if male worker 0, otherwise. 0, otherwise. Mean wage of a male agri. worker? Mean wage of a male agri. worker? E(Y E(Y i | D | D i = 1) = = 1) = α α + + β β

Upload: others

Post on 07-Jul-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dummy X -variables Dummy Y-variables - WordPress.com · 2011-05-09 · Dummy Y -variables Discrete Choice Models Many situations in which the dependent variable is not a continuous

3/3/2014

1

CDS M Phil Econometrics Vijayamohan

CDS M Phil Econometrics

Vijayamohanan Pillai N

13-Mar-14 CDS Mphil Econometrics Vijayamohan

Dummy variable ModelsDummy variable Models

CDS Mphil Econometrics Vijayamohan

Dummy XDummy X--variablesvariables

Dummy Dummy YY--variablesvariables

CDS Mphil Econometrics Vijayamohan

Dummy XDummy X--variablesvariables

CDS M Phil Econometrics Vijayamohan

53-Mar-14

Dummy XDummy X--variablesvariables

Dummy variable: Dummy variable:

variable assuming values 0 and 1 to indicate variable assuming values 0 and 1 to indicate

some attributes some attributes

To classify data into mutually exclusive To classify data into mutually exclusive

categoriescategories

Also called: Also called:

indicator variable, binary variable, indicator variable, binary variable,

dichotomous variable, categorical variable, dichotomous variable, categorical variable,

qualitative variablequalitative variable

3-Mar-14 CDS M Phil Econometrics Vijayamohan

6

Dummy XDummy X--variablesvariables

YYii = = αα + + ββDDii + + uuii

YYii = Wage rate of an agricultural = Wage rate of an agricultural labourerlabourer

DDii = 1, if male worker= 1, if male worker0, otherwise.0, otherwise.

Mean wage of a male agri. worker?Mean wage of a male agri. worker?

E(YE(Yii | D| Dii = 1) = = 1) = αα + + ββ

Page 2: Dummy X -variables Dummy Y-variables - WordPress.com · 2011-05-09 · Dummy Y -variables Discrete Choice Models Many situations in which the dependent variable is not a continuous

3/3/2014

2

3-Mar-14 CDS M Phil Econometrics Vijayamohan

7

Dummy XDummy X--variablesvariables

YYii = = αα + + ββDDii + + uuii

YYii = Wage rate of an agricultural = Wage rate of an agricultural labourerlabourer

DDii = 1, if male worker= 1, if male worker0, otherwise.0, otherwise.

Mean wage of a female agri. worker?Mean wage of a female agri. worker?

E(YE(Yii | D| Dii = 0) = = 0) = αα

3-Mar-14 CDS M Phil Econometrics Vijayamohan

8

Dummy XDummy X--variablesvariables

YYii = = αα + + ββDDii + + uuii

YYii = Wage rate of an agricultural = Wage rate of an agricultural labourerlabourer

DDii = 1, if male worker= 1, if male worker0, otherwise.0, otherwise.

HH00: no sex discrimination : no sex discrimination ⇒⇒

HH00: : ββ= 0.= 0.

3-Mar-14 CDS M Phil Econometrics Vijayamohan

9

Dummy XDummy X--variablesvariables

YYii = = αα + + ββDDii + + uuii

YYii = Wage rate of an agricultural = Wage rate of an agricultural labourerlabourer

DDii = 1, if male worker= 1, if male worker0, otherwise.0, otherwise.

Analysis of Variance (ANOVA) Model:Analysis of Variance (ANOVA) Model:

Mean difference testMean difference test

CDS M Phil Econometrics Vijayamohan

103-Mar-14

In eco applications, In eco applications, control for other sociocontrol for other socio--eco factors:eco factors:caste, nature of work, experience, caste, nature of work, experience, …………Both quantitative and qualitative Both quantitative and qualitative variables:variables:Analysis of Covariance (ANCOVA) Analysis of Covariance (ANCOVA) ModelModel

Dummy XDummy X--variablesvariables

CDS M Phil Econometrics Vijayamohan

113-Mar-14

YYii = = αα00 ++ αα11DDii ++ββXXii ++ uuii

Mean wage of a female agri. worker?Mean wage of a female agri. worker?

E(YE(Yii | D| Dii = 0) = = 0) = αα00 + + ββXXii

Dummy XDummy X--variablesvariables

Mean wage of a male agri. worker?Mean wage of a male agri. worker?

E(YE(Yii | D| Dii = 1) = (= 1) = (αα00 ++ αα11) + ) + ββXXii

DDii = 1, if male worker= 1, if male worker= 0, otherwise.= 0, otherwise.

CDS M Phil Econometrics Vijayamohan

123-Mar-14

Dummy XDummy X--variablesvariables

Ag

ricu

ltu

ral

wag

e r

ate

Ag

ricu

ltu

ral

wag

e r

ate

XX

αα00

αα11

((αα 00

++αα 11

))

Differential intercept: Differential intercept: αα11

Page 3: Dummy X -variables Dummy Y-variables - WordPress.com · 2011-05-09 · Dummy Y -variables Discrete Choice Models Many situations in which the dependent variable is not a continuous

3/3/2014

3

CDS M Phil Econometrics Vijayamohan

133-Mar-14

Dummy XDummy X--variablesvariables

Ag

ricu

ltu

ral

wag

e r

ate

Ag

ricu

ltu

ral

wag

e r

ate

XX

αα00

αα11

((αα 00

++αα 11

)) ββ

ββ

CDS M Phil Econometrics Vijayamohan

143-Mar-14

Dummy XDummy X--variablesvariables

Ag

ricu

ltu

ral

wag

e r

ate

Ag

ricu

ltu

ral

wag

e r

ate

XX

αα00

αα11

((αα00++αα11))

ffmm

3 March 2014 Vijayamohan CDS 15

YYii = = αα00 ++ αα11DDii ++ββ11XXii ++ββ22DDiiXXii ++ uuii

Mean wage of a female agri. worker?Mean wage of a female agri. worker?

E(YE(Yii | D| Dii = 0) = = 0) = αα00 + + ββ11XXii

Mean wage of a male agri. worker?Mean wage of a male agri. worker?

E(YE(Yii | D| Dii = 1) = (= 1) = (αα00++αα11) + () + (ββ11++ββ22)X)Xii

DDii = 1, if male worker= 1, if male worker= 0, otherwise.= 0, otherwise.Interaction termInteraction term

CDS M Phil Econometrics Vijayamohan

163-Mar-14

Dummy XDummy X--variablesvariablesA

gri

cu

ltu

ral

wag

e r

ate

Ag

ricu

ltu

ral

wag

e r

ate

XX

αα00

ββ11ββ11+ + ββ22

αα11

((αα00++αα11))

CDS Mphil Econometrics Vijayamohan

Dummy YDummy Y--variablesvariables

Discrete Choice ModelsDiscrete Choice Models Many situations Many situations in in which which the the dependent variable dependent variable is is not a continuous variablenot a continuous variable..

Discrete Discrete or, or, qualitativequalitative

CDS Mphil Econometrics Vijayamohan

Dummy YDummy Y--variablevariable

Page 4: Dummy X -variables Dummy Y-variables - WordPress.com · 2011-05-09 · Dummy Y -variables Discrete Choice Models Many situations in which the dependent variable is not a continuous

3/3/2014

4

CDS M Phil Econometrics Vijayamohan

193-Mar-14

In general 2 types:In general 2 types:

1.1. dependent variables which take one dependent variables which take one of two values of two values (binary/ dichotomous choice), and(binary/ dichotomous choice), and

2. dependent variables which can 2. dependent variables which can take more than two values but are take more than two values but are finite (finite (polychotomouspolychotomous; multiple ; multiple choice).choice).

Dummy YDummy Y--variablevariable

Choose A Don’t Choose A

Individual i

To be Not to be

Binary Choice Model

Binary Choice Model

Choose A Don’t Choose A

Individual i

By car Not

Individual i

Alternatives J

2cycle

3car …

1walk bus train

Multinomial Choice Model

ExamplesExamples

•• Labour Force Labour Force Participation:Participation:

-- occupational choice (multiple choice)occupational choice (multiple choice)-- employed or unemployed (binary choice)employed or unemployed (binary choice)-- to be employed fullto be employed full--time, parttime, part--time or time or

unemployed unemployed (multiple choice(multiple choice))

CDS Mphil Econometrics Vijayamohan

Binary Choice Model Binary Choice Model

•• Voting Voting Behaviour:Behaviour:

-- to vote or not to vote (binary choice)to vote or not to vote (binary choice)-- to vote to vote Congress, BJP, Communists, Congress, BJP, Communists, or or

abstain abstain (multiple (multiple choicechoice))

CDS Mphil Econometrics Vijayamohan

ExamplesExamplesBinary Choice Model Binary Choice Model

Page 5: Dummy X -variables Dummy Y-variables - WordPress.com · 2011-05-09 · Dummy Y -variables Discrete Choice Models Many situations in which the dependent variable is not a continuous

3/3/2014

5

CDS Mphil Econometrics Vijayamohan

Censored Data:Censored Data:Limited dependent variable:Limited dependent variable:

Housing expenditure Housing expenditure ––some may not have purchased: some may not have purchased: so zeroso zero

ExamplesExamplesBinary Choice Model Binary Choice Model

Two questions Two questions ––

(i)(i) Can we still use OLS to estimate such Can we still use OLS to estimate such outcomesoutcomes??

(ii)(ii) If If not, how do we model such outcomes?not, how do we model such outcomes?

CDS Mphil Econometrics Vijayamohan

Binary Choice Model Binary Choice Model

Binary Choice Binary Choice ––

(i)(i) Linear Probability Model (LPM)Linear Probability Model (LPM)

(ii)(ii) LogitLogit/ / ProbitProbit ModelModel

Censored/ Limited Dependent Variable Censored/ Limited Dependent Variable Regression ModelRegression Model

(i)(i) TobitTobit ModelModel

CDS Mphil Econometrics Vijayamohan

Binary Choice Model Binary Choice Model

CDS M Phil Econometrics Vijayamohan

283-Mar-14

Linear Probability ModelLinear Probability Model

We We focus on single equation binary outcomes:focus on single equation binary outcomes:

A fundamental difference between a quantitative A fundamental difference between a quantitative response model response model and and a qualitative response a qualitative response model:model:

The latter The latter is is a a probability model. probability model.

{ }10 ,∈iy

3-Mar-14 CDS M Phil Econometrics Vijayamohan

29

Linear Probability ModelLinear Probability Model

In general:In general:

ProbProb(event (event j occursj occurs) ) = P(Y = j= P(Y = j) ) = f (relevant variables; = f (relevant variables; parametersparameters))= f(x= f(x ii, , ββ))

where [ ]ik1ii x,...,xx =

are the variables and β is a vector of parameters.

;),x(fP)x1y(P iiii β===

;),x(f1P1)x0y(P iiii β−=−==

CDS Mphil Econometrics Vijayamohan

Given Given

yi follows Bernoulli probability distribution

{ }10 ,∈iy

Linear Probability ModelLinear Probability Model

Page 6: Dummy X -variables Dummy Y-variables - WordPress.com · 2011-05-09 · Dummy Y -variables Discrete Choice Models Many situations in which the dependent variable is not a continuous

3/3/2014

6

CDS M Phil Econometrics Vijayamohan

313-Mar-14

How do we How do we specify f(xspecify f(xii,,ββ)?)?

•• Linear Probability ModelLinear Probability Model

An obvious choice is the familiar least squares procedure:

β=β ii x),x(f ⇒ iii uxy +β=

This leads to the linear probability model (LPM).

)1y(P1)0y(P0)xy(E iiii =⋅+=⋅=

β=== iiiii x)x1y(P)xy(E

Conditional expectation = conditional probability

The regression eqn describes the probability that yi = 1 given information on xi

CDS Mphil Econometrics Vijayamohan

Assuming E(u) = 0, it follows that

Linear Probability ModelLinear Probability Model

XXk

1

0

β1 +β2Xk

y, P

β1

The probability of the event occurring, p, is assumed to be a linear function of the variable X.

CDS Mphil Econometrics Vijayamohan

The case of a single explanatory variable:

yi = β1 +β2 Xi

Linear Probability ModelLinear Probability Model

Now an example….Now an example….CDS Mphil Econometrics Vijayamohan

CDS Mphil Econometrics Vijayamohan

Changed into binary: 0 = none; 1 = ≥ 1

http

://fa

irmod

el.e

con.

yale

.edu

/rayf

air/w

orks

d.ht

m

0 → 4511 → 150

F → 315M → 286

CDS M Phil Econometrics Vijayamohan

363-Mar-14

N

Min

Statistic

Max

Statistic

Mean

Statistic

Std.

Dev

Skewness

Kurtosis

Statistic

Std.

Error Statistic

Std.

Error

Have extramarital

affairs 601 0 1 0.250 0.433 1.160 0.100 -0.656 0.199

Sex 601 0 1 0.476 0.500 0.097 0.100 -1.997 0.199

Age group 601 17.5 57 32.488 9.289 0.889 0.100 0.232 0.199

Married years group 601 0.125 15 8.178 5.571 0.078 0.100 -1.571 0.199

Have children 601 0 1 0.715 0.452 -0.958 0.100 -1.087 0.199

Religiocity group 601 1 5 3.116 1.168 -0.089 0.100 -1.008 0.199

Education group 601 9 20 16.166 2.403 -0.250 0.100 -0.302 0.199

Occupation group 601 1 7 4.195 1.819 -0.741 0.100 -0.776 0.199

Marriage rating

group 601 1 5 3.932 1.103 -0.836 0.100 -0.204 0.199

Interpretation ?

LPM LPM –– Ray Fair ModelRay Fair Model

Page 7: Dummy X -variables Dummy Y-variables - WordPress.com · 2011-05-09 · Dummy Y -variables Discrete Choice Models Many situations in which the dependent variable is not a continuous

3/3/2014

7

CDS Mphil Econometrics Vijayamohan

LPM LPM –– Ray Fair ModelRay Fair Model

CDS Mphil Econometrics Vijayamohan

UnstandardizedStandar-

dized

Beta

t Sig.

B Std. Error

(Constant) 0.736 0.152 4.859 0.000

Sex 0.045 0.040 0.052 1.129 0.259

Age -0.007 0.003 -0.159 -2.463 0.014

Married years 0.016 0.005 0.206 2.911 0.004

Have children 0.054 0.047 0.057 1.168 0.243

Religiocity -0.054 0.015 -0.145 -3.608 0.000

Education 0.003 0.009 0.017 0.360 0.719

Occupation 0.006 0.012 0.025 0.499 0.618

Marriage rating -0.087 0.016 -0.223 -5.472 0.000

Dependent Variable: Extramarital affairs

Interpretation ?

LPM LPM –– Ray Fair ModelRay Fair Model

CDS Mphil Econometrics Vijayamohan

Chap 13-39

LPM LPM –– Ray Fair ModelRay Fair Model

CDS Mphil Econometrics Vijayamohan

sex Ageyears

married childrenreligiou

sEducat-

ionOccup-ation

marriagrating

PredProb

1 57 15 1 1 20 7 1 0.614

Predicted Probabilities

LPM LPM –– Ray Fair ModelRay Fair Model

0 57 15 1 1 20 7 1 0.569

1 57 15 1 1 20 7 5 0.2650 57 15 1 1 20 7 5 0.219

CDS Mphil Econometrics Vijayamohan

sex Ageyears

married childrenreligiou

sEducat-

ionOccup-ation

marriagrating

PredProb

1 27 4 0 5 9 1 5 -0.0270 27 4 0 5 9 1 5 -0.072

Predicted Probabilities

NegativeNegative probability !probability !

LPM LPM –– Ray Fair ModelRay Fair ModelSome Some serious shortcomings.serious shortcomings.

(i)(i) The The distribution of the disturbance is nondistribution of the disturbance is non--normalnormal..

As y As y can can take only one take only one of two of two values, the values, the error term also error term also has a discrete (nonhas a discrete (non--normal) distributionnormal) distribution..

β−= ixyuThe probability distribution of u is:

β−=⇒= ii x1u1y

β−=⇒= ii xu0y

In effect, u follows a Bernoulli distribution

CDS Mphil Econometrics Vijayamohan

The Linear Probability Model

with Pi

with (1 – Pi)

Page 8: Dummy X -variables Dummy Y-variables - WordPress.com · 2011-05-09 · Dummy Y -variables Discrete Choice Models Many situations in which the dependent variable is not a continuous

3/3/2014

8

CDS Mphil Econometrics Vijayamohan

Normal PP Plot of Regression Standardized Residuals

Histogram

LPM LPM –– Ray Fair ModelRay Fair Model

CDS M Phil Econometrics Vijayamohan

443-Mar-14

The Linear Probability Model

((ii) the ii) the error term is error term is heteroskedasticheteroskedastic

)x1)(x()u(Var ii β−β=

Which clearly varies with the value of xWhich clearly varies with the value of xii

Some serious shortcomings.Some serious shortcomings.

CDS Mphil Econometrics Vijayamohan

Chap 13-45

LPM LPM –– Ray Fair ModelRay Fair Model

CDS M Phil Econometrics Vijayamohan

463-Mar-14

these these problems problems not insurmountablenot insurmountable::

•• Problem of nonProblem of non--normality can be normality can be

circumvented provided we have a large circumvented provided we have a large

sample size (invoke the central limit theorem)sample size (invoke the central limit theorem)

•• Problem of Problem of heteroskedasticityheteroskedasticity can be can be

removed by using White’s removed by using White’s heteroskedasticheteroskedastic

standard errorsstandard errors

The Linear Probability Model

CDS M Phil Econometrics Vijayamohan

473-Mar-14

The Linear Probability ModelThe Linear Probability Model

(iii) The main problem (iii) The main problem isis

the the NonNon--fulfilment of 0 fulfilment of 0 ≤≤ E(YE(Yii) ) ≤≤ 11

There is no There is no guaranteeguarantee that the predicted values that the predicted values of Y will all lie between 0 and 1. of Y will all lie between 0 and 1.

NegativeNegativeprobability !probability !

sex Ageyears

married childrenreligiou

sEducat-

ionOccup-ation

marriagrating

PredProb

1 27 4 0 5 9 1 5 -0.0270 27 4 0 5 9 1 5 -0.072

CDS Mphil Econometrics Vijayamohan

The Linear Probability ModelThe Linear Probability Model

What we require therefore is a way of constraining the LPM constraining the LPM so that the predicted probabilities do lie in the [0,1] range.

In general we use alternative estimation models to do this.

Page 9: Dummy X -variables Dummy Y-variables - WordPress.com · 2011-05-09 · Dummy Y -variables Discrete Choice Models Many situations in which the dependent variable is not a continuous

3/3/2014

9

CDS M Phil Econometrics Vijayamohan

493-Mar-14

The SolutionThe Solution

0.00

0.25

0.50

0.75

1.00

-8 -6 -4 -2 0 2 4 6

The usual way of avoiding this problem is to hypothesize that

the probability is a sigmoid (S-shaped) function of Z, F(Z),

where Z is a function of the explanatory variables.

Several mathematical functions are sigmoid in character.

The SolutionThe Solution

0.00

0.25

0.50

0.75

1.00

-8 -6 -4 -2 0 2 4 6

CDS Mphil Econometrics Vijayamohan

Alternatives to Alternatives to

The Linear Probability ModelThe Linear Probability Model

CDS M Phil Econometrics Vijayamohan

523-Mar-14

Alternatives Alternatives

• The distribution

– Normal: PROBIT, natural for behavior

– Logistic: LOGIT, allows “thicker tails”

– Gompertz: asymmetric, underlies the

basic logit model for multiple choice

Underlying Probability Distributions for Binary ChoiceUnderlying Probability Distributions for Binary Choice

CDS Mphil Econometrics Vijayamohan

3 March 2014 Vijayamohan CDS 54

The The LogitLogit ModelModel

0.00

0.25

0.50

0.75

1.00

-8 -6 -4 -2 0 2 4 6

Z

Z

Z e1

e

e1

1)Z(F

+=

+= −

β= ixZ where

Several mathematical functions are sigmoid in Several mathematical functions are sigmoid in character. character.

One One is the logistic is the logistic function. function.

)(ZF

Z

Page 10: Dummy X -variables Dummy Y-variables - WordPress.com · 2011-05-09 · Dummy Y -variables Discrete Choice Models Many situations in which the dependent variable is not a continuous

3/3/2014

10

3 March 2014 Vijayamohan CDS 55

The The LogitLogit ModelModel

0.00

0.25

0.50

0.75

1.00

-8 -6 -4 -2 0 2 4 6

Z

Z

Z e1

e

e1

1)Z(F

+=

+= −

β= ixZ where

As As Z Z →→ ∞∞, e, e-- ZZ →→ 0 and p 0 and p →→ 1 1

(but cannot exceed 1).

As As Z Z →→ –– ∞∞, e, e-- ZZ →→ ∞∞ and and p p →→ 0 0

(but cannot be below 0).

)(ZF

Z

CDS M Phil Econometrics Vijayamohan

563-Mar-14

Normal distribution vs. Normal distribution vs. Logistic distributionLogistic distribution

CDS M Phil Econometrics Vijayamohan

573-Mar-14

Logistic distributionLogistic distributionThe Logistic distribution has density function: The Logistic distribution has density function:

wherewhere

aa is the mean of the distributionis the mean of the distribution

bb is the scale parameteris the scale parameter

ee is the base of the natural logarithm, Euler's e is the base of the natural logarithm, Euler's e

(2.71...) (2.71...)

2b/)az(

b/)az(

)e1(

e)b/1()z(f −−

−−

+=

Here a = 0; b = 1, 2, and 3

CDS M Phil Econometrics Vijayamohan

583-Mar-14

Logistic distributionLogistic distributionWith a = 0 and b = 1, the Logistic distribution has density

function:

Integrating the pdf gives the distribution function:

2z

z

)e1(

e)z(f −

+= –∞ < z < ∞

ze1

1)z(F −+

= –∞ < z < ∞

Here a = 0; b = 1, 2, and 3

Z

)(ZF

CDS Mphil Econometrics Vijayamohan

0.00

0.25

0.50

0.75

1.00

-8 -6 -4 -2 0 2 4 6Z

Z

Z e1

e

e1

1

+=

+= −

β= ixZ where

PPii = E(y = 1|X= E(y = 1|Xii) = F(Z)) = F(Z)

;e1

1P1

zi+

=−

zZ

z

i

i ee1

e1

P1

P=

++=

− −

;e1

1P

zi −+=

Odds ratioOdds ratio

The Logit Model: Odds Ratio

3 March 2014 Vijayamohan CDS 60

The Logit Model: Odds Ratio

)(ZF

0.00

0.25

0.50

0.75

1.00

-8 -6 -4 -2 0 2 4 6 Z

zZ

z

i

i ee1

e1

P1

P=

++

=− −

Odds ratio

β= ixZ where

Taking log of the odds ratio,Taking log of the odds ratio,β==

−= i

i

i xZP1

PlnL

L is called L is called LogitLogit..

Hence the Hence the LogitLogit modelmodel

Now an example….Now an example….

Page 11: Dummy X -variables Dummy Y-variables - WordPress.com · 2011-05-09 · Dummy Y -variables Discrete Choice Models Many situations in which the dependent variable is not a continuous

3/3/2014

11

CDS Mphil Econometrics Vijayamohan

CDS Mphil Econometrics Vijayamohan

Changed into binary: 0 = none; 1 = ≥ 1

http

://fa

irmod

el.e

con.

yale

.edu

/rayf

air/w

orks

d.ht

mht

tp://

fairm

odel

.eco

n.ya

le.e

du/ra

yfai

r/wor

ksd.

htm

0 → 4511 → 150

F → 315M → 286

LogitLogit –– Ray Fair ModelRay Fair Model

Regression variables estimates

BB S.E.S.E. WaldWald dfdf Sig.Sig. Exp(B)Exp(B)

SexSex 0.2800.280 0.2390.239 1.3741.374 11 0.2410.241 1.3241.324

Age Age --0.0440.044 0.0180.018 5.8815.881 11 0.0150.015 0.9570.957

Married years Married years 0.0950.095 0.0320.032 8.6558.655 11 0.0030.003 1.0991.099

Have childrenHave children 0.3980.398 0.2920.292 1.8611.861 11 0.1730.173 1.4881.488

Religiocity Religiocity --0.3250.325 0.0900.090 13.08913.089 11 0.0000.000 0.7230.723

Education Education 0.0210.021 0.0510.051 0.1740.174 11 0.6770.677 1.0211.021

Occupation Occupation 0.0310.031 0.0720.072 0.1860.186 11 0.6670.667 1.0311.031

Marriage Marriage

rating rating --0.4680.468 0.0910.091 26.55526.555 11 0.0000.000 0.6260.626

ConstantConstant 1.3771.377 0.8880.888 2.4072.407 11 0.1210.121 3.9643.964

CDS Mphil Econometrics VijayamohanCDS Mphil Econometrics Vijayamohan

Logit – Ray Fair ModelRegression variables estimates

CDS Mphil Econometrics Vijayamohan

Logit – Ray Fair ModelOdds ratio estimates

Regression variables estimatesRegression variables estimates

Logi

tLo

git––

Ray

Fai

r Mod

elR

ay F

air M

odel BB S.E.S.E. WaldWald Sig.Sig. Exp(B)Exp(B)

SexSex 0.2800.280 0.2390.239 1.3741.374 0.2410.241 1.3241.324

Age Age --0.0440.044 0.0180.018 5.8815.881 0.0150.015 0.9570.957

Married years Married years 0.0950.095 0.0320.032 8.6558.655 0.0030.003 1.0991.099

Have childrenHave children 0.3980.398 0.2920.292 1.8611.861 0.1730.173 1.4881.488

Religiocity Religiocity --0.3250.325 0.0900.090 13.08913.089 0.0000.000 0.7230.723

Education Education 0.0210.021 0.0510.051 0.1740.174 0.6770.677 1.0211.021

Occupation Occupation 0.0310.031 0.0720.072 0.1860.186 0.6670.667 1.0311.031

Marriage rating Marriage rating --0.4680.468 0.0910.091 26.55526.555 0.0000.000 0.6260.626

ConstantConstant 1.3771.377 0.8880.888 2.4072.407 0.1210.121 3.9643.964

Wald = (B/SE)Wald = (B/SE)22 = t= t22

Only 4 variables significantly different from zero Only 4 variables significantly different from zero at at αα = 0.05 = 0.05

Page 12: Dummy X -variables Dummy Y-variables - WordPress.com · 2011-05-09 · Dummy Y -variables Discrete Choice Models Many situations in which the dependent variable is not a continuous

3/3/2014

12

CDS Mphil Econometrics Vijayamohan

Logit – Ray Fair ModelB S.E. Wald Sig. Exp(B)

Sex 0.280 0.239 1.374 0.241 1.324

Age -0.044 0.018 5.881 0.015 0.957

Married years 0.095 0.032 8.655 0.003 1.099

Have children 0.398 0.292 1.861 0.173 1.488

Religiocity -0.325 0.090 13.089 0.000 0.723

Education 0.021 0.051 0.174 0.677 1.021

Occupation 0.031 0.072 0.186 0.667 1.031

Marriage rating -0.468 0.091 26.555 0.000 0.626

Constant 1.377 0.888 2.407 0.121 3.964

If B > 0, OR > 1; If B > 0, OR > 1; If B < 0, OR < 1. If B < 0, OR < 1. If B = 0, odds unchanged If B = 0, odds unchanged

z

i

i eP1

P=

Exp(Bi) = odds ratio =

Factor by which the odds change when the ithindependent variable ↑ by one unit.

e.g., When No. of years married e.g., When No. of years married ↑↑ by 1 unitby 1 unit, log , log of odds of odds for affairs for affairs ↑↑ by 1.099 or 9.9%, ceteris paribus. by 1.099 or 9.9%, ceteris paribus.

CDS M Phil Econometrics Vijayamohan

683-Mar-14

LogitLogit –– Ray Fair ModelRay Fair Model

Exp(BExp(Bii) = ) = odds ratio =odds ratio =

z

i

i eP1

P=

BB S.E.S.E. WaldWald Sig.Sig. Exp(B)Exp(B)

SexSex 0.2800.280 0.2390.239 1.3741.374 0.2410.241 1.3241.324

Age Age --0.0440.044 0.0180.018 5.8815.881 0.0150.015 0.9570.957

Married years Married years 0.0950.095 0.0320.032 8.6558.655 0.0030.003 1.0991.099

Have childrenHave children 0.3980.398 0.2920.292 1.8611.861 0.1730.173 1.4881.488

ReligiocityReligiocity --0.3250.325 0.0900.090 13.08913.089 0.0000.000 0.7230.723

Education Education 0.0210.021 0.0510.051 0.1740.174 0.6770.677 1.0211.021

Occupation Occupation 0.0310.031 0.0720.072 0.1860.186 0.6670.667 1.0311.031

Marriage rating Marriage rating --0.4680.468 0.0910.091 26.55526.555 0.0000.000 0.6260.626

ConstantConstant 1.3771.377 0.8880.888 2.4072.407 0.1210.121 3.9643.964

e.g., e.g., ReligiocityReligiocity significantly reduces significantly reduces incidence of incidence of

extramarital affairs! extramarital affairs!

‘Married years’ significantly contributes to ‘Married years’ significantly contributes to extramarital extramarital

affairs!affairs!

324.1e 28.0 =

CDS M Phil Econometrics Vijayamohan

693-Mar-14

LogitLogit –– Ray Fair ModelRay Fair Model

Exp(BExp(Bii) = ) = odds ratio =odds ratio =

z

i

i eP1

P=

BB S.E.S.E. WaldWald Sig.Sig. Exp(B)Exp(B)

SexSex 0.2800.280 0.2390.239 1.3741.374 0.2410.241 1.3241.324

Age Age --0.0440.044 0.0180.018 5.8815.881 0.0150.015 0.9570.957

Married years Married years 0.0950.095 0.0320.032 8.6558.655 0.0030.003 1.0991.099

Have childrenHave children 0.3980.398 0.2920.292 1.8611.861 0.1730.173 1.4881.488

ReligiocityReligiocity --0.3250.325 0.0900.090 13.08913.089 0.0000.000 0.7230.723

Education Education 0.0210.021 0.0510.051 0.1740.174 0.6770.677 1.0211.021

Occupation Occupation 0.0310.031 0.0720.072 0.1860.186 0.6670.667 1.0311.031

Marriage rating Marriage rating --0.4680.468 0.0910.091 26.55526.555 0.0000.000 0.6260.626

ConstantConstant 1.3771.377 0.8880.888 2.4072.407 0.1210.121 3.9643.964

Those who are married longer are about 1.1 times Those who are married longer are about 1.1 times

more likely to have more likely to have extramarital affairs than those extramarital affairs than those

recently married! recently married!

CDS Mphil Econometrics Vijayamohan

Logit – Ray Fair ModelModel Discrimination: Goodness of FitModel Discrimination: Goodness of Fit

Compares the observed and predicted group memberships.Cases with a cut value of 0.5 or greater are classified as having extramarital affairs.

>≤

= 5.0y if 1

5.0y if 0y

i

i*i

CDS Mphil Econometrics Vijayamohan

PredictedPredicted ObservedObserved

11

11

00

00

25/150

25/41435/451

435/560

Model DiscriminationModel Discrimination

LogitLogit –– Ray Fair ModelRay Fair Model

1. The likelihood (L): probability of the observed results, given the parameter estimates.

As L is a small number, < 1, we use – 2 times the log of L (–2LL)

A good model is one with a high L of the observed results ⇒ small value for – 2LL.

(If a model fits perfectly, L = 1; –2LL = 0.)

Usually compare with the –2LL of a model with only the constant.

Goodness of Fit

Model DiscriminationModel Discrimination

Page 13: Dummy X -variables Dummy Y-variables - WordPress.com · 2011-05-09 · Dummy Y -variables Discrete Choice Models Many situations in which the dependent variable is not a continuous

3/3/2014

13

CDS M Phil Econometrics Vijayamohan

733-Mar-14

Logit – Ray Fair ModelGoodness of Fit

The likelihood (L):

Usually compare with the –2LL of a model with only the constant.

Model is good, if –2LL(with all variables) < –2LL(with only constant),

Model DiscriminationModel Discrimination3-Mar-14 CDS M Phil Econometrics

Vijayamohan74

Logit – Ray Fair ModelGoodness of Fit

Pseudo R2 : similar to R2. to quantify the proportion of explained ‘variation’ in the logistic regression model.

1. Cox & Snell R2 and Nagelkerke R2: (in SPSS)

(a) Cox & Snell R2 : N/2

2CS

)Max(L

)0(L1R

−=

Mod

el D

iscr

imin

atio

n

3-Mar-14 CDS M Phil Econometrics Vijayamohan

75

Logit – Ray Fair ModelGoodness of Fit

(a) Cox & Snell R2 :

L(0) = L(with only the constant): Constrained

L(Max) = L(with all variables): Unconstrained

N = sample size.

Cannot achieve a maximum value of 1.

N/22CS

)Max(L

)0(L1R

−=

Mod

el D

iscr

imin

atio

n

CDS Mphil Econometrics Vijayamohan

Logit – Ray Fair Model

(b)(b)NagelkerkeNagelkerke RR22::

where where RR22

CSCS = = Cox & Snell R2

Goodness of Fit

2max

2CS2

NR

RR =

N/22max )]0(L[1R −=

Mod

el D

iscr

imin

atio

n

CDS Mphil Econometrics Vijayamohan

Logit – Ray Fair ModelGoodness of Fit

Mod

el D

iscr

imin

atio

n

3-Mar-14 CDS M Phil Econometrics Vijayamohan

78

Logit – Ray Fair Model: Goodness of Fit

No. of correct predictions = 435 + 25 = 460No. of correct predictions = 435 + 25 = 460N (No. of observations) = 601N (No. of observations) = 601

Count RCount R22 = = (No. of correct predictions)/N(No. of correct predictions)/N460/601 460/601 = 0.765= 0.765

Mod

el D

iscr

imin

atio

n

Page 14: Dummy X -variables Dummy Y-variables - WordPress.com · 2011-05-09 · Dummy Y -variables Discrete Choice Models Many situations in which the dependent variable is not a continuous

3/3/2014

14

LogitLogit –– Ray Fair ModelRay Fair Model

Predicted ProbabilitiesPredicted Probabilities

i

i

Z

Z

ie

eyP

+==

11)( oror

;e1

1P

zi −+=

BB

SexSex 0.2800.280

Age Age --0.0440.044

Married years Married years 0.0950.095

Have childrenHave children 0.3980.398

Religious Religious --0.3250.325

Education Education 0.0210.021

Occupation Occupation 0.0310.031

Marriage rating Marriage rating --0.4680.468

ConstantConstant 1.3771.377

sexsex AgeAge

MarriedyMarriedy

earsears childrenchildren religiousreligious

EducatEducat--

ionion

OccupOccup--

ationation

marriagmarriag

ratingrating ZiZi PiPi

11 5757 1515 11 11 2020 77 11 0.7990.799 0.6900.690

00 5757 1515 11 11 2020 77 11 0.5180.518 0.6270.627

11 5757 1515 11 11 2020 77 55 --1.0751.075 0.2540.254

00 5757 1515 11 11 2020 77 55 --1.3561.356 0.2050.205

11 2727 44 00 55 99 11 55 --2.9042.904 0.0520.052

00 2727 44 00 55 99 11 55 --3.1843.184 0.0400.040

Another commonly used distribution: the probit.

PROBIT MODELPROBIT MODEL

Here the sigmoid function is the cumulative standardized normal distribution.

2Z2

1

e2

1)Z()Z(f

π=φ=

∫∞−

π=Φ=

Z Z2

1 2

e2

1)Z()Z(F

CDS Mphil Econometrics Vijayamohan –∞ < z < ∞

CDS M Phil Econometrics Vijayamohan

813-Mar-14

COMPARISON OF COMPARISON OF LOGIT LOGIT AND PROBITAND PROBIT

How do logit and probit models compare?

• Results quite similar although the logistic distribution has slightly fatter tails

• Variance of the probit is 1 (standard normal distribution).

For the logit it is 3

π

CDS M Phil Econometrics Vijayamohan

823-Mar-14

COMPARISON OF COMPARISON OF LOGIT LOGIT AND PROBITAND PROBIT

How do logit and probit models compare?

• Amemiya (1981) : the relationship between probit and logit models :

βprobit= 0.625βlogit and

βlogit= 1.6βprobit

ESTIMATION OF BINARY PROBIT ESTIMATION OF BINARY PROBIT AND LOGIT MODELSAND LOGIT MODELS

The logit and probit are non-linear. The parameters enter the regression model in a non-linear fashion.

We can no longer use OLS.

Hence the method of Maximum Likelihood.

CDS Mphil Econometrics Vijayamohan