part i: discrete choice models (theory and applications)binary outcomes ordinal outcomes multinomial...

Binary outcomes Ordinal outcomes Multinomial Logit Model

Part I: Discrete Choice Models (Theory andApplications)

Mauricio Sarrias

Universidad Catolica del NorteWorkshop SOCHER 2017

Fondecyt Project N 11160104, Individual-specific inference for choice models

October 5, 2017


1 Binary outcomesIntroduction to Linear Probability Model (LPM)WLSExampleNon-Linear Probability ModelProbit and LogitEstimationMarginal EffectsGoodness-of-FitStata Example

2 Ordinal outcomesMotivationLatent ApproachML EstimationInterpretation of ParametersStata Example

3 Multinomial Logit ModelIntroductionRUMProbabilities and EstimationFormulas using gmnl packageAn example


Outline of this workshop

Part I: Discrete choice models (Theory and Applications):Binary outcomes.Ordered outcomes.Multinomial Logit Model.

Part II: Discrete choice models and individual-heterogeneity(Theory and Applications):

Binary outcomes.Multinomial Logit Model.


Binary Dependent Variable

We will study method for estimating model with binary dependentvariable:

yi ={

1 if some event occurs0 if the even does not occurs

Some examples:Is an adult a member of the labor force?Did a citizen vote in the last election?Does a high school student decide to go to college?Is a consumer more likely to buy the same brand or try a newbrand?Does the individual migrate?


Binary Dependent Variable

So, the question is: How to estimate a model then thedependent variable is binary?

The first approach is to apply OLS as if the dependent variable iscontinuous.


What if OLS ... ?

The linear probability model is the regression model applied to abinary dependent variable. The structural model is:

yi = x′iβ + εi,

where

yi ={

1 if some event occurs0 if the even does not occurs

When y is a binary random variable, then:

E (yi|xi) = [1× Pr (yi = 1|xi)]+[0× Pr (yi = 0|xi)] = Pr (yi = 1|xi) = x′iβ


Linear Probability ModelFor a Single Independent Variable


Problems with the LPM

Heterokedasticity: If a binary random variable has mean µ,then its variance is µ(1− µ). Then:

Var (yi|xi) = Pr (yi = 1|xi) [1− Pr (yi = 1|xi)] = x′iβ (1− x′iβ) ,

which implies that the variance of the errors depend on the x’sand is not constant.Nonsensical Predictions: The LPM predicts values of y thatare negative or greater than 1.Functional Form: Since the model is linear, a unit increase inxk results in change of βk in the probability of an event. Theincrease is the same regardless of the current value of x.


Problems with the LPMMore on Functional Form

Consider the following two specifications:

yi = α+ βxi + δdi + εi

yi = F (α+ βxi + δdi)

where di is a dummy variable.The discrete change in y as d changes from 0 to 1, holding xconstant as:

∆y∆d = (α+ βxi + δ · 1)− (α+ βxi + δ · 0) = δ

For our second function, the discrete change is?


Problems with the LPMMore on Functional Form


WLS

Since we know the exact form of the heterokedasticity function we canuse Weighted-Least-Square (WLS)

In this case:

E(ε2i |X

)= Var (εi|X) = σ2

0hi(X) (1)

wherehi(X) = x′iβ

(1− x′iβ

)(2)

Then we can estimate the WLS estimator:

βWLS =[n∑i=1

wixix′i

]−1 [ n∑i=1

wixiyi

], wi = 1/hi (3)


Determinants of Personal Computer Ownership

Consider the following model:

PCi = β0 + β1hsGPAi + β2ACTi + β3parcolli + εi (4)

where:PC: binary indicator equal to unity if the student owns acomputer, zero otherwise.hsGPA: High school GPA.ACT: is achievement test score.parcoll: binary indicator equal to unity if at least one parentattended college.1

We use data in GPA1.dta to estimate the probability ofowning a computer.

1Separate college indicators for the mother and father do not yield individuallysignificant results, as these are pretty highly correlated.


* Open Datacd "/Users/mauriciosarrias/Documents/Clases/Discrete Choice Models/Examples/Binary"use "GPA1.DTA", clear

* Gen parcollquietly{gen parcoll = 0replace parcoll =1 if fathcoll == 1 | mothcoll == 1reg PC hsGPA ACT parcoll}

* Plot Predicted Valuesquietly margins, at(hsGPA = (16(0.5)33)) atmeansmarginsplot, yline(0, lcolor(red)) yline(1, lcolor(red))


* Predicted valuequi reg PC hsGPA ACT parcollpredict yhat, xbsum yhat

Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------

yhat | 141 .3971631 .1000667 .1700624 .4974409

* Weightsgen h_hat = yhat * (1 - yhat)


* Modelsquietly eststo ols : reg PC hsGPA ACT parcollquietly eststo olsr: reg PC hsGPA ACT parcoll, robustquietly eststo wls : reg PC hsGPA ACT parcoll [aweight = 1/ h_hat]esttab ols olsr wls, b se ///mtitle("OLS" "OLS Rob" "WLS")

------------------------------------------------------------(1) (2) (3)OLS OLS Rob WLS

------------------------------------------------------------hsGPA 0.0654 0.0654 0.0327

(0.137) (0.141) (0.130)

ACT 0.000565 0.000565 0.00427(0.0155) (0.0161) (0.0155)

parcoll 0.221* 0.221* 0.215*(0.0930) (0.0880) (0.0863)

_cons -0.000432 -0.000432 0.0262(0.491) (0.496) (0.477)

------------------------------------------------------------N 141 141 141------------------------------------------------------------Standard errors in parentheses* p<0.05, ** p<0.01, *** p<0.001


WLSFinal Remarks

WLS and Robust Standard Errors help us to lead withHeterokedasticity.... but it does not solve the problems related to interpretation.


Latent Variable Approach

The latent y∗ is assumed to be linearly related to the observed x’sthrough the structural model:

y∗i = x′iβ + εi

The latent variable y∗ is linked to the observed binary variable y bythe measurement equation:

yi ={

1, if y∗i > τ

0, if y∗i ≤ τ(5)


ProbitError term distributed as Normal

When ε is normal with E (ε|X) = 0 and Var (ε|X) = 1, the pdf is

φ(ε) = 1√2π

exp(−ε

2

2

)and the cumulative distribution function (cdf) is:

Φ(ε) =∫ ε

−∞

1√2π

exp(− t

2

2

)dt


LogitError term distributed as Logistic

In the logistic model, the errors are assumed to have a standardlogistic distribution with mean 0 and variance π2/3. This unusualvariance is chosen because it results in a particularly simple equationfor the pdf:

λ(ε) = exp (ε)[1 + exp (ε)]2

and an even simpler equation for the cdf:

Λ(ε) = exp (ε)1 + exp(ε)


Normal and Logistic Distribution


ProbitIf the εi’s are independently and normally distributed, εi ∼ N(0, σ2),then

Pr (yi = 1|xi) = Pr(εiσ> −x′iβ

σ

∣∣∣∣xi)= 1− Φ

(−x′iβ

σ

)= Φ

(x′iβσ

)by symmetry of standard normal distribution

The probability depends on β and σ,but only the ration β/σ is identified, but not the singleparameters β and σ.For instance, if β and σ each are multiplied by a constant c, thenthe probability remains unchanged.Typically, we let σ = 1 as normalization.


ML EstimationThe outcome is Bernoulli distributed, the binomial distribution withjust one trial. A very convenient compact notation for the density ofyi, or more formally its probability mass function, is:

f (yi|xi) = P yi

i (1− Pi)1−yi , yi = 1, 0where Pi = F (x′iβ). This yields probabilities Pi and (1− Pi) sincef(1) = P 1(1− P )0 = P and f(0) = P 0(1− P )1 = 1− P . Assumingthat each probability is independent of that other, the jointprobability function (or Likelihood function) is:

Pr (Y1 = y1, Y2 = y2, ..., Yn = yn|X) =N∏i=1

f (yi|xi)

The likelihood function for a sample of n obvervations can be writtenas

L

β|y,X︸︷︷︸data

=n∏i=1

[F (x′iβ)]yi [1− F (x′iβ)]1−yi


Log-Likelihood FunctionTaking logs, we obtain the Log-Likelihood function, which must bemaximized

logL (β|data) = log(

n∏i=1

[F (x′iβ)]yi [1− F (x′iβ)]1−yi

)

=n∑i=1

{log([F (x′iβ)]yi

)+ log

([1− F (x′iβ)]1−yi

)}=

n∑i=1{yi log [F (x′iβ)] + (1− yi) log [1− F (x′iβ)]}

Useful TrickIf the distribution is symmetric, as the normal and logistic, then1− F (x′iβ) = F (−x′iβ). Let qi = 2yi − 1. Then:

logL (β|data) =n∑i=1

logF (qix′iβ)


Asymptotic Distribution of Probit and Logit ModelRecall that

√n(β − β0

)d−→ N(0, I−1)

where I is the sample Fisher Information defined as:

I = −E[

∂2

∂β∂β′log f(yi|xi;β)

]Then, an estimator of the expected value of the Hessian (orinformation matrix), is given by:

I(β) = −E[∂2 logL∂β∂β′

]= 1n

n∑i=1

f2(x′iβ)F (x′iβ) [1− F (x′iβ)]xix

′i

(6)

We can also use the following estimators:(−∑ni=1 H(wi, β)

)−1

or the outer product of the gradients.


What about small samples?Griffiths et al. (1987)

We have three estimators.NR: inverse of the negative of the Hessian matrix from thelog-likelihood function.Scoring: inverse of the information matrix is used.BHHH: the inverse of the outer product of the first derivatives ofthe log-likelihood function.

They are asymptotically equivalent, but their performance canvary in finite samples.

They find that, on average, the Hessian matrix and the informationmatrix give almost identical results and lead to more accurateestimates of the asymptotic covariance matrix than does theestimator based on first derivatives.


Marginal EffectsRecall that:

E (yi|xi) = [1× Pr (yi = 1|xi)]+[0× Pr (yi = 0|xi)] = Pr (yi = 1|xi) = F (x′iβ)

The marginal effect is given by:

∂E (yi|xi)∂xi︸︷︷︸

(K×1)

=[dF (x′iβ)d(x′iβ)

]β = f(x′iβ)︸︷︷︸

(1×1)

β︸︷︷︸(K×1)

where f(·) is the probability density function.

So:

Probit =⇒ ∂E (yi|xi)∂xi

= φ(x′iβi)β

Logit =⇒ ∂E (yi|xi)∂xi

= Λ(x′iβ) [1− Λ(x′iβ)]β


Marginal EffectsComments

Some aspects are important to note:MEs will vary with the values of x.Current practices:

Calculate MEs at means of the variables.Calculate MEs at specific values.Evaluate MEs at every observation and use the sample average ofthe individual MEs.


Marginal EffectsDummy variable

The appropriate ME for a binary independent variable, say, d, wouldbe:

ME =[Pr(yi = 1|x(d), di = 1

)]−[Pr(yi = 1|x(d), di = 0

)]where x(d) denotes the means of all the other variables in the model.


Average Partial Effects

Current practice: “average partial effects”. The quantity of interest is:

APE = Ex[∂E(y|x)∂x

]In practical terms, this suggests the computation:

APE = ¯γ = 1n

n∑i=1

f(x′iβ)β.


Goodness-of-Fit

A number of suggestions have been made for how to evaluate theoverall quality of a binary response model.

First approach: to mimic the R2 measure.Assess the predictive performance of the model

R2 are not directly applicable in non-linear models such as binaryresponse models, since we do not have a proper variancedecomposition result

The so-called pseudo R2 measures have been suggested.


Goodness-of-Fit

Let:logL(βr): the value of the maximized log-likelihood function inthe constant-only model.logL(βu): is the maximized log-likelihood value in the full model.

Note that the value of the log-likelihood function is always negative,so:

logL(βu) ≥ logL(βr) =⇒∣∣∣logL(βu)

∣∣∣ ≤ ∣∣∣logL(βr)∣∣∣

So that:

0 ≤ 1− logL(βu)logL(βr)

= R2McFadden ≤ 1

The McFadden R2 will be zero if the full model has noexplanatory power.


Information Measures

Definition (Akaike’s Information Criterion (AIC))Akaike’s (1973) information criterion is defined as

AIC = −2 log L+ 2Kn

(7)

where log L is the likelihood of the model and K is the number ofparameters in the model.

Larger values of log L indicates a better fit.−2 log L ranges from to 0 +∞ with smaller values indicating abetter fit.As K increases, −2 log L becomes smaller since moreparameters make what is observed more likely.2K is added as a penalty.All else being equal, smaller values suggest a better fitting model.Use to compare models across different samples or to comparenonested models.


Information Measures

Definition (Bayes Information Criterion (BIC))BIC information criterion is defined as

BIC = −2 log L+K lognn

(8)

where log L is the likelihood of the model and K is the number ofparameters in the model and n is the number of individuals.

It is possible to increase the likelihood by adding parameters, butdoing so may result in overfitting. Both BIC and AIC resolve thisproblem by introducing a penalty term for the number of parametersin the model; the penalty term is larger in BIC than in AIC.


Stata Example

Open BinaryStata.do


Ordinal Outcomes

When a variable is ordinal, its categories can be ranked from lowto high, but the distances between adjacent categories areunknown.Examples:

Kind of Job: low skilled, medium skilled, and highly skilled.Subjective Health Status: very bad, bad, good, very good.Likert Scales: strongly agree, agree, have no opinion, disagree, orstrongly disagree with a statement.


Ordinal Outcomes

Researchers often treat ordinal dependent variables as if they wereinterval. The dependent categories are numbered sequantially and theLRM is used. This involves the implicit assumption that the intervalsbetween adjacent categories are equal.

For example, the distance between strongly agreeing and agreeing isassumed to be the same as the distance between agreeing and beingneutral on a Likert Scale.

This is not correct. See for example MacKelvey and Zavoina (1975)and Winship and Mare (1984).


Latent ApproachConsider the latent variable y∗i ranging from −∞ to ∞, determind bythe following structural model

y∗i = x′iβ + εi (9)However, we do not observe y∗i . We only observed a discrete variableyi, which is though of as providing incomplete information about andunderlying y∗i according to the following rule:

yi =

1 if y∗i < κ1

2 if κ1 < y∗i < κ2

3 if κ2 < y∗i

(10)

or more general mechanism that accounts for the ordering nature ofy∗i is

yi = j ⇐⇒ κj−1 < y∗i ≤ κj j = 1, ..., J (11)The κ’s are called thresholds or cutpoints. The extreme categories 1and J are defined by open-ended intervals with κ0 = −∞ andκJ =∞. When J = 2, we have the BRM.


Latent ApproachConsider the following question: How do you grade your actualhealth status? The answer are: 1 = Poor, 2 = Fair, 3 =Good, and 4 = Very Good. Assume that this ordinal variable isrelated to a continuous, latent variable y∗ that indicates andindividual’s underlying health status. The observed y is related to y∗according to:

yi =

1 =⇒ Poor if κ0 = −∞ ≤ y∗i < κ1

2 =⇒ Fair if κ1 ≤ y∗i < κ2

3 =⇒ Good if κ2 ≤ y∗i < κ3

4 =⇒ Very Good if κ3 ≤ y∗i < κ4 =∞

(12)


The probabilitiesFirst, consider the probability that y = 1. This implies that:

Pr(yi = 1|xi) = Pr(κ0 ≤ y∗i < κ1|xi)= Pr(κ0 ≤ x′iβ + εi < κ1|xi)= Pr(κ0 − x′iβ ≤ εi < κ1 − x′iβ|xi)= Pr(εi < κ1 − x′iβ|xi)− Pr(εi < κ0 − x′iβ|xi)= F (κ1 − x′iβ)− F (κ0 − x′iβ)

(13)

So, generalizing we obtain:

Pr(yi = j|xi) = F (κj − x′iβ)− F (κj−1 − x′iβ) (14)Note that:

F (κ0 − x′iβ) = F (−∞− x′iβ) = 0 (15)and

F (κJ − x′iβ) = F (∞− x′iβ) = 1 (16)


Ordered Probit Model

In the ordered probit model, we assume that the error term followa standard normal distribution, F (ε) = Φ(ε).

In this case, the probabilities are:

Pr(yi = j|xi) = Φ(κj − x′iβ

σε

)− Φ

(κj−1 − x′iβ

σε

)(17)

For identification we need σε


Ordered Probit Model


Ordered Logist Model

If εi ∼ iid Logistically distributed, then

Pr(yi = j|xi) = Λ(κj − x′iβ

σε

)− Λ

(κj−1 − x′iβ

σε

)(18)

where σε = π2/3


Identification

Since y∗ is latent, its means and variance cannot be estimated.For logit Var(ε|x) = π2/3.For Probit Var(ε|x) = 1

Can we estimate a constant is this model?


Estimation

The conditional probabilities are:

f(yi|xi;β, κ1, ..., κJ−1) = P di1i1 ...P diJ

iJ =J∏j=1

Pdij

ij (19)

where dij is defined as a binary indicator equal to one if yi = j and 0otherwise. For a sample of n independent pairs of obserations (yi,xi),the likelihood function is given by

L(θ; y,X) =n∏i=1

J∏j=1

Pdij

ij (20)


Estimation

Taking logarithms converts products into sums and we can write thelog-likelihood funcition as

logL(θ; y,X) =n∑i=1

J∑j=1

dijPij (21)

ML estimation of the parameters yields consistent, asymptoticallyefficient, and asymptotically normally distributed estimators.We can use LR, Wald, and Score test to test for generalrestrictions, and the invariance property together with the DeltaMethod for estimation and inference of predicted probabilities ormarginal probability effects.


Compesating Variation

DefinitionCV The variation in two regressors such that the latent variable doesnot change.

Let xil denote the l-element in xi and βl the correspondingparameter, and let m index the m-th elements in both vectors xi andβ, respectively.

Now consider a change in xil and xim at the same time, such that∆y∗i = 0 (and therefore all probabilities are unchanged). Thisrequires:

0 = βl∆xil + βm∆xim or ∆xil∆xim

= −βmβl

(22)

If, for example, xil is logarithmic income and xim is a dummy variableindicating unemployment, then the above ratio gives a trade-offration: The relative increase in income required to compensate for thenegative effect of unemployment


Compesating Variation

In some applications, it could be interesting to know how much anexplanatory varible should change to reach the next higher responsecategory. For this purpose, one could consider the ratio of the intervallength to the parameter

kj − kj−1

βl(23)

The smallest this ratio in absolute terms, the smallest the maximumchange in xil required to move the response from yi = j to yi = j + 1


Discrete Probability Effect

The ceteris paribus effect of a discrete change in one explanatoryvariable, say the l-th element in xi, is simply the difference betweenthe probabilities before and those after the change, given the values ofthe other variables:

∆Pij = Pr(yi = j|xi + ∆xil)− Pr(yi = j|xi)= [F (κj − βxi −∆xilβ)− F (κj−1 − βxi −∆xilβ)]− [F (κj − βxi)− F (κj−1 − βxi)]

(24)


Shift in Density Due to a Change ∆xi > 0 (β > 0)


Marginal Effects

The marginal probability effect (MPE) is given by:

MPEijl = ∂Pij

∂xil=[f(kj−1 − x′iβ)− f(kj − x′iβ)

]βl (25)

Neither the signs nor the magnitudes of the coefficients are directlyinterpretable in the ordered choice models.MPE’s are functions of the covariates and therefore vary acrossindividuals.To calculate average marginal probability effects (AMPE’s) wehave to take expectations with respect to x, which is estimatedconsistently by replacing β by its ML estimate β and averaging overthe sample.


Marginal Effects

Furthermore, in the computation of the MPE, the only certainties inthe signs of the partial effects in this model are as follows, where weconsider a variable with a positive coefficient:

Increases in that variable will increase the probability in thehighest cell and decrease the probability in the lowest cell.The sum of all the changes will be zero—the new probabilitiesmust still sum to one.The effects will begin at Pr(yi = 1) with one or more negativevalues, then change to a set of positives values; there will be onesign change (Single Crossing)


Marginal Effects

One might also be interested in cummulative values of the partialeffects, such as

∂ Pr(y ≤ j|xi)∂xil

=J∑j=1

[f(kj−1 − x′iβ)− f(kj − x′iβ)]βl (26)


Stata Example

Open Ordered Lab.do


Introduction

Type of dependent variableDiscrete choices: choie of one among several mutually exclusivealternatives:

Commuting mode: train, car, bycicle, walking.Type of car: standard car, hybrid car, electric car.

Type of dataRevealed preferences: data which means that the data areobserved choice of individuals for, say, a transport mode.Stated preferences data: in this case individuals face a virtualsituation of choice for example the choice between different typesof automated cars.


Design of the choice experiment

Figure: Sample of a Choice Situation Presented to Respondents


Random Utility Model

The utility of a individual:

Uij = Vij + εij

where Vij is the deterministic component, and εij is the randomcomponent. This random component is intended to capture thevarious unpredictable choices people make.

Deterministic PartIt is usually assumed that is a linear combination of the observedexplantory variables:

Vij = αj + x′ijβ + z′iγj + w′ijδj



Trhee kinds of variables:alternative specific variables xij with generic coefficient β.individual specific variables zi with an alternative specificcoefficients γj .alternative specific variables wij with an alternative specificcoefficient δj



Only differences between utility matters. Let two alternatives j and k

Vij − Vik = (αj − αk) + β(xij − xik) + (γj − γk)zi + (δjwij − δkwik)

Coefficients for individual specific variables (including theintercept) should be alternative specific. Normalization isrequired γ1 = 0.Coefficients for alternative specific variables may (or may not) bealternative specific.

Cost is alternative specific, but the cost of standard vehicles maynot have the same impact on utility that the cost in a electric car.


Probabilities

If the error terms are EV type I, then:

Pij = eVij∑j eVij

= ex′ijβ∑

j ex′

ijβ

(27)

For a sample of n independent pairs of obserations (yi,xij), thelog-likelihood function is given by

logL(θ; y,X) =∑i

∑j

yij logPij (28)

where yij is a dummy variable which equal to 1 if individual i madechoice j and 0 otherwise.


Marginal effects

The marginal effects respect to individual-specific zi oralternative-specific xij variables are:

∂Pij∂zi

= Pij

(βj −

∑l

Pilβl

)∂Pij∂xij

= γPij(1− Pij)

∂Pij∂xil

= −γPijPil


gmnl package in R


Formulas in gmnl package

#load packages and datalibrary("gmnl")library("mlogit")data("TravelMode", package = "AER")with(TravelMode, prop.table(table(mode[choice == "yes"])))

#### air train bus car## 0.2761905 0.3000000 0.1428571 0.2809524

head(TravelMode)

## individual mode choice wait vcost travel gcost income size## 1 1 air no 69 59 100 70 35 1## 2 1 train no 34 31 372 71 35 1## 3 1 bus no 35 25 417 70 35 1## 4 1 car yes 0 10 180 30 35 1## 5 2 air no 64 58 68 68 30 2## 6 2 train no 44 31 354 84 30 2



Data is in long-format (one row per available mode). We transformthe data in mlogit.format

# transform dataTM <- mlogit.data(TravelMode, choice = "choice",

shape = "long",alt.levels = c("air", "train", "bus", "car"))



Suppose we want to estimate a multinomial logit model wherethe variables wait and vcost are alternative-specific variableswith a generic coefficient β,income is an individual-specific variable with an alternativespecific coefficient γj ;and the variable travel is alternative-specific variable with analternative-specific coefficient δj .

f1 <- choice ˜ wait + vcost | income | travel



By default, the alternative-specific constants (ASC) for eachalternative are included. They can be omitted by adding +0 or -1 inthe second part of the formula.

f2 <- choice ˜ wait + vcost | income + 0 | travelf2 <- choice ˜ wait + vcost | income - 1 | travel



Some parts may by omitted when there is no ambiguity. For instance,a model with only individual-specific variables can be specified asfollows:

f2 <- choice ˜ 0 | income + size | 0f2 <- choice ˜ 0 | income + size | 1


Example

The dataset Train contains data about a stated preference survey inNetherlands. Users are asked to choose between two trains tripscharacterized by four attributes:

price: the price in cents of guilders,time: travel time in minutes,change: the number of changes,comfort: the class of comfort, 0, 1 or 2; 0 being the mostcomfortable class.

data("Train", package = "mlogit")Tr <- mlogit.data(Train, shape = "wide", choice = "choice",varying = 4:11, sep = "", alt.levels = c(1, 2), id = "id")


Example

We first convert price and time in more meaningful unities, hoursand euros (1 guilder is 2.20371 euros):

Tr$price <- Tr$price / 100 * 2.20371Tr$time <- Tr$time / 60


Example

Now we estimate the model: both alternatives being virtual equal(train trips), it is relevant to use only generic coefficients and toremove the ASC

mnl <- gmnl(choice ˜ price + time + change + comfort | -1,data = Tr,model = "mnl")


Examplesummary(mnl)

#### Model estimated on: Thu Oct 05 18:16:57 2017#### Call:## gmnl(formula = choice ˜ price + time + change + comfort | -1,## data = Tr, model = "mnl", method = "nr")#### Frequencies of categories:#### 1 2## 0.50324 0.49676#### The estimation took: 0h:0m:0s#### Coefficients:## Estimate Std. Error z-value Pr(>|z|)## price -0.0673581 0.0033933 -19.8506 < 2.2e-16 ***## time -1.7205517 0.1603517 -10.7299 < 2.2e-16 ***## change -0.3263410 0.0594892 -5.4857 4.118e-08 ***## comfort -0.9457257 0.0649455 -14.5618 < 2.2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Optimization of log-likelihood by Newton-Raphson maximisation## Log Likelihood: -1724.2## Number of observations: 2929## Number of iterations: 5## Exit of MLE: gradient close to zero


Example

Coefficients are not directly interpretable, but dividing them by theprice coefficient, we get the WTPs

coef(mnl)[-1] / coef(mnl)[1]

## time change comfort## 25.543370 4.844869 14.040276

or we can usewtp.gmnl(mnl, wrt = "price")

#### Willigness-to-pay respect to: price#### Estimate Std. Error t-value Pr(>|t|)## time 25.54337 2.09054 12.2185 < 2.2e-16 ***## change 4.84487 0.84345 5.7441 9.241e-09 ***## comfort 14.04028 0.88110 15.9349 < 2.2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We obtain the value of 26 euros for an hour of traveling, 5 euros for achange and 14 euros to access a more confortable class.


Example

Now we use the Fish data set.

data("Fishing", package = "mlogit")head(Fishing, 3)

## mode price.beach price.pier price.boat price.charter catch.beach## 1 charter 157.930 157.930 157.930 182.930 0.0678## 2 charter 15.114 15.114 10.534 34.534 0.1049## 3 boat 161.874 161.874 24.334 59.334 0.5333## catch.pier catch.boat catch.charter income## 1 0.0503 0.2601 0.5391 7083.332## 2 0.0451 0.1574 0.4671 1250.000## 3 0.4522 0.2413 1.0266 3750.000

Fish <- mlogit.data(Fishing, shape = "wide",varying = 2:9,choice = "mode")

mnl.fish <- gmnl(mode ˜ price | income | catch, data = Fish,model = "mnl")


Examplesummary(mnl.fish)

#### Model estimated on: Thu Oct 05 18:16:58 2017#### Call:## gmnl(formula = mode ˜ price | income | catch, data = Fish, model = "mnl",## method = "nr")#### Frequencies of categories:#### beach boat charter pier## 0.11337 0.35364 0.38240 0.15059#### The estimation took: 0h:0m:1s#### Coefficients:## Estimate Std. Error z-value Pr(>|z|)## boat:(intercept) 8.4184e-01 2.9996e-01 2.8065 0.0050080 **## charter:(intercept) 2.1549e+00 2.9746e-01 7.2443 4.348e-13 ***## pier:(intercept) 1.0430e+00 2.9535e-01 3.5315 0.0004132 ***## price -2.5281e-02 1.7551e-03 -14.4046 < 2.2e-16 ***## boat:income 5.5428e-05 5.2130e-05 1.0633 0.2876613## charter:income -7.2337e-05 5.2557e-05 -1.3764 0.1687089## pier:income -1.3550e-04 5.1172e-05 -2.6480 0.0080977 **## beach:catch 3.1177e+00 7.1305e-01 4.3724 1.229e-05 ***## boat:catch 2.5425e+00 5.2274e-01 4.8638 1.152e-06 ***## charter:catch 7.5949e-01 1.5420e-01 4.9254 8.417e-07 ***## pier:catch 2.8512e+00 7.7464e-01 3.6807 0.0002326 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Optimization of log-likelihood by Newton-Raphson maximisation## Log Likelihood: -1199.1## Number of observations: 1182## Number of iterations: 6## Exit of MLE: successive function values within tolerance limit


Example

Predicted probabilities for the outcome or for all alternatives ifoutcome = FALSE:

head(fitted(mnl.fish))

## 1.beach 2.beach 3.beach 4.beach 5.beach 6.beach## 0.3114002 0.4537956 0.4567631 0.3701758 0.4763721 0.4216448

head(fitted(mnl.fish, outcome = FALSE))

## beach boat charter pier## [1,] 0.09299770 0.5011740 0.3114002 0.09442818## [2,] 0.09151069 0.2749292 0.4537956 0.17976449## [3,] 0.01410359 0.4567631 0.5125571 0.01657626## [4,] 0.17065866 0.1947959 0.2643696 0.37017583## [5,] 0.02858216 0.4763721 0.4543225 0.04072325## [6,] 0.01029792 0.5572462 0.4216448 0.01081103