dca disaggregate 1
TRANSCRIPT
-
7/28/2019 DCA Disaggregate 1
1/66
Discrete choice models have played an important role in transportation modeling for the last 25
years. They are namely used to provide a detailed representation of the complex aspects of
transportation demand, based on strong theoretical justifications. Moreover, several packagesand tools are available to help practionners using these models for real applications, making
discrete choice models more and more popular.
Discrete choice models are powerful but complex. The art of finding the appropriate model for a
particular application requires from the analyst both a close familiarity with the reality underinterest and a strong understanding of the methodological and theoretical background of the
model.
The main theoretical aspects of discrete choice models are reviewed in this paper. The mainassumptions used to derive discrete choice models in general, and random utility models in
particular, are covered in detail. The Multinomial Logit Model, the Nested Logit Model and the
Generalized Extreme Value model are also discussed.
In the context of transportation demand analysis, disaggregate models have played an importantrole these last 25 years. These models consider that the demand is the result of several decisions
of each individual in the population under consideration. These decisions usually consist of a
choice made among a finite set of alternatives. An example of sequence of choices in the context
of transportation demand is described in Figure 1: choice of an activity (play-yard), choice ofdestination (6th street), choice of departure time (early), choice of transportation mode (bike) and
choice of itinerary (local streets). For this reason, discrete choice models have been extensively
used in this context.
Figure 1: A sequence of choices
A model, as a simplified description of the reality, provides a betterunderstandingof complex
systems. Moreover, it allows for obtainingprediction of future states of the considered system,controllingorinfluencingits behavior and optimizingits performances.
The complex system under consideration here is a specific aspect of human behavior dedicatedto choice decisions. The complexity of this ``system'' clearly requires many simplifying
assumptions in order to obtain operational models. A specific model will correspond to a specific
set of assumptions, and it is important from a practical point of view to be aware of these
assumptions when prediction, control or optimization is performed.
http://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexamplehttp://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexample -
7/28/2019 DCA Disaggregate 1
2/66
The assumptions associated with discrete choice models in general are detailed in Section2.
Section 3 focuses specifically on assumptions related to random utility models. Some of the most
used models, the Multinomial Logit Model (Section4), the Nested Logit Model (Section 5) andthe Generalized Extreme value Model (Section6), are then introduced, with special emphasis on
the Nested Logit model.
Among the many publications that can be found in the literature, we refer the reader to Ben-
Akiva and Lerman (1985), Anderson, De Palma and Thisse (1992), Hensher and Johnson (1981)and Horowitz, Koppelman and Lerman (1986) for more comprehensive developments.
In order to develop models capturing how individuals are making choices, we have to make
specific assumptions. We will distinguish here among assumptions about
1. the decision-maker: these assumptions define who is the decision-maker, and what are
his/her characteristics;2. the alternatives: these assumptions determine what are the possible options of the
decision-maker;3. the attributes: these assumptions identify the attributes of each potential alternative that
the decision-maker is taking into account to make his/her decision;
4. the decision rules: they describe the process used by the decision-maker to reach his/herchoice.
In order to narrow down the huge number of potential models, we will consider some of these
assumptions as fixed throughout the paper. It does not mean that there is no other valid
assumption, but we cannot cover everything in this context. For example, even if continuous
models will be briefly described, discrete models will be the primary focus of this paper.
Decision-maker
As mentioned in the introduction, choice models are referred to as disaggregate models. It meansthat the decision-maker is assumed to be an individual. In general, for most practical
applications, this assumption is not restrictive. The concept of ``individual'' may easily been
extended, depending on the particular application. We may consider that a group of persons (a
household or a government, for example) is the decision-maker. In doing so, we decide to ignore
all internal decisions within the group, and to consider only the decision of the group as a whole.The example described in Figure 1 reflects the decisions of a household, without accounting for
all potential negotiations among the parents and the children. We will refer to ``decision-maker''and individual'' interchangeably throughout the rest of the paper.
http://roso.epfl.ch/mbi/papers/discretechoice/node2.html#secassumptionshttp://roso.epfl.ch/mbi/papers/discretechoice/node2.html#secassumptionshttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#secmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#secmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#secnestedhttp://roso.epfl.ch/mbi/papers/discretechoice/node16.html#secGEVhttp://roso.epfl.ch/mbi/papers/discretechoice/node16.html#secGEVhttp://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexamplehttp://roso.epfl.ch/mbi/papers/discretechoice/node2.html#secassumptionshttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#secmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#secnestedhttp://roso.epfl.ch/mbi/papers/discretechoice/node16.html#secGEVhttp://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexample -
7/28/2019 DCA Disaggregate 1
3/66
Because of its disaggregate nature, the model has to include the characteristics, or attributes, of
the individual. Many attributes, like age, gender, income, eyes color or social security number
may be considered in the model .
The analyst has to identify those that are likely to explain the choice of the individual. There isno automatic process to perform this identification. The knowledge of the actual application and
the data availability play an important role in this process.
Alternatives
Analyzing the choice of an individual requires the knowledge of what has been chosen, but also
of what has notbeen chosen. Therefore, assumptions must be made about options, or
alternatives, that were considered by the individual to perform the choice. The set containingthese alternatives, called the choice set, must be characterized.
The characterization of the choice set depends on the context of the application. If we consider
the example described in Figure 2, the time spent on each Internet site may be anything, as far as
the total time is not more than two hours. The resulting choice set is represented in Figure3,and is defined by
It is a typical example of a continuous choice set, where the alternatives are defined by some
constraints and cannot be enumerated.
Figure 2: Choice on Internet
http://roso.epfl.ch/mbi/papers/discretechoice/node4.html#figexamplecontinuoushttp://roso.epfl.ch/mbi/papers/discretechoice/node4.html#figcontinuoushttp://roso.epfl.ch/mbi/papers/discretechoice/node4.html#figcontinuoushttp://roso.epfl.ch/mbi/papers/discretechoice/footnode.html#782http://roso.epfl.ch/mbi/papers/discretechoice/node4.html#figexamplecontinuoushttp://roso.epfl.ch/mbi/papers/discretechoice/node4.html#figcontinuous -
7/28/2019 DCA Disaggregate 1
4/66
Figure 3: Example of a continuous choice set
In this paper, we focus on discrete choice sets. A discrete choice set contains a finite number ofalternatives that can be explicitly listed. The corresponding choice models are called discrete
choice models. The choice of a transportation mode is a typical application leading to a discrete
choice set. In this context, the characterization of the choice set consists in the identification ofthe list of alternatives. To perform this task, two concepts of choice set are considered: the
universalchoice set and the reducedchoice set.
The universal choice set contains all potential alternatives in the context of the application.
Considering the mode choice in the example of Figure 1, the universal choice set may contain allpotential transportation modes, like walk, bike, bus, car, etc. The alternative plane, which is
also a transportation mode, is clearly not an option in this context and, therefore, is not included
in the universal choice set.
The reduced choice set is the subset of the universal choice set considered by a particularindividual. Alternatives in the universal choice set that are not available to the individual under
consideration are excluded (for example, the alternative car may not be an option for individuals
without a driver license). The awareness of the availability of the alternative by the decision-
maker should be considered as well. The reader is referred to Swait (1984) for more details onchoice set generation. In the following, ``choice set'' will refer to the reduced choice set, except
when explicitly mentioned.
Attributes
http://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexamplehttp://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexamplehttp://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexample -
7/28/2019 DCA Disaggregate 1
5/66
Each alternative in the choice set must be characterized by a set of attributes. Similarly to the
characterization of the decision-maker described in Section2.1, the analyst has to identify theattributes of each alternatives that are likely to affect the choice of the individual. In the context
of a transportation mode choice, the list of attributes for the mode car could include the traveltime, the out-of-pocket cost and the comfort. The list forbus could include the travel time, the
out-of-pocket cost, the comfort and the bus frequency. Note that some attributes may be generic
to all alternatives, and some may be specific to an alternative (bus frequency is specific to bus).
Also, qualitative attributes, like comfort, may be considered.
An attribute is not necessarily a directly observed quantity. It can be any function of availabledata. For example, instead of considering travel time as an attribute, the logarithm of the travel
time may be considered. The out-of-pocket cost may be replaced by the ratio between the out-of-
pocket cost and the income of the individual. The definition of attributes as a function ofavailable data depends on the problem. Several definitions must usually be tested to identify the
most appropriate.
Decision rules
At this point, we have identified and characterized both the decision-maker and all availablealternatives. We will now focus on the assumptions about the rules used by the decision-maker to
come up with the actual choice. Different sets of assumptions can be considered, that leads to
different family of models. We will describe here three theories on decision rules, and thecorresponding models. The neoclassical economic theory, described in Section2.4.1, introduces
the concept ofutility. The Luce model (Section2.4.2) and the random utility models (introducedin Section2.4.3 and developed in Section 3) are designed to capture uncertainty.
Neoclassical Economic Theory
The neoclassical economic theory assumes that each decision-maker is able to compare two
alternatives a and b in the choice set using a preference-indifference operator . If , thedecision-maker either prefers a to b, or is indifferent. The preference-indifference operator is
supposed to have the following properties:
1. Reflexivity:
http://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#secneoclassicalhttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#secneoclassicalhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#secintrorumhttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#secintrorumhttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#secneoclassicalhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#secintrorumhttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrum -
7/28/2019 DCA Disaggregate 1
6/66
2. Transitivity:
3. Comparability:
Because the choice set is finite, the existence of an alternative which is preferred to all of them
is guaranteed, that is
More interestingly, and because of the three properties listed above, it can be shown that the
existence of a function
such that
is guaranteed. Therefore, the alternative defined in (2) may be identified as
It results that using the preference-indifference operator to make a choice is equivalent toassigning a value, called utility, to each alternative, and selecting the alternative associated
with the highest utility.
The concept of utility associated with the alternatives plays an important role in the context of
discrete choice models. However, the assumptions of neoclassical economic theory presents
strong limitations for practical applications. Indeed, the complexity of human behavior suggeststhat a choice model should explicitly capture some level of uncertainty. The neoclassical
economic theory fails to do so.
The exact source of uncertainty is an open question. Some models assume that the decision rules
are intrinsically stochastic, and even a complete knowledge of the problem would not overcomethe uncertainty. Others consider that the decision rules are deterministic, and motivate the
uncertainty from the impossibility of the analyst to observe and capture all dimensions of the
problem, due to its high complexity. Anderson et al. (1992) compare this debate with the one
between Einstein and Bohr, about the uncertainty principle in theoretical physics. Bohr arguedfor the intrinsic stochasticity of nature and Einstein claimed that ``Nature does not play dice''.
http://roso.epfl.ch/mbi/papers/discretechoice/node7.html#eqprefalthttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#eqprefalt -
7/28/2019 DCA Disaggregate 1
7/66
Two families of models can be derived, depending on the assumptions about the source of
uncertainty. Models with stochastic decision rules, like the model proposed by Luce (1959),
described in Section 2.4.2, or the ``elimination by aspects'' approach, proposed by Tverski(1972), assumes a deterministic utility and a probabilistic decision process. Random Utility
Models, introduced in Section 2.4.3 and developed in Section3, are based on the deterministic
decision rules from the neoclassical economic theory, where uncertainty is captured by randomvariables representing utilities.
The Luce model
An important characteristic of models dealing with uncertainty is that, instead of identifying one
alternative as the chosen option, they assign to each alternative aprobability to be chosen.
Luce (1959) proposed the choice axiom to characterize a choice probability law. The choice
axiom can be stated as follow.
Denoting the probability of choosing a in the choice set , and the probability ofchoosing one element of the subset within , the two following properties hold for any choice
set , and , such that .
1. If an alternative is dominated, that is if there exists such that b is always
preferred to a or, equivalently, , then removing a from does not modify
the probability of any other alternative to be chosen, that is
2. If no alternative is dominated, that is if for all , then the
choice probability is independent from the sequence of decisions, that is
The independence described by (7) can be illustrated using a example of transportation mode
choice, where we consider Car, Bike, Bus . We apply two different assumptions tocompute the probability of choosing ``car'' as a transportation mode.
1. The decision-maker may decide first to use a motorized mode (car or bus, in this case).
The probability of choosing ``car'' is then given by
http://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#secintrorumhttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#secintrorumhttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominated -
7/28/2019 DCA Disaggregate 1
8/66
2. Alternatively, the decision-maker may decide first to use a private transportation mode
(car or bike, in this case). The probability of choosing ``car'' is then given by
Equation (7) of the choice axiom imposes that both assumptions produce the same probability,
that is
The second part of the choice axiom can be interpreted in a different way. Luce (1959) has
shown that (7) is a sufficient and necessary condition for the existence of a function
, such that, for all , we have
Also, function v is unique up to a proportionality factor. If there exists verifying(11), then
where . Similarly to (3), may be interpreted as a utility function. We will elaborate
more on this result in Section 4.
Random Utility Models
Random utility models assume, as neoclassical economic theory, that the decision-maker has a
perfect discrimination capability. In this context, however, the analyst is supposed to have
incomplete information and, therefore, uncertainty must be taken into account. Manski (1997)identifies four different sources of uncertainty: unobserved alternative attributes, unobserved
individual attributes (called ``unobserved taste variations'' by Manski, 1997), measurement errorsand proxy, or instrumental, variables.
The utility is modeled as a random variable in order to reflect this uncertainty. More specifically,
the utility that individual i is associating with alternative a is given by
http://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucemultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#eqneoutilityhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#secmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucemultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#eqneoutilityhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#secmultinomial -
7/28/2019 DCA Disaggregate 1
9/66
where is the deterministic part of the utility, and is the stochastic part, capturing the
uncertainty. Similarly to the neoclassical economic theory, the alternative with the highest utility
is supposed to be chosen. Therefore, the probability that alternative a is chosen by decision-makeri within choice set is
Random utility models are the most used discrete choice models for transportation applications.
Therefore, the rest of the paper is devoted to them.
Random utility models
The derivation of random utility models is based on a specification of the utility as defined by
(13). Different assumptions about the random term and the deterministic term will produce
specific models. We present here the most usual assumptions that are used in practice. In
Section 3.1, common assumptions about the random part of the utility are discussed. Thedeterministic part is treated in Section 3.2
Assumptions on the random term
We will focus here on assumptions about the mean, the variance and the functional form of the
random term.
Figure 4: A binary model
http://roso.epfl.ch/mbi/papers/discretechoice/node9.html#equtilityhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#secrandomhttp://roso.epfl.ch/mbi/papers/discretechoice/node12.html#secdeterministichttp://roso.epfl.ch/mbi/papers/discretechoice/node12.html#secdeterministichttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#equtilityhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#secrandomhttp://roso.epfl.ch/mbi/papers/discretechoice/node12.html#secdeterministic -
7/28/2019 DCA Disaggregate 1
10/66
For all practical purposes, the mean of the random term is usually supposed to be zero. It can be
shown that this assumption is not restrictive. We do it here on a simple example. Considering the
example described in Figure 4, we denote the mean of the error term of each alternative by
and , respectively. Then, the error terms can be specified as
and
where and are random variables with zero mean. Therefore,
The terms and , called Alternative Specific Constants (ASC), are capturing the mean ofthe error term. Therefore, it can be assumed without loss of generality, that the error terms have
zero mean if the model specification includes these ASCs.
In practice, it is impossible to estimate the value of all ASCs from observed data. Considering
again the example of Figure4, the probability of choosing alternative 1, say, is not modified if anarbitrary constantKis added to both utilities. Therefore, only the difference between the two
ASCs can be identified. Indeed, from (17), we have
for any . If , we obtain
or, equivalently, defining ,
Defining produces the same result. This property can be generalized easily to models
with more than two alternatives, where only differences between ASCs can be identified.
http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqasc0http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqasc0 -
7/28/2019 DCA Disaggregate 1
11/66
It is common practice to constrain one ASC in the model to zero. From a modeling viewpoint,
the choice of the particular alternative whose ASC is constrained is purely arbitrary. However,
Bierlaire, Lotan and Toint (1997) have shown that the estimation process is influenced by thischoice. They propose a different technique of ASC specification which is optimal from an
estimation perspective.
To derive assumptions about the variance of the random term, we observe that the scale of the
utility may be arbitrarily specified. Indeed, for any , we have
The arbitrary decision about is equivalent to assuming a particular variance v of the distribution
of the error term. Indeed, if
we have also
We will illustrate this relationship with several examples in the remaining of this section.
Once assumptions about the mean and the variance of the error term distribution have beendefined, the focus is now on the actual functional form of this distribution. We will consider here
three different distributions yielding to three different families of models: linear, probit and logitmodels.
The linear model is obtained from the assumption that the density function of the error term is
given by
where , is an arbitrary constant. This density function is used to derive the
probability of choosing one particular alternative. Considering the example presented inFigure4, the probability is given by (23) (see Figure 5).
http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqlinearhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figlinearhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqlinearhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figlinear -
7/28/2019 DCA Disaggregate 1
12/66
Figure 5: Linear model
The linear model presents some problem for real applications. First, the probability associated
with extreme values ( in the example) is exactly zero. Therefore, if any extreme
event happens in the reality, the model will never capture it. Second, the discontinuity of thederivatives at -L andL causes problems to most of the estimation procedures. We conclude the
presentation of the linear model by emphasizing that the constantL determines the scale of the
distribution. For the binary example, . Using (21), we have that
assuming is equivalent to assuming . A common value for
L is 1/2, that is .
The Normal Probability Unit, or Probit, model is derived from the assumption that the error
terms are normally distributed, that is
where is an arbitrary constant. This density function is used to derive the
probability of choosing one particular alternative. Considering the example presented in
http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2 -
7/28/2019 DCA Disaggregate 1
13/66
Figure4, and assuming that and are normally distributed with zero mean, variances and
respectively, and covariance , the probability is given by (25) (see Figure 6).
where is the variance of
Figure 6: Probit model
The probit model is motivated by the Central Limit Theorem , assuming that the error terms
are the sum of independent unobserved quantities. Unfortunately, the probability function (25)has no closed analytical form, which limits practical use of this model. We refer the reader to
Daganzo (1979) for a comprehensive development of probit models. We conclude this shortintroduction of the probit model by looking at the scale parameter. Considering again the binary
example presented in Figure 4in the probit context, we have . Using (21),
we have that assuming is equivalent to assuming . It iscommon practice to arbitrary define , that is .
http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqprobithttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figprobithttp://roso.epfl.ch/mbi/papers/discretechoice/footnode.html#856http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqprobithttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2http://roso.epfl.ch/mbi/papers/discretechoice/footnode.html#856http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqprobithttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figprobithttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqprobithttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2 -
7/28/2019 DCA Disaggregate 1
14/66
Despite its complexity, the probit model has been applied to many practical problems (see
Whynes, Reedand and Newbold, 1996, Bolduc, Fortin and Fournier, 1996, Yai, Iwakura and
Morichi, 1997 among recent publications). However, the most widely used model in practicalapplications is probably the Logistic Probability Unit, or Logit, model. The error terms are now
assumed to be independent and identically Gumbel distributed. The density function of the
Gumbel distribution is given by (26) (see Figure 7).
where is the location parameter, and is the scale parameter.
Figure 7: Gumbel distribution
The mean of the Gumbel distribution is
where
is the Euler constant. The variance is
http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqdensitygumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figdensitygumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqdensitygumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figdensitygumbel -
7/28/2019 DCA Disaggregate 1
15/66
The Gumbel distribution is an approximation of the Normal law, as shown in Figure8, where theplain line represents the Normal distribution, and the dotted line the Gumbel distribution.
Figure 8: Comparison between Normal and Gumbel distribution
We derive the probability function for the binary example of Figure 4from the followingproperty of the Gumbel distribution. If is Gumbel distributed with location parameter and
scale parameter , and is Gumbel distributed with location parameter and scale parameter
, then follows a Logistic distribution with location parameter and scale
parameter (the name of the Logit model comes from this property). The density function of the
Logistic distribution is given by
where is the scale parameter. As a consequence, we have,
or, equivalently,
http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#fignormalgumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#fignormalgumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#fignormalgumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinary -
7/28/2019 DCA Disaggregate 1
16/66
In order to determine the relationship between the scale parameter and the variance of the
distribution, we compute . Using (21),
we have that assuming is equivalent to assuming . It is
common practice to arbitrary define , that is .
In most cases, the arbitrary decision about the scale parameter does not matter and can be safelyignored. But it is important not to completely forget its existence. Indeed, it may sometimes play
an important role. For example, utilities derived from different models can be compared only if
the value of is the same for all of them. It is usually not the case with the scale parameterscommonly used in practice, as shown in Table1. Namely, a utility estimated with a logit model
has to be divided by before being compared with a utility estimated with a probit model.
Table 1: Model comparison
The list of models presented here above is not exhaustive. Other assumptions about thedistribution of the error term will lead to other families of models. For instance, Ben-Akiva and
Lerman (1985) cite the arctan and the truncated exponential models. These models are not often
used in practice and we will not consider them here.
Assumptions on the deterministic term
The utility of each alternative must be a function of the attributes of the alternative itself and of
the decision-maker identified in Sections2.1 and 2.3. We can write the deterministic part of theutility that individual i is associating with alternative a as
http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#tabcomparisonhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#tabcomparisonhttp://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node5.html#secattributeshttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#tabcomparisonhttp://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node5.html#secattributes -
7/28/2019 DCA Disaggregate 1
17/66
where is a vector containing all attributes, both of individual i and alternative a. The function
defined in (33) is commonly assumed to be linear in the parameters, that is, ifn attributes are
considered,
where are parameters to be estimated. This assumption simplifies the formulation
and the estimation of the model, and is not as restrictive as it may seem. Indeed, nonlinear effects
can still be captured in the attributes definition, as mentioned in Section2.3.
Multinomial logit model
As introduced in the previous section, the logit model is derived from the assumption that the
error terms of the utility functions are independent and identically Gumbel distributed. These
models were first introduced in the context of binary choice models, where the logisticdistribution is used to derive the probability. Their generalization to more than two alternative is
referred to as multinomiallogit models.
If the error terms are independent and identically Gumbel distributed, with location parameter 0
and scale parameter , the probability that a given individual choose alternative i within is
given by
The derivation of this result is attributed to Holman and Marley by Luce and Suppes (1965). We
refer the reader to Ben-Akiva and Lerman (1985) and Anderson et al. (1992) for additional
details.
It is interesting to note that the multinomial logit model can also be derived from the choice
axiom defined by (6) and (7). Indeed, defining and , we have that (11) isequivalent to (35).
An important property of the multinomial logit model is the Independence from Irrelevant
Alternatives (IIA). This property can be stated as follows. The ratio of the probabilities of any
two alternatives is independent from the choice set. That is, for any choice sets and such that
, for any alternative and in , we have
http://roso.epfl.ch/mbi/papers/discretechoice/node12.html#eqdeterministicutilhttp://roso.epfl.ch/mbi/papers/discretechoice/node5.html#secattributeshttp://roso.epfl.ch/mbi/papers/discretechoice/node5.html#secattributeshttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucedominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucemultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node12.html#eqdeterministicutilhttp://roso.epfl.ch/mbi/papers/discretechoice/node5.html#secattributeshttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucedominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucemultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomial -
7/28/2019 DCA Disaggregate 1
18/66
This result can be proven easily using (35). Ben-Akiva and Lerman (1985) propose an equivalentdefinition: The ratio of the choice probabilities of any two alternatives is entirely unaffected by
the systematic utilities of any other alternatives.
The IIA property of multinomial logit models is a limitation for some practical applications. This
limitation is often illustrated by the red bus/blue bus paradox (see, for example, Ben-Akiva andLerman, 1985) in the modal choice context. We prefer here the path choice example presented in
Figure9.
Figure 9: A path choice example
The probability provided by the multinomial logit model (35) for this example are
which is not consistent with the intuitive result. This situation appears in choice problems with
significantly correlated alternatives, as it is clearly the case in the example. Indeed, alternatives2a and 2b are so similar that their utilities share many unobserved attributes of the path and,
therefore, the assumption of independence of the random part of these utilities is not valid in thiscontext.
The Nested Logit Model, presented in the next section, partly overcomes this limitation of the
multinomial logit model
Nested logit model
http://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#figiiahttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#figiiahttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#figiiahttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomial -
7/28/2019 DCA Disaggregate 1
19/66
The nested logit model, first derived by Ben-Akiva (1973), is an extension of the multinomial
logit model designed to capture correlations among alternatives. It is based on the partitioning of
the choice set into several nests such that
and
The utility function of each alternative is composed of a term specific to the alternative, and a
term associated with the nest. If , we have
The error terms and are supposed to be independent. As for the multinomial logit model,
error terms are supposed to be independent and identically Gumbel distributed, with scale
parameter . The distribution of is such that the random variable is Gumbel
distributed with scale parameter .
Each nest within the choice set is associated with a pseudo-utility, called composite utility,expected maximum utility, inclusive value oraccessibility in the literature. The composite utility
for nest is defined as
where is the component of the utility which is common to all alternatives in the nest .
The probability model is then given by
where
-
7/28/2019 DCA Disaggregate 1
20/66
and
The parameters and reflect the correlation among alternatives in the nest . Indeed, if
, we have
Clearly, we have
Ben-Akiva and Lermand (1985) derive condition (46) directly from utility theory. Note also that
if , we have .
The parameters and are closely related in the model. Actually, only their ratio is meaningful.It is not possible to identify them separately. A common practice is to arbitrarily constrain one of
them to a value (usually 1). The impacts of this arbitrary decision on the model are briefly
discussed in Section 5.1. We illustrate here the Nested Logit Model with the path choice example
described in Figure 9. First, the choice set is divided into and
. The deterministic components of the utilities are , ,
and . The composite utilities of each nest are
and
The probability of choosing each nest is then
http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01http://roso.epfl.ch/mbi/papers/discretechoice/node15.html#secnormnlmhttp://roso.epfl.ch/mbi/papers/discretechoice/node15.html#secnormnlmhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#figiiahttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01http://roso.epfl.ch/mbi/papers/discretechoice/node15.html#secnormnlmhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#figiia -
7/28/2019 DCA Disaggregate 1
21/66
and
where the value of has been assumed to be 1, without loss of generality. The probability ofeach alternative is then computed. We obtain
and
The values of , and as a function of are plotted on Figure 10. From
(46), we have that because has been arbitrarily defined as 1. We observe that,
when , the nested logit model produces the same results as the multinomial logit model
(37), and all probabilities are . On the other hand, when goes to infinity, and goes to 0,
the probability of each nest is closer and closer to 1/2. At the limit, the model is becoming abinary choice model, where the small detours a and b are ignored in the choice process.
http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#figprobnestedhttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01http://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eq13http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#figprobnestedhttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01http://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eq13 -
7/28/2019 DCA Disaggregate 1
22/66
Figure 10: Probability of each alternative as a function of .
Normalization of nested logit models
In order to compute the probabilities in the previous example, we have arbitrarily decided to
constraint to 1. Alternatively, we could have decided to constraint to 1. It is easy to show
that, in this case, we have
and
-
7/28/2019 DCA Disaggregate 1
23/66
which is equivalent to (51) and (52), replacing by .
A model where the scale parameter is arbitrarily constrained to 1 is said to be ``normalized
from the top''. A model where one of the parameters is constrained to 1 is said to be
``normalized from the bottom''. The latter may produce a simpler formulation of the model. Weillustrate it using the example of Figure 11.
Figure 11: A mode choice example
We have
and
If we impose , we can define , , and to obtain
the following expressions.
and
http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqexprob1http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqexprob2http://roso.epfl.ch/mbi/papers/discretechoice/node15.html#figexnormalizehttp://roso.epfl.ch/mbi/papers/discretechoice/node15.html#figexnormalizehttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqexprob1http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqexprob2http://roso.epfl.ch/mbi/papers/discretechoice/node15.html#figexnormalize -
7/28/2019 DCA Disaggregate 1
24/66
This formulation, proposed by Daly (1987), simplifies the estimation process. For this reason, it
has been adopted in estimation packages like ALOGIT (Daly, 1987) or HieLoW (Bierlaire, 1995,
Bierlaire and Vandevyvere, 1995).
We emphasize here that this formulation should be used with caution when the same parameters
are present in more than one nest. In this case, specific techniques, inspired from artificial treesproposed by Bradley and Daly (1991) must be used to obtain a correct specification of the
model. The description of these techniques is out of the scope of this paper.
A direct extension of the nested logit model consists in partionning some or all nests into sub-
nests, which can, in turn, be divided into sub-nests. Because of the complexity of these models,
their structure is usually represented as a tree, as suggested by Daly (1987). Clearly, the numberof potential structures, reflecting the correlation among alternatives, can be very large. No
technique has been proposed thus far to identify the most appropriate correlation structure
directly from the data.
We conclude our introduction of nested logit models by mentioning their limitations. Thesemodels are designed to capture choice problems where alternatives within each nestare
correlated. No correlation across nests can be captured by the Nested Logit Model. When
alternatives cannot be partitioned into well separated nests to reflect their correlation, Nested
Logit Models are not applicable. This is the case for most route choice problems. Several modelswithin the ``logit family'' have been designed to capture specific correlation structures. For
example, Cascetta (1996) captures overlapping paths in a route choice context using
commonality factors, Koppelman and Wen (1997) capture correlation between pair ofalternatives, and Vovsha (1997) proposes a cross-nested model allowing alternatives to belong to
more than one nest. The two last models are derived from the Generalized Extreme Value model,
presented in the next section.
Generalized extreme value model
The Generalized Extreme Value (GEV) model has been introduced by McFadden (1978) in the
context of residential location. This general model actually consists in a large family of modelsthat are consistent with random utility theory. The probability of choosing alternative i within
is given by
where is a differentiable function with the following properties.
-
7/28/2019 DCA Disaggregate 1
25/66
1. for all ,
2. G is homogeneous of degree , that is , for all ,
3. for all i such that , and
4. the kth partial derivative with respect to kdistinct is non-negative ifkis odd, and non-
positive ifkis even, that is, such that if and
if and , we have
As an example, we consider
which has the required properties, as it can be easily verified. Then,
which is the multinomial logit model. Similarly, the nested logit model can be derived with
It can be shown that property 4holds if , which is consistent with condition (46).
The Generalized Extreme Value model provides a nice theoretical framework for thedevelopment of new discrete choice models, like Koppelman and Wen (1997) and Vovsha
(1997) .
Conclusion
We have covered in this paper the main theoretical aspects of discrete choice models in general,and random utility models in particular. A good awareness of underlying assumptions is
necessary for an efficient use of these models for practical applications. In particular, we have
focused on the location parameters and the scale parameters in multinomial and nested logit
http://roso.epfl.ch/mbi/papers/discretechoice/footnode.html#2577http://roso.epfl.ch/mbi/papers/discretechoice/node16.html#prop4http://roso.epfl.ch/mbi/papers/discretechoice/node16.html#prop4http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01http://roso.epfl.ch/mbi/papers/discretechoice/footnode.html#2577http://roso.epfl.ch/mbi/papers/discretechoice/node16.html#prop4http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01 -
7/28/2019 DCA Disaggregate 1
26/66
models. Despite its importance, the role of these parameters tend to be underestimated by
practitioners. This may lead to incorrect specifications of the models, or incorrect interpretation
of the results.
AcknowledgmentsThis paper is based on a lecture given at the NATO Advanced Studies Institute Operations
Research and Decision Aid Methodologies in Traffic and Transportation Management,Balatonfured, Hungary, March 1997. Comments from the students and other lecturers of the ASI
have been very useful to write this paper. Moreover, I am very grateful to Moshe Ben-Akiva and
John Bowman for their valuable discussions and comments.
References
1
Simon P. Anderson, And de Palma, and Jacques-Franois Thisse.Discrete ChoiceTheory of Product Differentiation. MIT Press, Cambridge, Ma, 1992.
2
M. E. Ben-Akiva. Structure of passenger travel demand models. PhD thesis, Department
of Civil Engineering, MIT, Cambridge, Ma, 1973.
3M. E. Ben-Akiva and S. R. Lerman.Discrete Choice Analysis: Theory and Application to
Travel Demand. MIT Press, Cambridge, Ma., 1985.
4
Moshe Ben-Akiva and B. Franois. homogeneous generalized extreme value model.
Working paper, Department of Civil Engineering, MIT, Cambridge, Ma, 1983.
5M. Bierlaire. A robust algorithm for the simultaneous estimation of hierarchical logit
models. GRT Report 95/3, Department of Mathematics, FUNDP, 1995.
6
M. Bierlaire, T. Lotan, and Ph. L. Toint. On the overspecification of multinomial and
nested logit models due to alternative specific constants. Transportation Science, 1997.
(forthcoming).
7
M. Bierlaire and Y. Vandevyvere.HieLoW: the interactive user's guide. Transportation
Research Group - FUNDP, Namur, 1995.
8
Denis Bolduc, Bernard Fortin, and Marc-Andre Fournier. The effect of incentive policieson the practice location of doctors: A multinomial probit analysis.Journal of laboreconomics, 14(4):703, 1996.
9
M. A. Bradley and A.J. Daly. Estimation of logit choice models using mixed stated
preferences and revealed preferences information. InMethods for understanding travelbehaviour in the 1990's, pages 116-133, Qubec, mai 1991. International Association for
Travel Behaviour. 6th international conference on travel behaviour.
-
7/28/2019 DCA Disaggregate 1
27/66
10
Ennio Cascetta. A modified logit route choice model overcoming path overlapping
problems. Specification and some calibration results for interurban networks. InProceedings of the 13th International Symposium on the Theory of Road Traffic Flow
(Lyon, France), 1996.
11 C. F. Daganzo.Multinomial Probit: The theory and its application to demand
forecasting. Academic Press, New York, 1979.
12A. Daly. Estimating ``tree'' logit models. Transportation Research B, 21(4):251-268,
1987.
13
D. A. Hensher and L. W. Johnson.Applied discrete choice modelling. Croom Helm,London, 1981.
14
J. L. Horowitz, F. S. Koppelman, and S. R. Lerman.A self-instructing course in
disaggregate mode choice modeling. Technology Sharing Program, US Department ofTransportation, Washington, D.C. 20590, 1986.
15F. S. Koppelman and Chieh-Hua Wen. The paired combinatorial logit model: properties,
estimation and application. Transportation Research Board, 76th Annual Meeting,
Washington DC, January 1997. Paper #970953.
16
R. Luce.Individual choice behavior: a theoretical analysis. J. Wiley and Sons, New
York, 1959.
17R. D. Luce and P. Suppes. Preference, utility and subjective probabiblity. In R. D. Luce,
R. R. Bush, and E. Galanter, editors,Handbook of Mathematical Psychology, New York,
1965. J. Wiley and Sons.
18
C. Manski. The structure of random utility models. Theory and Decision, 8:229-254,
1977.
19
Andrey Andreyevich Markov. Calculation of probabilities. Tip. Imperatorskoi Akademii
Nauk, Sint Petersburg, 1900. (in Russian).
20D. McFadden. Modelling the choice of residential location. In A. Karlquist et al., editor,
Spatial interaction theory and residential location, pages 75-96, Amsterdam, 1978.
North-Holland.
21
J. Swait.Probabilistic choice set formation in transportation demand models. PhD thesis,
Department of Civil and Environmental Engineering, Massachussetts Institute ofTechnology, Cambridge, Ma, 1984.
22
A. Tversky. Elimination by aspects: a theory of choice. Psychological Review, 79:281-
299, 1972.
-
7/28/2019 DCA Disaggregate 1
28/66
23
Peter Vovsha. Cross-nested logit model: an application to mode choice in the Tel-Aviv
metropolitan area. Transportation Research Board, 76th Annual Meeting, WashingtonDC, January 1997. Paper #970387.
24
D.K. Whynes, G. Reedand, and P. Newbold. General practitioners' choice of referraldestination: A probit analysis.Managerial and Decision Economics, 17(6):587, 1996.
25
T. Yai, S. Iwakura, and S. Morichi. Multinomial probit with structured covariance forroute choice behavior. Transportation Research B, 31(3):195-208, June 1997.
Chapter
5
Discrete Dependent Variable Models
CHAPTER 5; SECTION A: LOGIT, NESTED LOGIT, & PROBIT
Purpose of Logit, Nested Logit, and Probit Models:
Logit, Nested Logit, and Probit models are used to model a relationship between a dependent
variableY and one or more independent variables X. The dependent variable, Y, is a discrete
variable that represents a choice, or category, from a set of mutually exclusive choices or
categories. For instance, an analyst may wish to model the choice of automobile purchase (from aset of vehicle classes), the choice of travel mode (walk, transit, rail, auto, etc.), the manner of an
automobile collision (rollover, rear-end, sideswipe, etc.), or residential location choice (high-density,
suburban, exurban, etc.). The independent variables are presumed to affect the choice or category
or the choice maker, and represent a priori beliefs about the causal or associative elements
important in the choice or classification process. In the case ofordinal scale variables, an ordered
logit or probitmodel can be applied to take advantage of the additional information provided by the
ordinal over the nominal scale (not discussed here).
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#146http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#92http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#92http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#92http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#192http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#146http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#92http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#92http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#192http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174 -
7/28/2019 DCA Disaggregate 1
29/66
1. Examples: An analyst wants to model:2. 1. The effect of household member characteristics, transportationnetwork
characteristics, and alternativemodecharacteristics on choice of transportation mode;bus, walk, auto, carpool, single occupant auto, rail, or bicycle.
3. 2. The effect of consumer characteristics on choice of vehicle purchase: sport utilityvehicle, van, auto, light pickup truck, or motorcycle.
4. 3. The effect of traveler characteristics and employment characteristics on airlinecarrier choice; Delta, United Airlines, Southwest, etc.5. 4. The effect of involved vehicle types, pre-crash conditions, and environmental
factors on vehicle crash outcome: property damage only, mild injury, severe injury,fatality.
Basic Assumptions/Requirements of Logit, Nested Logit, and Probit Models:
1) 1) The observations on dependentvariable Y are assumed to have been randomly sampled
from thepopulationof interest (even for stratified samples or choice-based samples).
2) 2) Y is caused by or associated with the Xs, and the Xs are determined by influences
(variables) outside of the model.
3) 3) There is uncertainty in the relation between Y and the Xs, as reflected by a scattering of
observations around the functional relationship.
4) 4) Thedistribution oferror terms must be assessed to determine if a selected model is
appropriate.
Inputs for Logit, Nested Logit, and Probit Models:
Discrete variable Y is the observed choice or classification, such as brand selection, transportation
modeselection, etc. For grouped data, where choices are observed for homogenous experimental
units or observed multiple times per experimental unit, the dependent variable is proportion of
choices observed.
One or more continuous and/or discrete variables X, which describe the attributes of the choice
maker or event and/or various attributes of the choices thought to be causal or influential in the
decision or classification process.
Outputs of Logit, Nested Logit, and Probit Models:
Functional form of relation between Y and Xs.
Strength ofassociation between Y and Xs (individual Xs and collective set of Xs).
Proportion of choice or classification uncertainty explained by hypothesized relation.
Confidence in predictions of future/other observations on Y given X.
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#17http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#17 -
7/28/2019 DCA Disaggregate 1
30/66
Logit, Nested Logit, and Probit Methodology:
-
7/28/2019 DCA Disaggregate 1
31/66
Examples of Logit, Nested Logit, and Probit:
PavementsKoehne, Jodi, Fred Mannering, and Mark Hallenbeck (1996). Analysis of Trucker and MotoristOpinions Toward Truck-lane Restrictions. Transportation Research Record #1560 pp. 73-82.
National Academy of Sciences.
TrafficMannering, Fred, Jodi Koehne and Soon-Gwan Kim. (1995). Statistical Assesssment of Public
Opinion Toward Conversion of General-Purpose Lanes to High-Occupancy Vehicle Lanes.
TransportationResearch Record #1485 pp. 168-176. National Academy of Sciences.
PlanningKoppelman, Frank S., and Chieh-Hua Wen (1998). Nested Logit Models: Which Are You Using?
TransportationResearch Record #1645 pp. 1-9. National Academy of Sciences.
Yai, Tetsuo, and Tetsuo Shimizu (1998). Multinomial Probit with Structured Covariance for ChoiceSituations with Similar Alternatives. Transportation Research Record #1645 pp. 69-75. National
Academy of Sciences.
McFadden, Daniel. Modeling the Choice of Residential Location. (1978). TransportationResearch
Record #673 pp. 72-77. National Academy of Sciences.
Horowitz, Joel L. (1984) Testing Disaggregate Travel Demand Models by Comparing Predicted
and Observed Market Shares. Transportation Research Record #976 pp. 1-7. National Academy
of Sciences.
Interpretation of Logit, Nested Logit, and Probit:
How is a choice modelequation interpreted?How do continuous andindicator variables differ in the choice model?How are beta coefficients interpreted?How is the Likelihood Ratio Test interpreted?How are t-statistics interpreted?How are phi and adjusted phi interpreted?How are confidence intervals interpreted?How are degrees of freedominterpreted?How are elasticities computed and interpreted?When is the independence of irrelevant alternatives (IIA) assumption violated?
Troubleshooting: Logit, Nested Logit, and Probit:
Shouldinteraction terms be included in the model?How many variables should be included in the model?What methods can be used to specify the relation between choice and the Xs?What methods are available for fixing heteroscedastic errors?
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#191http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#80http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#80http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#147http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#40http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#88http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#88http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#152http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#191http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#80http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#147http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#40http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#88http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#152 -
7/28/2019 DCA Disaggregate 1
32/66
What methods are used for fixing serially correlated errors?What can be done to deal with multi-collinearity?What is endogeneity and how can it be fixed?How does one know if the errors are Gumbel distributed?
Logit, Nested Logit, and Probit References:
Ben Akiva, Moshe and Steven R. Lerman. Discrete Choice Analysis: Theory and
Application to Predict Travel Demand. The MIT Press, Cambridge MA. 1985.
Greene, William H. Econometric Analysis. MacMillan Publishing Company, New York,
New York. 1990.
Ortuzar, J. de D. and L. G. Willumsen. Modelling Transport. Second Edition. John Wiley
and Sons, New York, New York. 1994.
Train, Kenneth. Qualitative Choice Analysis: Theory, Econometrics, and an Application to
Automobile Demand. The MIT Press, Cambridge MA. 1993.
Logit, Nested Logit, and Probit Methodology:
Postulate mathematical models from theory and past
research.
Discrete choice models (logit, nested logit, and probit) are used to develop models of behavioral
choice or of event classification. It is accepted a priorithat the analyst doesnt know the complexity
of the underlying relationships, and that any model of reality will be wrong to some degree. Choice
models estimated will reflect the a prioriassumptions of the modeler as to what factors affect the
decision process. Common applications of discrete choice models include choice of transportation
mode, choice of travel destination choice, and choice of vehicle purchase decisions. There are
many potential applications of discrete choice models, including choice of residential location,
choice of business location, andtransportationproject contractor selection.
In order to postulate meaningful choice models, the modeler should review past literature regarding
the choice context and identify factors with potential to affect the decision making process. These
factors should drive the data-collection processusually a survey instrument given to experimental
units, to collect the information relevant in the decision making process. There is much written
about survey design and data collection, and these sources should be consulted for detaileddiscussions of this complex and critical aspect of choice modeling
Transportation Planning Example: An analyst is interested in modeling the mode choicedecision made by individuals in a region. The analyst reviews the literature and developsthe following list of potential factors influencing themodechoice decision for mosttravelers in the region.1. Trip maker characteristics (within the household context):Vehicle availability, possession of drivers license, household structure (stage of life-cycle),role in household, household income (value of time)
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173 -
7/28/2019 DCA Disaggregate 1
33/66
1. 2. Characteristics of the journey or activity:Journey or activity purpose; work, grocery shopping, school, etc., time of day, accessibilityand proximity of activity destination2. 3. Characteristics of transport facility:Qualitative Factors; comfort and convenience,
reliability and regularity, protection, securityQuantitative Factors; in-vehicle travel times, waiting and walking times, out-of-pocket
monetary costs, availability and cost of parking, proximity/accessibility of transportmode
Estimate choice models
Qualitative choice analysis methods are used to describe and/or predict discrete choices of
decision-makers or to classify a discrete outcome according to a host of regressors. The need to
modelchoice and/or classification arises in transportation, energy, marketing, telecommunications,
and housing, to name but a few fields. There are, as always, a set of assumptions or requirements
about thedatathat need to be satisfied. The response variable (choice or classification) must meet
the following three criteria.
1. 1. The set of choices or classifications must be finite.
2. 2. The set of choices or classifications must be mutually exclusive; that is, a
particular outcome can only be represented by one choice or classification.
3. 3. The set of choices or classifications must be collectively exhaustive, that is
all choices or classifications must be represented by the choice set or
classification.
Even when the 2nd and 3rd criteria are not met, the analyst can usually re-define the set of
alternatives or classifications so that the criteria are satisfied.
Planning Example: An analyst wishing tomodelmode choice for commute decisionsdefines the choice set as AUTO, BUS, RAIL, WALK, and BIKE. The modeler observed a
person in the database drove her personal vehicle to the transit station and then took abus, violating the second criteria. To remedy the modeling problem and similar problemsthat might arise, the analyst introduces some new choices (or classifications) into themodeling process: AUTO-BUS, AUTO-RAIL, WALK-BUS, WALK-RAIL, BIKE-BUS, BIKE-RAIL. By introducing these new categories the analyst has made the discrete choice datacomply with the stated modeling requirements.
Deriving Choice Models from Random Utility Theory
Choice models are developed from economic theories of random utility, whereas classification
models (classifying crash type, for example) are developed by minimizing classification errors with
respect to the Xs and classification levels Y. Because most of the literature in transportationis
focused on choice models and because mathematically choice models and classification models
are equivalent, the discussion here is based on choice models. Several assumptions are made
when deriving discrete choice models from random utility theory:
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272 -
7/28/2019 DCA Disaggregate 1
34/66
1. 1. An individual is faced with a finite set of choices from which only one can be chosen.
2. 2. Individuals belong to a homogenous population, act rationally, and possess perfect
information and always select the option that maximizes their net personal utility.
3. 3. If C is defined as the universal choice set of discrete alternatives, and J the number of
elements in C, then each member of the population has some subset of C as his or her choiceset. Most decision-makers, however, have some subset Cn, that is considerably smaller than
C. It should be recognized that defining a subset Cn, that is the feasible choice set for an
individual is not a trivial task; however, it is assumed that it can be determined.
4. 4. Decision-makers are endowed with a subset of attributes xn X, all measured attributes
relevant in the decision making process.
Planning Example: In identifying the choice set of travelmode the analyst identifies theuniversal choice set C to consist of the following:1. driving alone2. sharing a ride
3. taxi 4. motorcycle5. bicycle6. walking7. transit bus8. light rail transit
The analyst identifies a family whose choice set is fairly restricted because the do not owna vehicle, and so their choice set Cn is given by:1. 1. sharing a ride2. 2. taxi3. 3. bicycle4. 4. walking
5. 5. transit bus6. 6. light rail transit
The modeler, who is an OBSERVER of the system, does not possess complete information about
all elements considered important in the decision making process by all individuals making a
choice, so Utility is broken down into 2 components, V and :
Uin = (Vin + in);
where;
Uin is the overall utility of choice i for individual n,Vin is the systematic or measurably utility which is a function of xn and i
for individual n and choice i
in includes idiosyncrasies and taste variations, combined with
measurement or observations errors made by modeler, and is the randomutility component.
The errorterm allows for a couple of important cases: 1) two persons with the same measured
attributes and facing the same choice set make different decisions; 2) some individuals do not
select the best alternative (from the modelers point of view it demonstrated irrational behavior).
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103 -
7/28/2019 DCA Disaggregate 1
35/66
The decision maker n chooses the alternative from which he derives the greatest utility. In the
binomial or two-alternative case, the decision-maker chooses alternative 1 if and only if:
U1n U2n
or when:
V1n + 1n V2n + 2n.
In probabilistic terms, the probability that alternative 1 is chosen is given by:
Pr (1) = Pr (U1 U2)
= Pr (V1 + 1 V2 + 2)
= Pr (2 - 1 V1 - V2).
Note that this equation looks like a cumulative distribution functionfor a probability density. That is,
the probability of choosing alternative 1 (in the binomial case) is equal to the probability that the
difference in random utility is less than or equal to the difference in deterministic utility.
If = 2 - 1, which is the difference in unobserved utilities between alternatives 2 and 1 for travelers
1 through N (subscript not shown), then the probability distribution or density of , (), can be
specified to form specific classes of models.
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#95http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#95http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#95http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94 -
7/28/2019 DCA Disaggregate 1
36/66
A couple of important observations about the probability density given by F (V1 - V2) can be made.
1. 1. The error is small when there are large differences in systematic utility between
alternatives one and two.
2. 2. Large errors are likely when differences in utility are small, thus decision makers
are more likely to choose an alternative on the wrong side of the indifference line(V1 - V2 = 0).
Alternative 1 is chosen when V1 - V2 > 0 (or when > 0), and alternative 2 is chosen
whenV1 - V2 < 0.
Thus, for binomial models of discrete choice:
.
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103 -
7/28/2019 DCA Disaggregate 1
37/66
The cumulative distributionfunction, or CDF, typically looks like:
V1 -V2
This structure for the error term is a general result for binomial choice models. By making
assumptions about the probability density of the residuals, the modeler can choose between
several different binomial choice model formulations. Two types of binomial choice models are
most common and found in practice: the logit and the probit models. The logit model assumes a
logistic distribution of errors, and the probit model assumes a normal distributed errors. These
models, however, are not practical for cases when there are more than two cases, and the probit
modelis not easy to estimate (mathematically) for more than 4 to 5 choices.
Mathematical Estimation of Choice Models
Recall that choice models involve a response Y with various levels (a set of choices or
classification), and a set of Xs that reflect important attributes of the choice decision or
classification. Usually the choice or classification of Y is a modeled as a linear function or
combination of the Xs. Maximum likelihood methods are employed to solve for the betas in choice
models.
Consider the likelihood of a sampleof N independent observations with probabilities p1, p2,,pn.
The likelihood of the sample is simply the product of the individual likelihoods. The product is a
maximum when the most likely set of ps is used.
i.e. Likelihood L* = p1p2p3pn =
For the binary choice model:
L* = (1, , K) =
where, Prn (i) is a function of the betas, and i and j are alternatives 1 and 2 respectively. It is
generally mathematically simpler to analyze the logarithm ofL*, rather than the likelihood function
itself. Using the fact that ln (z1z2) = ln (z1) + ln (z2), ln (z)x = x ln (z), Pr (j)=1-Pr (i), and yjn = 1 yin,
the equation becomes:
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#232http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#232http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#232http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#232http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#232http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164 -
7/28/2019 DCA Disaggregate 1
38/66
The maximum ofL is solved by differentiating the function with respect to each of the betas and
setting the partial derivatives equal to zero, or the values of1, , K that provides the maximum
ofL . In many cases the log likelihood function is globally concave, so that if a solution to the first
order conditions exist, they are unique. This does not always have to be the case, however.Under general conditions the likelihood estimators can be shown to be consistent, asymptotically
efficient, and asymptotically normal.
In more complex and realistic models, the likelihood function is evaluated as before, but instead of
estimating one parameter, there are many parameters associated with Xs that must be estimated,
and there are as many equations as there are Xs to solve. In practice the probabilities that
maximize the likelihood functionare likely to be different across individuals (unlike the simplified
example above where all individuals had the same probability).
Because the likelihood function is between 0 and 1, the log likelihood function is negative. The
maximum to the log-likelihood function, therefore, is the smallest negative value of the log
likelihood function given thedataand specified probability functions.
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84 -
7/28/2019 DCA Disaggregate 1
39/66
Planning Example. Suppose 10 individuals making travel choices between auto (A) andtransit (T) were observed. All travelers are assumed to possess identical attributes (a really
poor assumption), and so the probabilities are not functions of betas but simply a functionof p, the probability of choosing Auto. The analyst also does not have any alternativespecific attributesa very naivemodelthat doesnt reflect reality. The likelihood functionwill be:L* = px(1-p)n-x= p7(1-p)3
where; p = probability that a traveler chooses A,1-p = probability that a traveler chooses T,n = number of travelers = 10
x = number of travelers choosing A.
Recall that the analyst is trying to estimate p, the probability that a traveler chooses A. If 7travelers were observed taking A and 3 taking T, then it can be shown that the maximumlikelihood estimate of p is 0.7, or in other words, the value ofL* is maximized when p=0.7and 1-p=0.3. All other combinations of p and 1-p result in lower values ofL*. To see this,the analyst plots numerous values ofL* for all integer values of P (T) from 0.0 to 10.0. Thefollowing plot is obtained:
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174 -
7/28/2019 DCA Disaggregate 1
40/66
Similarly (and in practice), one could use the loglikelihood function to derive the maximumlikelihood estimates, where L = log (L*) = Log [p7(1-p)3] = Log p7+ Log (1-p)3 = 7 Log p + 3Log (1-p).
LogLikehood Function
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164 -
7/28/2019 DCA Disaggregate 1
41/66
Note that in this simple modelp is the only parameter being estimated, so maximizing thelikelihood function L* or the log (L*) only requires one first order condition, the derivative of
p with respect to log (L*).
The Multinomial Logit Model
The multinomial logit (MNL) model is the most commonly applied model to explain and forecast
discrete choices due to its ease of estimation and foundation in utility theory. The MNL model is a
general extension of the binomial choicemodel to more than two alternatives. The universal choice
set is C, which contains j elements, and a subset of C for each individual C n,defines their restricted
choice sets. It should be noted that it is not a trivial task to define restricted choice sets for
individuals. In most cases Jn for decision maker n is less than or equal to J, the total number of
alternatives in the universal choice set, however it is often assumed that all decision makers facethe same set of universal alternatives.
Without showing the derivation, which can be found in the references for this chapter, the MNL
modelis expressed as:
Where;
1. 1. Utility for traveler n andmode i = Uin = Vin + in
2. 2. Pn (i) is the probability that traveler n chooses modei
3. 3. Numerator is utility formodei for travelern, denominator is the sum of
utilities for all alternative modes Cn for travelern
4. 4. The disturbances in are independently distributed
5. 5. The disturbances inare identically distributed
6. 6. The disturbances are Gumbel distributed with locationparameterand a
scaleparameter> 0.
The MNL model expresses the probability that a specific alternative is chosen is the exponent of
the utility of the chosen alternative divided by the exponent of the sum of all alternatives (chosen
and not chosen). The predicted probabilities are bounded by zero and one. There are several
assumptions embedded in the estimation of MNL models.
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#196http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#196http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#196http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#196http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#196http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174 -
7/28/2019 DCA Disaggregate 1
42/66
Linear in parameters restriction:
The linear in parameters restriction is made for convenience of estimation, which enables simple
and efficient estimation of parameters. When the functional form of the systematic component of
the utility function is linear in parameters, the MNL modelcan be written as:
where xin and xjn are vectors describing the attributes of alternatives i and j as well as attributes of
traveler n.
Independence from Irrelevant Alternatives Property (IIA)
Succinctly stated, the IIA property states that for a specific individual the ratio of the choiceprobabilities of any two alternatives is entirely unaffected by the systematic utilities of any other
alternatives. This property arises from the assumption in the deriv