dca disaggregate 1

7/28/2019 DCA Disaggregate 1

1/66

Discrete choice models have played an important role in transportation modeling for the last 25

years. They are namely used to provide a detailed representation of the complex aspects of

transportation demand, based on strong theoretical justifications. Moreover, several packagesand tools are available to help practionners using these models for real applications, making

discrete choice models more and more popular.

Discrete choice models are powerful but complex. The art of finding the appropriate model for a

particular application requires from the analyst both a close familiarity with the reality underinterest and a strong understanding of the methodological and theoretical background of the

model.

The main theoretical aspects of discrete choice models are reviewed in this paper. The mainassumptions used to derive discrete choice models in general, and random utility models in

particular, are covered in detail. The Multinomial Logit Model, the Nested Logit Model and the

Generalized Extreme Value model are also discussed.

In the context of transportation demand analysis, disaggregate models have played an importantrole these last 25 years. These models consider that the demand is the result of several decisions

of each individual in the population under consideration. These decisions usually consist of a

choice made among a finite set of alternatives. An example of sequence of choices in the context

of transportation demand is described in Figure 1: choice of an activity (play-yard), choice ofdestination (6th street), choice of departure time (early), choice of transportation mode (bike) and

choice of itinerary (local streets). For this reason, discrete choice models have been extensively

used in this context.

Figure 1: A sequence of choices

A model, as a simplified description of the reality, provides a betterunderstandingof complex

systems. Moreover, it allows for obtainingprediction of future states of the considered system,controllingorinfluencingits behavior and optimizingits performances.

The complex system under consideration here is a specific aspect of human behavior dedicatedto choice decisions. The complexity of this ``system'' clearly requires many simplifying

assumptions in order to obtain operational models. A specific model will correspond to a specific

set of assumptions, and it is important from a practical point of view to be aware of these

assumptions when prediction, control or optimization is performed.
http://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexamplehttp://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexample


2/66

The assumptions associated with discrete choice models in general are detailed in Section2.

Section 3 focuses specifically on assumptions related to random utility models. Some of the most

used models, the Multinomial Logit Model (Section4), the Nested Logit Model (Section 5) andthe Generalized Extreme value Model (Section6), are then introduced, with special emphasis on

the Nested Logit model.

Among the many publications that can be found in the literature, we refer the reader to Ben-

Akiva and Lerman (1985), Anderson, De Palma and Thisse (1992), Hensher and Johnson (1981)and Horowitz, Koppelman and Lerman (1986) for more comprehensive developments.

In order to develop models capturing how individuals are making choices, we have to make

specific assumptions. We will distinguish here among assumptions about

1. the decision-maker: these assumptions define who is the decision-maker, and what are

his/her characteristics;2. the alternatives: these assumptions determine what are the possible options of the

decision-maker;3. the attributes: these assumptions identify the attributes of each potential alternative that

the decision-maker is taking into account to make his/her decision;

4. the decision rules: they describe the process used by the decision-maker to reach his/herchoice.

In order to narrow down the huge number of potential models, we will consider some of these

assumptions as fixed throughout the paper. It does not mean that there is no other valid

assumption, but we cannot cover everything in this context. For example, even if continuous

models will be briefly described, discrete models will be the primary focus of this paper.

Decision-maker

As mentioned in the introduction, choice models are referred to as disaggregate models. It meansthat the decision-maker is assumed to be an individual. In general, for most practical

applications, this assumption is not restrictive. The concept of ``individual'' may easily been

extended, depending on the particular application. We may consider that a group of persons (a

household or a government, for example) is the decision-maker. In doing so, we decide to ignore

all internal decisions within the group, and to consider only the decision of the group as a whole.The example described in Figure 1 reflects the decisions of a household, without accounting for

all potential negotiations among the parents and the children. We will refer to ``decision-maker''and individual'' interchangeably throughout the rest of the paper.
http://roso.epfl.ch/mbi/papers/discretechoice/node2.html#secassumptionshttp://roso.epfl.ch/mbi/papers/discretechoice/node2.html#secassumptionshttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#secmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#secmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#secnestedhttp://roso.epfl.ch/mbi/papers/discretechoice/node16.html#secGEVhttp://roso.epfl.ch/mbi/papers/discretechoice/node16.html#secGEVhttp://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexamplehttp://roso.epfl.ch/mbi/papers/discretechoice/node2.html#secassumptionshttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#secmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#secnestedhttp://roso.epfl.ch/mbi/papers/discretechoice/node16.html#secGEVhttp://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexample


3/66

Because of its disaggregate nature, the model has to include the characteristics, or attributes, of

the individual. Many attributes, like age, gender, income, eyes color or social security number

may be considered in the model .

The analyst has to identify those that are likely to explain the choice of the individual. There isno automatic process to perform this identification. The knowledge of the actual application and

the data availability play an important role in this process.

Alternatives

Analyzing the choice of an individual requires the knowledge of what has been chosen, but also

of what has notbeen chosen. Therefore, assumptions must be made about options, or

alternatives, that were considered by the individual to perform the choice. The set containingthese alternatives, called the choice set, must be characterized.

The characterization of the choice set depends on the context of the application. If we consider

the example described in Figure 2, the time spent on each Internet site may be anything, as far as

the total time is not more than two hours. The resulting choice set is represented in Figure3,and is defined by

It is a typical example of a continuous choice set, where the alternatives are defined by some

constraints and cannot be enumerated.

Figure 2: Choice on Internet
http://roso.epfl.ch/mbi/papers/discretechoice/node4.html#figexamplecontinuoushttp://roso.epfl.ch/mbi/papers/discretechoice/node4.html#figcontinuoushttp://roso.epfl.ch/mbi/papers/discretechoice/node4.html#figcontinuoushttp://roso.epfl.ch/mbi/papers/discretechoice/footnode.html#782http://roso.epfl.ch/mbi/papers/discretechoice/node4.html#figexamplecontinuoushttp://roso.epfl.ch/mbi/papers/discretechoice/node4.html#figcontinuous


4/66

Figure 3: Example of a continuous choice set

In this paper, we focus on discrete choice sets. A discrete choice set contains a finite number ofalternatives that can be explicitly listed. The corresponding choice models are called discrete

choice models. The choice of a transportation mode is a typical application leading to a discrete

choice set. In this context, the characterization of the choice set consists in the identification ofthe list of alternatives. To perform this task, two concepts of choice set are considered: the

universalchoice set and the reducedchoice set.

The universal choice set contains all potential alternatives in the context of the application.

Considering the mode choice in the example of Figure 1, the universal choice set may contain allpotential transportation modes, like walk, bike, bus, car, etc. The alternative plane, which is

also a transportation mode, is clearly not an option in this context and, therefore, is not included

in the universal choice set.

The reduced choice set is the subset of the universal choice set considered by a particularindividual. Alternatives in the universal choice set that are not available to the individual under

consideration are excluded (for example, the alternative car may not be an option for individuals

without a driver license). The awareness of the availability of the alternative by the decision-

maker should be considered as well. The reader is referred to Swait (1984) for more details onchoice set generation. In the following, ``choice set'' will refer to the reduced choice set, except

when explicitly mentioned.

Attributes
http://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexamplehttp://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexamplehttp://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexample


5/66

Each alternative in the choice set must be characterized by a set of attributes. Similarly to the

characterization of the decision-maker described in Section2.1, the analyst has to identify theattributes of each alternatives that are likely to affect the choice of the individual. In the context

of a transportation mode choice, the list of attributes for the mode car could include the traveltime, the out-of-pocket cost and the comfort. The list forbus could include the travel time, the

out-of-pocket cost, the comfort and the bus frequency. Note that some attributes may be generic

to all alternatives, and some may be specific to an alternative (bus frequency is specific to bus).

Also, qualitative attributes, like comfort, may be considered.

An attribute is not necessarily a directly observed quantity. It can be any function of availabledata. For example, instead of considering travel time as an attribute, the logarithm of the travel

time may be considered. The out-of-pocket cost may be replaced by the ratio between the out-of-

pocket cost and the income of the individual. The definition of attributes as a function ofavailable data depends on the problem. Several definitions must usually be tested to identify the

most appropriate.

Decision rules

At this point, we have identified and characterized both the decision-maker and all availablealternatives. We will now focus on the assumptions about the rules used by the decision-maker to

come up with the actual choice. Different sets of assumptions can be considered, that leads to

different family of models. We will describe here three theories on decision rules, and thecorresponding models. The neoclassical economic theory, described in Section2.4.1, introduces

the concept ofutility. The Luce model (Section2.4.2) and the random utility models (introducedin Section2.4.3 and developed in Section 3) are designed to capture uncertainty.

Neoclassical Economic Theory

The neoclassical economic theory assumes that each decision-maker is able to compare two

alternatives a and b in the choice set using a preference-indifference operator . If , thedecision-maker either prefers a to b, or is indifferent. The preference-indifference operator is

supposed to have the following properties:

1. Reflexivity:
http://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#secneoclassicalhttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#secneoclassicalhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#secintrorumhttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#secintrorumhttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#secneoclassicalhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#secintrorumhttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrum


6/66

2. Transitivity:

3. Comparability:

Because the choice set is finite, the existence of an alternative which is preferred to all of them

is guaranteed, that is

More interestingly, and because of the three properties listed above, it can be shown that the

existence of a function

such that

is guaranteed. Therefore, the alternative defined in (2) may be identified as

It results that using the preference-indifference operator to make a choice is equivalent toassigning a value, called utility, to each alternative, and selecting the alternative associated

with the highest utility.

The concept of utility associated with the alternatives plays an important role in the context of

discrete choice models. However, the assumptions of neoclassical economic theory presents

strong limitations for practical applications. Indeed, the complexity of human behavior suggeststhat a choice model should explicitly capture some level of uncertainty. The neoclassical

economic theory fails to do so.

The exact source of uncertainty is an open question. Some models assume that the decision rules

are intrinsically stochastic, and even a complete knowledge of the problem would not overcomethe uncertainty. Others consider that the decision rules are deterministic, and motivate the

uncertainty from the impossibility of the analyst to observe and capture all dimensions of the

problem, due to its high complexity. Anderson et al. (1992) compare this debate with the one

between Einstein and Bohr, about the uncertainty principle in theoretical physics. Bohr arguedfor the intrinsic stochasticity of nature and Einstein claimed that ``Nature does not play dice''.
http://roso.epfl.ch/mbi/papers/discretechoice/node7.html#eqprefalthttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#eqprefalt


7/66

Two families of models can be derived, depending on the assumptions about the source of

uncertainty. Models with stochastic decision rules, like the model proposed by Luce (1959),

described in Section 2.4.2, or the ``elimination by aspects'' approach, proposed by Tverski(1972), assumes a deterministic utility and a probabilistic decision process. Random Utility

Models, introduced in Section 2.4.3 and developed in Section3, are based on the deterministic

decision rules from the neoclassical economic theory, where uncertainty is captured by randomvariables representing utilities.

The Luce model

An important characteristic of models dealing with uncertainty is that, instead of identifying one

alternative as the chosen option, they assign to each alternative aprobability to be chosen.

Luce (1959) proposed the choice axiom to characterize a choice probability law. The choice

axiom can be stated as follow.

Denoting the probability of choosing a in the choice set , and the probability ofchoosing one element of the subset within , the two following properties hold for any choice

set , and , such that .

1. If an alternative is dominated, that is if there exists such that b is always

preferred to a or, equivalently, , then removing a from does not modify

the probability of any other alternative to be chosen, that is

2. If no alternative is dominated, that is if for all , then the

choice probability is independent from the sequence of decisions, that is

The independence described by (7) can be illustrated using a example of transportation mode

choice, where we consider Car, Bike, Bus . We apply two different assumptions tocompute the probability of choosing ``car'' as a transportation mode.

1. The decision-maker may decide first to use a motorized mode (car or bus, in this case).

The probability of choosing ``car'' is then given by
http://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#secintrorumhttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#secintrorumhttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominated


8/66

2. Alternatively, the decision-maker may decide first to use a private transportation mode

(car or bike, in this case). The probability of choosing ``car'' is then given by

Equation (7) of the choice axiom imposes that both assumptions produce the same probability,

that is

The second part of the choice axiom can be interpreted in a different way. Luce (1959) has

shown that (7) is a sufficient and necessary condition for the existence of a function

, such that, for all , we have

Also, function v is unique up to a proportionality factor. If there exists verifying(11), then

where . Similarly to (3), may be interpreted as a utility function. We will elaborate

more on this result in Section 4.

Random Utility Models

Random utility models assume, as neoclassical economic theory, that the decision-maker has a

perfect discrimination capability. In this context, however, the analyst is supposed to have

incomplete information and, therefore, uncertainty must be taken into account. Manski (1997)identifies four different sources of uncertainty: unobserved alternative attributes, unobserved

individual attributes (called ``unobserved taste variations'' by Manski, 1997), measurement errorsand proxy, or instrumental, variables.

The utility is modeled as a random variable in order to reflect this uncertainty. More specifically,

the utility that individual i is associating with alternative a is given by
http://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucemultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#eqneoutilityhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#secmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucemultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#eqneoutilityhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#secmultinomial


9/66

where is the deterministic part of the utility, and is the stochastic part, capturing the

uncertainty. Similarly to the neoclassical economic theory, the alternative with the highest utility

is supposed to be chosen. Therefore, the probability that alternative a is chosen by decision-makeri within choice set is

Random utility models are the most used discrete choice models for transportation applications.

Therefore, the rest of the paper is devoted to them.

Random utility models

The derivation of random utility models is based on a specification of the utility as defined by

(13). Different assumptions about the random term and the deterministic term will produce

specific models. We present here the most usual assumptions that are used in practice. In

Section 3.1, common assumptions about the random part of the utility are discussed. Thedeterministic part is treated in Section 3.2

Assumptions on the random term

We will focus here on assumptions about the mean, the variance and the functional form of the

random term.

Figure 4: A binary model
http://roso.epfl.ch/mbi/papers/discretechoice/node9.html#equtilityhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#secrandomhttp://roso.epfl.ch/mbi/papers/discretechoice/node12.html#secdeterministichttp://roso.epfl.ch/mbi/papers/discretechoice/node12.html#secdeterministichttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#equtilityhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#secrandomhttp://roso.epfl.ch/mbi/papers/discretechoice/node12.html#secdeterministic


10/66

For all practical purposes, the mean of the random term is usually supposed to be zero. It can be

shown that this assumption is not restrictive. We do it here on a simple example. Considering the

example described in Figure 4, we denote the mean of the error term of each alternative by

and , respectively. Then, the error terms can be specified as

and

where and are random variables with zero mean. Therefore,

The terms and , called Alternative Specific Constants (ASC), are capturing the mean ofthe error term. Therefore, it can be assumed without loss of generality, that the error terms have

zero mean if the model specification includes these ASCs.

In practice, it is impossible to estimate the value of all ASCs from observed data. Considering

again the example of Figure4, the probability of choosing alternative 1, say, is not modified if anarbitrary constantKis added to both utilities. Therefore, only the difference between the two

ASCs can be identified. Indeed, from (17), we have

for any . If , we obtain

or, equivalently, defining ,

Defining produces the same result. This property can be generalized easily to models

with more than two alternatives, where only differences between ASCs can be identified.
http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqasc0http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqasc0


11/66

It is common practice to constrain one ASC in the model to zero. From a modeling viewpoint,

the choice of the particular alternative whose ASC is constrained is purely arbitrary. However,

Bierlaire, Lotan and Toint (1997) have shown that the estimation process is influenced by thischoice. They propose a different technique of ASC specification which is optimal from an

estimation perspective.

To derive assumptions about the variance of the random term, we observe that the scale of the

utility may be arbitrarily specified. Indeed, for any , we have

The arbitrary decision about is equivalent to assuming a particular variance v of the distribution

of the error term. Indeed, if

we have also

We will illustrate this relationship with several examples in the remaining of this section.

Once assumptions about the mean and the variance of the error term distribution have beendefined, the focus is now on the actual functional form of this distribution. We will consider here

three different distributions yielding to three different families of models: linear, probit and logitmodels.

The linear model is obtained from the assumption that the density function of the error term is

given by

where , is an arbitrary constant. This density function is used to derive the

probability of choosing one particular alternative. Considering the example presented inFigure4, the probability is given by (23) (see Figure 5).
http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqlinearhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figlinearhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqlinearhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figlinear


12/66

Figure 5: Linear model

The linear model presents some problem for real applications. First, the probability associated

with extreme values ( in the example) is exactly zero. Therefore, if any extreme

event happens in the reality, the model will never capture it. Second, the discontinuity of thederivatives at -L andL causes problems to most of the estimation procedures. We conclude the

presentation of the linear model by emphasizing that the constantL determines the scale of the

distribution. For the binary example, . Using (21), we have that

assuming is equivalent to assuming . A common value for

L is 1/2, that is .

The Normal Probability Unit, or Probit, model is derived from the assumption that the error

terms are normally distributed, that is

where is an arbitrary constant. This density function is used to derive the

probability of choosing one particular alternative. Considering the example presented in
http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2


13/66

Figure4, and assuming that and are normally distributed with zero mean, variances and

respectively, and covariance , the probability is given by (25) (see Figure 6).

where is the variance of

Figure 6: Probit model

The probit model is motivated by the Central Limit Theorem , assuming that the error terms

are the sum of independent unobserved quantities. Unfortunately, the probability function (25)has no closed analytical form, which limits practical use of this model. We refer the reader to

Daganzo (1979) for a comprehensive development of probit models. We conclude this shortintroduction of the probit model by looking at the scale parameter. Considering again the binary

example presented in Figure 4in the probit context, we have . Using (21),

we have that assuming is equivalent to assuming . It iscommon practice to arbitrary define , that is .
http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqprobithttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figprobithttp://roso.epfl.ch/mbi/papers/discretechoice/footnode.html#856http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqprobithttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2http://roso.epfl.ch/mbi/papers/discretechoice/footnode.html#856http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqprobithttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figprobithttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqprobithttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2


14/66

Despite its complexity, the probit model has been applied to many practical problems (see

Whynes, Reedand and Newbold, 1996, Bolduc, Fortin and Fournier, 1996, Yai, Iwakura and

Morichi, 1997 among recent publications). However, the most widely used model in practicalapplications is probably the Logistic Probability Unit, or Logit, model. The error terms are now

assumed to be independent and identically Gumbel distributed. The density function of the

Gumbel distribution is given by (26) (see Figure 7).

where is the location parameter, and is the scale parameter.

Figure 7: Gumbel distribution

The mean of the Gumbel distribution is

where

is the Euler constant. The variance is
http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqdensitygumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figdensitygumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqdensitygumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figdensitygumbel


15/66

The Gumbel distribution is an approximation of the Normal law, as shown in Figure8, where theplain line represents the Normal distribution, and the dotted line the Gumbel distribution.

Figure 8: Comparison between Normal and Gumbel distribution

We derive the probability function for the binary example of Figure 4from the followingproperty of the Gumbel distribution. If is Gumbel distributed with location parameter and

scale parameter , and is Gumbel distributed with location parameter and scale parameter

, then follows a Logistic distribution with location parameter and scale

parameter (the name of the Logit model comes from this property). The density function of the

Logistic distribution is given by

where is the scale parameter. As a consequence, we have,

or, equivalently,
http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#fignormalgumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#fignormalgumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#fignormalgumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinary


16/66

In order to determine the relationship between the scale parameter and the variance of the

distribution, we compute . Using (21),

we have that assuming is equivalent to assuming . It is

common practice to arbitrary define , that is .

In most cases, the arbitrary decision about the scale parameter does not matter and can be safelyignored. But it is important not to completely forget its existence. Indeed, it may sometimes play

an important role. For example, utilities derived from different models can be compared only if

the value of is the same for all of them. It is usually not the case with the scale parameterscommonly used in practice, as shown in Table1. Namely, a utility estimated with a logit model

has to be divided by before being compared with a utility estimated with a probit model.

Table 1: Model comparison

The list of models presented here above is not exhaustive. Other assumptions about thedistribution of the error term will lead to other families of models. For instance, Ben-Akiva and

Lerman (1985) cite the arctan and the truncated exponential models. These models are not often

used in practice and we will not consider them here.

Assumptions on the deterministic term

The utility of each alternative must be a function of the attributes of the alternative itself and of

the decision-maker identified in Sections2.1 and 2.3. We can write the deterministic part of theutility that individual i is associating with alternative a as
http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#tabcomparisonhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#tabcomparisonhttp://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node5.html#secattributeshttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#tabcomparisonhttp://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node5.html#secattributes


17/66

where is a vector containing all attributes, both of individual i and alternative a. The function

defined in (33) is commonly assumed to be linear in the parameters, that is, ifn attributes are

considered,

where are parameters to be estimated. This assumption simplifies the formulation

and the estimation of the model, and is not as restrictive as it may seem. Indeed, nonlinear effects

can still be captured in the attributes definition, as mentioned in Section2.3.

Multinomial logit model

As introduced in the previous section, the logit model is derived from the assumption that the

error terms of the utility functions are independent and identically Gumbel distributed. These

models were first introduced in the context of binary choice models, where the logisticdistribution is used to derive the probability. Their generalization to more than two alternative is

referred to as multinomiallogit models.

If the error terms are independent and identically Gumbel distributed, with location parameter 0

and scale parameter , the probability that a given individual choose alternative i within is

given by

The derivation of this result is attributed to Holman and Marley by Luce and Suppes (1965). We

refer the reader to Ben-Akiva and Lerman (1985) and Anderson et al. (1992) for additional

details.

It is interesting to note that the multinomial logit model can also be derived from the choice

axiom defined by (6) and (7). Indeed, defining and , we have that (11) isequivalent to (35).

An important property of the multinomial logit model is the Independence from Irrelevant

Alternatives (IIA). This property can be stated as follows. The ratio of the probabilities of any

two alternatives is independent from the choice set. That is, for any choice sets and such that

, for any alternative and in , we have
http://roso.epfl.ch/mbi/papers/discretechoice/node12.html#eqdeterministicutilhttp://roso.epfl.ch/mbi/papers/discretechoice/node5.html#secattributeshttp://roso.epfl.ch/mbi/papers/discretechoice/node5.html#secattributeshttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucedominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucemultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node12.html#eqdeterministicutilhttp://roso.epfl.ch/mbi/papers/discretechoice/node5.html#secattributeshttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucedominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucemultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomial


18/66

This result can be proven easily using (35). Ben-Akiva and Lerman (1985) propose an equivalentdefinition: The ratio of the choice probabilities of any two alternatives is entirely unaffected by

the systematic utilities of any other alternatives.

The IIA property of multinomial logit models is a limitation for some practical applications. This

limitation is often illustrated by the red bus/blue bus paradox (see, for example, Ben-Akiva andLerman, 1985) in the modal choice context. We prefer here the path choice example presented in

Figure9.

Figure 9: A path choice example

The probability provided by the multinomial logit model (35) for this example are

which is not consistent with the intuitive result. This situation appears in choice problems with

significantly correlated alternatives, as it is clearly the case in the example. Indeed, alternatives2a and 2b are so similar that their utilities share many unobserved attributes of the path and,

therefore, the assumption of independence of the random part of these utilities is not valid in thiscontext.

The Nested Logit Model, presented in the next section, partly overcomes this limitation of the

multinomial logit model

Nested logit model
http://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#figiiahttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#figiiahttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#figiiahttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomial


19/66

The nested logit model, first derived by Ben-Akiva (1973), is an extension of the multinomial

logit model designed to capture correlations among alternatives. It is based on the partitioning of

the choice set into several nests such that

and

The utility function of each alternative is composed of a term specific to the alternative, and a

term associated with the nest. If , we have

The error terms and are supposed to be independent. As for the multinomial logit model,

error terms are supposed to be independent and identically Gumbel distributed, with scale

parameter . The distribution of is such that the random variable is Gumbel

distributed with scale parameter .

Each nest within the choice set is associated with a pseudo-utility, called composite utility,expected maximum utility, inclusive value oraccessibility in the literature. The composite utility

for nest is defined as

where is the component of the utility which is common to all alternatives in the nest .

The probability model is then given by

where


20/66

and

The parameters and reflect the correlation among alternatives in the nest . Indeed, if

, we have

Clearly, we have

Ben-Akiva and Lermand (1985) derive condition (46) directly from utility theory. Note also that

if , we have .

The parameters and are closely related in the model. Actually, only their ratio is meaningful.It is not possible to identify them separately. A common practice is to arbitrarily constrain one of

them to a value (usually 1). The impacts of this arbitrary decision on the model are briefly

discussed in Section 5.1. We illustrate here the Nested Logit Model with the path choice example

described in Figure 9. First, the choice set is divided into and

. The deterministic components of the utilities are , ,

and . The composite utilities of each nest are

and

The probability of choosing each nest is then
http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01http://roso.epfl.ch/mbi/papers/discretechoice/node15.html#secnormnlmhttp://roso.epfl.ch/mbi/papers/discretechoice/node15.html#secnormnlmhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#figiiahttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01http://roso.epfl.ch/mbi/papers/discretechoice/node15.html#secnormnlmhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#figiia


21/66

and

where the value of has been assumed to be 1, without loss of generality. The probability ofeach alternative is then computed. We obtain

and

The values of , and as a function of are plotted on Figure 10. From

(46), we have that because has been arbitrarily defined as 1. We observe that,

when , the nested logit model produces the same results as the multinomial logit model

(37), and all probabilities are . On the other hand, when goes to infinity, and goes to 0,

the probability of each nest is closer and closer to 1/2. At the limit, the model is becoming abinary choice model, where the small detours a and b are ignored in the choice process.
http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#figprobnestedhttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01http://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eq13http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#figprobnestedhttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01http://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eq13


22/66

Figure 10: Probability of each alternative as a function of .

Normalization of nested logit models

In order to compute the probabilities in the previous example, we have arbitrarily decided to

constraint to 1. Alternatively, we could have decided to constraint to 1. It is easy to show

that, in this case, we have

and


23/66

which is equivalent to (51) and (52), replacing by .

A model where the scale parameter is arbitrarily constrained to 1 is said to be ``normalized

from the top''. A model where one of the parameters is constrained to 1 is said to be

``normalized from the bottom''. The latter may produce a simpler formulation of the model. Weillustrate it using the example of Figure 11.

Figure 11: A mode choice example

We have

and

If we impose , we can define , , and to obtain

the following expressions.

and
http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqexprob1http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqexprob2http://roso.epfl.ch/mbi/papers/discretechoice/node15.html#figexnormalizehttp://roso.epfl.ch/mbi/papers/discretechoice/node15.html#figexnormalizehttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqexprob1http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqexprob2http://roso.epfl.ch/mbi/papers/discretechoice/node15.html#figexnormalize


24/66

This formulation, proposed by Daly (1987), simplifies the estimation process. For this reason, it

has been adopted in estimation packages like ALOGIT (Daly, 1987) or HieLoW (Bierlaire, 1995,

Bierlaire and Vandevyvere, 1995).

We emphasize here that this formulation should be used with caution when the same parameters

are present in more than one nest. In this case, specific techniques, inspired from artificial treesproposed by Bradley and Daly (1991) must be used to obtain a correct specification of the

model. The description of these techniques is out of the scope of this paper.

A direct extension of the nested logit model consists in partionning some or all nests into sub-

nests, which can, in turn, be divided into sub-nests. Because of the complexity of these models,

their structure is usually represented as a tree, as suggested by Daly (1987). Clearly, the numberof potential structures, reflecting the correlation among alternatives, can be very large. No

technique has been proposed thus far to identify the most appropriate correlation structure

directly from the data.

We conclude our introduction of nested logit models by mentioning their limitations. Thesemodels are designed to capture choice problems where alternatives within each nestare

correlated. No correlation across nests can be captured by the Nested Logit Model. When

alternatives cannot be partitioned into well separated nests to reflect their correlation, Nested

Logit Models are not applicable. This is the case for most route choice problems. Several modelswithin the ``logit family'' have been designed to capture specific correlation structures. For

example, Cascetta (1996) captures overlapping paths in a route choice context using

commonality factors, Koppelman and Wen (1997) capture correlation between pair ofalternatives, and Vovsha (1997) proposes a cross-nested model allowing alternatives to belong to

more than one nest. The two last models are derived from the Generalized Extreme Value model,

presented in the next section.

Generalized extreme value model

The Generalized Extreme Value (GEV) model has been introduced by McFadden (1978) in the

context of residential location. This general model actually consists in a large family of modelsthat are consistent with random utility theory. The probability of choosing alternative i within

is given by

where is a differentiable function with the following properties.


25/66

1. for all ,

2. G is homogeneous of degree , that is , for all ,

3. for all i such that , and

4. the kth partial derivative with respect to kdistinct is non-negative ifkis odd, and non-

positive ifkis even, that is, such that if and

if and , we have

As an example, we consider

which has the required properties, as it can be easily verified. Then,

which is the multinomial logit model. Similarly, the nested logit model can be derived with

It can be shown that property 4holds if , which is consistent with condition (46).

The Generalized Extreme Value model provides a nice theoretical framework for thedevelopment of new discrete choice models, like Koppelman and Wen (1997) and Vovsha

(1997) .

Conclusion

We have covered in this paper the main theoretical aspects of discrete choice models in general,and random utility models in particular. A good awareness of underlying assumptions is

necessary for an efficient use of these models for practical applications. In particular, we have

focused on the location parameters and the scale parameters in multinomial and nested logit
http://roso.epfl.ch/mbi/papers/discretechoice/footnode.html#2577http://roso.epfl.ch/mbi/papers/discretechoice/node16.html#prop4http://roso.epfl.ch/mbi/papers/discretechoice/node16.html#prop4http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01http://roso.epfl.ch/mbi/papers/discretechoice/footnode.html#2577http://roso.epfl.ch/mbi/papers/discretechoice/node16.html#prop4http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01


26/66

models. Despite its importance, the role of these parameters tend to be underestimated by

practitioners. This may lead to incorrect specifications of the models, or incorrect interpretation

of the results.

AcknowledgmentsThis paper is based on a lecture given at the NATO Advanced Studies Institute Operations

Research and Decision Aid Methodologies in Traffic and Transportation Management,Balatonfured, Hungary, March 1997. Comments from the students and other lecturers of the ASI

have been very useful to write this paper. Moreover, I am very grateful to Moshe Ben-Akiva and

John Bowman for their valuable discussions and comments.

References

1

Simon P. Anderson, And de Palma, and Jacques-Franois Thisse.Discrete ChoiceTheory of Product Differentiation. MIT Press, Cambridge, Ma, 1992.

2

M. E. Ben-Akiva. Structure of passenger travel demand models. PhD thesis, Department

of Civil Engineering, MIT, Cambridge, Ma, 1973.

3M. E. Ben-Akiva and S. R. Lerman.Discrete Choice Analysis: Theory and Application to

Travel Demand. MIT Press, Cambridge, Ma., 1985.

4

Moshe Ben-Akiva and B. Franois. homogeneous generalized extreme value model.

Working paper, Department of Civil Engineering, MIT, Cambridge, Ma, 1983.

5M. Bierlaire. A robust algorithm for the simultaneous estimation of hierarchical logit

models. GRT Report 95/3, Department of Mathematics, FUNDP, 1995.

6

M. Bierlaire, T. Lotan, and Ph. L. Toint. On the overspecification of multinomial and

nested logit models due to alternative specific constants. Transportation Science, 1997.

(forthcoming).

7

M. Bierlaire and Y. Vandevyvere.HieLoW: the interactive user's guide. Transportation

Research Group - FUNDP, Namur, 1995.

8

Denis Bolduc, Bernard Fortin, and Marc-Andre Fournier. The effect of incentive policieson the practice location of doctors: A multinomial probit analysis.Journal of laboreconomics, 14(4):703, 1996.

9

M. A. Bradley and A.J. Daly. Estimation of logit choice models using mixed stated

preferences and revealed preferences information. InMethods for understanding travelbehaviour in the 1990's, pages 116-133, Qubec, mai 1991. International Association for

Travel Behaviour. 6th international conference on travel behaviour.


27/66

10

Ennio Cascetta. A modified logit route choice model overcoming path overlapping

problems. Specification and some calibration results for interurban networks. InProceedings of the 13th International Symposium on the Theory of Road Traffic Flow

(Lyon, France), 1996.

11 C. F. Daganzo.Multinomial Probit: The theory and its application to demand

forecasting. Academic Press, New York, 1979.

12A. Daly. Estimating ``tree'' logit models. Transportation Research B, 21(4):251-268,

1987.

13

D. A. Hensher and L. W. Johnson.Applied discrete choice modelling. Croom Helm,London, 1981.

14

J. L. Horowitz, F. S. Koppelman, and S. R. Lerman.A self-instructing course in

disaggregate mode choice modeling. Technology Sharing Program, US Department ofTransportation, Washington, D.C. 20590, 1986.

15F. S. Koppelman and Chieh-Hua Wen. The paired combinatorial logit model: properties,

estimation and application. Transportation Research Board, 76th Annual Meeting,

Washington DC, January 1997. Paper #970953.

16

R. Luce.Individual choice behavior: a theoretical analysis. J. Wiley and Sons, New

York, 1959.

17R. D. Luce and P. Suppes. Preference, utility and subjective probabiblity. In R. D. Luce,

R. R. Bush, and E. Galanter, editors,Handbook of Mathematical Psychology, New York,

1965. J. Wiley and Sons.

18

C. Manski. The structure of random utility models. Theory and Decision, 8:229-254,

1977.

19

Andrey Andreyevich Markov. Calculation of probabilities. Tip. Imperatorskoi Akademii

Nauk, Sint Petersburg, 1900. (in Russian).

20D. McFadden. Modelling the choice of residential location. In A. Karlquist et al., editor,

Spatial interaction theory and residential location, pages 75-96, Amsterdam, 1978.

North-Holland.

21

J. Swait.Probabilistic choice set formation in transportation demand models. PhD thesis,

Department of Civil and Environmental Engineering, Massachussetts Institute ofTechnology, Cambridge, Ma, 1984.

22

A. Tversky. Elimination by aspects: a theory of choice. Psychological Review, 79:281-

299, 1972.


28/66

23

Peter Vovsha. Cross-nested logit model: an application to mode choice in the Tel-Aviv

metropolitan area. Transportation Research Board, 76th Annual Meeting, WashingtonDC, January 1997. Paper #970387.

24

D.K. Whynes, G. Reedand, and P. Newbold. General practitioners' choice of referraldestination: A probit analysis.Managerial and Decision Economics, 17(6):587, 1996.

25

T. Yai, S. Iwakura, and S. Morichi. Multinomial probit with structured covariance forroute choice behavior. Transportation Research B, 31(3):195-208, June 1997.

Chapter

5

Discrete Dependent Variable Models

CHAPTER 5; SECTION A: LOGIT, NESTED LOGIT, & PROBIT

Purpose of Logit, Nested Logit, and Probit Models:

Logit, Nested Logit, and Probit models are used to model a relationship between a dependent

variableY and one or more independent variables X. The dependent variable, Y, is a discrete

variable that represents a choice, or category, from a set of mutually exclusive choices or

categories. For instance, an analyst may wish to model the choice of automobile purchase (from aset of vehicle classes), the choice of travel mode (walk, transit, rail, auto, etc.), the manner of an

automobile collision (rollover, rear-end, sideswipe, etc.), or residential location choice (high-density,

suburban, exurban, etc.). The independent variables are presumed to affect the choice or category

or the choice maker, and represent a priori beliefs about the causal or associative elements

important in the choice or classification process. In the case ofordinal scale variables, an ordered

logit or probitmodel can be applied to take advantage of the additional information provided by the

ordinal over the nominal scale (not discussed here).
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#146http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#92http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#92http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#92http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#192http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#146http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#92http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#92http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#192http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174


29/66

1. Examples: An analyst wants to model:2. 1. The effect of household member characteristics, transportationnetwork

characteristics, and alternativemodecharacteristics on choice of transportation mode;bus, walk, auto, carpool, single occupant auto, rail, or bicycle.

3. 2. The effect of consumer characteristics on choice of vehicle purchase: sport utilityvehicle, van, auto, light pickup truck, or motorcycle.

4. 3. The effect of traveler characteristics and employment characteristics on airlinecarrier choice; Delta, United Airlines, Southwest, etc.5. 4. The effect of involved vehicle types, pre-crash conditions, and environmental

factors on vehicle crash outcome: property damage only, mild injury, severe injury,fatality.

Basic Assumptions/Requirements of Logit, Nested Logit, and Probit Models:

1) 1) The observations on dependentvariable Y are assumed to have been randomly sampled

from thepopulationof interest (even for stratified samples or choice-based samples).

2) 2) Y is caused by or associated with the Xs, and the Xs are determined by influences

(variables) outside of the model.

3) 3) There is uncertainty in the relation between Y and the Xs, as reflected by a scattering of

observations around the functional relationship.

4) 4) Thedistribution oferror terms must be assessed to determine if a selected model is

appropriate.

Inputs for Logit, Nested Logit, and Probit Models:

Discrete variable Y is the observed choice or classification, such as brand selection, transportation

modeselection, etc. For grouped data, where choices are observed for homogenous experimental

units or observed multiple times per experimental unit, the dependent variable is proportion of

choices observed.

One or more continuous and/or discrete variables X, which describe the attributes of the choice

maker or event and/or various attributes of the choices thought to be causal or influential in the

decision or classification process.

Outputs of Logit, Nested Logit, and Probit Models:

Functional form of relation between Y and Xs.

Strength ofassociation between Y and Xs (individual Xs and collective set of Xs).

Proportion of choice or classification uncertainty explained by hypothesized relation.

Confidence in predictions of future/other observations on Y given X.
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#17http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#17


30/66

Logit, Nested Logit, and Probit Methodology:


31/66

Examples of Logit, Nested Logit, and Probit:

PavementsKoehne, Jodi, Fred Mannering, and Mark Hallenbeck (1996). Analysis of Trucker and MotoristOpinions Toward Truck-lane Restrictions. Transportation Research Record #1560 pp. 73-82.

National Academy of Sciences.

TrafficMannering, Fred, Jodi Koehne and Soon-Gwan Kim. (1995). Statistical Assesssment of Public

Opinion Toward Conversion of General-Purpose Lanes to High-Occupancy Vehicle Lanes.

TransportationResearch Record #1485 pp. 168-176. National Academy of Sciences.

PlanningKoppelman, Frank S., and Chieh-Hua Wen (1998). Nested Logit Models: Which Are You Using?

TransportationResearch Record #1645 pp. 1-9. National Academy of Sciences.

Yai, Tetsuo, and Tetsuo Shimizu (1998). Multinomial Probit with Structured Covariance for ChoiceSituations with Similar Alternatives. Transportation Research Record #1645 pp. 69-75. National

Academy of Sciences.

McFadden, Daniel. Modeling the Choice of Residential Location. (1978). TransportationResearch

Record #673 pp. 72-77. National Academy of Sciences.

Horowitz, Joel L. (1984) Testing Disaggregate Travel Demand Models by Comparing Predicted

and Observed Market Shares. Transportation Research Record #976 pp. 1-7. National Academy

of Sciences.

Interpretation of Logit, Nested Logit, and Probit:

How is a choice modelequation interpreted?How do continuous andindicator variables differ in the choice model?How are beta coefficients interpreted?How is the Likelihood Ratio Test interpreted?How are t-statistics interpreted?How are phi and adjusted phi interpreted?How are confidence intervals interpreted?How are degrees of freedominterpreted?How are elasticities computed and interpreted?When is the independence of irrelevant alternatives (IIA) assumption violated?

Troubleshooting: Logit, Nested Logit, and Probit:

Shouldinteraction terms be included in the model?How many variables should be included in the model?What methods can be used to specify the relation between choice and the Xs?What methods are available for fixing heteroscedastic errors?
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#191http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#80http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#80http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#147http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#40http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#88http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#88http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#152http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#191http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#80http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#147http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#40http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#88http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#152


32/66

What methods are used for fixing serially correlated errors?What can be done to deal with multi-collinearity?What is endogeneity and how can it be fixed?How does one know if the errors are Gumbel distributed?

Logit, Nested Logit, and Probit References:

Ben Akiva, Moshe and Steven R. Lerman. Discrete Choice Analysis: Theory and

Application to Predict Travel Demand. The MIT Press, Cambridge MA. 1985.

Greene, William H. Econometric Analysis. MacMillan Publishing Company, New York,

New York. 1990.

Ortuzar, J. de D. and L. G. Willumsen. Modelling Transport. Second Edition. John Wiley

and Sons, New York, New York. 1994.

Train, Kenneth. Qualitative Choice Analysis: Theory, Econometrics, and an Application to

Automobile Demand. The MIT Press, Cambridge MA. 1993.

Logit, Nested Logit, and Probit Methodology:

Postulate mathematical models from theory and past

research.

Discrete choice models (logit, nested logit, and probit) are used to develop models of behavioral

choice or of event classification. It is accepted a priorithat the analyst doesnt know the complexity

of the underlying relationships, and that any model of reality will be wrong to some degree. Choice

models estimated will reflect the a prioriassumptions of the modeler as to what factors affect the

decision process. Common applications of discrete choice models include choice of transportation

mode, choice of travel destination choice, and choice of vehicle purchase decisions. There are

many potential applications of discrete choice models, including choice of residential location,

choice of business location, andtransportationproject contractor selection.

In order to postulate meaningful choice models, the modeler should review past literature regarding

the choice context and identify factors with potential to affect the decision making process. These

factors should drive the data-collection processusually a survey instrument given to experimental

units, to collect the information relevant in the decision making process. There is much written

about survey design and data collection, and these sources should be consulted for detaileddiscussions of this complex and critical aspect of choice modeling

Transportation Planning Example: An analyst is interested in modeling the mode choicedecision made by individuals in a region. The analyst reviews the literature and developsthe following list of potential factors influencing themodechoice decision for mosttravelers in the region.1. Trip maker characteristics (within the household context):Vehicle availability, possession of drivers license, household structure (stage of life-cycle),role in household, household income (value of time)
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173


33/66

1. 2. Characteristics of the journey or activity:Journey or activity purpose; work, grocery shopping, school, etc., time of day, accessibilityand proximity of activity destination2. 3. Characteristics of transport facility:Qualitative Factors; comfort and convenience,

reliability and regularity, protection, securityQuantitative Factors; in-vehicle travel times, waiting and walking times, out-of-pocket

monetary costs, availability and cost of parking, proximity/accessibility of transportmode

Estimate choice models

Qualitative choice analysis methods are used to describe and/or predict discrete choices of

decision-makers or to classify a discrete outcome according to a host of regressors. The need to

modelchoice and/or classification arises in transportation, energy, marketing, telecommunications,

and housing, to name but a few fields. There are, as always, a set of assumptions or requirements

about thedatathat need to be satisfied. The response variable (choice or classification) must meet

the following three criteria.

1. 1. The set of choices or classifications must be finite.

2. 2. The set of choices or classifications must be mutually exclusive; that is, a

particular outcome can only be represented by one choice or classification.

3. 3. The set of choices or classifications must be collectively exhaustive, that is

all choices or classifications must be represented by the choice set or

classification.

Even when the 2nd and 3rd criteria are not met, the analyst can usually re-define the set of

alternatives or classifications so that the criteria are satisfied.

Planning Example: An analyst wishing tomodelmode choice for commute decisionsdefines the choice set as AUTO, BUS, RAIL, WALK, and BIKE. The modeler observed a

person in the database drove her personal vehicle to the transit station and then took abus, violating the second criteria. To remedy the modeling problem and similar problemsthat might arise, the analyst introduces some new choices (or classifications) into themodeling process: AUTO-BUS, AUTO-RAIL, WALK-BUS, WALK-RAIL, BIKE-BUS, BIKE-RAIL. By introducing these new categories the analyst has made the discrete choice datacomply with the stated modeling requirements.

Deriving Choice Models from Random Utility Theory

Choice models are developed from economic theories of random utility, whereas classification

models (classifying crash type, for example) are developed by minimizing classification errors with

respect to the Xs and classification levels Y. Because most of the literature in transportationis

focused on choice models and because mathematically choice models and classification models

are equivalent, the discussion here is based on choice models. Several assumptions are made

when deriving discrete choice models from random utility theory:


34/66

1. 1. An individual is faced with a finite set of choices from which only one can be chosen.

2. 2. Individuals belong to a homogenous population, act rationally, and possess perfect

information and always select the option that maximizes their net personal utility.

3. 3. If C is defined as the universal choice set of discrete alternatives, and J the number of

elements in C, then each member of the population has some subset of C as his or her choiceset. Most decision-makers, however, have some subset Cn, that is considerably smaller than

C. It should be recognized that defining a subset Cn, that is the feasible choice set for an

individual is not a trivial task; however, it is assumed that it can be determined.

4. 4. Decision-makers are endowed with a subset of attributes xn X, all measured attributes

relevant in the decision making process.

Planning Example: In identifying the choice set of travelmode the analyst identifies theuniversal choice set C to consist of the following:1. driving alone2. sharing a ride

3. taxi 4. motorcycle5. bicycle6. walking7. transit bus8. light rail transit

The analyst identifies a family whose choice set is fairly restricted because the do not owna vehicle, and so their choice set Cn is given by:1. 1. sharing a ride2. 2. taxi3. 3. bicycle4. 4. walking

5. 5. transit bus6. 6. light rail transit

The modeler, who is an OBSERVER of the system, does not possess complete information about

all elements considered important in the decision making process by all individuals making a

choice, so Utility is broken down into 2 components, V and :

Uin = (Vin + in);

where;

Uin is the overall utility of choice i for individual n,Vin is the systematic or measurably utility which is a function of xn and i

for individual n and choice i

in includes idiosyncrasies and taste variations, combined with

measurement or observations errors made by modeler, and is the randomutility component.

The errorterm allows for a couple of important cases: 1) two persons with the same measured

attributes and facing the same choice set make different decisions; 2) some individuals do not

select the best alternative (from the modelers point of view it demonstrated irrational behavior).
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103


35/66

The decision maker n chooses the alternative from which he derives the greatest utility. In the

binomial or two-alternative case, the decision-maker chooses alternative 1 if and only if:

U1n U2n

or when:

V1n + 1n V2n + 2n.

In probabilistic terms, the probability that alternative 1 is chosen is given by:

Pr (1) = Pr (U1 U2)

= Pr (V1 + 1 V2 + 2)

= Pr (2 - 1 V1 - V2).

Note that this equation looks like a cumulative distribution functionfor a probability density. That is,

the probability of choosing alternative 1 (in the binomial case) is equal to the probability that the

difference in random utility is less than or equal to the difference in deterministic utility.

If = 2 - 1, which is the difference in unobserved utilities between alternatives 2 and 1 for travelers

1 through N (subscript not shown), then the probability distribution or density of , (), can be

specified to form specific classes of models.
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#95http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#95http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#95http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94


36/66

A couple of important observations about the probability density given by F (V1 - V2) can be made.

1. 1. The error is small when there are large differences in systematic utility between

alternatives one and two.

2. 2. Large errors are likely when differences in utility are small, thus decision makers

are more likely to choose an alternative on the wrong side of the indifference line(V1 - V2 = 0).

Alternative 1 is chosen when V1 - V2 > 0 (or when > 0), and alternative 2 is chosen

whenV1 - V2 < 0.

Thus, for binomial models of discrete choice:

.
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103


37/66

The cumulative distributionfunction, or CDF, typically looks like:

V1 -V2

This structure for the error term is a general result for binomial choice models. By making

assumptions about the probability density of the residuals, the modeler can choose between

several different binomial choice model formulations. Two types of binomial choice models are

most common and found in practice: the logit and the probit models. The logit model assumes a

logistic distribution of errors, and the probit model assumes a normal distributed errors. These

models, however, are not practical for cases when there are more than two cases, and the probit

modelis not easy to estimate (mathematically) for more than 4 to 5 choices.

Mathematical Estimation of Choice Models

Recall that choice models involve a response Y with various levels (a set of choices or

classification), and a set of Xs that reflect important attributes of the choice decision or

classification. Usually the choice or classification of Y is a modeled as a linear function or

combination of the Xs. Maximum likelihood methods are employed to solve for the betas in choice

models.

Consider the likelihood of a sampleof N independent observations with probabilities p1, p2,,pn.

The likelihood of the sample is simply the product of the individual likelihoods. The product is a

maximum when the most likely set of ps is used.

i.e. Likelihood L* = p1p2p3pn =

For the binary choice model:

L* = (1, , K) =

where, Prn (i) is a function of the betas, and i and j are alternatives 1 and 2 respectively. It is

generally mathematically simpler to analyze the logarithm ofL*, rather than the likelihood function

itself. Using the fact that ln (z1z2) = ln (z1) + ln (z2), ln (z)x = x ln (z), Pr (j)=1-Pr (i), and yjn = 1 yin,

the equation becomes:


38/66

The maximum ofL is solved by differentiating the function with respect to each of the betas and

setting the partial derivatives equal to zero, or the values of1, , K that provides the maximum

ofL . In many cases the log likelihood function is globally concave, so that if a solution to the first

order conditions exist, they are unique. This does not always have to be the case, however.Under general conditions the likelihood estimators can be shown to be consistent, asymptotically

efficient, and asymptotically normal.

In more complex and realistic models, the likelihood function is evaluated as before, but instead of

estimating one parameter, there are many parameters associated with Xs that must be estimated,

and there are as many equations as there are Xs to solve. In practice the probabilities that

maximize the likelihood functionare likely to be different across individuals (unlike the simplified

example above where all individuals had the same probability).

Because the likelihood function is between 0 and 1, the log likelihood function is negative. The

maximum to the log-likelihood function, therefore, is the smallest negative value of the log

likelihood function given thedataand specified probability functions.
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84


39/66

Planning Example. Suppose 10 individuals making travel choices between auto (A) andtransit (T) were observed. All travelers are assumed to possess identical attributes (a really

poor assumption), and so the probabilities are not functions of betas but simply a functionof p, the probability of choosing Auto. The analyst also does not have any alternativespecific attributesa very naivemodelthat doesnt reflect reality. The likelihood functionwill be:L* = px(1-p)n-x= p7(1-p)3

where; p = probability that a traveler chooses A,1-p = probability that a traveler chooses T,n = number of travelers = 10

x = number of travelers choosing A.

Recall that the analyst is trying to estimate p, the probability that a traveler chooses A. If 7travelers were observed taking A and 3 taking T, then it can be shown that the maximumlikelihood estimate of p is 0.7, or in other words, the value ofL* is maximized when p=0.7and 1-p=0.3. All other combinations of p and 1-p result in lower values ofL*. To see this,the analyst plots numerous values ofL* for all integer values of P (T) from 0.0 to 10.0. Thefollowing plot is obtained:
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174


40/66

Similarly (and in practice), one could use the loglikelihood function to derive the maximumlikelihood estimates, where L = log (L*) = Log [p7(1-p)3] = Log p7+ Log (1-p)3 = 7 Log p + 3Log (1-p).

LogLikehood Function
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164


41/66

Note that in this simple modelp is the only parameter being estimated, so maximizing thelikelihood function L* or the log (L*) only requires one first order condition, the derivative of

p with respect to log (L*).

The Multinomial Logit Model

The multinomial logit (MNL) model is the most commonly applied model to explain and forecast

discrete choices due to its ease of estimation and foundation in utility theory. The MNL model is a

general extension of the binomial choicemodel to more than two alternatives. The universal choice

set is C, which contains j elements, and a subset of C for each individual C n,defines their restricted

choice sets. It should be noted that it is not a trivial task to define restricted choice sets for

individuals. In most cases Jn for decision maker n is less than or equal to J, the total number of

alternatives in the universal choice set, however it is often assumed that all decision makers facethe same set of universal alternatives.

Without showing the derivation, which can be found in the references for this chapter, the MNL

modelis expressed as:

Where;

1. 1. Utility for traveler n andmode i = Uin = Vin + in

2. 2. Pn (i) is the probability that traveler n chooses modei

3. 3. Numerator is utility formodei for travelern, denominator is the sum of

utilities for all alternative modes Cn for travelern

4. 4. The disturbances in are independently distributed

5. 5. The disturbances inare identically distributed

6. 6. The disturbances are Gumbel distributed with locationparameterand a

scaleparameter> 0.

The MNL model expresses the probability that a specific alternative is chosen is the exponent of

the utility of the chosen alternative divided by the exponent of the sum of all alternatives (chosen

and not chosen). The predicted probabilities are bounded by zero and one. There are several

assumptions embedded in the estimation of MNL models.
http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#196http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#196http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#196http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#196http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#196http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174


42/66

Linear in parameters restriction:

The linear in parameters restriction is made for convenience of estimation, which enables simple

and efficient estimation of parameters. When the functional form of the systematic component of

the utility function is linear in parameters, the MNL modelcan be written as:

where xin and xjn are vectors describing the attributes of alternatives i and j as well as attributes of

traveler n.

Independence from Irrelevant Alternatives Property (IIA)

Succinctly stated, the IIA property states that for a specific individual the ratio of the choiceprobabilities of any two alternatives is entirely unaffected by the systematic utilities of any other

alternatives. This property arises from the assumption in the deriv

dca disaggregate 1

Documents