consistency of nested logit models with utility...

53
CONSISTENCY OF NESTED LOGIT MODELS WITH UTILITY MAXIMIZATION J. Nicolás Ibáñez Institute for Transport Studies, University of Leeds, UK School of Engineering, Universidad de Sevilla, Spain Abstract This paper provides the minimum set of necessary and sufficient conditions for testing the global consistency of preference models with random utility maximization (RUM). We present the relation between extra conditions considered in the literature, not included in this minimum set, and the requirements for the integrability of demand systems. Besides, the conditions to prove consistency with RUM when only some values are allowed for the non-random part of the utilities, i.e., local consistency with RUM, are reviewed, allowing us to raise some concerns regarding the two main studies on this issue existent in the literature. Additionally, we apply this theoretical apparatus to study the consistency with RUM of two-level and three-level nested logit models, showing how irrespective of the specific nesting structures of these models, the dissimilarity parameters measuring correlations between alternatives cannot take greater- than-one values. Finally, we develop further a procedure present in the literature to build the probabilities for two-level and three-level nested logit models and based on the sequential implementation of the random utility maximization governing two dependent choice processes. In so doing we enable an extension of the unit interval for the mentioned dissimilarity parameters and proof that any non- negative value for the dissimilarity parameters reproduce a model rooted in this sequential RUM. Keywords: Nested logit, discrete choice, random utility maximization 1 INTRODUCTION Random utility maximisation (RUM) was conceived by Marschak (1960) and Block and Marschak (1960) as a probabilistic representation of Neo-Classical theory of individual choice. RUM preserves several fundamental tenets that characterise the Neo-Classical theory, thus; both are couched at the individual level, both are based fundamentally on the notion that the individual acts so as to maximise his or her ‘utility’, and the proposition that utility is ordinal rather than cardinal is sufficient to support each. In contrast to Neo-Classical theory, however, RUM is specific to finite choice sets, and appeals to a notion of probabilistic choice. Hence define a finite set of discrete alternatives ( ) 1,..., = N , and a feasible subset M N , s ( ) 1,..., , M I I = s . The probabilistic content of RUM arises from the propensity for an individual, when faced with the repetition of the same choice task, to

Upload: others

Post on 27-Jul-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

CONSISTENCY OF NESTED LOGIT MODELS WITH UTILITY MAXIMIZATION J. Nicolás Ibáñez

Institute for Transport Studies, University of Leeds, UK School of Engineering, Universidad de Sevilla, Spain

Abstract This paper provides the minimum set of necessary and sufficient conditions for testing the global consistency of preference models with random utility maximization (RUM). We present the relation between extra conditions considered in the literature, not included in this minimum set, and the requirements for the integrability of demand systems. Besides, the conditions to prove consistency with RUM when only some values are allowed for the non-random part of the utilities, i.e., local consistency with RUM, are reviewed, allowing us to raise some concerns regarding the two main studies on this issue existent in the literature. Additionally, we apply this theoretical apparatus to study the consistency with RUM of two-level and three-level nested logit models, showing how irrespective of the specific nesting structures of these models, the dissimilarity parameters measuring correlations between alternatives cannot take greater-than-one values. Finally, we develop further a procedure present in the literature to build the probabilities for two-level and three-level nested logit models and based on the sequential implementation of the random utility maximization governing two dependent choice processes. In so doing we enable an extension of the unit interval for the mentioned dissimilarity parameters and proof that any non-negative value for the dissimilarity parameters reproduce a model rooted in this sequential RUM. Keywords: Nested logit, discrete choice, random utility maximization

1 INTRODUCTION Random utility maximisation (RUM) was conceived by Marschak (1960) and Block and Marschak (1960) as a probabilistic representation of Neo-Classical theory of individual choice. RUM preserves several fundamental tenets that characterise the Neo-Classical theory, thus; both are couched at the individual level, both are based fundamentally on the notion that the individual acts so as to maximise his or her ‘utility’, and the proposition that utility is ordinal rather than cardinal is sufficient to support each. In contrast to Neo-Classical theory, however, RUM is specific to finite choice sets, and appeals to a notion of probabilistic choice. Hence define a finite set of discrete alternatives ( )1,...,=N , and a feasible subset M N , s ⊆

( )1,..., ,M I I= s≤ . The probabilistic content of RUM arises from the propensity for an individual, when faced with the repetition of the same choice task, to

Page 2: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

exhibit variability in his or her preference ordering. Formally, we can represent this situation with a random vector 1( ,..., )IU U=U , unique up to an increasing monotone transformation, such that for any j M N∈ ⊆ , we can define the following probability with which alternative i will be chosen over the other alternatives in M :

), 1, ,jU j I= K 1, ,= K

11jP =

( )1 1, , ,iP U U= K ,I i α∀ ∀

), , , ,i IU i 0λ∀ ∀K K ≥

(Pri iP U= ≥ , i I

The following conditions are considered in Marschak (1960) and Block and Marschak (1960) to apply to this definition:

B&M1: 0 1,iP i≤ ≤ ∀ and I

j=∑

B&M2: ( ), ,i IP U Uα α+ +K ∈ ¡

B&M3: ( ) (1 1, ,i IP U U P Uλ λ =

Since condition B&M1 is inherent in the definition of probability, it is reasonably trivial to the definition of RUM, and further elaboration would seem unnecessary. B&M2 states that a common constant may be added to the utility of all alternatives without changing probability. B&M3, similarly, states that probability is unaffected by the multiplication of each utility by a common factor. Taking B&M2 and B&M3 together, probability is robust to increasing linear transformations of utility; U cannot, therefore, be uniquely defined. The latter property is entirely consistent with the notion of ordinal utility. In subsequent work, Marschak et al. (1963) introduced much of the apparatus by which RUM could be implemented. This arises from the authors’ investigation of the relationship between binary RUM, as defined above, and the so-called Fechner model. The Fechner model originates from psychophysics, a domain which naturally appeals to binary models (e.g. ‘is stimulus a stronger or weaker than stimulus b?’). Marschak et al., more specifically, presented a proof that Fechner models with particular distributional assumptions are binary RUM. Intrinsic to the proof is the proposition that each U may be dissected into two components, one deterministic or systematic and one random, which will be the approach taken for the analysis of RUM in this paper, though this implies, somewhat inevitably, that utility mutates from an ordinal metric to a cardinal one (Batley, 2005).

∈U

2 DEFINITION, THEOREM AND COROLLARY REGARDING RANDOM UTILITY MAXIMISATION

Having introduced the basics of RUM, we proceed now by presenting the following definitions applicable to this random utility maximising behavioural paradigm. Definition 1: A set of I probabilities, each one of them assigned to each one of the I alternatives considered by an individual when revealing his or her preference

Page 3: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

ordering amongst them, is considered to reproduce a random utility maximisation paradigm if we can write:

( )Pr , 1, , , 1, ,i i jP U U j I i= ≥ = =K K I

i

where is the probability with which the model stipulates that the alternative i will be chosen, and U is the (random) utility of alternative i.

iP

i

If we adopt the cardinal approach to utilities introduced before considering the utilities of each alternative to be composed of a random and a deterministic part, i iU v ε= − + , we can define the consistency with random utility maximisation of a probability system by means of analyzing the I different joint distributions of I 1− random terms differences, each one of them associated to take one of the I alternatives as the reference respect to which differences are taken. Then, a set of I probabilities is considered to reproduce a random utility maximisation paradigm if we can write:

( )Pr , 1, , , 1, ,i j i j iP v v j I iε ε= − ≤ − = =K K I

with each of the I following ( I 1− ) random vectors, ( )1 , , , 1, ,i

i I i iε ε ε ε= − − =η K IK , jointly distributed as an absolutely continuous, proper, non-defective and, at least, translationally invariant vectorial random variable1.

By translationally invariance we mean that the cumulative distribution of η does not change when translating the vector of systematic part of utilities

i

( )= K1, , Iv vv to ( )+ = + +K1 , ,Ic v c v cv ι I

i

for any real constant (see, for instance, Daly, 2004).

c

Next we present two definitions employed throughout the paper which make invariably use of the cardinality of utilities, that is, the assumption that the utilities are divided in a systematic and a random part, U vi i ε= − + for any

. 1, ,i I= K

Definition 2: A set of I probabilities is globally compatible with random utility maximisation if for any real value of the I systematic parts of the utilities, i.e.,

( )1, , IIv v∀ = ∈v K ¡ , we can write:

( )Pr , 1, , , 1, ,i j i j iP v v j I iε ε= − ≤ − = =K K I

with ( )1 , ,i I iε ε ε ε− −K a random vector which follows an absolutely continuous, proper, non-defective and translationally invariant distribution. Definition 3: A set of I probabilities is locally compatible with random utility maximisation in a set if for any value of the I systematic parts of the utilities contained in this set, i.e.,

I⊆Θ ¡( )1, , I

Iv v ∈v ΘK∀ = , we can write:

Page 4: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

( )Pr , 1, , , 1, ,i j i j iP v v j I iε ε= − ≤ − = =K K I

with ( )1 , ,i I iε ε ε ε− −K a random vector which follows an absolutely continuous, proper, non-defective and translationally invariant distribution. Having introduced the definition of random utility maximisation and global and local compatibility with it, we present now the following theorem. Theorem 1: The following two conditions, J1 and J2, imposed over an absolutely continuous real valued function , are necessary and sufficient for the global compatibility of a set of I probabilities,

1: If − →¡ ¡( ) , 1, ,i =v KP i , with random

utility maximisation when the utilities I

ε= − +i i = K, 1, ,i i IU v of at least of the alternatives are considered to be absolutely continuous:

1I −

J-1: ( ) 10 , , 1, ,i i If i−≥ ∀ ∈ =η η ¡ K I

I

1

I

J-2: ( )11 , 1, ,

I

i if d i−

= =∫ η η¡

K

Where the variables involved are defined as follows:

( )( )

( )( )

( )

( ) ( )

1 1 1 1 1

1 1 1

1

1 1 1 1

1 1 1

11

( ) 1 1 1 1

, , , , ,

, , , , ,

1, ,1

, , , , ,

, , , , ,

, , , ,

,0 , , ,0, , ,

i i i i i ii i I i I

ii i I

I

i i i i i ii i I i I

ii i I

i i i i II

i i i i ii i i I

w w w w v

v v v v

w w w w

η η η η ε

ε ε ε ε− + − −

− +

− + −

− +

−−

− + −

= =

=

=

=

=

= ∈

η ε

ει

w ι

v

η ε ι w v

w

K K

K K

K

K K

K K

¡

K K

= −

ι

v

¡

(DJ-1)

And where the system of probabilities has the following relation with the mentioned real valued function f:

( )( ) ( )

( )( ) ( )( )( )

1

1

11

,0 , 1, ,

,0 , 1, ,

i

I

i i ij j I

jij I

i i ii i

wi i i i i ij j I ji w

P f d i I

P f d d d dη

η η η η

−∞

∞ + − +

−∞

= =

= ≠

∫ ∫

w

ι

w ι

ι

w η η

w η

K

L L Ki j I= (DJ-2)

Proof: Given the absolute continuity of the function f contained in the theorem, conditions J-1 and J-2 are necessary and sufficient to guarantee that is a properly specified density function, (see, for instance, Rohatgi, 1976), and therefore, following the definition of the probabilities contained in the theorem (DJ-2), which embodies the basics of RUM when utilities are split into a deterministic (v ) and a random part (

( )if η

i iε ), and Definition 1 about global compatibility with RUM, we can guarantee finally that J-1 and J-2 are

Page 5: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

necessary and sufficient conditions to ensure the global compatibility with RUM of the probability system defined in the theorem. The requirement of absolute continuity for only I 1− of the random terms is due to the consideration of only random terms’ differences, allowing one of the utilities to have a discrete distribution, which is converted into an absolutely continuous one once we carry out the mentioned differences. Notice that, in theory, we should only try to verify the conditions J-1 and J-2 for one i, that is, for only one alternative of reference, since the probabilities of the I alternatives can be expressed in terms of it. However, and due to consistency issues, we would have to include the possibility of considering the I alternatives as references, which leaves us with having to prove the proper specification of I distributions. This fact, for instance, is the one that will allow us in section 3.1 to point out some imprecise characteristics of the work of Börsch-Supan (1990).

2.1 Theorem 1 and basic RUM conditions in Block & Marschak (1960)

The global compatibility with RUM, given the definition DJ-2 of a probability system in Theorem 1, implies that the three conditions of Marschak (1960) and Block & Marschak (1960) previously introduced, that is, B&M1, B&M2 and B&M3, have to be met ∀ ∈ Iv ¡ . Condition B&M2 about the translational invariance of the probabilities is met if these are calculated by the definition DJ-2, whose integrand limits are not affected when adding a real constant term to all the systematic part of the utilities. Condition B&M3 about a positive proportional increase by a real constant of all the utilities hot having any effect on the probability system is also met if probabilities are calculated by the definition DJ-2, since we have that:

( )( ) ( ) ( ) ( ) ( ) ( )( )1 1

0

,0 ,0 , 1, ,

i i

i i

I I

i i i i i ii ii iP f d f d P i

λλλ λ λ

− −

=>

−∞ −∞= = =∫ ∫

η ςw w

ι ιw η η ς ς w K I=

The first part of condition B&M1 about the non-negativity of the probabilities is met if condition J-1 holds and just by following definition DJ-2. The second part of the same condition B&M1 about the unitary sum of the I probabilities is met if condition J-2 holds, because the domains of the I integrals involved in the definition DJ-2, taking the same alternative as reference, cover exactly

1I−¡ .

2.2 Restrictions on the probability systems to guarantee global compatibility with random utility maximisation

Page 6: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

If we interpret Theorem 1 in terms of restrictions imposed directly on the probability system ( ) , 1, ,iP i =v K I , we can define a more practical set of conditions as we show in the following corollary. Corollary 1: The following three conditions, S-1, S-2 and S-3, imposed over an absolutely continuous probability system, ( ) , 1, ,i =v KP i , are necessary and sufficient to guarantee the global compatibility of such system with random utility maximisation:

I

S-1: ( )1( ) 1

,00 , , 1, ,

I ii i i I

i

Pi I

−−

∂≥ ∀ ∈ =

ηη

η¡ K

S-2: ( )( )lim ,0 0ij

ii iP

η →−∞=η , ( )

−→∞= ≠ = K

1( )lim ,0 1 , 1, ,

iI

ii iP i j

η ιη I

I

S-3: ( ) ( )( ) 1 ( ),0 , , , 1, ,i ii i i I iP P c c c i−= + ∀ ∈ =η η ι ¡ K

Where the variables involved are defined as in Theorem 1, and where the definition of probabilities is now the following:

( )( ) ( )( )

( )( ) ( )( )( )( )

1

1

1

1

1

1

,0,0 , 1, ,

,0,0 , 1, ,

i

I

i i ij j I

jij I

I ii ii i

i i i

I iw i ii i i i

j j I ji iw

PP d i I

PP d d d

ηη η η η

−∞

− ∞ + − +

−∞

∂= =

∂ = ≠ ∂

∫ ∫

w

ι

w ι

ι

ηw η

η

ηw

η

K

L L Kid i j I=

¡

Proof: This corollary of Theorem 1 directly follows if we define the absolutely continuous real valued function presented in this theorem to be the following:

1: If − →¡

( ) ( )11 , , 1, ,

I iii i I

i

Pf i

−−

∂= ∀ ∈ =

ηη η

η¡ K I

As before, the definition of the probabilities ensures that the whole 1I−¡ is covered by the sum of the disjoint integration domains of the I different probabilities, irrespective of the alternative that is taken as reference. This definition and condition S-2 are the ones responsible for the fact that the I probabilities sum up to one. For instance, for the case where there are three alternatives available ( I ), we represent in Figure 1 how the three corresponding integration domains of the density function

3=

( )2 1 11 2 3 3,P 1

21η η η η∂ ∂ ∂ ,

i.e., when taking alternative 1 as reference, covers exactly the entire 2¡ :

Page 7: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

η12

12 2 1w v v= −

13 3 1w v v= −

1 1 13 2 3w wη η 1

2= + −

13η

1P

2P

3P

Figure 1. Integration domains for the case of three available alternatives

Regarding condition S-1 about the non-negativity of the implied density function for the random terms differences, we have that translational invariance (condition S-3) and non-defectiveness (condition S-2) have to be also met by the probability system so as to state that effectively

( )( )1 ,0I ii iP−∂ η i∂η is a density function and that:

( )( ) ( )( )1

1,0 ,0i

I

i I ii ii iP P

−∞= ∂ ∂∫

w

ιw η ηi idη

The explanation for this is contained in the ensuing equalities:

( ) ( )( ) ( )( )i

I

Translationalinvariance Non-defectiveness

i I ii i ii iP P P

1

1,0 ,0−

−∞= = ∂∫

w

ιv w η ηi id∂ η

Notice, in addition, that translational invariance is not embodied by conditions S-1 and S-2, since these guarantee that the I-1 vector that represents I-1 differences of random terms is properly specified, but do not give any reason to believe that translational invariance is met by the calculation of the probabilities based on such random terms. Of course, if we go to the definition of the probabilities in corollary 1, we would see that this holds, but if we start, as it would be the most usual case in practice, by a given expression of the probabilities, and even if S-1 and S-2 holds, we would have to check that condition S-3 is met so as to assure translationally invariant probability systems. Notice that translational invariance was not an issue before in Theorem 1 because the conditions were not placed on the probabilities. Moreover, these were built using a definition of the random terms not directly involving the probabilities as in corollary 1, and therefore, translational invariance followed directly from the redefinition of the integral limits. In line with the discussion introduced in Theorem 1, the corollary ensures that the compatibility with RUM do not depend on which of the I alternatives available to an individual is chosen as the reference respect to which random terms differences are calculated. In summary, with corollary 1 we have aimed to derive the minimum set of necessary and sufficient conditions to guarantee the global compatibility of an absolutely continuous probability system with RUM. In doing so we have built a set that is not as binding as the one contained in Daly & Zachary (1976),

Page 8: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

which is one of the first seminal works on this issue of assuring the consistency with RUM of a probability system by directly imposing conditions on it. In later sections we analyse in more detail this greater generality of our approach, devoting special attention to the fact that corollary 1 implicitly ensures symmetrical probability systems and all mixed partial derivatives (not only the highest or I order one). 1−

Once we have presented and evaluated the results of Theorem 1 and its corollary about the global compatibility of a model with RUM, as stated in definition 2, we proceed next to study not the global but the local consistency of a probability system with RUM, as stated in definition 3.

3 LOCAL COMPATIBILITY WITH RANDOM UTILITY MAXIMISATION We introduce this concept by considering, without loss of generality, a three alternatives case, in which we are only interested in allowing some values for the differences of systematic parts of the utilities available in each choice set. That is:

ε= − + = , 1,2,3i i iU v i ⇒ ( ) ( ) [ ]− − = ∈ = ⊆ ¡ 22 1 3 1 1 2, , ,v v v v w w Ω a b

with the interval defined as:

( ) ≤ ≤ ≡ ⊆ ≤ ≤

¡1 1 2 21 2

1 2 2

, ;a w a

w wb w b

Ω

By applying Theorem 1 we can observe at this early stage that random utility maximisation is guaranteed in this local case if and only if the distributions of

continue to be properly specified in the entire real domain, 1 2 3, ,η η η 2¡ in this three alternatives case, with ( )ε ε ε ε= − −1

2 1 3 1,η and so on. The reason is that the variation of the random terms is not restricted to the interval [ , since by construction, such variation covers exactly the entire real domain

],a b2. This

means that the conditions to ensure local compatibility with RUM (see definition 3) must be equivalent to the conditions to prove global compatibility included in Theorem 1, and at least for the case of −1I absolutely continuous utilities. When we translate this fact to corollary 1, we have that since the systematic (non-random) parts of the utilities are forced to be contained in [ , we will not have to calculate the probabilities outside this interval, and therefore the definition of the density functions in the corollary in terms of the I order derivative of the probabilities for the entire real domain could be relaxed.

],a b

−1

Combining the statements contained in the last two paragraphs, we can contextualise the studies of Börsch-Supan (1990) and Koning & Ridder (1994) on local compatibility with RUM, both focused exclusively on the case where three alternatives are available in each choice situation. They both pursue the mentioned double objective: to define a proper density function in 1I−¡ , with I the number of alternatives available to individuals ( 3I = ), while only having to define this density equal to the −1I order derivative of the probabilities in the domain allowed for the systematic parts of the utilities3. Thus:

Page 9: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

( ) [ ]1

( ) 1,0

0 , , , 1, ,I i

i i i Ii

Pi I

−−

∂≥ ∀ ∈ = ⊆ =

ηη Ω a b

η¡ K

The distribution for η has to be also defined, by construction, in the rest of the domain,

i

Ω . Precisely, the second objective is to choose the distribution of η in

i

Ω so as to guarantee the same expressions for the probabilities than in the general case, i.e., when −≡ ¡ 1IΩ . The motivation behind this double objective is to ease the non-negativity condition of corollary 1 in this paper (condition S-1), thus allowing for not so binding restrictions on model parameters while counting with the same probability expressions in the model. Next, we show how the two mentioned studies somewhat fail in completely achieving this double objective.

3.1 A comment on local compatibility with RUM and Börsch-Supan (1990)

This seminal paper is considered to be the first to have addressed the issue of local compatibility of a choice model with stochastic utility maximisation, though it does not include an exact definition of the concept. After stating the utility of each of the alternatives available to an individual to be the sum of a deterministic component and an additive disturbance, U vi i iε= − + , the paper refers to McFadden (1981) to state that an individual is said to maximise his stochastic utility if he prefers alternative i over alternative j if and only if4

. i jU U>

Then, and referring to Williams (1977), Daly & Zachary (1979) and McFadden (1981), the following compatibility conditions are imposed over a given set of choice probabilities so as to guarantee that they define a stochastic utility maximisation model with an implied joint distribution o the stochastic utility components. These conditions are defined to be the following three5:

BS–1: ( ) 0, 1, ,iP i≥ =v K I ; ( )11I

iiP

==∑ ;v ( ) ( )= + = ∀ ∈K ¡, 1, , ,i i IP P c i I cv v ι

(basic requirements and translational invariance)

BS–2: ( ) ( ) , 1, ,ji

j i

PPj I

v v∂∂

= =∂ ∂

vvK (symmetry)

BS–3: ( )[ ]−∂

≥ =∂ ∂ ∂

KL L

1

1

0, 1, ,I

i

i I

Pi

v v vv

I (non-negativity)

Condition BS–1 represents the basic requirements of probability systems and imposes that the comparisons between available alternatives can only depend on the differences between non-random utilities (translational invariance). Condition BS–2 is stated to guarantee the integrability of the and to be straightforward analogue to the Slutsky condition in continuous demand analysis. In section 4 we discuss this issue of the integrability of demand systems, where we consider not only this symmetry condition BS–2 but also

iP

Page 10: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

the one related to the negative semi-definiteness of the Slutsky matrix ( )ij i jS P v = ∂ ∂ v .

( )

Finally, condition BS–3 is stated to ensure a non-negative density function for the random terms that give place to the probability system ( ), 1, ,i =v KP i . We first notice that the original formulation of Börsch-Supan needs to be changed so as to recover the consistency between this condition and the definition of the utilities. It can be easily shown that condition BS–3 applies when the utilities are defined as

I

ε= − +i i iU v , with v indicating the cost of alternative i, and not when the utilities are defined as

i

ε= +i i iU v , which is the case in Börsch-Supan (1990, p. 374). The basic argument behind the necessity and sufficiency of BS–3 is that:

( ) ( )ε=− +

= = −K K K K1 1, , , , , ,0, ,j j jU v

ii i i I i IP P v v v F v v v vv − i

where refers to the distribution of the differences of random terms taken respect to alternative i, i.e. the distribution of

iFε −− 1

ii Iιε , and therefore:

( )( ) ( ) ( ) ( )

( )− −

− + − +

∂ − − ∂= ≥

∂ − ∂ − ∂ − ∂ − ∂ ∂ ∂ ∂K K K K

L L L

TranslationalI i Iinvariancei I i i i I

i i i i i I i i i I

F v v v v P v v vv v v v v v v v v v v v

1 11 1

1 1 1 1 1 1

, ,0, , , , , ,0

L

Notice how in this process to identify the relation between the utilities and the exact form of condition BS–3, it has become clear that translational invariance (part of condition BS–1) is necessary along with condition BS–3 to guarantee a non-negative density for the I random terms differences giving place to the probability system ( )i , 1, ,=v KP i . I

Still regarding condition BS–3, we would like to consider as misleading the reference made in Herriges & Kling (1996, p. 36) about it, since they consider BS–3 to include mixed partial derivatives of any order, while Börsch-Supan (1990) involves only one −1I order derivative for each one of the I probabilities. We later in the paper show how BS–3 plus some assumptions not considered by Börsch-Supan about a non-defective behaviour for the random terms do guarantee that all the derivatives are non-negative. In relation with this, we would also like to highlight that Börsch-Supan (1990) does not reproduce, as it states, the conditions of Williams (1977), Daly & Zachary (1979) and McFadden (1981), since all of them include the non-defectiveness condition of the random terms subjacent to the probability system, that is, if U vi i iε= − + , all of them impose that:

( )k

j IvP v v k j I1lim , , 0 , 1, ,

→−∞= ≠ =K K

what directly means that:

( ) ( )−

=→−∞ →−∞≠

= − =∑K Kk k

B S Ijk I j Iv vj k

P v v P v v& 1

11 1lim , , 1 lim , , 1

We consider this to be an important issue, since non-defectiveness is necessary to guarantee that the density function whose non-negativity is assured by BS–3 is really the one giving place to ( ), 1, ,iP i =v K I . As a consequence, we cannot state that only BS–1, BS–2 and BS–3 are sufficient

Page 11: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

to guarantee that ( )i , 1, ,=v KP i represents a probability system associated to a preference model consistent with random utility maximisation.

)a b,

( )∂

=

Pr

)12

¡

[ ]a b,

( )∂

21η = −

I

3.1.1 Inconsistencies in the summing-up-to-one condition on the distributions proving local consistency with RUM

The previous discussion do not preclude Börsch-Supan (1990) to state in its Theorem 1 that if a probability system meets BS–1 and BS–2 globally and BS–3 only in an open interval containing the values permitted for the systematic part of the utilities, then such probability system is compatible with stochastic utility maximisation. The argument to support this idea is the existence of a joint density function for the random parts of the utilities which contains a mass point in the lower limits of (see Börsch-Supan, 1990, p. 382).

Ω

Ω

If we consider, for instance, the case where only two alternatives are available in each choice situation, and if we allow the difference ( )v v2 1− only to be in the interval (=Ω , we have that the density function of η ε ε= −2 2

11 (taking

alternative 1 as reference) proposed in Börsch-Supan (1990) would be the following:

( )( )

P if

f P a ifif a

1 1 11 2 2 2

1 12 1 2

12

0,

0,0

η η η

η ηη

∂ >

= =<

a

a

Using this definition we observe that for any real value of ( )v v2 1− in the interval ( )a b, , that is, for any ( )− ∈ ⊆ ¡x v v a b2 1 , , we have that:

( ) ( )η ≤ =x P x12 1 0,

In fact the previous equality holds for any [ )x a,∈ ∞ . Notice how this definition

of the density (f η only translates the necessity of ( )P 1 11 2 20, 0η η∂ ∂ ≥ to ,

and not to , which would be the case if analysing global compatibility with RUM.

a12η >

12η∀ ∈

We observe now that the introduction of a mass point in has direct consequences when we take as reference other alternatives available to individuals. In the binary case we can also take alternative 2 as reference and, since , we have that

a12η =

( )v v2 1− ∈ ( ) [ ]v v a b1 2 ,− ∈ − − , and the density defined by Börsch-Supan (1990) would be the following:

( )( )

P if

f P b ifif b

2 2 22 1 1 1

2 21 2 1

21

0,

,00

η η η

η ηη

∂ >

= − =< −

b

b

The mass point located at b implies that:

Page 12: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

( ) ( ) ( )b b P1 2 2 1 2Pr Pr ,0ε ε ε ε− = − = − = = −b

This means that the previous definition of the density function for ( )f 12η was

incomplete, and should be modified to include two mass points, that is:

( )

( )( )( )

η

η η η ηη

ηη

− =∂ ∂ >′ =

= <

P b if b

P if a f

P a if aif a

12 2

1 1 1 11 1 2 2 2 22 1

1 212

,0

0, ,

0,0

≠ b

Notice that this still allows that for any real value of ( )x v v2 1= − in the interval : [ )a b,

( ) ( ) ( )x x P x12 1 2 1Pr Pr 0,ε ε η− ≤ = ≤ =

But with the inconvenient that the amended density function (f 12 )η′ is not a

proper one, since even when assuming a non-defectiveness distribution for (f 1

2 )η′ , we would have that:

( ) ( ) ( ) ( ) ( ) ( ) ( )( ) ( )

( )

f d P a P b P a P b P P b

P P b

P b

1 12 2 1 1 1 2 1 1

1 2

2

0, 0, 0, ,0 0, 0,

0, ,0

1 ,0 1

η η∞

−∞′ = + − + − + ∞ −

= ∞ + −

= + − ≥

Therefore, given that we would expect certain compatibility in the probability system that ensures that any alternative can be taken as reference, we conclude that the approach to local consistency with RUM contained in Börsch-Supan (1990) is not completely exact, since the density functions involved in assuring this locally consistent character of the probability system would not be properly defined.

3.2 A comment on local compatibility with RUM and Koning & Ridder (1994)

The work of Koning & Ridder (1994) builds upon the results shown in the previous section and concludes that Börsch-Supan (1990)’s conditions to assure local consistency with RUM are sufficient, but not necessary, and that if defining the density function of the random terms differences as in Börsch-Supan (1990) (the ( )f 1

2η that we have developed above for the binary case),

then we can only have compatibility with RUM in the open interval ( ),a b .

Notice however that in the previous section we have shown how Börsch-Supan (1990) fails to prove local compatibility with RUM even in the open interval ( ),a b due to not imposing non-defectiveness on the utilities and not allowing to take any alternative as reference.

Page 13: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

These same authors state that it is well-known from Daly & Zachary (1979) and McFadden (1981) that choice probabilities, ( )i , 1, ,=v KP i , that satisfy the following five conditions for all

II∈v ¡ are globally compatible with stochastic

utility maximization of preferences of the ARUM (or additive random utility model) form, that is, preferences that can be described by a model that consider the utility of alternative i to be ε= − +i iiU v , with ε i

I being stochastic.

Specifically, these conditions, for any i 1, ,= K , are the following five6:

K&R–1: ( )iP 0≥v ; ( )Iii

P1

1=

=∑ v (basic requirements)

K&R–2: ( ) ( )i iP P c= +v v ιI c ∀ ∈ ¡ (translational invariance)

K&R–3: ( )iP v is differentiable with respect to iv

K&R–4: ( )( )

ki

kik

P0

∂≥

v

v with any k-subvector of (non-negativity) ikv iv

K&R–5: ( ) ( )ji

j i

PP j

v v, 1, ,

∂∂= =

∂ ∂

vvK I (symmetry)

The same authors state that an additional restriction is needed if we want the distribution of the random terms, F, to be non defective. As we have stated before, we think that the additional condition that they refer to is not optional to guarantee the compatibility of a set of probabilities with stochastic utility maximisation, since this condition is needed to ensure that we have a proper joint distribution for the correspondent differences of random terms (see section Error! Reference source not found. devoted to non-defectiveness). Moreover, a preference model of the ARUM form is defined in Daly & Zachary (1976) and McFadden (1981) to have associated a random vector ( )I1, ,ε εK that follows a continuous and proper distribution, which is also non defective and which does not depend on the systematic part of the utilities, ( )Iv v, ,K1 , at least in terms of adding the same constant to all these parts (translational invariance). This means that the ARUM form includes non-defectiveness as a non-optional requirement. Also in Koning & Ridder (1994), and after considering as incomplete some of the results of Börsch-Supan (1990), the authors present the following theorem regarding local consistency with RUM: Theorem (Koning & Ridder, 1994):

The choice probabilities ( )iP i, 1, ,=v K

[ ],= a bI are locally compatible with stochastic

utility maximisation on if and only if conditions K&R–1 to K&R–5 hold for all .

ΩI∈ ⊆v Ω ¡

Where is an I-dimensional closed interval and local compatibility with stochastic utility maximisation is characterised as in definition 2 in this paper.

Ω

Page 14: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

The necessity part of this theorem is implicitly proved in Koning & Ridder (1994, p. 394) using the requirement imposed by local compatibility about stochastic utility differences being distributed absolute continuously. First, we cannot see this requirement in the definition of local compatibility employed by the authors, and second, only the continuity (absolute or not) of a joint distribution does not ensure its propriety, and therefore, it does not ensure either that K&R–1 to K&R–5 hold. We think that an extended definition of compatibility with utility maximisation (including the proper, continuous, non-defective and independent from non-random part of the utilities –at least translationally– character of ( )I1, ,ε εK ) should be employed instead to prove the necessity of this theorem. On the other hand, the proof of the sufficiency part of the theorem includes the construction of a density function (absolutely continuous and properly specified) particularised for a three alternatives case, and which meets conditions K&R–1 to K&R–5 for all . This density function, though not noticed in the paper, is also translationally invariant, since it refers to differences of random terms, and non-defective

I∈ ⊆v Ω ¡

7, as long as the auxiliary distribution, noted as ( )Φ ∈ ¡x x, , is non-defective. These two facts would come to completely assure the sufficiency of K&R–1 to K&R–5 to ensure local compatibility with RUM of a model of the ARUM form. In a minor scale, we also think that the expression of the density function that proves the sufficiency of the theorem about local compatibility with RUM in Koning & Ridder (1994, pp. 394-95) cannot be completely exact, since the derivatives of the probabilities would be zero, since ( )w w1 2,=w is defined as ( )2 1 3 1,ε ε ε ε− − , what we have defined as ( )1, 2η η=η , and ( )v v v1 2 3, ,=v is a fixed a point in 3¡ .

3.2.1 Inconsistencies in the expectation values of the distributions to prove local consistency with RUM

We analyse in more detail now the density function proposed in Koning & Ridder (1994) to guarantee the local compatibility of a probability system with RUM particularised for a binary case. In line with what was stated previously regarding the study of this case under the approach of Börsch-Supan (1990), we consider the values permitted for the systematic parts of the utilities to be restricted to a closed interval, ( ) [ ]v v a b2 1 ,− ∈ .

In this scenario we want to guarantee that there exists a probability system that represents a RUM process upon which less binding conditions apply than if all real values for the systematic parts of the utilities were allowed. To do so, and employing Koning & Ridder (1994) approach, we define the following density function when alternative 1 is taken as reference (η ε ε= −1

2 2 1 ):

Page 15: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

( )

( ) ( )( )

( )

( ) ( )( )

φ ηη

η η η

φ ηη

>

− Φ= ∂ ∂ ≤ ≤

< Φ

P b if bb

f P if a

P a if aa

12 1

2 2

1 1 11 2 1 2 2 2

12 1

1 2

0,1

0,

0,

η b1

And equivalently, since ( ) [ ]v v b a1 2 ,− ∈ − − , we define this other density function when taking alternative 2 as reference (η ε ε= −2

1 1 2 ):

( )

( ) ( )( )

( )

( ) ( )( )

φ ηη

η η η η

φ ηη

− >

− Φ −= ∂ ∂ − ≤ ≤ −

− < Φ −

P a if aa

f P if b

P b if bb

21 2

1 1

2 2 2 22 1 2 1 1 1

21 2

2 1

,01

,0

,0

a

In both cases, ( )xφ and ( )xΦ denote the density and cumulative distribution, respectively, of a well-behaved continuous univariate random variable8. Using the definition of these densities and the translational invariance that this probability system has to meet (K&R–2) we can easily see how they effectively reproduce a probability system locally compatible with RUM, since for any ( ) [ ]v v b2 1− ∈ a, , it holds that:

( ) ( ) ( )

( ) ( ) ( )

η ε ε

η ε ε

≤ − = − + ≤ − + = − =

≤ − = − + ≤ − + = − =

K R

K R

v v v v P v v P v v

v v v v P v v P v v

& 212 2 1 2 2 1 1 1 2 1 1 1 2

& 221 1 2 1 1 2 2 2 1 2 2 1 2

Pr Pr 0, ,

Pr Pr ,0 ,

( )

( )

In order to ensure the non-negativity of the densities ( )x1f and ( )f x2

( we need

to impose extra conditions than K&R–2 and the correctness of )xφ on the probability system:

( )( ) [ ]

( ) ( )

( )( ) [ ]

( ) ( )

ηη

η

ηη

η

∂ ≥ ∀ ∈ ⇒

≥ ⇔ ∂ ≥ ⇒∂ ≥ ∀ ∈ − − ⇒

≥ ⇔ ∂ ≥ ⇒

Pa b K&R-3,K&R-4

f x

P a P b K&R-1

Pb a K&R-3,K&R-4

f x

P a P b K&R-1

11 2 1

211 2

1 2

22 1 2

122 1

1 2

0,0, , ( )

0

0, , 0, 0 ( )

,00, , ( )

0

0, , 0, 0 ( )

In addition, in order to ensure the correct distribution of the random terms we need to guarantee that:

( ) ( ) ( )

( ) ( ) ( )

η η

η η

−∞

−∞

= + = ⇒

= + = ⇒

∫K&R-2

f d P b P b K&R-1

f d P a P a K&R-1

1 12 2 1 2

2 21 1 1 2

0, 0, 1 ( )

0, 0, 1 ( )

Page 16: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

The relation between the two densities impose still an extra restriction not related to their propriety but to the correct specification of the probability system, that for any [ , ]x a b∈ it has to hold that ( ) ( )1 20, 0, 1P xP x + = . Therefore, we have come to show that a probability system that complies with K&R1 to K&R4 only in ( ),Ω = a b ⊂ ¡ is locally compatible with RUM in such interval.

Observe that we do not need to assure the non-defectiveness in the probability system for finite values of a and b, since in this case it is accomplished by the non-defectiveness of the auxiliary distributions. However, we would have to be able to accommodate the case where local compatibility collapses into the global one, that is, when ≡ ¡Ω because [ ] [, ,a b → −∞ ∞] , so we need to impose that:

( ) ( )P P1 20, ,0 0−∞ = −∞ = It is in this sense, and as stated before, that non-defectiveness is not the void condition that Koning & Ridder (1994) suggests9. Expectation values of the densities for a binary case: We observe that the expectation of the random terms differences whose range is by construction not restricted to an interval but free to vary in ¡ , is dependent on the interval permitted for the systematic parts of the utilities. For instance, for this translationally invariant binary case the expectations are as follows:

( ) ( )( ) ( ) ( ) ( )

( ) ( )

( ) ( )( ) ( ) ( ) ( )

( ) ( )

ηη η φ η η η η η φ η η

η

ηη η φ η η η η η φ η η

η

−∞

−∞

∂= + +

Φ −Φ∂

∂= − − − − −

− Φ − Φ −∂

∫ ∫ ∫

∫ ∫ ∫

a b

a b

a b

a b

PP a P bE d d

a b

PP a P bE d d

a b

11 21 21 1 1 1 1 1 1 1

2 2 2 2 2 2 2 212

11 21 22 1 1 1 1 1 1

1 2 2 2 2 2 212

0,0, 0,1

0,0, 0,1

d

d

12

1 12 2

Given the definition of the variables η η1 2

2 1, the following relation must hold:

( ) ( )η η= −E E1 22 1

Thus we have that the univariate auxiliary distributions ( )xφ have to meet extra conditions, aside from only being properly distributed, to guarantee such equality. In particular, they need to be symmetrical10, i.e. ( ) ( )φ φ= −x x . This result together with the flexibility mentioned in note 8 about the possibility of defining different univariate distributions ( )xφ for the same density function, are necessary to fully argument the correctness of the theorem in Koning & Ridder (1994) about local compatibility of a probability system with RUM. The calculations presented involve a binary choice case. Models with a higher number of alternatives will have not so simple expressions for the expected values and will make necessary to impose additional restrictions on the auxiliary distributions so as to guarantee that:

( ) ( ) ( )j k ki j iE E E i j k, 1η η η+ = ∀ ≠ ≠ = K I, ,

Page 17: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

Notice before concluding this section that even if we were able to find auxiliary distributions that lead to the necessary equivalent expectation values, we would have so at the expense of artificially changing the domain of variation of the random parts of the utilities. The analyst should have then some information of these changes so as to draw the correct conclusions about the probability system in use. For instance, if we have a binary probability system ( ( ) (1 1 2 2 1 2, , ,v P v v )P v ) that meets conditions K&R-1 to K&R-4 for any ( ) [ ]2 1v v− ∈ ,a b , we have that it is locally consistent with RUM in the sense that there exists a properly distributed random vector ( )1 2,ε ε that permits to state that

( ) ( )ε ε= − + ≤ − +1 1 2 2 2 1 1, PrP v v v v and ( ) ( )ε ε+ ≤ − +1 2 2v= −2 1, Pr v2 1 vP v . However, the auxiliary distributions that determine the exact form of the random terms is not identified, and therefore unknown to the analyst. For the binary choice case, we have that the probability system ( ) ( )2 2 1 2, , ,P v v1 1P v v encounters explanation in its rooting on RUM in the following two situations:

A)( )

( ) ( )( ) ( ) ( ) ( )

( ) ( )

ε

ηε η φ η η η η η φ η η

η∞

−∞

=

∂= + +

Φ Φ −∂∫ ∫ ∫

1

11 21 21 1 1 1 1 1 1 1

2 2 2 2 2 2 2 212

0

0,0, 0,0

a b

a b

E

PP a P bE d d

a b≠2d

B)( )

( ) ( )( ) ( ) ( ) ( )

( ) ( )

ε

ηε η ϕ η η η η η ϕ η η

η∞

−∞

=

∂= + +

Γ Γ −∂∫ ∫ ∫

1

11 21 21 1 1 1 1 1 1 1

2 2 2 2 2 2 2 212

0

0,0, 0,0

a b

a b

E

PP a P bE d d

a b≠2d

In such a way that if we do not fix the level of the expected value of ( )ε2E we cannot draw any exact conclusion about how the probability system interprets that individuals behave. Considering all this arguments, we finally think that the theorem about local compatibility in Koning & Ridder (1994) is not completely exact in its relation to random utility maximisation. Summarizing this section 3, we have shown the two main attempts in the literature to study local compatibility with utility maximisation, Börsch-Supan (1990) and Koning & Ridder (1994). Regarding the former we have observed some unfeasibility in the process to identify a proper distribution for the random terms to reproduce the same probability expressions than in the scenario of global compatibility and, regarding the latter, we have identified some compatibility drawbacks in Koning & Ridder (1994) on the consistency between the expected values of the different random terms involved.

4 INTEGRABILITY OF DEMAND SYSTEMS Having discussed so far the conditions that ensure global and local consistency of a preference model with random utility maximisation (RUM), we devote our analysis now to the relation between RUM and demand

Page 18: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

systems, implemented through one of these conditions, specifically, the one assuring symmetrical probability systems. This symmetry condition is considered explicitly in all the sets of conditions reviewed in this paper, that is, in the studies of Daly & Zachary (1976), McFadden (1981), Börsch-Supan (1990) and Koning & Ridder (1994, 2003) (every one except our three-conditions corollary 1). Such condition is considered by these authors as necessary to assure that a probability system complies with RUM, under the argumentation that it guarantees that this probability systems are integrable. They make use then of the interpretation of a probability system as a demand function and, therefore, make suitable to impose on such system the integrability conditions that characterise any such demand function. What we would like to point out at this section is that these integrability conditions obey to a particular identification of probability systems with demand functions, so much that they are not related with assuring that the joint distribution of the random terms giving place to the probabilities is proper. In fact, we show in this section that the minimum set of conditions guaranteeing this propriety (conditions S-1, S-2 and S-3 in corollary 1) do satisfy the integrability conditions, and moreover, that the integrability conditions, which are associated to the compatibility with RUM by the representative agent of a population, do not yield weaker conditions on the choice probabilities than the ones related to ensure the compatibility with RUM of any individual from the population, at least not in terms of what we have called comparable conditions. When we refer to comparable conditions we acknowledge the fact that the representative agent’s probability system differs from the ones associated to the individuals in the population, so much that in the latter we need to include conditions that ensure that the random terms are properly specified and that the probabilities are translationally invariant (in accordance to the specification of an ARUM model), while only a subset of these conditions (the comparable ones) are applicable to the representative agent’s model. An issue of further research from this paper is the derivation of the complete set of conditions that have to be met by the random terms leading to the representative agent model. Part of the published material on this issue of integrability of demand systems and its relation to discrete choice models include the conditions of Daly & Zachary (1976), to which we have referred previously in this paper, as the ones guaranteeing the global compatibility with RUM. We explicitly note that Williams (1977) is similar in approach to Daly & Zachary (1976). Thus, before continuing with the relation between demand functions and RUM, we analyse in the following section the mentioned conditions.

4.1 Daly & Zachary (1976)’s compliance with RUM

The presentation of random utility maximisation (RUM) in Daly & Zachary’s (D&Z) work is couched somewhat differently from the basic RUM of Block & Marschak (1960) (B&M) that we introduced at the beginning of the paper; indeed no reference is made to B&M. The dissection of utility, and with it the

Page 19: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

notion of cardinality, is introduced by assertion at an early stage in their analysis, thus defining the expression of the utilities with which we have been working throughout the paper, i.e. ε= − + = Kj j j j, 1, ,U v . By analogy to the basic derivation of Block & Marschak (1960), the probability statement under the D&Z scheme, equal to the one employed in definitions 1 and 2 in this paper, is written as:

I

I( )j j j k kP v v kPr , 1, ,ε ε= − + ≥ − + = K

The main contribution of D&Z is that they present a set of necessary and sufficient conditions to correctly implement RUM in practice, guaranteeing that a given probability system ( ) =Kj Iv j I1, , , 1, ,KP v represents an underlying preference model consistent with RUM. These conditions are as follows:

D&Z-1: ( ) ( )j I j IP v v P v v j1 1, , , , , ,α α α+ + = ∀ ∀K K ∈ ¡ (translational invariance)

D&Z-2: ( )k

j IvP v v k j k j1lim , , 0, ,

→−∞= ∀ ≠K (non-defectiveness)

D&Z-3: , ( )j IP v v j1, , 0,≥ ∀K ( )Ij Ij

P v v11, , 1

==∑ K (basic requirements)

D&Z-4: ( )−∂≥ ∀ = ≠

∂ ∂ ∂

K KK

L L

mj j I

j m

P v v v j m I j m

v v v

11

1

, , , ,0 ; , 1, , ;

1

(non-negativity)

−and also, the ( I ) different derivatives of each probability are finite everywhere and continuous

D&Z-5: ( ) ( )∂ ∂=

∂ ∂

K Kj I k I

k j

P v v P v v, j k

v v1 1, , , ,

,∀ (symmetry)

Comments to conditions D&Z-4 and D&Z-5 Condition D&Z-4 imposes that any mixed partial derivative of the probabilities has to be non-negative. This implies that for each of the I probabilities we would need to check the sign of −I derivatives. We notice however that if the subjacent distribution leading to the probability system under analysis is non-defective, then it is enough to guarantee that the highest order ( ) mixed partial derivative of the I probabilities is non-negative. Thus condition D&Z-4 would only include the following I sign restrictions instead of the original

1

−I 1

( )−I I 1 :

( ) ( )( )

( )− −−−

∂ − ∂∂= = ≥ ∀ ∈

∂ ∂∂ −¡ K

I i I iIi i I i i ii i I-1

i iii I

P v PP0 , , i I

v

1 111 ( ) ( )

1

,0 ,01, ,

v ι wvw

v wv ι=

The explanation of this fact starts by considering that:

( ) ( )ij

I i I iwi i i ii I-1 i

ji i

P P0 , d

1 1( ) ( ),0 ,0

0η− −

−∞

∂ ∂≥ ∀ ∈ ⇒ ≥

∂ ∂∫η η

ηη η

¡

Which in turn imply that for any real w , imposing non-defectiveness, ij

( )ij

ii iP ( )lim ,0 0,

η =−∞i j= ≠η , we have finally that:

Page 20: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

( ) ( )

( ) ( )

i ij j

ij

ij

wI i I iw i i i ii

ji i ij

I i i ii j j i i i I-2 i

j ji ij

P Pd

P w , , w

1 2( ) ( )

2( )

,0 ,0

, ,00

η

η

ηη

ηη

η

=− −

−∞

=−∞

∂ ∂=

∂ ∂ ∂

∂ = ≥ ∀ ∈ ∂ ∂

∫η ηη η

ηη

η¡ ¡∀ ∈

K

Proceeding in the same way we would reach similar conclusions for the lower order derivatives. Notice that translational invariance (D&Z-1) is used throughout the previous argument and that the non-defectiveness needed for the proof is only the one incorporated directly by D&Z-2. The other side of non-defectiveness,

, follows directly from D&Z-2 and the second part of D&Z3 about probabilities summing up to one.

( )→−∞ = =Kkv k IP v v k I1lim , , 1 , 1, ,

Regarding the evaluation of the ( )−I I 1 2 equalities included in the symmetry condition D&Z-5, we prove in section 5 that they are not adding extra restrictions on the probability systems if the other conditions hold.

4.2 Relation of the Daly & Zachary conditions on the probability systems and the demand of available alternatives

Returning now to the previous discussion about the integrability conditions imposed over probability systems and their relation with the representative agent’s demand function, and as an introduction to the results presented later in this section, we proceed by taking the conditions in the work of Daly & Zachary (1976) and presenting their direct relation with demand analysis. The non-negativity condition (D&Z–4), combined with translational invariant probabilities (D&Z–1) and the non-defectiveness precluding positive probabilities for an alternative if the utility of some other alternative increase without limit (D&Z–2), states that all first order cross-partial derivative of probability with respect to deterministic utility are non-negative, that is:

( )i

j

Pv

0∂

≥∂

v , i j 1, ,I≠ = K

In other words, if the mean utility of an alternative j increases (notice that j jU v jε= − + ) then, ceteris paribus, the probability of choosing another

alternative i cannot increase. This serves to restrict individuals’ preferences to direct substitution between alternatives, and precludes joint consumption. Besides, the combination of this previous result with the unitary sum of the probabilities imposed by condition D&Z–3, states that:

( ) ( )Ij iii jj j

P Pv v1 0=

∂ ∂= − ≤

∂ ∂∑v v

, j I1, ,= K

Page 21: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

which means, with analogy to the ‘law of demand’, that the probability of choosing an alternative increases as its deterministic utility increases, and vice versa. Finally, the equivalence between symmetrical cross-partial derivatives (D&Z–5) is analogous to the symmetry requirements of the Slutsky matrix in Neo-Classical microeconomic theory. We prove later in this section that D&Z–5 and D&Z–4, along with the unitary sum of probabilities and the non-defectiveness of the subjacent distribution (part of D&Z–3 and D&Z–2, respectively), are sufficient to ensure consistency between a probability system and the integrability of demand systems.

4.3 Integrability conditions

The exploitation of McFadden’s (1981) representative agent model to derive integrability conditions for the case of probabilistic (discrete) choice can be found in works such as Koning & Ridder (2003). In this case, the derivation starts considering that mean utility and price are one-and-the-same, that is:

t iti ti

y vUp p

ε= − + for i I1,...,= , t T1,...,=

where t refers to the t-th agent or individual, ty is the income of the t-th agent, and p is the price of other consumption. With reference to D&Z–1, ty p is common to all the I alternatives for a specific agent and therefore has no impact on probability, what implies that we are implicitly considering that all the T agents shows a translationally invariant preference ordering. This re-statement of utility permits the derivation of the following indirect utility and cost functions, respectively:

( ) iii I

vyV y p Ep p1,...,

, , max ε=

= + − +

v

, ( ) i

ii I

vC u p p u Ep1,...,

, , max ε=

= − − +

v

Thus, exploiting Roy’s identity and Shephard’s lemma, we can arrive at the Marshallian and Hicksian demands which, in this case, are identical, and are expressed in terms of choice probability (Koning & Ridder, 2003):

( )C u pP

p, ,∂

= ∂

v vv

This demand function is the one to which integrability conditions are referred, and which are embodied in the following two requirements:

IC–1: ( ) ( )ji

j i

PPv v

∂∂=

∂ ∂

vv , i j I, 1,...,=

IC–2: ( )i Iij ij

j

P vS S

v; ; 0 ,

∂′ = = ≤ ∀ ∂

S x Sx ¡∈ Ix i j 1,..., , ≠ =

Notice however that the indirect utility and cost functions of the representative agent are expressed in terms of an expectation taken over the distribution of

Page 22: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

( )I1, ,ε εK and that some requirements, aside from the integrability conditions, would have to be considered as an issue of further research to ensure that this I random terms are distributed properly. Additionally, we would also have to decide as to whether the representative agent would require translational invariance and/or non-negativity of the I order derivative of the ( )P pv probability system. Interpreting each of the mentioned integrability conditions we have, on one hand, that IC–2 requires that the Slutsky matrix be negative semi-definite, which in turn implies that the on-diagonal elements of must be non-positive, that is:

SS

( )i

i

P v i I

v0 , 1,...,

∂≤ =

Hence we arrive at the (stochastic) ‘law of demand’ previously introduced when considering in the individual random utility maximisation the non-negativity, non-defectiveness and the unitary sum of the probabilities, that is, as price increases, the probability that the alternative will be chosen decreases, and vice versa. On the other hand, IC–1 imposes symmetry of the cross-partial derivatives of probability with respect to price, and can be seen to be a further restriction on the Slutsky matrix. Next we focus on the implications for the binary and general choice cases of imposing the integrability conditions. Before that, we notice that the I probabilities sum up to one for the representative agent of the population and for any of the individuals, so we have that for both cases it holds that:

( )Ijj

P1

1=

=∑ v ⇒( )I j

jk

P , k I

v10 1,

=

∂= =

∂∑v

K ,

Besides, the symmetry condition is also considered in both set of conditions, moreover, they are included at an individual level with the sole purpose of guaranteeing precisely the integrability of demand systems.

4.4 Analysis of the integrability conditions for a binary choice case

The Slutsky matrix for this case is as follows:

P P Symmetry

P P P Pv v v v PP P P P vv v v v

1 2

1 2 2 21

1 1 1 1 1

1 2 1 1 2

2 2 2 2

1 11 1

+ =

∂ ∂ ∂ ∂ − ∂ ∂ ∂ ∂ − ∂ = = = ∂ ∂ ∂ ∂ −∂ − ∂ ∂ ∂ ∂

S

Which is negative semi-definite if and only if P v P v1 2 2 1 0∂ ∂ = ∂ ∂ ≥ .

The Daly & Zachary conditions for this binary case are the following five:

D&Z–1: ( ) ( )P P1 1α α+ = +v v , ( ) ( )P P2 2α α+ = +v v

D&Z–2: ( ) ( )P v P v1 1 2 2, ,∞ = ∞ = 0

Page 23: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

D&Z–3: ; P , P1 20 0≥ ≥ P P1 2 1+ =

D&Z–4: P v , P v1 2 2 10 0∂ ∂ ≥ ∂ ∂ ≥

D&Z–5: P v P v1 2 2∂ ∂ = ∂ ∂ 1

The symmetry and the negative semi-definiteness of the Slutsky matrix imposed over the probability system of a representative agent (conditions IC–1 and IC–2) are as binding for this binary case as the second part of D&Z–3, D&Z–4 and D&Z–5. The fact is that it would be expected that D&Z–2 and D&Z–3 were also met by the same representative agent probability system ( ( )P pv ), and also D&Z–1 about translational invariance, since if we have considered that all the agents behave in this way, it should be equally plausible and expected that the representative agent behaved in the same way, that is, with ( )( ) ( )α α+ = ∀ ∈ ¡P p ,v vP p .

Thus we conclude that in a binary choice case compatibility with utility maximization by a representative agent yields weaker conditions on the choice probabilities that compatibility with individual utility maximisation only if the latter is not forced to be translationally invariant11.

4.5 Analysis of the integrability conditions for the general choice case

Moving from the binary to the general case, where I alternatives are considered to be available to the individuals, we have that if the probabilities sum up to one and symmetry hold, i.e., respectively:

( )Ij kj

P v1

0=∂ ∂∑ v = and ( ) ( )j k k jP v P v j k, , 1, ,= ∂ ∂ =v v K I∂ ∂

then requiring a negative semi-definite character for the Slutsky matrix implies that:

( )I ii ji j i

j

Px xv

2

10

= >

∂′ = − − ∂

∑ ∑x Sx ≤ , ( ) IIx x1, , ∈x K ¡∀ =

We can see that the previous inequality is equivalent to the following:

b b b 0′ ≥x S x , ( ) ( )( )I Ib I I I Ix x x x x x x x x x 1 2

1 2 1 2 3 2 1, , , , , , , −−∀ = − − − − − ∈x K K K ¡

where:

Page 24: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

I

b

I

I

I

Pv

Pv

Pv

Pv

Pv

1

2

1

2

3

2

1

0 0 0 0

0 0 0

0 0 0

0 0 0

0 0 0 0 −

∂ ∂

∂ ∂

∂ ∂= ∂ ∂ ∂

S

L L L

M O M M O M O M

L L L

L L L

M O M M O M O M

L L L

M O M M O M O M

L L L

0

0

0

Thus, the negative semi-definiteness obliges to the ( )( )I I 1 2− square and diagonal matrix to be positive semi-definite, which invariably means that bS

i jP v 0∂ ∂ ≥ for any . i I j1, , ,= >K i

The Daly & Zachary conditions for this general case were introduced in section 4.1. From the previous development for this general case, we can see that D&Z–5, the second part of D&Z–3 and condition D&Z–4 (for ) guarantees the negative semi-definiteness of the Slutsky matrix. In addition, the first part of D&Z–3 and D&Z–2 are properties which are expected to be met by

m 2=

( )P pv , the representative agent probability system, along with the whole condition D&Z–4, i.e., for m I= , since in that way we are assuring that the representative agent probability system could as well proceed from the process of maximisation between I alternatives with p−v as systematic utility. In any case, we are in condition to conclude that compatibility with utility maximization by a representative agent yields weaker conditions on the choice probabilities than compatibility with individual utility maximisation only if the latter is not forced to either be translationally invariant or representative of a process of maximisation between I alternatives. Summarizing, in this section about the integrability of demand functions we have analysed random utility maximisation under the perspective of demand functions, found out that D&Z guarantees the negative semi-definiteness of the Slutsky matrix for the general case, and disserted about the weaker or equal character of the conditions required to assure compatibility with utility maximisation for a representative agent compared to the ones required by each individual, depending on the number of conditions able to be imposed over the latter.

Page 25: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

5 EXISTENCE OF SYMMETRY IN PREFERENCE MODELS As we have introduced in the previous section, some studies analysing the compatibility of models with random utility maximisation (RUM) such as Daly & Zachary (1976), McFadden (1981), or more recently, Börsch-Supan (1990) and Koning & Ridder (1994, 2003) imposes the necessity and sufficiency of having symmetrical probability systems (along with other conditions) to guarantee that such systems are compatible with RUM. We have stated, however, through Theorem 1 and corollary 1 in section 2, that this symmetry condition is non-central to prove consistency with RUM, since it follows automatically from the definition of this consistency and the existence of a proper distribution for the random terms. The issue of symmetrical probability systems is governed by the relation existent between these probabilities and the random terms of the utilities. Two approaches have been taken in the literature to define this relation, first, the one contained mainly in Daly & Zachary (1976) based on considering I different joint distributions of −I 1 random terms differences. This is the approach followed in Theorem 1 and its corollary in this paper, and the one employed to prove symmetrical probability systems above. The other approach, mainly introduced in the RUM implementation of McFadden (1978), consists of considering only one distribution of the I random terms. In the following sub-sections we prove how symmetry holds irrespective of the approach considered.

5.1 Symmetrical probability systems based on the distribution of random terms differences

We make use in this approach of the definition of RUM, which states that a probability system represents a RUM choice process if it can be related to the utilities of the alternatives as ( )ε ε= − + ≥ − + = =Kj j j k kP v v k I jPr , 1, , , 1, ,K I . One of the objectives of this paper have been to derive the minimum set of conditions to be imposed on a probability system ( ) =K Kj IP v v j I1, , , 1, , so as to assure that it is consistent with RUM (conditions S-1, S-2 and S-3 in corollary 1). We shall prove now that symmetry is implicitly included within these conditions. We start by easily proving it for a binary choice case:

( ) ( ) ( )

( ) ( ) ( )η

η

η ηη

η η

η ηη

η η

−∞= −

−= −

∂ ∂ ∂∂= =

∂ ∂ ∂ ∂

∂ ∂ ∂∂= =

∂ ∂ ∂ ∂

v v

v v

v vv v

P v v P Pd

v v

P v v P Pd

v v

2 1

2 2 1

2 12 2 1

1 1 2 1 2 1 22

2 2 2 2

2 1 2 1 2 1 22

1 1 2 2

, 0, 0,

, 0, 0,

We continue moving to the three alternatives case, where for any two different indexes in 1,2,3 we have that:

Page 26: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

( )

( ) ( )

v v

v v

v vd d

v v v vv v

PPd d

v v

P Pd d

P

12 3 3

3 1

13 2 3

13

3 1 2 11 1 12 2 3 3 2

2 1 11 2 3 1 13

2 31 12 2 2 3

2 1 1 2 11 2 3 1 2 2 31

31 1 12 3 2

2

0, ,

0, , 0, ,

η

η ϖη ϖ

η η η ϖ

η ηη η

η η

η η η ϖ v vη ϖ

η η η ϖ

∞ − +

− −∞

= − +=∞ ∞

− −

= − + =

∂∂ ∂ = ∂ ∂ ∂ ∂

∂ ∂ − + = = ∂ ∂ ∂ ∂

∫ ∫

∫ ∫

( )

( ) ( )

v v

v v

v v

v v v vv v

Pd d

v v

P Pd d

P

13 2 2

2 1

13 2 3

2 1 2 11 1 13 3 2 2 2

2 1 11 2 3 1 1

3 21 13 3 2 3

2 1 1 2 11 2 3 1 2 2 31 1

2 21 1 12 3 2

2 11 2

0, ,

0, , 0, ,

0, ,

η

η ϖ

η η ϖ η

η ηη η

η η

η η η ϖ v vη η

η η η ϖ

η

∞ − +

− −∞

= − +∞ ∞

− −

= − + =

∂∂ = ∂ ∂ ∂ ∂ ∂ ∂ = = ∂ ∂ ∂ ∂

∂=

∫ ∫

∫ ∫

( )

− +

v v

v vd

2 112

2 312

η ϖ

ϖϖ

η ϖ∞

=

− + ∂ ∂

Finally, the proof for the general case would follow a similar procedure which will arrive to j k kP v P v∂ ∂ = ∂ ∂ j by means of expressing the probabilities in terms of a third probability, P :

j kP P,

i

( )( ) ( )( )( )( )

( )( ) ( )( )( )( )

i i ij j I

jij I

i i ik k I

kik I

I iw i ii i i

j j I ji iw

I iw i ii i

k k I ki iw

PP d d

PP d d

1

1

1

1

1

1

1

1

,0,0 ,

,0,0 ,

η

η

η η η η

η η η η

− ∞ + − +

−∞

− ∞ + − +

−∞

∂ = ≠ ∂ ∂ = ≠ ∂

∫ ∫

∫ ∫

w ι

ι

w ι

ι

ηw

η

ηw

η

L L

L L

i i

i i i

d d i j

d d i k

Notice that the proof of symmetrical probability systems that we have done is based on the definition of the probabilities based on the joint distributions of random terms differences, that is, to distributions that, for instance, allow one of the random terms to have a discrete distribution, as far as the other I are absolutely continuous. To this regard, the proof of symmetry contained in Koning & Ridder (2003), aside from not being completely exact (as we show in section 5.2), is not as general as the one presented above, since they need the distribution F to be absolutely continuous for any (

1−

( )ε ) I∈K ¡I1, ,ε ε . Moreover, we maintain that this symmetry condition should not be considered at the same level as conditions S-1, S-2 and S-3 of the corollary of the Theorem contained in this paper, since these conditions imply the existence of symmetry, but not vice versa.

5.2 Symmetrical probability systems based on the distribution of random terms

This second approach to RUM models, followed by references such as McFadden (1978) or Koning & Ridder (2003), states that the probabilities are related to the joint distribution of the random terms in the following way12:

Page 27: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

( ) ( )i i i I i ii i

i

F v v v vP d1 , , ,ε ε ε

εε

−∞

∂ − + − +=

∂∫vK K

Since the previous integral would trivially yield a unitary value as solution, we start considering that they might be referring instead to the following relation:

( ) ( )

( )

ε ε ε ε

κ

ε εε ε ε ε

ε ε

ε ε ε εε

ε

− +∞ − + − + − + − +

− +−∞ −∞ −∞ −∞ −∞

−∞

∂=

∂ ∂

∂=

∫ ∫ ∫ ∫ ∫

KL L L

L

K K K

i i i i i i i i I i i

i

Iv v v v v v v v I

i i i iI

i j Ii

i

FP d

Fd

1 1 1 11 1 1

1

1

, ,

, , , , , ,

v εL Id d d d

with the region where the derivative of the last term is evaluated denoted as : κ i

κ ε ε≡ = − + = ≠Ki j j i iv v j I j i; 1, , ; This initial approximation to the joint distribution of the I random terms of the utilities is enough to observe that the proof of symmetry included in Koning & Ridder (2003) is not completely exact, since they state that:

( ) ( )

( )

( )

j j i i

j i

v vd d

j j j i j j I j jjj

i i j

i i j i i i I i ii

i i

i

j

F v v v v v vPd

v

F v v v v v vd

Pv

21

21

, , , , ,

, , , , ,

ε εε εε ε ε ε

εε ε

ε ε ε εε

ε ε

− + =− +=

−∞

−∞

∂ − + − + − +∂= =

∂ ∂ ∂

∂ − + − + − +=

∂ ∂

∂=

v

v

K K K

K K K

Notice that in the first integral the variable iε only appears as a variable respect to which we take the derivative, and therefore, we could only give proof of the degenerate case ( ) ( )j i iP v P0 jv∂ ∂ = = ∂v v∂ .

However, considering the previous probability expressions that avoid a trivial unitary value, it can be briefly shown why symmetry holds when this probability system is consistent with RUM, that is, when there exists a proper distribution F such as:

( ) [ ] ( )

( ) [ ] ( )

ε ε ε

ε ε ε

ε εε ε ε ε ε

ε ε

ε ε

ε

ε ε ε ε εε ε

∞ − + − + − +

−∞ −∞ −∞ −∞

∞ − + − + − +

−∞ −∞ −∞ −∞

∂ = ∂ ∂

∂ = ∂ ∂

∫ ∫ ∫ ∫

∫ ∫ ∫ ∫

KL L L L

L

KL L L L

L

i i I i i j i i

j j I j j i j j

Iv v v v v v I

i i i j II

Iv v v v v v I

j j i jI

FP i j d d d d

FP i j d d d d

1

1

11

1

11

1

, ,,

, ,,

v

v ε

j

I i

d d

d d

The proof uses the Leibniz rule and the change of variables ε ε− + = − +j j iv v i :

Page 28: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

( ) [ ] [ ] ( ) [ ] [ ]

( )

( ) [ ] [ ] ( )

ε ε

ε ε

κ

ε

ε ε

ε ε εε ε ε

ε ε ε

ε ε εε

ε ε

ε ε εε ε ε

∞ − + − +

−∞ −∞ −∞= − +

−∞

− − +

−∞ −∞= − +

∂ ∂ = ∂ ∂ ∂ ∂

∂=

∂ ∂

∂ ∂ = ∂ ∂ ∂ ∂

∫ ∫ ∫

K KL L L L L L

L L

K K

K KL L L

L L

1

1

11

1

21

1

1

, , ,

, , ,

, , ,

i i I i i

j j i i

i

I j j

i i j j

Iv v v vi i I

i Ij i I v v

i Ii

i j

Iv v vj i I

i i I v v

P Fi j d d i j d

v

Fd

P Fi j

v

v

v [ ] [ ]

[ ] [ ] ( ) [ ] [ ]

( ) ( )

ε

ε ε

ε ε

κ

ε ε ε

ε ε εε ε ε

ε ε ε

ε ε εε

ε ε

∞ +

−∞

∞ − + − +

−∞ −∞ −∞= − +

−∞

∂ = ∂ ∂ ∂

∂ ∂= =

∂ ∂ ∂

∫ ∫

∫ ∫ ∫

L L L

K KL L L L L L

L L

K K

1

1

11

1

21

, , ,

, , ,

j j

i i I i i

j j i i

i

v

j I

Iv v v v i I

i Ii I v v

i I ii

i j j

d d i j d

Fi j d d i j d

F Pd

vv

It relies implicitly upon the non-defectiveness of the joint distribution of the random terms ( ( )ε ε ε→−∞ =K

k IF 1, , 0lim ) and its absolute continuity (to allow its complete differentiability), being both guaranteed by the mentioned consistency of the probability system with RUM. Thus we have come to proof symmetry under this second approach of considering only one joint distribution of I random terms (function F). Notice that symmetry would also hold even if the distribution function F were non-defective and not properly specified (for instance with negative densities associated to some values of the random terms), which is another sign that consistency with RUM implies symmetry, but not vice versa. Summarizing this session, we have concluded that a probability system that meets the minimum set of conditions guaranteeing compliance with RUM (conditions S-1, S-2 and S-3 in corollary 1) is symmetrical, and therefore, making not informative to check if a probability system is symmetrical to conclude about its correct rooting in RUM theory. In proving so, we have employed two different approaches and have pointed out at some unclear developments in the literature.

6 GENERALIZED EXTREME VALUE MODELS COMPLIANCE WITH RUM

In previous sections we have compared the widespread set of necessary and sufficient conditions imposed by Daly & Zachary (1976) on a probability system to guarantee its compatibility with RUM with the minimum set of necessary and sufficient conditions that serves this same purpose and contained in corollary 1. This direct analysis of the probabilities is one of the implementations of the basic RUM of Block & Marschak (1960), once the dissection of utilities in random and systematic parts is made (U vi i iε= − + ). We turn our attention now to the other one13, the implementation focused on the joint distribution of the random terms leading to the probabilities, whose widespread formulation is owed to McFadden (1978). This work starts assuming a multivariate14 extreme value distribution for the random terms of the utilities:

Page 29: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

( ) ( )( )ε εε ε − −= − ∀ ∈K K11, , exp , , ,I I

IF G e e ε ¡

and from there up to four conditions are imposed over this distribution so as to guarantee that they can yield a probability system compatible with RUM:

GEV-1: ( )− ≥ ∀ ∈ ¡0 , IG e ε ε

GEV–2: ( ) ( )µα α α− −= ∀ ≥ ∀ ¡, 0 , IG e G e ε ε ε

GEV–3: ( )ε−

→−∞ = ∞ = ∀ ∈K ¡lim , 1, , ,j

IG e j I ε ε

GEV–4: ( ) ( )( ) ( ) ε ε

−−

− −

∂− ≥ ∀ ⊆ ≠ ≠

∂ ∂K K K

Li ik

kk I

k k

G e i i I i i

e e1

11 11 0 , , , 1, , ,

ε

ε∀ ∈ ¡,

The purpose of these conditions is double, to assure that F is a proper distribution function (consistency with RUM) and that there is a close-form expression for the probability with which each alternative will be chosen, that is, for the probability that its utility is the highest amongst the available ones (RUM definition). This close-form expression depends on the previous function G and defines the widely spread family of GEV models:

( ) ( ) ( )( )

i I

I

v v v

i v v

e G e e eP

G e e

1

1

, ,

, ,µ

− − − −

− −

∂ ∂=v

K

K

i

i I1, ,v

, = K

6.1 Relaxation of the GEV conditions while keeping consistency with RUM. Theorem 2.

The previous four GEV conditions are relaxed in Ibáñez & Batley (2005), where it is proven their non-necessary but only sufficient character when trying to ensure the compatibility with RUM of a model whose random terms follow a multivariate extreme value distribution. Next we present less binding necessary and sufficient conditions applicable to these general extreme value models. Before doing so, though, and since the GEV approach does not focus on random terms differences but in the joint distribution of all of them, we need to slightly adjust the implications of Theorem 1. Whilst in Theorem 1 we have included the minimum set of necessary and sufficient conditions to be imposed over distributions of random terms differences to guarantee compatibility with RUM, the following theorem will do the same but when considering only one joint distribution of I random terms, as the GEV approach does: Theorem 2: The following conditions L-1 and L-2, imposed over an absolutely continuous real valued function F , are necessary and sufficient to guarantee the global compatibility of a set of I probabilities,

→¡: I ¡( ) = K, 1, ,jP jv I , with random

utility maximisation when the utilities of the I available alternatives are defined as ε= − + = , j j j j K1, ,IU v :

Page 30: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

L-1: ( )∂ ∂ ≥ ∀ ∈ ¡0 , I IF ε ε ε

L-2: ( )ε →−∞ = = Klim 0 , 1, , j

F jε I , ( )→∞ =lim 1IFε ι ε

Where the variables involved are ( ) ( ) ( )ε ε= = =K K K1 1, , , , , , 1, ,1 II I Iv vv ι ∈ ¡ε and

when the probabilities originate from the real valued function F as follows:

( ) [ ] ( ) [ ]ε εε ε ε

∞ − + − +

−∞ −∞ −∞

∂= =

∂∫ ∫ ∫L L L L K1

1 , 1, ,j j I j jI

v v v v

j j

FP j d d j d

εv

ε I j I

Proof: Given the absolute continuity of the function F, conditions L-1 and L-2 are necessary and sufficient to guarantee that it is a distribution function (Rohatgi, 1976). From here, and just following the definition of the probabilities contained in the theorem and the definition of global compatibility with RUM, we can conclude that conditions L-1 and L-2 are necessary and sufficient to ensure the global compatibility with RUM of the probability system defined in the theorem. Comparisons of Theorems 1 and 2: Notice that if the joint distribution of all the random terms differences is proper then we can guarantee that the distribution of all random terms differences is also proper, though not vice versa, that is, if ( )F ε is proper, then the I different distributions are also proper. Therefore, if Theorem 2 holds then Theorem 1 holds as well, though the latter, which allows the existence of at least one utility with a discrete distribution (converted into continuous once differences are taken), does not imply the former.

( )ε −− = K1 , 1, ,ii i IF jε ι I

This greater generality of Theorem 1 can be easily shown by assessing the following two relations, which contain the relation between both theorems ( is non-defective): ( )F ε

( ) ( ) ( )εε−

∞ +

−∞ −∞

∂= = =

∂∫ ∫ K1

1( ),0 , 1, ,

jj I

I

Ij

j j j j

FP P d j

w ι

ι

εv w

εI

( ) ( ) ( )ε

ε−

−−∞

−∞= +

∂∂ ∂ = = ∂∂ ∂

∫ K1

11( ),0

, 1, ,j j

j I

I jI Ij jj

jj j

PP Fd j

ε w ι

wv εεv w

= I

Where the variables are related to the utilities of the alternatives as follows.

ε

−−

−−

= − ∈

= + ∈

¡

¡

¡

11

11

j j Ij I

j j Ij I

vw v ι

ε w ιε

Observe now that if Theorem 2 holds (conditions L-1 and L-2 hold) then Theorem 1 holds through its corollary 1 (since conditions S-1, S-2 and S-3), and notice as well that the reverse cannot be assured. Therefore, since Theorem 2 leads to Theorem 1 but not vice versa, the implementation of RUM based on Theorem 1 and its corollary 1 (which

Page 31: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

contains a reduction of the number of conditions in Daly & Zachary, 1976) cannot be less general than the implementation of RUM derived from the application of Theorem 2 (which contains as a particular case a reduction and relaxation of the number of conditions in McFadden, 1978). Then, two are the arguments to support the greater generality of Daly & Zachary (1976) respect to McFadden (1978): the comparison between theorems carried out just above and the fact that the random terms have to follow an specific type of distribution (multivariate extreme value type) under the work of McFadden (1978). Necessary and sufficient conditions for a multivariate extreme value model to be consistent with RUM: At this point, using Theorem 2 as theoretical framework, we can state that the following are the least binding set of necessary and sufficient conditions to guarantee the compatibility with RUM of a model whose random terms follow a multivariate extreme value distribution ( ) ( )( )−= − ∀ ∈ ¡ IF G eexp ,εε ε :

GEV–2n: ( )G 0=0

GEV–3: ( )ε−

→−∞ = ∞ = ∀ ∈K ¡lim , 1, , ,j

IG e j I ε ε

GEV–4n: ( )( )−∂ −

≥ ∀ ∈∂

¡I

IG e

exp

0 ,ε

εε

Conditions GEV2n and GEV3 guarantees the non-defectiveness of F (condition L-1 in Theorem 2) and condition GEV4n guarantees its non-negativity (condition L-2 in Theorem 2). The three conditions (GEV4n, GEV3 and GEV2n) are necessary and sufficient to guarantee that the multivariate extreme value distribution F accommodates a probability system compatible with RUM. If we replace condition GEV-4n for the most binding GEV-4 and/or condition GEV-2n for the most binding GEV-2, we will have sufficient conditions instead, since GEV-2n is guaranteed by the more restrictive consideration of homogeneity through GEV-2 ( ( ) ( ) ( )e G e0 0µ− −G G 0= =x x0 = ) and GEV-4n is guaranteed by the most restrictive GEV-4. The last statement is detailed next for a binary case with alternatives j,k:

GEV-4n: ( )( )

( )( )ε ε

− −

− −

∂ ∂≥ ≥ ∀

∂ ∂¡

i k

IG e G e

e e

0 , 0 ,ε ε

ε

( )

( )

( )( )

( )( ) ( )ε ε ε ε

− − −

− − − −

∂ ∂ ∂− ≥

∂ ∂ ∂ ∂¡

k i k i

IG e G e G e

e e e e

2

0 ,ε ε ε

ε∀ ∈

GEV-4: ( )( )

( )( )ε ε

− −

− −

∂ ∂≥ ≥ ∀

∂ ∂¡

i k

IG e G e

e e

0 , 0 ,ε ε

ε∈

Page 32: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

( )

( ) ( )ε ε

− −

∂≤ ∀ ∈

∂ ∂¡

k i

IG e

e e

2

0 ,ε

ε

Notice however that since in this case we are considering the random terms to follow the distribution defined by G in the entire real domain, i.e. ( ) ( )( )−= − ∀ ∈ ¡ IF G eexp ,εε ε , and given that non-defectiveness holds by

GEV-3, we have finally that GEV-4 and GEV-4n represent the same restriction. Homogeneous multivariate extreme value models (GEV models): Imposing homogeneity over the G function is related to achieve closed form probability expressions and not to the correct specification of the random terms. Regarding the derivation of this close form for the probabilities, we think that there are two results playing a central role in it: first, the consideration of what we think to be the correct relation between the derivatives of the distribution function and the probabilities (already discussed in section 5.2) and, second, a change of variables that allow the use of the homogeneity of degree 1µ − of the first derivative of the G function, describing the variables iε as the sum of two variables (Ui vi+ ), that is:

( )( )

( )( )

( ) ( )( )

( )( ) ( )( )

ε ε

ε

ε ε

µ

µ ε

− − − − − − −

− − −

= − + =

− − −− −

=

− −− − − +

∂ ∂ = ∂ ∂

∂ = ∂

∂=

v U U U v U

v U

v v U U

v U U vU

v

U U

v vv

v

G e e G e e e e e e

e e e

G e e e ee

e

G e ee

e

1 2 1 1 2 2 2 2

2 2 2

2 2 1 1 2 1

1 2 1 2

2

2

2 1

1 2

1 1

2

1

1

, ,

,

,

Notice that the use of the homogeneity of degree 1µ − is feasible due to the fact that in the derivative we are doing a change of variables of the same nature ( i iU viε = + ), which is useful and valid for our purposes of interpreting the previous derivative. Therefore, we have that ( ) ( )v Ue e2 2 0− −∂ ∂ = and the homogeneity of degree 1µ − guaranteed. Notice that in our initial description of the random utility model we considered U vi i iε= − + , with v and i iε independent (at least translationally) and ε,i iU random, which is not against the specific relations employed in the previous derivative calculation. The consideration of homogeneity as a sufficient but non-necessary condition to guarantee that a model is compatible with RUM, an homogeneity that is only used to generate useful close-form probabilities, has allowed us to present the theoretical framework under which non-homogeneous multivariate extreme value models are consistent with RUM. Finally, and as a summary of the previous sections, we include in Figure 2 a graphical representation of the different sets of conditions imposed over

Page 33: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

probability systems so as to ensure that they are correctly rooted in RUM theory.

Figure 2. Comparison of sets of conditions that ensure compatibility with RUM

Several conclusions arise from this comparison: The presentations of RUM models in Daly & Zachary (1976) (D&Z) and

McFadden (1978) (GEV) are two different implementations in practice of the basic RUM model of Block & Marschak (1960) (B&M). They both assume a dissection of utilities and a proper distribution for the random terms, though they differ in the theoretical background chosen to argue this propriety. It follows then, somewhat inevitably, that B&M is in this sense more general than either of D&Z or GEV.

The GEV models family is a more restrictive presentation of RUM than D&Z, even when both are relaxed to consider only the minimum set of necessary and sufficient conditions to guarantee consistency with RUM, that is, when considering, respectively, Theorem 1 and 2.

7 CONSISTENCY WITH RUM OF NESTED LOGIT (NL) MODELS Once we have studied in previous sections the theoretical background behind proving consistency with random utility maximisation (RUM), we apply the results of Theorem 1 (which we have identified to be the most general) to analyse such consistency particularised for a nested logit model (NL). This model, first introduced by Ben-Akiva (1973) and Williams (1977), came to overcome the principal limitations intrinsic to the widespread simple or multinomial logit model (McFadden, 1973), mainly the possibility to consider non-proportional substitution patterns between some of the alternatives

Page 34: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

available to individuals. This is done by introducing correlations between sub-sets of alternatives, defining a nested choice structure that aim to reproduce better observed behaviour. This implies the definition of a ratio of parameters, denoted as dissimilarity parameters or inclusive values, which explicitly appear in the probability expressions and that characterise the mentioned correlation. We consider two-level and three-level NL, then allowing for correlation between alternatives sharing the same branch and or/trunk. The aim of our presentation is to show that these dissimilarity parameters can effectively be beyond the unit interval while keeping consistency with RUM, which is a more general result than the generally accepted in the literature consisting in restricting these parameters to be within a unit interval.

7.1 Two-level NL consistency with RUM

This model considers that the I alternatives available to an individual in a choice situation are grouped in J sub-sets or branches, within each of which a correlation between alternatives is permitted, and with a complete independence between utilities of alternatives belonging to different branches. This modelling of the choice process is presented in Figure 3, where we refer with to the k-th alternative in branch j. k j,

Branchesj

k | j

j=1,…,J

k=1,…,K|jAlternatives

Figure 3. Notation for a two-level nested logit model

The probability system describing this two-level NL, considering the utilities to be dissected in a random and systematic part, ε= − + ∀k j k j k j j, , , , ,U v and a

RUM consistent behaviour from individuals,

k

( )= ≥ ∀k j l mU l m, , , , ,k j PrP U , were first introduced by Williams (1977) and are the following15:

( )( )

µ λλλ

µ λλ λ

−−

=

− −= = =

= ⋅ = =∑

∑ ∑ ∑K K

jj l j

j k j

mj l jm l m

K j vv

lk j K j v J i K m v

l m l

eeP k e e

,,

,,

|

1, | |

1 1 1

, 1, , , 1, ,K j j J

The ratios µ λ = Kj j, 1, ,J are the inclusive values or dissimilarity parameters measuring the level of correlation between alternatives belonging to the same branch. In previous sections we have argued the greater generality of Theorem 1 (corollary 1) in guaranteeing compatibility with RUM. Therefore, and given that the two-level NL probabilities is absolutely continuous, we would only have to

Page 35: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

assure that conditions S-1, S-2 and S-3 hold to conclude the correct rooting in RUM of such model. It is easy to check that the last two conditions (non-defectiveness and translational invariance) hold, whilst S-1 (non-negativity) implies the following calculations:

( )I k jk j k j k j I

k j

P ,

1 ( , ), ( , ) ( , ) 1

( , )

,00

−−

∂≥ ∀ ∈

ηη

η¡

Considering two alternatives per branch this restriction translates into assuring the non-negativity of the derivative of the probability of choosing alternative

respect to the difference of random terms k j( , ) ( ),, ,k j

l j l j k j,η ε ε= − , with ( )l j, being another alternative in the same branch j, that is:

( )( ) ( )( )

( , )1, ( , ) ( , ) 1

,,,

,01 0

k jk j k j k j I

k j j jl jk jl j

PP P P , , k lµ µ λ

η− −

∂ = − − ≥ ∀ ∈ ∂

ηη ¡ ≠

Where all the probabilities are evaluated in the vector representing the random terms differences taking alternative (k,j,i) as reference ( ) , and where

k j ik j i

( , , )( , , ),0η

k jP is the probability of choosing alternative k conditioned on its belonging to branch j and is the marginal probability of choosing branch j (

jP

=k j jk jP P P, ). Thus it has to hold for every branch j:

( )( , ) 1

( , )( , )

1 ,1 ,0

k j Ij k j

j k jPµ λ −≤ ∀ ∈

−η

η¡ ⇒ 0 1jµ λ≤ ≤

One might think that the restrictions imposed by S-1 could be relaxed by not considering the entire real ( )1I − -hyperspace to be the domain of variation of the differences of random terms. However, random terms can take any real value by the own definition of random utility models (see, for instance, the basic RUM of Block & Marschak, 1960). Therefore, the non-negativity needs to be assured guaranteeing this complete real variation of the I random terms involved, which in turn implies for a two-level NL that the inclusive values must lie within the unit interval to guarantee compatibility with utility maximisation. When considering a higher number of alternatives per branch we need to check higher order derivatives, obtaining more binding conditions on the inclusive values, but collapsing into the restriction 0 j 1µ λ≤ ≤ once the entire real domain is taken into account.

7.1.1 Two-level NL local consistency with RUM (Herriges & Kling, 1996) The previous result uses corollary 1 as theoretical background to restrict the inclusive values in the unit interval. In doing so we implicitly consider that we want the two-level NL probability expressions derived by Williams (1977) to represent the choice processes for all the possible values that the systematic parts of the utilities can take, that is for any real I vector v. If we only allow some values are for the systematic part of the utilities, that is, if we consider , we would not be interested in calculating the I∈ ⊂v Θ ¡

Page 36: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

probabilities in Θ , and therefore, the non-negativity condition imposed by means of S-1 only needs to hold in Θ . This is the issue of local consistency of RUM, addressed in section 3, which deals with modifying the distribution of random terms in Θ while so as to keep the same probability expressions in Θ and while easing the conditions imposed over such distribution.

Θ

The two main studies that have addressed this local consistency with RUM have been Börsch-Supan (1990) and Koning & Ridder (2003). We have proved in this paper that the discrete distributions in the boundaries of Θ used by the former leads to not being able to reproduce the same probabilities in

, and regarding the latter, that there is a lack of identification between the distribution of random terms and the probability systems, with different distributions in

Θ

leading to the same probability systems in Θ .

Moreover, none of these approaches relate the relaxation they propose to the distribution of the random terms, that is, we do not know which consequences on the distributions have an increase or decrease of the inclusive values within the permitted interval. Moreover again, for the cases where the inclusive values are less than one, we cannot keep stating that the correlation between any two alternatives belonging to branch j is ( )j

21 µ λ− .

Moreover again, the forecasting power of RUM models is somewhat manipulated via the previous two approaches, since this stage would imply generally to predict probabilities (market shares), elasticities or marginal rates of substation calculated in points that might not have been considered when deriving the intervals for the inclusive values, points where the distribution of random terms is inconsistent (B&S) or unidentified (K&R), thus not allowing to complement the descriptive power of RUM models with their predictive one. In any case, the study relating higher than one inclusive values in a nested logit model is Herriges & Kling (1996), which is based in the work of Börsch-Supan (1990), which we have proved inconsistent.

7.2 Three-level NL consistency with RUM

The previous results are extensible to nested logit models based on three-level structures such as the one shown in Figure 4. In this case, in addition to group alternatives in branches, the model also groups branches in trunks, in such a way that only alternatives belonging to different trunks are considered independent. We refer with k j to the k-th alternative of the j-th branch in trunk i.

i, ,

Page 37: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

Branches

Trunksi

j| i

k |j, i

i= 1,…,R

j=1,… ,J|i

k=1,… ,K|j, iAlternatives

Figure 4. Notation for a three-level nested logit model

The probability system associated to this model, once again making use of the dissection of utilities (U vk j i k j i k j i, , , , , ,ε= − + ) and an expected RUM consistent

behaviour in the individuals ( ( )= ≥ ∀j i l m nU l m, , , , , ,k j i kP U, , ,Pr n ) is the following16:

( )( )

( )( )

ii m ii l mj i m il j ij i

k j ij i

l j i i nj i m il m i nm i m nl m nm n

vJ i K m ivK j iv m ll

k j i vK j i vJ i K m i vR J n K m nl m l n m l

eeePe e e

, ,, ,

, ,

, ,, ,

, ,

| ,| ,1 11

, , | , | , | ,1 1 1 1 1 1

ψ µµ λµ λ λλλ

i

λ µ λ ψ µλ µλ

−−− = ==

− − −= = = = = =

= ⋅ ⋅

∑ ∑∑∑ ∑ ∑ ∑ ∑ ∑

λ

The ratios µ λ = Ki j i j J i, 1, , , associated to trunk i, are the inclusive values or dissimilarity parameters measuring the level of correlation between the alternatives grouped under the same branch in this trunk i. In this three-level case we have also the ratios ψ µ = Ki i, 1, ,R , which are the inclusive values or dissimilarity parameters measuring the level of proximity between branches located within the same trunk. Following the same arguments shown previously for the two-level NL, we can check that conditions S-2 and S-3 are met by the absolutely continuous probability system presented above, so we only need to check if meeting with condition S-1 impose extra conditions on the inclusive values of this system, thus finally guaranteeing that the probabilities properly reproduce a RUM choice process. We know that all mixed partial derivatives have to be non-negative, so if we take the one involving two alternatives within the same trunk i but in different branches (j and m), we have that:

( )( ) ( )( )k j i

k j i k j i k j i Ik j i i il m ik j i

l m i

PP P P ,

( , , )1, , ( , , ) ( , , ) 1

, , ,, ,, ,

,01 0ψ ψ µ

η− −

∂ = − − ≥ ∀ ∂

ηη ¡∈

Where all the probabilities are calculated on ( )k j ik j i

( , , )( , , ),0η . In terms of the

inclusive values this implies that for every trunk i we have that:

( )k j i I

i k j ii k j iP

( , , ) 1( , , )

( , , )

1 ,1 ,0

ψ µ −≤ ∀−

ηη

¡∈ ⇒ i0 1ψ µ≤ ≤

Considering now the derivative involving two alternatives within the same trunk i and the same branch j, we have that:

Page 38: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

( )( ) ( ) ( )k j i

k j i k j i k j i Ik j i i i il j i j i j i j ik j i

l j i

PP P P P P ,

( , , ), , ( , , ) ( , , ) 1

, , ,, ,, ,

,0ψ µ λ ψ µ

η−

∂ = − − − − ∀ ∂

ηη ¡∈

Which in terms of the inclusive values, having into account the previous result ( i0 1ψ µ≤ ≤ ), implies that:

( ) ( )( )k j i I

i j i k j i k j ik j i i k j i ij iP P

( , , ) 1

( , , ) ( , , )( , , ) ( , , )

1 ,1 ,0 1 1 ,0

µ λψ µ

−≤ ∀ − ⋅ − − ⋅

ηη η

¡∈ ⇒ i j i0 1µ λ≤ ≤

Thus we have come to show the necessity of all the dissimilarity parameters lying within unit intervals to guarantee compatibility of the three-level NL with RUM, at least when we consider that the systematic part of the utilities can take any real value ( ). I∈ ≡v Θ ¡

Similarly to the previous two-level NL, we have that the other mixed partial derivatives that have to be also non-negative to meet condition S-1 do not restrict further the unitary interval allowed for the inclusive values, which can be easily checked from straightforward, though tedious, derivation

7.2.1 Three level NL local consistency with RUM (Gil-Moltó & Risa, 2004)

The work of Gil-Moltó & Risa (2004) is the first one in addressing the potential feasibility of greater-than-one values for the dissimilarity parameters, understanding feasibility in terms of the consistency with RUM. They argue about the unnecessarily binding conditions imposed by assuring that the derivatives of the probabilities respect to the systematic parts are non-negative for every real value that these parts can take. They directly translate the arguments in Börsch-Supan (1990) and proceed to present compact results associated to the tedious derivation of the probabilities for this three-level NL. We have proved in section 3.1, however, that this work of Börsch-Supan (1990) does not guarantee the rooting of a model in RUM when the derivatives of the probabilities are ensured to be non-negative when evaluated in the values that the systematic parts of the utilities can take. At the most, we could use the arguments in Koning & Ridder (2003) on local compatibility with RUM to give some theoretical background to this relaxation of the inclusive values, though even then, and as introduced in section 3.2, we would have many distributions of random terms leading to the same probability system, experiencing problems when using model to forecast future scenarios. Moreover, we would have the case that the introduction of only an extra observation, no matter how many observations we have, could change the feasibility character of the inclusive values estimated (for the wide spread case of considering that the inclusive values are the same across individuals, as opposed to the covariance heterogeneity case first developed in Bhat, 1997).

Page 39: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

Therefore we conclude the that the relaxations carried out in Gil-Moltó & Risa (2004) are inexact and that in the most favourable scenario they have associated some drawbacks regarding the distribution of the utilities in the entire real domain (the random terms in a NL vary in the entire real domain).

7.3 Dissimilarity parameters for NL models under GEV approach

The nested logit model can also be derived under the GEV presentation of RUM contained in McFadden (1978), which we have before proved to be a particularisation of the more general presentation of Daly & Zachary (1976), and specifically, more general than the reduction of the latter contained in corollary 1 of this paper. This particularisation, to which we have devoted the Theorem 2, cannot lead then to any relaxation of the unitary interval for the inclusive values previously obtained through the application of the more general Theorem 1 (corollary 1). We can easily check this by applying conditions in Theorem 2 over the joint distribution of the I random utilities available to individuals. Thus, for a two-level NL we have that the G function defining this distribution and giving place to the probability expressions in section 7.1 is:

( ) ( )( )µ λλ−

= == −∑ ∑

jJ K jj k jj k

G e v|,1 1

expv

and imposing condition GEV-4, we can easily see that it has to hold that:

µ λ≤ ≤ = Kj j J0 1 , 1, ,

The same unitary interval restriction applies for a three-level NL, since the G function leading to the probability expressions in Error! Reference source not found. is:

( ) ( )( )ψ µµ λ

λ−= = =

= −

∑ ∑ ∑i

i j iR J i K j ik j ij ii j k

G e v,| ,

, ,1 1 1expv

and just imposing condition GEV-4, we can see that it has to hold that:

ψ µ µ λ≤ ≤ ≤ ≤ = =K Ki i j i j J i i R0 1 , 0 1 , 1, , , 1, ,

With this we come to the end of section 7, where we have applied the previously developed different sets of conditions to guarantee the compatibility of a model with RUM. We have shown how the inclusive values in two-level and three-level NL models have to lie within the unit interval so as to ensure consistency with RUM for any of the systematic parts of the utilities (global compatibility) and how these inclusive values can take greater-than-one values dependent on the values of marginal and conditional probabilities in the model and only under an approach not defining univocally a relation between random terms and probabilities. We present in the next section a further generalization of these results for the case of a two-level NL model showing how any positive inclusive value leads to a probability system globally compatible with RUM. This relaxation does not

Page 40: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

invalidate the greatest generality of Theorem 1 (and its corollary) presented above, because such relaxation is based deriving the same probabilities for a two-level NL but considering two independent choice processes. The extension to three-level NL is direct and follows the same argumentation.

8 NESTED LOGIT MODELS AND SEQUENTIAL RUM Therefore, to derive the probabilities for a two-level NL we consider that individuals follow a two-step choice process when selecting their most preferred alternative among a set of I available ones, with each step associated to each level of the hierarchy. First, we consider that individuals follow a RUM process to decide which alternative in each branch they prefer (for instance, bus or train if having to use public transport, or car or car-pooling if having to use private transport), and second, they follow another RUM process to decide whether they will use public or private transport, considering here for each branch the aggregated benefit of the alternatives belonging to it. We call this two-step implementation of random utility maximisation as sequential RUM, which is in line with adopting a behavioural paradigm to model the observed choices related to elimination-by-aspects, that is, in line with defining individuals as sequential decision-makers (for instance, decide first whether to use public transport or car, followed by decide whether to use train or bus). Thus, if there exist I alternatives available to an individual grouped in J braches, with K|j alternatives in branch j, we have that direct RUM would consider that this individual make ( )−1 2I I comparisons when making a decision, whilst the implementation of sequential RUM would imply only

( ) ( ) ( )J

jK j K j J J

11 2 1 2

=− + −∑ (if we consider six alternatives and branches

of two alternatives, direct RUM implies 15 comparisons and sequential RUM only 9). Precisely, this less number of comparisons comprises the theoretical background that allows us to relax in this section the dissimilarity parameters for a NL, this and the fact that we will be able to build the same probability expressions for a NL than before, leaving intact the (maximum likelihood) estimation procedures to calculate model parameters (including the dissimilarity parameters). If we consider instead that individuals do consider at once all alternatives available to them, we would return to the implementation of RUM discussed previously in this paper, having to impose over the probability expressions the conditions in Theorem 1 to assure a proper rooting of NL in (direct) RUM, conditions that would not allow for any greater-than-one value for dissimilarity parameters if this proper rooting is going to be ensured for any values of the systematic parts of the utilities (global compatibility). Hence, in this section we build the probability expressions for a two-level nested logit (NL) model not by considering a multivariate extreme value distribution for the random terms of the utilities (McFadden, 1978), but by following a successive process first introduced by Ben-Akiva (1973) and later developed in Williams (1977) and Ben-Akiva & Lerman (1985) (BWL). We

Page 41: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

prove in what follows how this process allows for greater-than-one values for the dissimilarity parameters. In fact, we show how any positive value for the dissimilarity parameters describe a probability system rooted in RUM theory. We accomplish this relaxation, firstly, by adopting a sequential RUM as the way individual behave, secondly, by a split of the random terms in one branch-specific component and other alternative-specific component and, thirdly, by allowing a shift between them of the randomness conveyed by each one. In so doing, we develop an approach that guarantees that two-level NL models are globally compatible with sequential RUM while including BWL as a particular case. We shall start by presenting in Figure 5 the notation used in this section, slightly to the one in section 7.1 to ease the comparison with Ben-Akiva & Lerman (1985, p. 285):

J

K|j

jBranch

k|jElemental alternative

N

Dnm

mTransport mode

dDestination

Figure 5. Notation in this paper for a two-level NL (left) and in Ben-Akiva & Lerman (1985)

8.1 Utilities and necessary and sufficient assumptions to implement sequential RUM in a two-level NL

The utilities of the alternatives available to individuals are dissected in a systematic (or deterministic) part and a random one, and include an important difference respect to previous formulations of NL to allow a suitable implementation of sequential RUM, the split of the random parts of utilities in components exclusively associated to each level of the hierarchy, that is:

k j j k j jk j k jU U U v k K j j, , , 1, , , 1, ,ε ε= + = − + + = =K K J

Next we present the total number of assumptions that we will impose over the random terms involved in the utilities so as to build the two-level NL probability expressions by means of a sequential or two-steps implementation of RUM ( ( )K jI represents the identity matrix with K|j dimension):

Assumption 1: ( )( ) ( )( ) ( )K jj jj K jK j j

Gumbel I j J1 , , 0, , 1, ,ε ε λ→ = ⋅ ∈ =ζK ¡ K

Assumption 2: ( ) ( )( ) JJ J JGumbel I1 1* *, , 0,ε ε ε ε µ+ + → = ⋅ ∈ψK ¡

Assumption 3: ( ) ( ) ( )j m j l m k j l mcov , cov , cov , 0ε ε ε ε ε ε= = =

Page 42: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

j m J k K j l K1, , , 1, , , 1, ,= =K K K m≠ =

In the process to derive the probabilities here we make use only of assumptions 1 and 3, along with the following property that comes to justify why we employ Gumbel for the random terms and not other types of distributions:

Property 1: k j j jk jk K jv v j, * *1, ,

max , 1, ,ε ε=

− + → − + =K

K J

( )Gumbel 0,λ j j*ε =

( )( )K jj jk

j* ,1

1 ln exp λλ =

= − −∑ k jv v

( )cov jk j *, 0ε ε ≥

In so doing we do not make use of the following two assumptions made in the BWL approach to build two-level NL models:

Assumption 4: j k j k K j jcov( , ) 0 , 1, , , 1, ,ε ε = = =K K J

Assumption 5: j j j J*cov( , ) 0 , 1, ,ε ε = = K

Moreover, this paper mainly differs from Ben-Akiva & Lerman (1985) in that these authors use assumption 4 and 5 (the latter in an implicit way), and we shall prove next that these assumption are sufficient but non-necessary to guarantee the compliance of a two-level NL with a sequential implementation of RUM. This will be central to the relaxation of the dissimilarity parameters’’ feasible range that we propose.

8.2 Conditional and marginal probabilities from implementation of sequential RUM in a two-level NL

We enunciate now how RUM theory is applied to build a NL in a sequential way making use of the previous assumptions (H1, H2 and H3). The first step is to build the (conditional) probability of an individual choosing each of the available alternatives belonging to one of the available branches, that is:

( )( )

( )( )

k j k j l j

k j l jk j l j

H1 j k j

K jj l jl

P U U l K j

v v l K j

v k K j j

v

, ,

,

,1

Pr , 1, ,

Pr , 1, ,

exp, 1, , , 1, ,

exp

ε ε

λ

λ=

= ≥ =

= − + ≥ − + =

−= =

−=

K

K

K K J=

And the second step comprises the choice of one of the mentioned branches considering an aggregate measure of the utilities of the alternatives belonging to them:

Page 43: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

( )( )( )( )( )

j

m

H1

j l j l ml K j l K m

H1

j j j m m m

K jH2 j l jl

J K mm l mm l

P U U m J

v v m

v j J

v

, ,1, , 1, ,

* * * *

,1

,1 1

Pr max max , 1, ,

Pr , 1, ,

exp, 1, ,

exp

µ λ

µ λ

ε ε ε ε

λ

λ

= =

=

= =

= ≥ = =

= − + + ≥ − + + ==

=−

= =−

∑ ∑

K KK

K

K

J

Thus, we have that this sequential RUM implementation derives the following expression for the probability assigned by a two-level NL that an alternative is chosen by an individual:

( )( )

jj k j j l j

mm l m

K jv v

lk j j k j J K m v

m l

e eP P P k K j j

e

, ,

,

1

1,

1 1

, 1, , , 1, ,

µ λλ λ

µ λλ

−− −

=

= =

= ⋅ = = =∑

∑ ∑K K J

Notice that it is the same expression as the one derived under the GEV approach of McFadden (1978) and analysed in section 7.

8.3 Proper specification of the random terms involved in the implementation of sequential RUM in a two-level NL

The comparison of the results achieved by this sequential approach to two-level NL models with the approach based in imposing GEV conditions or Daly & Zachary (1976) (section 7) is possible once we have shown that both approaches lead to the same probability expressions. The latter is embedded within Theorem 1 of this paper, and we have seen how the following relations for the parameters of a two-level NL need to hold if we want to assure global compatibility with RUM:

j j J0 1 , 1, ,µ λ≤ ≤ = K

When we consider sequential RUM, we do not need to guarantee that the utilities of all the alternatives available to an individual (I) need to be jointly distributed in a proper way, since we are not considering that the individual makes a decision comparing in pairs all of them ( ( )−1 2I I comparisons), but a sequential process where utilities are maximised within and between branches ( ( ) ( ) ( )J

jK j K j J J

11 2 1 2

=− + −∑ comparisons). This eases us from

having to impose conditions in Theorem 1 over the joint probabilities, but only separately on conditional and marginal probabilities. It is easy to check that the random terms involved in the two steps of the sequential RUM implementation are properly specified, since they are independent sets of identical and independently distributed Gumbel variables. What we just need to ensure is that the restrictions involved in Assumptions 1, 2 and 3 are not mutually incompatible. We can guarantee this if there exist the following distributions:

Page 44: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

( )( )j jj K j j j J*1 , , , , , 1, ,ε ε ε ψ =K K

We first note that assumptions 1, 2 and 3 impose a particular structure for these distributions:

( )( ) ( )

( )

j

K j jj jj K j j

j K j j

corr

1

*1

1

1 0

0 1, , , ,

1 0

0 0 0

ρ

ρε ε ε ψ

ρ ρ

0

0

1

=

L

M O M M MLK

L

L

Which is positive semi-definite, as required, by any values of the systematic parts of the utilities (notice that ( )j j k j jk jk K j

v v Gumbel* * ,1, ,max 0, jε ε ψ λ=

= + − + = →K

),

since by the properties of the Gumbel distribution we have that:

( )j K j j2 21 1ρ ρ+ + ≤K

Second, we note that the previous correlation matrix refers to random variables following a Gumbel distribution (all the variables involved unless jε , in fact, we know that jε will be properly distributed if j *ε and jψ jointly are) and that such dependencies can be reproduced, for instance, by choosing the appropriate parameters for the asymmetrical multivariate extreme value distribution introduced in Tawn (1990). Notice that since jε and k jε define a proper correlation matrix, then any linear combination of them also define a proper one, so the correlation matrix of the utilities of all the alternatives available to individuals is also proper (semi-positive definite). Finally, we have that the expression for the correlation of any two alternatives offered to an individual is as follows ( j jIV µ λ= ):

( ) ( )( )( ) ( )( )

j k j l jk j l j

j jk j l j

IVcorr U U

IV IV

2

, , 1 2 1 22 2

1 1,

1 2 1 1 2 1

ρ ρ

ρ ρ

+ − −=

+ − + −

In terms of the parameters of the model we have not identified any restriction different of the non-negativity of the scale parameters of the Gumbel distributions defined in Assumptions 1 and 2, i.e. J1, , , 0µ λ λ ≥K , so we come to conclude that any set of positive real values for the dissimilarity parameters J1 , ,µ λ µ λ… is feasible to derive the probabilities for a two-level NL while ensuring its correct rooting in sequential random utility maximisation. This flexibility in the feasible range for the dissimilarity parameter obeys to our introduction of a negative correlation between the two random components of the utility of an alternative, which results in a negative correlation between the two variables for each branch that participate in branch choice, that is:

( )( )

k jj k jcorr

IV1 22

, 01 1

ρε ε = − ≤

+ , ( )

( )j jcorrIV

* 1 22

1, 01 1

ε ε = − ≤+

Page 45: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

This negativity indicate that there exists a shift in the randomness conveyed in the model at the upper and lower levels of the hierarchy, having that if more randomness is considered in the factors affecting the choice of a branch (process quantified by marginal probabilities), then less randomness would remain for the process of choosing between alternatives belonging to such branch (process quantified by conditional probabilities). Given the independency of random terms belonging to different branches introduced by assumption 3, this compatibility with sequential RUM of any non-negative value for the dissimilarity parameters in a two-level NL generalises directly to any number of branches, so we can finally state that if utilities are defined as:

k j j k j jk j k jU U U v k K j j, , , 1, , , 1, ,ε ε= + = − + + = =K K J

And if over their random parts we impose the previous Assumptions 1, 2 and 3, we can derive the probabilities for a two level NL model by considering a RUM choice process within each branch and a RUM choice process between branches:

( )

( )( ) j

j k j j l j

m l m

k j k j l m

k j l j

l j l ml K j l K m

l j l mk j l j l K j l K m

K jv vl

v

P U U l K m m J

U U l K j

U U m J

U U l K j U U m J

e e

e

, ,

,

,

, ,1, , 1, ,

, ,1, , 1, ,

1

1

Pr , 1, , , 1, ,

, 1, ,Pr

max max , 1, ,

Pr , 1, , Pr max max , 1, ,

µ λλ λ

λ

= =

= =

−− −

=

= ≥ = =

≥ = =

≥ =

= ≥ = ⋅ ≥ =

=∑

K K

K K

K K

K

K

K K

( ) mJ K m

m l

k K j j J

1 1

, 1, , , 1, ,µ λ

= =

= =∑ ∑

K K

With the implied property that the dissimilarity parameter are free to vary in the positive real domain. Notice once again that if the probability of choosing an alternative proceeds from pair comparisons between all available alternatives we need to apply over the random terms (dissected or not in two random components) the conditions in Theorem 1 and the dissimilarity parameters are forced to be in an unitary interval if global compatibility with RUM wants to be guaranteed.

8.4 Sequential RUM implementation in the literature for a two-level NL

As stated before, this sequential implementation of random utility maximisation to derive the probabilities governing a two-level NL model is due to original derivation of NL modelling contained in Ben-Akiva (1973), and later developed in Williams (1977) and Ben-Akiva & Lerman (1985) (BWL). These authors, however, consider two extra assumptions over the random terms, Assumptions 4 and 5 above, which have not being necessary in our derivation of the probabilities for a two-level NL by sequential RUM.

Page 46: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

We shall show next how these two extra assumptions confine the feasible range for the dissimilarity parameters to the unitary interval and how it incorporates further restrictions on the two choice processes leading to a NL that might be non-completely plausible. In Ben-Akiva & Lerman (1985, p.289) we can find the following explanation to why dissimilarity parameters cannot be greater than one:

( ) ( )( )( ) ( )( )

( )( ) ( )

( ) ( )( ) ( ) ( )( )

k j j l j jk j l j

k j j l j j

H1-H4 j

k j j

H5 j j j

k j j j j

H1-H2

j j

corr U U

IV

, ,, , 1 2 1 2

, ,

,

* *

, *

2 2

cov ,,

var var

var

var var

var var

var var var

1 1

ε ε ε ε

ε ε ε ε

ε

ε ε

ε ε ε

*ε ε ε ε

µ λ

+ +=

+ +

=+

+ −=

+ + −

= − = −

=

=

=

Since the correlation cannot be greater than one and since we already know that the dissimilarity parameters IV cannot be negative due to their relation through assumptions 1 and 2 with the scale of Gumbel-type distributions, we have that for any branch j of the hierarchy. Assumption 3 is employed to guarantee that the correlation is also correctly calculated for alternatives belonging to different branches, since:

j

jIV0 ≤ ≤ 1

( )H3

k j l mcorr U U j m, ,, 0 ,= ≠

Still considering the approach taken in Ben-Akiva & Lerman (1985), we find that the restrictions on the dissimilarity parameters coming from using Assumptions 1 to 5 should have come from analysing the correct specification of the following correlation matrix:

( )Jcorr 1, ,U UK with ( )j j K j jU U1, ,, ,= KU

Assumption 3 translates again the analysis towards the correct specification of any U , and assumptions 1, 2, 4 and 5 states that: j

( )j

j

j

IVcorr

IV

2

2

1 1

1 1

− = −

UL

M O ML

Which is positive semi-definite if 0 1 jIV 2 1≤ − ≤

jIV0 1, which for non-negative

dissimilarity parameters results in ≤ ≤ .

We think however that this analysis in BWL of the feasibility range of the dissimilarity parameters and the correlation between alternatives do not assure that individuals choose the alternative with the highest utility, since independently of the assumptions introduced, we have that random utility maximisation states that:

Page 47: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

( )k j k j l m

k j l j

l j l ml K j l K m

P U U l K j m J

U U l K j k K j j

U U m J

,

, ,1, , 1, ,

Pr , 1, , , 1, ,

, 1, ,Pr , 1, , , 1, ,

max max , 1, ,= =

= ≥ = =

≥ = = =

≥ = K K

K K

KK K

KJ=

Which is related to the direct multiplication of marginal and conditional probabilities only when the random events defining each type of probability are independent, that is, the following equality

k j jk jP P P k K j j, , 1, , , 1, ,= ⋅ = =K K J

where:

( )k j k j l j

j l j l ml K j l K m

P U U l K j k K j j

P U U m J j, ,1, , 1, ,

Pr , 1, , , 1, , , 1, ,

Pr max max , 1, , , 1, ,= =

= ≥ = = =

= ≥ = = K K

K K

K K

J

J

K

only holds if:

( )l j l m m m j j m jl K j l K mU U m J v v m J, , * * * *1, , 1, ,

max max , 1, , , 1, ,ε ε ε ε= =

≥ = ≡ + − − ≤ − = K K

K K

and

( ) ( )k j l j l j k j l j k jU U l K j v v l K, 1, , , 1, ,ε ε≥ = ≡ − ≤ − =K K j

are independent events and for any branch j, j J1, ,= K . But the use of Assumptions 4 and 5 in the BWL imposes that:

( ) ( ) ( ) ( )H H

m m j j j j j jl j k j l j k j k j l j

3 4

* * * * *cov , cov , cov , cov ,ε ε ε ε ε ε ε ε ε ε ε ε ε ε+ − − − = − − − = −

Which does not yield a null value unless vk j l jv, ,= (remind that

( )( ) ( )0,jK j

j k j j k jk j kk K jv v Gumbel

1

* , ,11, ,max ln exp

λ

jε ε λ==

= − + − − →∑Kλ ).

Precisely, the criterion for our implementation of a two-level NL through sequential RUM has been to guarantee the independence of the two previous events, which make us to assume a negative correlation between the random components specific to each level of the hierarchy ( jε and k jε ):

Assumption 3b: ( ) ( )j jk j k j k K j j*cov , cov , 0 , 1, , , 1, ,ε ε ε ε= − ≤ = =K K J

Notice that the correlation is negative due to the definition of the random variable j *ε . The introduction of this Assumption 3b (together with Assumptions 1, 2 and 3) allows us finally to state using the previous definitions that:

k j jk jP P P k K j j, , 1, , , 1, ,= ⋅ = =K K J

Notice then than Assumptions 1, 2, 3 and 3b (and not Assumptions 1 to 5 in BWL) ensure that the multiplication of the previously derived marginal and conditional probabilities associated to an specific alternative effectively

Page 48: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

corresponds with the probability that the utility of this alternative is the maximum amongst the utilities of the I alternatives available to individuals. This is not against the fact that we do not have to guarantee a proper joint distribution for the random parts of the I available alternatives (or their differences respect to one of them) since we do not consider that the process by which the individuals choose the alternative with the highest utility is due to the simultaneous comparison of all of them (direct RUM based in ( )I I pairs comparisons) but rather that it is due a two-step process where a decision for each branch and across branches is made (sequential RUM based in

1 2−

( ) ( ) ( )J

jK j K j J J

11 2 1 2

=− + −∑ pairs comparisons).

Notice again that the sequential implementation of RUM was the first insight developed in the literature into nested logit modelling (Ben-Akiva, 1973; Williams, 1977; Ben-Akiva & Lerman, 1985), though in a more restrictive way in terms of the dissimilarity parameters and in a less consistent way with one-step implementation of RUM than the approach followed in this paper.

8.5 Sequential RUM implementation for a two-level NL (Particular case of unitary values for the dissimilarity parameters)

In the case that all the dissimilarity parameters associated to the branches take a unitary value, the sequential RUM approximation to a two-level NL produces a probability system that resembles a simple logit model, since we would have that:

k j

l m

v

k j J K m vm l

eP k K j e

,

,,

1 1

, 1, , , 1, ,λ

λ

= =

= =∑ ∑

K K j J=

This case has been considered in the literature to represent the collapse of a NL into a simple logit, though we consider that this is not exactly this if the approach to build the NL is sequential RUM. The reason is that we have only to look into the assumptions over the random terms describing sequential RUM to notice that the random terms are not distributed IID Gumbel when

J1 1µ λ µ λ=…= = .

If we consider instead that the NL is derived from assuming a multivariate extreme value distribution for the random terms, that is, following the GEV approach of McFadden (1978) analysed in section 7.3, we would have that NL do collapse into a simple logit for unitary dissimilarity parameters, since the distribution for this case would be:

( ) ( )J K jk jj k

G e v|,1 1

exp λ−= =

= −∑ ∑v

Which is equivalent to the one describing a simple logit (I IID Gumbel variables). This is an indicator that the GEV and D&Z approaches to NL, included, respectively, within Theorem 2 and Theorem 1 on global compatibility with RUM in this paper differ from the one based in sequential RUM and first proposed in BWL. Another of this indicators is the fact that the correlation between the utilities of two alternatives belonging to the same

Page 49: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

branch do adopt a close expression under sequential RUM but not if using the previously mentioned multivariate extreme value distribution (McFadden, 1978, p. 539), though the analytical and numerical calculations lead to the same value ( ( )j

21 µ λ− ).

9 CONCLUSIONS In this paper we give proof that only three conditions (the ones presented in corollary 1) have to be met by a probability system to guarantee its global compatibility with random utility maximization (RUM), that is, with preferences that can be described by an absolutely continuous, proper, non-defective and, at least, translationally invariant distribution. In so doing we discard some of the conditions, added to the previous three, considered by other authors such as Daly & Zachary (1976), McFadden (1978, 1981), Börsch-Supan (1990) or Koning & Ridder (1994, 2003) in their necessary and sufficient sets of conditions to guarantee compatibility with RUM. We also give an explanation of why the approaches to RUM based on the distribution of differences of random terms (Daly & Zachary, 1976 - D&Z) has to be more general than those based on the distribution of all random terms (McFadden, 1978 - GEV), enunciating for these two approaches the Theorem 1 and Theorem 2 in this paper, both including less conditions than D&Z and GEV, respectively. We show that Theorem 2 is included within Theorem 1, and therefore, invariably, D&Z is a more general approach than GEV to guarantee compatibility with RUM. Additionally, we have addressed the issue of local compatibility of a probability system with RUM, which has received its major impulse from studies such as Börsch-Supan (1990) and Koning & Ridder (1994, 2003): on one hand, we review the main inconsistency posed by the former, due to not guaranteeing the independency of the analysis from the alternative taken as reference to calculate the random terms’ differences; and, on the other hand, regarding the latter, we point out how it does not state a univocal relation between the distribution of random terms and probabilities and how this might reduce the forecast capacity of locally compatible models. Particularly, we consider as misleading the relaxations of the feasible ranges for the dissimilarity parameters of nested logit (NL) based on the results of Börsch-Supan (1990) and carried out in Herriges & Kling (1996) and Gil-Moltó & Risa (2004), for a two-level and three-level NL respectively. Besides, we present a sequential implementation of the postulates of RUM that leads to the same probability expressions for a two-level NL and that accommodates any non-negative value for the dissimilarity parameters of such model. This relaxation is not against analysis develops further the one contained in the original formulation of NL (Ben-Akiva, 1973; Williams, 1977; Ben-Akiva & Lerman, 1985), showing how the five assumptions included in the latter might now be guaranteeing that individuals choose the alternative with the maximum perceived utility. Thus, we think that both the review carried out of the different material devoted in the literature to ensure global and local compatibility of discrete

Page 50: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

choice modelling with random utility maximisation and the comparisons of the generality of the different sets of conditions trying to implement such models, along with the particularisation to the study of nested logit models, might offer some value for practitioners when applying RUM theory, in general, or nested logit modelling, in particular, to study transport behaviour. Bibliography Batley, R. P. (2005): “RUM, the log sum, and the reprise of cardinality”. Working paper, Institute for Transport Studies, University of Leeds. Ben-Akiva, M. (1973): “Structure of Passenger Travel Demand Models”. Ph.D. dissertation. Department of Civil Engineering, MIT. Ben-Akiva, M., and S. Lerman (1985): “Discrete Choice Analysis : Theory and Application to Travel Demand”. Cambridge, Massachusetts: MIT Press. Bhat, C. (1997): "Covariance Heterogeneity in Nested Logit Models: Econometric Structure and Application to Intercity Travel," Transportation Research Part B, 31, 11-21. Block, H. D., and J., Marschak (1960): “Random orderings and stochastic theories of responses”. In Marschak, J. (1974) Economic Information, Decision and Prediction: Selected Essays (Volume 1). D. Reidel, Dordrecht. Börsch-Supan, A. (1990): "On the Compatibility of Nested Logit Models with Utility Maximization," Journal of Econometrics, 43, 373-388. Daly, A. J. (2004): “Properties of Random Utility Models of consumer choice”. Proceedings of the TraLog Conference, Molde, 25-27 August 2004. Daly, A. J. and S. Zachary (1976): “Improved multiple choice models”. In Hensher, D.A. and Dalvi, Q. (eds) Determinants of Travel Choice. Saxon House, Farnborough. Gil-Moltó, M.J. and A. R. Risa (2004): “Tests for the consistency of three-level nested logit models with utility maximization”. Economics Letters, 85, pp133-137. Hensher, D., and W. H. Greene (2002): "Specification and Estimation of Nested Logit Models," Transportation Research Part B, 36, 1-18. Herriges, J. A., and C. L. Kling (1996): "Testing the Consistency of Nested Logit Models with Utility Maximization," Economics Letters, 50, 33-39. Ibáñez, J. N. (2006): “Transport Decisions Modelling with Discrete Choice Analysis and Non-linear Experiment Design”. PhD Thesis. University of Seville Ibáñez, J. N., and R. P. Batley (2005): “Alternative Presentations of the Random Utility Model”, Proceedings of the European Transport Conference, Strasbourg, October 2005 Johnson N. L., S. Kotz, and N. Balakrishnan (1994): Continuous Univariate Distributions. Vol. 2, 2nd ed., Wiley, New York. Koning, R. H., and G. Ridder (1994): “On the compatibility of nested logit models with utility maximization”. Journal of Econometrics, 63, pp. 389-96.

Page 51: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

Koning, R. H., and G. Ridder (2003): “Discrete choice and stochastic utility maximization”. Econometrics Journal, 6, pp1-27. Louviere, J. J., D. A. Hensher, and J. D. Swait (2000): Stated Choice Methods: Analysis and Application. Cambridge University Press. Marschak, J. (1960): “Binary choice constraints and random utility indicators”, in Marschak, J. (1974) Economic Information, Decision and Prediction: Selected Essays (Volume 1). D. Reidel, Dordrecht. McFadden, D. (1973): “Conditional Logit Analysis of Quantitative Choice Behavior”, in Zaremmbka P. (ed.), Frontier of Econometrics, Academic Press, New York. McFadden, D. (1978): “Modelling the choice of residential location”. In Karlqvist, A., Lundqvist, L., Snickars, F. and Weibull, J. (eds) Spatial Interaction Theory and Residential Location. North-Holland, Amsterdam, pp75-96 McFadden, D. (1981): “Econometric models of probabilistic choice”. In Manski, C. and McFadden, D. (eds) Structural Analysis of Discrete Data: With Econometric Applications. The MIT Press, Cambridge, Massachusetts, pp198-272. Rohatgi, V. K. (1976): An Introduction to Probability Theory and Mathematical Statistics. John Wiley and Sons, New York. Williams, H.C.W.L. (1977): “On the formation of travel demand models and economic evaluation measures of user benefit”. Environment and Planning A, 9 (3), pp285-344. Notes:

1 If we allow discrete distributions for the random terms we would need, for instance, to change the expressions for the probabilities, since in this case ( ), 1, ,i jU U j I≥ = K

( ), 1, ,i jU U j I> = K

Pr is not

equal, in general, to Pr .

2 If we restricted the domain of variance of the random terms to a subset of real values we would be altering the basic assumptions governing the most extended RUM models in the literature (logit, nested logit, probit, etc.). 3 In theory, Börsch-Supan (1990) does not consider the bounds of [ ],= a b

( )Pr 0i jU U= =

Ω in this definition.

4 Notice that this strict inequality is implicitly assuming a null probability of ties, that is, , which is achieved considering the distribution of ( ),i jε ε to be absolutely

continuous. 5 Conditions BS–1, BS–2 and BS–3 involve respectively +2 1I , ( )−1 2I I I

i i i

and restrictions on the probabilities. 6 Notice that the non-alternating sign in the inequalities embodied by K&R–4 respect to D&Z conditions are due to the former considering the utilities to be U v ε= − + instead of

Page 52: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

i i iU v ε= + , and that the original condition K&R–4 in Koning & Ridder (2004) has to be amended to find partial derivatives’ notation consistency. 7 In works such as Koning & Ridder (2003) the joint distribution of the random parts of the utilities of the I alternatives available to indidividuals, that is, ( )F ε , is considered to be non-defective if the two following equalities hold:

( )Flim 1ε ε→∞ = and ( )Fm 0ε ε→−∞li =

We see however that since they refer with ε to the vector of random terms, the two conditions are somewhat inexact, since non-defectiveness of a joint distribution of an I-dimensional random vector would instead require the ensuing Iε 1+ conditions:

( )IFlim 1→∞ =ε ι ε ; ( )

jFlim 0ε →−∞ =ε for any j I1, ,= K

I

8 In theory, there is no limitation in consider different auxiliary distributions, φ and IIφ , in the two occasions in which it appears:

( )

( ) ( )( )

( )

( ) ( )( )

12 1

2 2

1 1 1 12 1 2 2 2

12 1

1 2

0,1

0,

0,

I

I

II

II

P b if bb

f P if a b

P a if aa

φ ηη

η η η η

φ ηη

>

−Φ= ∂ ∂ ≤ ≤

9 This is guaranteed by the own definition of ( )12f η as far as ( )xφ is a non-defective

distribution. The probabilities sum up to one always that ( )12f η is properly specified

( f x ), since the integral domains involved in the summation covers the

entire correspondent real hyperplane (see Theorem 1).

( ) ( )0, 1f x dx∞

−∞≥ =∫

( ) ( ) ( ) ( ) ( )ε ε ε ε ε ε= − ≤ = − ≥ − = − − ≤ − = − −1 2 1 1 2 1 2 2Pr Pr 1 Pr 1Continuity

F x x x x F x

10 We could have arrived to this result by imposing the following relations between distribution functions:

For instance, if we consider that ( ) [ ]− ∈v v a b2 1 , and a real value ≤c a

( ) ( )

, we have that a translationally invariant and summing-up-to-one probability system leads to the necessity for symmetrical auxiliary distributions since:

( )( )

( )( )ε ε ε ε− −

Φ − Φ −= − − ⇔ =

Φ −Φ −

c cF c F c

a a2 1 1 2

11

1

11 Notice that the arguments just described for the binary case would invalidate the following statement contained in Koning & Ridder (2003), where only the binary case is considered when studying integrability conditions: “the Daly–Zachary–Williams conditions imply that the Slutsky matrix is negative semi-definite and symmetric, that the off-diagonal elements are non-negative, and that the rows and columns of this matrix sum to 0. These conditions are stronger than the ones imposed by the integrability conditions (IC–1) and (IC–2).” The reason is that the D&Z conditions and the IC conditions impose the same level of restriction on the Slutsky matrix for this binary case..

Page 53: Consistency of nested logit models with utility maximizationweb.mit.edu/11.951/oldstuff/albacete/Other... · ordering amongst them, is considered to reproduce a random utility maximisation

12 Notice that McFadden, 1978 considers ε= +k k kU v and that we have changed it to

ε= − +k k kU v to keep consistency with the criterion chosen in this paper. 13 Other approaches followed in the literature to build probability systems compatible with random utility maximisation includes the one introduced by Theorem 5.1.2 in McFadden (1981, p. 213), where the probabilities of the I alternatives available in a choice situation are defined as the gradient with respect to the systematic parts of the utilities (v) of a social surplus function: ( ) ( )= −∂ ∂ = Ki iP S v i I, 1, ,v v

( )

, which such function satisfying conditions SS1-6 as stated in McFadden (1981, p. 211). Notice, for instance, that the proof of symmetry is direct in this cases, since ( )−∂ ∂ ∂ = −∂ ∂ ∂i j j iS v v S v v2 2v v .

14 The original formulation of McFadden (1978) refers to this distribution as multivariate extreme value and to the probability systems that they lead to as general extreme value (GEV) models. The wide use of these models have made commonplace (somewhat incorrectly) to refer to them as models based on a generalized extreme value distribution, which is a term that have been traditionally reserved to refer to the three types of extreme value distributions: type I or Gumbel, type II or Frechet and type III or Weibull (Johnson et al., 1994). 15 Notice that no a priori normalization is carried out over the parameters µ λ λK J1, , , when deriving the probability expressions, keeping the maximum degree of generality at this level of analysis, and in contrast to what is done in part of the literature (McFadden, 1978; Herriges & Kling, 1996), where µ is fixed to one.

We understand that keeping the models with all the parameters clarifies the role of the dissimilarity parameters’ ratios on the issue of model identification, since there is not only one way to achieve univocally identified nested logit models, being the RU1, RU2 and RU3 formulations included in Louviere et al (2000) and Hensher & Greene (2002) an example of this necessity of normalization (see Ibáñez, 2006 for a more detailed discussion). 16 We also renounce for this three-level NL to any a priori normalization over the parameters

defining the dissimilarity parameters, that is, ψ µ µ λ λ λ λK K K KR RJ J R R1 11 1( 1)1 ( ), , , , , , , , , , .

In so doing, we keep the maximum degree of generality at this level of analysis, and in contrast to what is done generally in the literature (Louviere et al., 2000; Gil-Moltó & Risa, 2004), where ψ is fixed to one.