identification of the distribution of random coecients in static and dynamic discrete choice...

20
Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice Models Kyoo il Kim ú Michigan State University November 2014 Abstract We show that the distributions of random coecients in various discrete choice models are nonparametrically identified. Our identification results apply to static discrete choice models including binary logit, multinomial logit, nested logit, and probit models as well as dynamic programming discrete choice models. In these models the only key condition we need to verify for identification is that the type specific model choice probability belongs to a class of functions that include analytic functions. Therefore our identification results are general enough to include most of commonly used discrete choice models in the literature. Our identification argument builds on insights from nonparametric specification testing. We find that the role of analytic function in our identification results is to eectively remove the full support requirement often exploited in other identification approaches. JEL Classification : C10, C14 Keywords : Random Coecients, Nonparametric Identification, Logit and Probit, Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity in preferences of economic agents has been of significant interests in both theoretical and empirical studies. Random coecients in various models have been popularly used to address this individual heterogeneity. For recent work in discrete choice models with random coecients including consumer demands, see (e.g.) Berry, Levinsohn, and Pakes (1995), Nevo (2001), Petrin (2002), Rossi, Allenby, and McCulloch (2005), Lewbel (2000), Burda, Harding, and Hausman (2008), McFadden and Train (2000), Briesch, Chin- tagunta, and Matzkin (2010), Hoderlein, Klemela, and Mammen (2010), and Gautier and Kitamura (2013). However, identification studies on random coecient models, which can be applied to various discrete choice models, still have been scarce with only a few exceptions. Moreover, there has been no unifying identification framework that can be generally applied to a variety of discrete choice models. In this paper we provide one such important result. Building on insights from nonparametric specification testing literature (e.g. Stinchcombe and White 1998, Bierens 1982, 1990) we show that the distributions of random coecients in ú I am grateful to Patrick Bajari and Jeremy Fox for comments and conversations relating to this paper. I would also like to thank the workshop participants at the CEME NSF-NBER conference on “Inference in Nonstandard Problems” at Princeton University and thank the seminar participants at Sungkyunkwan University. Corresponding author: Kyoo il Kim, Department of Economics, Michigan State University, Marshall-Adams Hall, 486 W Circle Dr Rm 110, East Lansing, MI 48824, E-mail: [email protected]. 1

Upload: others

Post on 16-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

Identification of the Distribution of Random Coe�cients

in Static and Dynamic Discrete Choice Models

Kyoo il Kim

ú

Michigan State University

November 2014

Abstract

We show that the distributions of random coe�cients in various discrete choice modelsare nonparametrically identified. Our identification results apply to static discrete choicemodels including binary logit, multinomial logit, nested logit, and probit models as wellas dynamic programming discrete choice models. In these models the only key conditionwe need to verify for identification is that the type specific model choice probabilitybelongs to a class of functions that include analytic functions. Therefore our identificationresults are general enough to include most of commonly used discrete choice modelsin the literature. Our identification argument builds on insights from nonparametricspecification testing. We find that the role of analytic function in our identificationresults is to e�ectively remove the full support requirement often exploited in otheridentification approaches.

JEL Classification : C10, C14Keywords : Random Coe�cients, Nonparametric Identification, Logit and Probit,

Discrete Choice, Dynamic Discrete Choice

1 Introduction

Modelling heterogeneity in preferences of economic agents has been of significant interestsin both theoretical and empirical studies. Random coe�cients in various models have beenpopularly used to address this individual heterogeneity. For recent work in discrete choicemodels with random coe�cients including consumer demands, see (e.g.) Berry, Levinsohn,and Pakes (1995), Nevo (2001), Petrin (2002), Rossi, Allenby, and McCulloch (2005), Lewbel(2000), Burda, Harding, and Hausman (2008), McFadden and Train (2000), Briesch, Chin-tagunta, and Matzkin (2010), Hoderlein, Klemela, and Mammen (2010), and Gautier andKitamura (2013). However, identification studies on random coe�cient models, which can beapplied to various discrete choice models, still have been scarce with only a few exceptions.Moreover, there has been no unifying identification framework that can be generally appliedto a variety of discrete choice models. In this paper we provide one such important result.

Building on insights from nonparametric specification testing literature (e.g. Stinchcombeand White 1998, Bierens 1982, 1990) we show that the distributions of random coe�cients in

úI am grateful to Patrick Bajari and Jeremy Fox for comments and conversations relating to this paper.I would also like to thank the workshop participants at the CEME NSF-NBER conference on “Inferencein Nonstandard Problems” at Princeton University and thank the seminar participants at SungkyunkwanUniversity. Corresponding author: Kyoo il Kim, Department of Economics, Michigan State University,Marshall-Adams Hall, 486 W Circle Dr Rm 110, East Lansing, MI 48824, E-mail: [email protected].

1

Page 2: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

discrete choice models are nonparametrically identified if the type specific choice probabilitysatisfies the property that the span of the type specific choice probabilities is weakly densein the space of bounded and continuous functions. We then show that this identificationcondition is satisfied under three conditions. The first is that the type specific model choiceprobability is a real analytic function and the support of the distribution of covariates (e.g.characteristics of products) is a nonempty open set. The second is that the function inside thetype specific choice probability is monotonic in each element of the covariates vector that hasrandom coe�cients. This condition is trivially satisfied for static discrete choice models withindex restrictions. Importantly we verify this monotonicity condition also holds for dynamicdiscrete choice models. Therefore, the second condition is not restrictive for most of discretechoice models that are commonly used in the literature. The third and last condition is thatwe need at least one value of covariates such that the type specific choice probability doesnot depend on random coe�cients at this particular value. To satisfy this condition we cantypically let the covariates include the value of zero or re-center the covariates at zero. Wefind these three identifying conditions are satisfied for a class of discrete choice logit modelsincluding binary choice, multinomial choice, nested logit, and dynamic programming discretechoice models. The required condition of being a real analytic function is su�cient but notnecessary. As an example we find that the distribution of random coe�cients in the probitmodel is also nonparametrically identified but the probit function is not analytic.

Our identification argument di�ers from the “identification at infinity” using a specialcovariate (e.g. Lewbel 2000) and from the Cramer-Wold device (e.g. Ichimura and Thomp-son 1997). Berry and Haile (2010) also provide an important identification result for discretechoice models but they require a special covariate and its full support condition while they donot use the logit structure. Moreover, their identification objects of interest do not includethe distribution of random coe�cients. In our opinion, the main concern with the specialregressor is the requirement for large support. Large supports are sometimes acceptable butnot often supported in typical datasets used in discrete choice estimation. Our study focuseson the nonparametric identification of distribution of random coe�cients while our identifi-cation strategy explicitly resorts to the logit or the probit error structure, as is common inempirical work.1 This parametric assumption on the distribution of the choice-specific errorsdoes away with the need for large support assumptions. The entire distribution of randomcoe�cients can be identified using only local variation in characteristics. The frameworkwe use is similar to Fox, Kim, Ryan, and Bajari (2012) but their results are specific to thestatic multinomial logit model and our framework extends to other discrete choice modelsincluding nested logit, probit, and dynamic discrete choices. To our knowledge, our workis the first to formally show nonparametric identification of random coe�cients in dynamicprogramming discrete choice models. Our identification results are general enough to in-clude most of commonly used discrete choice models and also can be used to develop a sieveapproximation based estimator of the nonparametric distribution as in Fox, Kim, and Yang(2013). Although our identification results are not constructive, the results can be used toverify identification conditions for the consistency of the sieve approximation based estimatorin Fox, Kim, and Yang (2013).

The organization of the paper is as follows. In Section 2 we review various discrete modelsthat fit into our identification framework. Section 3 develops the identification theorems. In

1Other important identification studies in static discrete choice models include Briesch, Chintagunta, andMatzkin (2010), Chiappori and Komunjer (2009), Gautier and Kitamura (2013), and Fox and Gandhi (2010)but their modelling primitives are all di�erent from our focus in this paper.

2

Page 3: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

Section 4 we show that the identification conditions are satisfied for various static discretechoice models. In Section 5 we show the identification conditions hold for the dynamicdiscrete choice model. In Section 6 we conclude. Technical details are gathered in theAppendix.

2 Discrete Choice Models with Random Coe�cients

Here we review various examples of discrete choice models with random coe�cients to whichour identification theorems in Section 3 are applied. These models are mostly commonlyused in empirical studies. Readers who are familiar with the models can skip to Section 3for our identification results.

2.1 Logit model with individual choices

The motivating example is the multinomial logit discrete choice - including binary choice -with random coe�cients where agents i = 1, . . . , N can choose between j = 1, . . . , J mutuallyexclusive alternatives and one outside option (e.g. outside good). The random coe�cientslogit model was first proposed by Boyd and Mellman (1980) and Cardell and Dunbar (1980).

In the random coe�cients model, the preference parameter —i

is distributed by F (—)and is independent of the exogenous covariates. The exogenous covariates for choice j arein the K ◊ 1 vector x

i,j

. We let xi

= (xÕi,1

, . . . , xÕi,J

). This distribution F (—) is the objectof our interest. In the random utility model, agent i of type —

i

has her utility of choosingalternative j is equal to

ui,j

= – + xÕi,j

—i

+ ‘i,j

(1)

where – is the non-random constant term, so this model does not allow random coe�cientfor the constant term. Assume that ‘

i,j

is distributed as Type I extreme value including anoutside good with utility u

i,0

= ‘i,0

and agents are the utility maximizers. Then the outcomevariable y

i,j

is defined as

yi,j

=I

1 if ui,j

> ui,j

Õ for all jÕ ”= j

0 otherwise.

The type specific choice probability of taking choice j at xi

is

gj

(xi

, —, –) =exp(– + xÕ

i,j

—)1 +

qJ

j

Õ=1

exp(– + xÕi,j

Õ—).

In the data we observe the conditional choice probability of the mixture P (yi,j

= 1|xi

) andthe logit model implies that

P (yi,j

= 1|xi

) =⁄

gj

(xi

, —, –)dF (—) =⁄ exp(– + xÕ

i,j

—)1 +

qJ

j

Õ=1

exp(– + xÕi,j

Õ—)dF (—). (2)

Our key question is whether we can identify F (—) from the observed P (yi,j

= 1|xi

) and thetype specific model choice probability g

j

(xi

, —, –) in (2). In Section 4.1 we show F (—) isidentified for this multinomial logit model.

3

Page 4: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

2.2 Nested logit model with individual choices

We consider a nested logit model with the following random utility

ui,j,l

= zÕi,j

“i

+ xÕi,j,l

—j,i

+ ‘i,j,l

for l = 1, . . . , Lj

choices per group j with j = 1, . . . , J groups of choices and j = 0 being theoutside good (u

i,0

= ‘i,0

) where zi,j

denotes the group specific covariates while xi,j,l

denotesthe choice specific covariates. Let z

i

= (zÕi,1

, . . . , zÕi,J

) and xi

= (xÕi,1,1

, . . . , xÕi,1,L1 , . . . , xÕ

i,J,LJ).

The nested logit model allows individual tastes to be correlated across products in eachgroup. The error terms follow a generalized extreme value distribution (McFadden 1978) ofthe form

F (Á) = exp

Q

a≠Jÿ

j=0

Q

aLjÿ

l=1

exp(≠Áj,l

/flj

)

R

bflj

R

b

where flj

reflects the correlation between Áj,l

and Áj,l

Õ as flj

1 ≠ Corr[Áj,l

, Áj,l

Õ ] for alll ”= lÕ and for the outside good L

0

= 1, fl0

= 1.The type specific choice probability of taking choice l in the j category at (z

i

, xi

) is

gj,l

(zi

, xi

, “, —, fl) =exp

1zÕ

i,j

“ + flj

log1q

Lj

l

Õ=1

exp(xÕi,j,l

Õ—j

/flj

)22

qJ

j

Õ=0

exp1zÕ

i,j

Õ“ + flj

Õ log1q

LjÕl

Õ=1

exp(xÕi,j

Õ,l

Õ—j

Õ/flj

Õ)22

exp(xÕi,j,l

—j

/flj

)q

Lj

l

Õ=1

exp(xÕi,j,l

Õ—j

/flj

)

where — = (—Õ1

, . . . , —ÕJ

) and fl = (fl1

, . . . , flJ

). We have

P (yi,j,l

= 1|zi

, xi

) =⁄

· · ·⁄

gj,l

(zi

, xi

, “, —, fl)dF“

(“)dF—1(—

1

) · · · dF—J (—

J

) (3)

where F“

(“) and F—j (—

j

)’s are distribution functions of “ and —j

’s, so we assume “ and —j

’sare independent each other while we allow distributions of components inside —

j

’s can bedependent. In Section 4.2 we show that the distributions of random coe�cients F

(“) andF

—j (—j

)’s are identified for this nested logit model.

2.3 Probit model with binary choice

When ‘i,j

in (1) follows a standard normal distribution with J = 1 and ui,0

= 0. The modelbecomes a probit binary choice. We have

P (yi,1

= 1|xi,1

) =⁄

�(– + xÕi,1

—)dF (—)

where �(·) denotes the CDF of standard normal. In Section 4.3 we show this probit modelis also identified.

2.4 Logit model with aggregate data

The multinomial logit model can be used when data only on market shares sj

’s are availablebut individual level data are not. We assume the utility of agent i is

ui,j

= – + xÕj

—i

+ ‘i,j

4

Page 5: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

where — is distributed by F (—). In this case the logit model implies

sj

=⁄

gj

(x, —, –)dF (—) =⁄ exp(– + xÕ

j

—)1 +

qJ

j

Õ=1

exp(– + xÕi,j

Õ—)dF (—).

2.5 Dynamic discrete choice models

We consider the identification of distribution of random coe�cients in dynamic discretechoice models (e.g. Rust 1987, 1994) - note that the original models of Rust do not haverandom coe�cients. We assume that per period utility of agent i in a period t from choosingaction d œ D is

ui,d,t

= xÕi,d,t

◊ + ‘i,d,t

.

Here the error term is iid extreme value across agents, choices, and time periods. All of oronly a subset of ◊ can be random coe�cients. We let x

i,t

= (xÕi,1,t

, . . . , xÕi,|D|,t). The type

specific conditional choice probability is

gd

(xi,t

, ◊) = exp(v(d, xi,t

, ◊))q|D|

d

Õ=1

exp(v(dÕ, xi,t

, ◊))(4)

where v(d, xi,t

, ◊) denotes Rust’s choice-specific value function.As an illustration consider a dynamic binary choice model of Rust (1987) where the

conditional choice probability of taking an action “1” is given by

g1

(x, —, –) = exp{xÕ— + ”EV (x, 1; —, –)}exp{– + ”EV (x, 0; —, –)} + exp{xÕ— + ”EV (x, 1; —, –)} (5)

= exp{xÕ— + ” [EV (x, 1; —, –) ≠ EV (x, 0; —, –)]}exp{–} + exp{xÕ— + ” [EV (x, 1; —, –) ≠ EV (x, 0; —, –)]} (6)

and EV (x, d; —, –) is given by the unique solution to the Bellman equation

EV (x, d; —, –) =⁄

y

log)exp{yÕ— + ”EV (y, 1; —, –)} + exp{– + ”EV (y, 0; —, –)}*

fi(dy|x, d)

(7)with the transition density fi(dy|x, d). Note that although the distribution of — does notdepend on x, the evolution of the state variable x over time depends on the type specificvalue —. This is because individuals having the same x

i,t

= Âx at time t but having di�erent—’s will make di�erent choices at time t and their states in the following time periods will bedi�erent. However, we note that in the evaluation of value functions in (7), we need only thetransition density of states and this transition density is independent of — given D because— a�ects the transition of states only through the choice d. Therefore, the transition densityfi(dy|x, d) is not a function of —, which is typically identified in a pre-stage of estimation.

In Rust (1987)’s bus engine replacement example, d = 0 denotes the replacement of anengine, – denotes the scrap value, and — is the unit operation cost with mileage equal to x.When the random coe�cient — is distributed with F (—), we have

P (1|x) =⁄

g1

(x, —, –)dF (—) (8)

where P (1|x) is the true (population) conditional choice probability. We study identificationof these dynamic discrete choice models with random coe�cients in Section 5.

5

Page 6: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

3 Identification

In a general framework we develop nonparametric identification of the distribution of randomcoe�cients in discrete choice models. We then apply the results to the models of Section2. The econometrician observes covariates or characteristics x and the probability of somediscrete outcome indicators y, denoted by G (x). Since our leading example is discretechoice models, we interpret G(x) as the conditional choice probability and let h (x, —) be theprobability of an agent with the random coe�cient — taking the choice. We assume that —

and x are independent.Our goal is to identify the distribution function F (—) in the equation

G (x) =⁄

h (x, —) dF (—) (9)

where h (x, —) is a known function of (x, —). Identification means a unique F (—) solves thisequation for all x. Let G

0

(x) denote the true function of G (x) and let F0

(—) denote thetrue function of F (—) such that

G0

(x) =⁄

h (x, —) dF0

(—).

Then the identification means for any F1

”= F0

, we must have G1

”= G0

. Because G0

(x) =E[y|x] is nonparametrically identified, we focus on the identification of F

0

below treating G0

is known.

3.1 Notion of identification in the weak topology

To formalize the notion of identification we develop notation as follows. First let fl be anymetric on the space of finite measures inducing the weak convergence of measures. Forexample, this includes the Lévy-Prokhorov metric for distribution functions. Further define

H =Ó

h(x, —) : R2K æ R : x œ X µ RK , — œ B µ RK

Ô.

Note that h œ H can be read as a function of — given x and be also a function of x given— (with possible abuse of notation). Then the identification means F

1

= F0

in the weaktopology if and only if

shdF

1

=s

hdF0

for all h œ H. Let C(B) be the set of continuous andbounded functions on B. We let F(B) be the set of continuous and bounded distributionfunctions, supported on B. We further let G(X ) be the space of continuous and boundedfunctions on X , generated by the mixture in (9) and assume every G œ G(X ) is measurablewith a measure µ. We let F(B) be endowed with the metric fl(F

0

, F1

) for F0

, F1

œ F(B) andG(X ) be endowed with the metric d(G

1

, G2

) for G1

, G2

œ G(X ). We also assume that everyh œ H is measurable with respect to F œ F(B) for almost every x œ X . Finally let spHdenote the span of H.

Now suppose H satisfies that for all h œ C(B), for all F œ F(B), and for all ” > 0, wecan find a hÕ œspH such that ----

⁄hÕdF ≠

⁄hdF

---- < ”. (10)

Then by the definition of the span and the linearity of the integral, the condition (10) impliesthat F

1

= F0

(in the weak topology) if and only ifs

hdF1

=s

hdF0

for all h œ H. This isbecause the condition

shdF

1

=s

hdF0

for all h œ H becomes equivalent tos

hdF1

=s

hdF0

6

Page 7: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

for all h œ C(B) under (10). This means that our identification condition is equivalent tothe condition that the linear span of H is weakly dense in C(B).

In following sections we will show that some class of functions of h(x, —) satisfy this weakdenseness. We then show the type specific model choice probabilities of various discretechoice models - as commonly used in empirical studies - belong to this class. Therefore,characterizing the class of functions h(x, —) that satisfy the weak denseness condition becomesour working tool for identification of the distribution of random coe�cients.

Note that our identification theorems below imply limnæŒ fl(F

n

, F0

) = 0 if and only iflim

næŒ d(Gn

, G0

) = 0 for any sequence of Fn

such that Gn

(·) =s

h (·, —) dFn

. Note thatlim

næŒ fl(Fn

, F0

) = 0 implies limnæŒ d(G

n

, G0

) = 0 is obvious when the convergence in themetric fl is equivalent to the weak convergence of measures. For example, this holds for theLévy-Prokhorov metric if the metric space (B, ·) is separable where · is a metric on the set B.Our identification results imply that the opposite is also true as long as µ(X ) ”= 0. Thereforethis identification result is also useful to show the consistency of a sieve approximation basedestimator of F

0

as in Fox, Kim, and Yang (2013).

3.2 Identification with known support of the distribution

First we consider the identification problem when the support of the distribution of therandom coe�cients B is known. Then we relax this arguably strong assumption in Section3.4. We define our notion of identification formally.

Definition 1. For given F ”= F0

, h œ H distinguishes F if d(G, G0

) ”= 0. If for anyF (”= F

0

) œ F there exists a distinguishing h œ H, then H is totally distinguishing. If forany F (”= F

0

) œ F , all but a negligible set of h œ H are distinguishing, then H is genericallytotally distinguishing.

The implication of H being generically totally distinguishing is that then F0

is identifiedon any subset X µ X with µ(X ) ”= 0. This notion of identification is closely related tothe notion of revealing and totally revealing in the consistent specification testing problemof Stinchcombe and White (1998) and those in works of Bierens (1982, 1990). We first layout our identification theorem below (Theorem 1) and note that its proof is closely relatedwith Theorem 2.3 in Stinchcombe and White (1998) since the class of H that is genericallytotally revealing in Stinchcombe and White (1998) is generically totally distinguishing in ourproblem of identification.

There are, however, several important di�erences to be pointed out. First their problemis a consistent specification testing where the index set B and draw of —’s (not necessarilyrandom) are arbitrary choices of a researcher, so the distribution of — is not of their interestbut our problem is the identification of the distribution of —. Second we switch the role of x

and — in the specification testing problems such that x’s in X now generate the functions inH. The last key di�erence is that for our identification result we do not need to restrict thefunction h(x, —) to take the form of h(x, —) = g(x

1

+ ÂxÕ—) (i.e., a�ne in —). This requiresa normalization of coe�cient for (e.g.) a special regressor x

1

. This will be replaced by therequirement that X includes at least one value xú such that h(x, —) does not depend onthe random coe�cients — at xú in our identification. Note that without loss of generality,following our leading example of the logit models, we can take xú = 0 or re-center x at zerosuch that {0} µ X for linear index models of the form h(x, —) = g(xÕ—). We present our firstidentification theorem.

7

Page 8: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

Theorem 1. Let Hg

= {h : h(x, —) = g(xÕ—), x œ X } where (i) X µ RK is a nonemptyopen set, (ii) {0} µ X , and (iii) g is real analytic. Suppose B is known. Then H

g

isgenerically totally distinguishing if and only if g is non-polynomial. Moreover, H

g

is alsototally distinguishing.

Proof. Theorem 1 is implied by Theorem 3 below and hence the proof is omitted. We proveTheorem 3 in Section 3.5.

In the theorem we restrict our attention to the class of models with the linear indexinside the model choice probability of the form g(xÕ—), which is general enough to include allstatic discrete choice models we consider in Section 2 and include a class of functions thatallow for dynamic discrete choice models in Section 5. Our results can extend to multiplelinear indexes models, which may include discrete game models of strategic interactions. Animportant implication of the linear index is that the term inside the model choice probabilityis monotonic in each element of x that has random coe�cients. This monotonicity is exploitedin the proof of the theorem. In the theorem the conditions (i) and (ii) are typically assumedin the models we consider, so we need to verify only the condition of g being real analytic.Real analytic functions include (e.g.) polynomials, exponential functions, and logit-typefunctions. A formal definition of real analytic function is given as

Definition 1. A function g (t) is real analytic at c œ T ™ R whenever it can be representedas a convergent power series, g (t) =

qŒd=0

ad

(t ≠ c)d, for a domain of convergence aroundc. The function g (t) is real analytic on an open set T ™ R if it is real analytic at allarguments t œ T .

Note that in Theorem 1 we do not require X = RK . Therefore our identification result isdi�erent from the identification at infinity and is also di�erent from the Cramer-Wold device.

3.3 Identification with fixed coe�cients

Note that when a subset (at least one) of coe�cients are not random, then the identificationof the distribution of random coe�cients is also obtained because we can let

h(x, —) = g(xÕ1

—1

+ xÕ2

—2

)

and treat this is a�ne in —2

where —1

is fixed parameters and —2

are random coe�cients. Ouridentification strategy for this case applies in two stages. The identification of homogenouscoe�cients is trivial when x

2

can take the value of zero. At x2

= 0, the model becomesdiscrete choice models with homogeneous parameters only and their identification is a stan-dard problem. To give further details note that in a first stage of an auxiliary argument weidentify the true —0

1

using the relationship from (9) as

G0

(x1

, 0) =⁄

h (x1

, 0, —1

, —2

) dF0

(—2

) =⁄

g(xÕ1

—1

)dF0

(—2

) = g(xÕ1

—1

).

Then because G0

(x1

, 0) is known we can identify —0

1

from the inverse function of the rela-tionship above, typically using a regression. Therefore, in this case we can treat —0

1

as beingknown and focus on the identification of F

0

(—2

). Then the identification of F0

(—2

) followsfrom the corollary below:

8

Page 9: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

Corollary 1. Let HgA = {h : h(x, —) = g(xÕ

1

—1

+ xÕ2

—2

), (x1

, x2

) œ X } where (i) the set ofvalues of x

2

, X2

is a nonempty open set, (ii) X includes values of the form{(x1

, 0)}, and (iii)g is real analytic. Suppose —

1

are fixed coe�cients and the support of F (—2

), B2

is known.Then H

gA is generically totally distinguishing if and only if g is non-polynomial. Moreover,H

gA is also totally distinguishing.

Proof. Corollary 1 is a direct application of Theorem 1 or Lemma 3.7 in Stinchcombe andWhite (1998) because g(xÕ

1

—1

+ xÕ2

—2

) is a�ne in —2

given —1

.

Below we focus on the models with random coe�cients only because all theorems wedevelop will apply to the models with a subset of fixed parameters after a first stage ofidentifying the fixed parameters is applied.

3.4 Identification with unknown support of the distribution

Often we do not know the support of F , B. For this reason, it will be useful to strengthenthe identification result when the mixture in (9) is generated by any compact subset B. Wedefine this stronger notion of identification as

Definition 2. H is completely distinguishing if it is totally distinguishing for any distributionF (”= F

0

) œ F supported on any compact B.

The implication of H being completely distinguishing is that F0

is identified on anycompact support B while the support of x X is particularly given. We apply this notion ofidentification to the class of functions H

g

= {h : h(x, —) = g(xÕ—), x œ X }.As discussed in Stinchcombe and White (1998) whether H

g

is totally distinguishing isequivalent to whether the linear span of H

g

defined below is weakly dense in C(B). Wedefine the linear spaces of functions, spanned by H

g

as

�(Hg

, X , B) =I

h : B æ R|h(—) = “0

+q

L

l=1

“l

g(x(l)Õ—), “0

, “l

œ R,

x(l) œ X µ RK , l = 1, . . . , L.

J

.

When X = RK , the totally distinguishing property is not surprising. More interesting resultis obtained when X is a subset of RK . In the proof of Theorem 3 below, we show that�(H

g

, X , B) is weakly dense in C(B) and so the identification result follows also with anynonempty open subset X of RK .

The di�erence between spHg

(X ) and �(Hg

, X , B) is that spHg

(X ) does not include theconstant functions while �(H

g

, X , B) does. But the di�erence disappears when X includes{0} or an xú such that g(xúÕ—) does not depend on — and g(xúÕ—) ”= 0. Therefore, in thiscase spH

g

(X ) becomes dense in C(B), which is our key argument for identification.For H

g

, it then becomes completely distinguishing when �(Hg

,RK , B) (so X = RK) isuniformly dense in C(B) for any compact B. This uniform denseness is satisfied as long asfor the non-polynomial function g(t), there exists an interval t œ [a, b] such that g is Riemannintegrable in [a, b] and is continuous on [a, b] due to Hornik (1991). Also see Lemma 3.5 inStinchcombe and White (1998).

Theorem 2. Let X = RK and Hg

= {h : h(x, —) = g(xÕ—), x œ X } where g is Riemann inte-grable and continuous on ÷[a, b] and non-polynomial. Then H

g

is completely distinguishing.

9

Page 10: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

Proof. The theorem follows from spHg

= �(Hg

,RK , B) and by Lemma 3.5 in Stinchcombeand White (1998).

Theorem 2 show that a wide class of functions g - that include all of the discrete choicemodels we consider in Section 2 - can identify the distribution of random coe�cients aslong as we have the full support condition, X = RK but this full support condition is verystrong requirement. Also note that we have not yet seen any role of analytic function in theidentification because any non-polynomial real analytic function satisfies the requirementon g in Theorem 2. Theorem 3 below shows that we can relax the full support conditionfor the identification when g is real analytic. This includes exponential functions and moreimportantly logit functions.

This also reveals the role of analytic function in the identification. It e�ectively removesthe full support requirement, which is very important for discrete choice models where thevalues of covariates are bounded below and above.

Now we show the above completeness result is generically true for any nonempty opensubset X µ RK when the function g is analytic. We further define

Definition 3. H is generically completely distinguishing if and only if it is totally distin-guishing for any open set X with nonempty interior and for any distribution F (”= F

0

) œ Fsupported on any compact B.

Theorem 3. Hg

is generically completely distinguishing when {0} µ X if and only if g isreal analytic and is not a polynomial.

Proof. See Section 3.5 for the proof.

Theorem 3 is our most general result and is the main theorem. Note that this identifica-tion result also holds for models with a subset of coe�cients (at least one) being not randombecause all theorems we develop apply to the models with a subset of fixed parameters aftera first stage of identifying the fixed parameters is applied.

Corollary 2. Let HgA = {h : h(x, —) = g(xÕ

1

—1

+ xÕ2

—2

), x œ X } where g is a real analyticnon-polynomial function, X includes values of the form{(x

1

, 0)}, and —1

are fixed coe�cients.Then H

gA is generically completely distinguishing.

Proof. See Appendix B for the proof.

3.5 Proof of Theorem 3

Because Theorem 3 implies Theorem 1 with the known support B, we only prove Theorem3. Theorem 3 is implied by the following two lemmas. First, we show that for H

g

, thegeneric completeness is equivalent to the condition that for every X with nonempty interior,�(H

g

, X , B) is uniformly dense in C(B) for any compact B.

Lemma 1. The class Hg

is generically completely distinguishing if and only if for everyopen set X with nonempty interior, �(H

g

, X , B) is uniformly dense in C(B) for any compactB.

In the proof we use the fact that xÕ— is monotonic in each element of x. By constructionof the linear index, this monotonicity is trivially satisfied for the static discrete choice models.For the dynamic discrete choices, to apply the theorem, we need to verify the choice specific

10

Page 11: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

continuation payo�s function EV (x, d; —, –) (defined in Section 2.5) is monotonic in eachelement of x and we verify this in Section 5.

We show also that

Lemma 2. Hg

is generically completely distinguishing if and only if it is completely distin-guishing when g is real analytic.

The most important implication of Lemma 2 is that we have only to show the identifica-tion at a particular choice of X . Then, according to Lemma 2, the identification must alsohold for any X with nonempty interior. This result facilitates applications of the identifica-tion argument substantially because one can take the full support X = RK under which theidentification is often easier to show (see e.g. Fox, Kim, Ryan, and Bajari 2012). Note thatverifying identification with X = RK does not mean we indeed require the true data shouldhave the full support. It only means that if one shows identification as if X = RK , thenidentification must also hold for any X with nonempty interior, which includes the real datasituation.

Combining Lemma 1 and 2 we conclude that Theorem 3 holds because Hg

is completelydistinguishing as long as g is real analytic by Theorem 2. In the appendix we prove Lemma1 and Lemma 2.

3.6 Identification with non-analytic functions

We find that the class of functions that is generically completely distinguishing is not lim-ited to analytic functions. Other class of functions that satisfy the following condition isalso generically completely distinguishing. This includes the normal cumulative distribu-tion function. Therefore the distribution of random coe�cients in the probit model is alsononparametrically identified.

Theorem 4. Suppose that sp{dpg(t), 0 Æ p < Œ|t œ T } is dense in C(R) for any nonemptyopen subset T µ R containing {0} with g(·) infinitely di�erentiable. Then for any open setX µ RK with nonempty interior, the span �(H

g

, X , B) is uniformly dense in C(B) for anycompact B, so H

g

is generically completely distinguishing.

Proof. Theorem 4 trivially follows from Theorem 3.10 in Stinchcombe and White (1998).

4 Identification of Static Discrete Choice Models

We verify identification conditions for the examples of static discrete choice models. Becauseother conditions for identification are either trivially satisfied or can be directly assumed, wefocus on showing the type specific model choice probability function is either being real ana-lytic - as the key condition in Theorem 3 - or belongs to other class of generically completelydistinguishing functions as in Theorem 4.

4.1 Logit model with individual choices

For the multinomial logit model (2) our identification argument on F (—) proceeds afterwe recover the constant term – from a first stage using an auxiliary argument that doesnot depend on —. A similar strategy was used in Fox, Kim, Ryan, and Bajari (2012) toidentify homogenous parameters in a first stage. The typical strategy is using the observed

11

Page 12: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

choice probability at xi

= 0 where we have P (yi,j

= 1|xi

= 0) = exp(–)

1+J exp(–)

. BecauseP (y

i,j

= 1|xi

= 0) is nonparametrically identified from the data, – is also identified from theinverse function. Below we focus on the identification of F (—) assuming – is known. Withabuse of notation we write g

j

(xi

, —) = gj

(xi

, —, –0) where –0 denotes the true value of –.In Section 3 we have shown that the key identification condition of F (—) is that (i)

gj

(xi

, —) is real analytic, (ii) the support of distribution xi

, X has nonempty interior, and(iii) X includes at least one value of x = x such that g

j

(x, —) ”= 0 does not depend on —.We assume the condition (ii). To satisfy the condition (iii) simply we can take x = 0 instatic discrete choice models where the covariates are re-centered at zero. For a binary logitmodel, it is obvious that g

j

(xi

, —) is real analytic, so the condition (i) is also satisfied. Forthe multinomial logit case too we can show that g

j

(x, —) is real analytic. Pick a particular j

and let xj

Õ = 0 for all jÕ ”= j. Then we have gj

(xi

, —) = exp(x

Õi,j—)

exp(≠–

0)+J≠1+exp(x

Õi,j—)

, which has the

form G÷

(t) © exp(t)

÷+exp(t)

for some ÷ and hence gj

(xi

, —) is real analytic because the exponentialfunction is real analytic and the function g

j

(xi

, —) is formed by the addition and division ofnever zero real analytic functions (Krantz and Parks 2002).

4.2 Nested logit model with individual choices

First we show flj

- that reflects the correlation between goods for each group - is identifiedfrom an auxiliary step. Note that where z

i

= 0 and xi

= 0, we have

P 0

j,l

© P (yi,j,l

= 1|zi

= 0, xi

= 0) = gj,l

(0, 0, “, —, fl) = exp (flj

log (Lj

))q

J

j

Õ=0

exp!fl

j

Õ log!L

j

Õ""

1L

j

.

It follows that Lj

· P 0

j,l

= exp(flj log(Lj))qJ

jÕ=0 exp(fljÕ log(LjÕ)) and therefore log(Lj

· P 0

j,l

) ≠ log(L0

· P 0

0,l

) =

flj

log(Lj

), from which we identify flj

for j = 1, . . . , J as

flj

= {log(Lj

· P 0

j,l

) ≠ log(L0

· P 0

0,l

)}/ log(Lj

)

because P 0

j,l

and Lj

are directly observable from data for all j, l. Below we treat flj

’s asknown.

In the nested logit model of (3) we focus on showing gj,l

(zi

, xi

, “, —, fl) is a real analyticfunction. Other conditions for identification in Section 3 are trivially satisfied or directlyassumed as in the multinomial logit case.

Now pick a particular j and let zj

Õ = 0 for all jÕ ”= j œ {0, 1, . . . , J} and let xj,l

= 0 forall j and l. Then we have

gj,l

(zi

, xi

, “, —, fl) =exp

1zÕ

i,j

“ + flj

log(Lj

)2

qj

Õ ”=j

exp (fljÕ log (L

jÕ)) + exp1zÕ

i,j

“ + flj

log(Lj

)2 1

Lj

=exp

1zÕ

i,j

“2

qj

Õ ”=j

exp (fljÕ log (L

jÕ) ≠ flj

log (Lj

)) + exp1zÕ

i,j

“2 1

Lj

=exp

1zÕ

i,j

“2

÷ + exp1zÕ

i,j

“2 1

Lj

where we let ÷ =q

j

Õ ”=j

exp (fljÕ log (L

jÕ) ≠ flj

log (Lj

)) .

Therefore gj,l

(zi

, xi

, “, —, fl) has the form as 1

LjG

÷

(t) = exp(t)

÷+exp(t)

1

Lj, so is an analytic func-

tion as in the multinomial logit case (Krantz and Parks 2002). Therefore the distributionof the random coe�cients “ is identified when z

j

also includes {0}. Now we turn to the

12

Page 13: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

identification of the distribution of —j

. Let zj

= 0 for all j and let xj

Õ,l

Õ = 0 for all jÕ ”= j andlÕ ”= l. Then we have

gj,l

(zi

, xi

, “, —, fl)

=exp

1fl

j

log1L

j

≠ 1 + exp(xÕi,j,l

—j

/flj

)22

qj

Õ ”=j

exp!fl

j

Õ log!L

j

Õ""

+ exp1fl

j

log1L

j

≠ 1 + exp(xÕi,j,l

—j

/flj

)22

◊ exp(xÕi,j,l

—j

/flj

)L

j

≠ 1 + exp(xÕi,j,l

—j

/flj

)

=

1L

j

≠ 1 + exp(xÕi,j,l

—j

/flj

)2

flj

qj

Õ ”=j

exp!fl

j

Õ log!L

j

Õ""

+1L

j

≠ 1 + exp(xÕi,j,l

—j

/flj

)2

flj

exp(xÕi,j,l

—j

/flj

)L

j

≠ 1 + exp(xÕi,j,l

—j

/flj

)

Because the product of analytic functions is also analytic, we have only to show the function

ÂGÂ÷(t) = (Lj

≠ 1 + exp(t/flj

))flj

Â÷ + (Lj

≠ 1 + exp(t/flj

))flj

(where we write Â÷ =q

j

Õ ”=j

exp!fl

j

Õ log!L

j

Õ""

) is analytic because exp(x

Õi,j,l—j/flj)

Lj≠1+exp(x

Õi,j,l—j/flj)

is

analytic (it can be written as exp(t)

÷+exp(t)

for some ÷). ÂGÂ÷(t) is also analytic as long as(L

j

≠ 1 + exp(t/flj

))flj is analytic because the reciprocal of an analytic function that does nottake the value of zero at its support is also analytic. Now note that (L

j

≠ 1 + exp(t/flj

))flj

is analytic because compositions of analytic functions are analytic and Lj

≠ 1 + exp(t/flj

)is strictly positive. Therefore we conclude the distribution of random coe�cients of —

j

isidentified. Similarly we can show that all the distributions of —

j

, j = 1, . . . , J are identified.

4.3 Probit model with binary choice

We assume the support of distribution of xi,1

includes {0} (or re-centered at zero). In afirst stage we identify –0 from P (y

i,1

= 1|xi,1

= 0) = �(–0). Also �(·) does not dependon — at x

i,1

= 0 and �(–0) ”= 0. Finally although the normal CDF �(·) is not analytic, itis infinitely di�erentiable and satisfies conditions in Theorem 4. Therefore the distributionF (—) is identified in this case too.

4.4 Logit model with aggregate data

As the logit model with individual choices, by the similar argument, the distribution ofrandom coe�cients for this case is also identified. The only di�erence is that in the individualchoices we identify P (y

i,j

= 1|xi

) from the data in an auxiliary step while in the aggregatedata case, the conditional share s

j

is the data.

5 Identification of Dynamic Programming Discrete Choices

We have shown that the distribution of random coe�cients is identified for static discretechoice models. However, the theorems in Section 3 cannot be directly applied to the dynamicprogramming discrete choice problems because the type specific model choice probabilities inthese models contain the choice specific continuation payo�s functions. In this section first weshow that the choice specific continuation payo�s function and so the choice specific value

13

Page 14: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

function is monotonic in each element of covariates vector that have random coe�cients.Then we show that Theorem 3 can extend to the dynamic programming discrete choiceproblems based on this monotonicity result.2

Following Rust (1994), let u(x, d, —, –) denote the per period utility of taking an actiond in the set of choices D(x) where x denotes the covariates or states variables with randomcoe�cients, — denotes random coe�cients, and – denotes homogeneous coe�cients. LetEV (x, d; —, –) denote the choice specific continuation payo�s function or the choice specificexpected value function. Then for the logit model the type specific choice probability oftaking the action d becomes

gd

(x, —, –) = exp{u(x, d, —, –) + ”EV (x, d; —, –)}q

d

ÕœD(x)

exp{u(x, dÕ, —, –) + ”EV (x, dÕ; —, –)}

where the expected value function EV (x, d; —, –) of the logit model is given by the uniquefixed point that solves

EV (x, d; —, –) =⁄

y

log

Y]

[ÿ

d

ÕœD(y)

exp{u(y, dÕ, —, –) + ”EV (y, dÕ; —, –)}Z^

\ fi(dy|x, d)

where fi(dy|x, d) denotes the transition density depending on d. We note that

Lemma 3. Suppose the per period utility satisfies the linear index restriction, i.e., u(x, d, —, –)depends on xÕ

d

— but does not depend on xd

or —, separately. Then the choice specific con-tinuation payo�s function EV (x, d; —, –) is monotonic in each element of x.

Proof. See Appendix C for the proof. Note that the result and its proof are not specific tothe logit model.

Based on this monotonicity, next we obtain the identification of the distribution of randomcoe�cients for dynamic discrete choice models.

Theorem 5. Let gd

(x, —, –) be the type specific choice probability of a dynamic discretechoice problem. Then HD

g

= {h : h = gd

(x, —, –), x œ X } is generically completely distin-guishing when {0} µ X if and only if (i) g

d

is real analytic and is not a polynomial and (ii)the per period utility u(x, d, —, –) satisfies the linear index restriction in Lemma 3.

Proof. See Appendix D for the proof.

In the example of the dynamic binary choice model of (5)-(6), HD

g

becomes

HD

g

=;

h : h = g1

(x, —, –) = exp{xÕ— + ”EV (x, 1; —, –)}exp{– + ”EV (x, 0; —, –)} + exp{xÕ— + ”EV (x, 1; —, –)} , x œ X

<

and we prove the identification theorem for the binary case in the appendix without lossof generality because (e.g.) for the multinomial choices, we can let x

d

= 0 for d ”= 1. Inthe proof we assume the discount factor ”, the scrap value –, and the transition densityfi(dy|x, d) are known resorting to the following remark:

Remark 1. Rust (1987, 1994) and Magnac and Thesmar (2002) argue that it is di�cultto identify the discount factor ”, so we assume it is known. For the binary logit case the

2Theorem 1 also extends to the dynamic discrete choices because Theorem 3 implies Theorem 1.

14

Page 15: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

homogeneous parameter, scrap value – is identified at x = 0 from the observation thatP (1|x = 0) =

sg

1

(0, —, –)dF (—)=s

1

exp{–}+1

dF (—) = 1

exp{–}+1

because from (7) we findEV (0, 1; —, –) = EV (0, 0; —, –) (see also Rust 1987). The transition density fi(dy|x, d) is alsononparametrically identified from the data.

6 Conclusion

We show that the distributions of random coe�cients in various discrete choice models arenonparametrically identified. Our identification results apply to both binary and multinomiallogit, nested logit, and probit models as well as dynamic programming discrete choices. Wefind that the distribution of random coe�cients is identified if (i) the type specific modelchoice probability belongs to a class of functions that include real analytic functions andthe support of the distribution of covariates is a nonempty open set, (ii) the term inside thetype specific choice probability is monotonic in each element of the covariates vector thathas random coe�cients, and (iii) the type specific choice probability does not depend onrandom coe�cients at a particular value of covariates. We show that these conditions aresatisfied for various discrete choice models that are commonly used in the empirical studies.In our identification results we stress the role of analytic function that e�ectively removesthe full support requirement often exploited in other identification approaches. Thereforeour results are important for discrete choice models where the values of covariates are oftenbounded below and above.

Lastly, as a referee points out, our identification results can be used as basis for specifica-tion testing. First note that our identification allows for the case of degenerated distribution(i.e., coe�cients are fixed parameters, not random) and hence can serve as a specificationtest for a random coe�cient model. Moreover, our results can be used as a specification testfor the specific choice model. Suppose a known  (x, —), with appropriate normalization, isincorrectly used instead of the true h (x, —) in (9). Then since

G0

(x) =⁄

h(x, —)dF0

(—) =⁄

 (x, —) h(x, —) (x, —)dF

0

(—)

=⁄

 (x, —) dH(x, —)

for an H such that dH(x, —) = h(x,—)

Â(x,—)

dF0

(—), if  (x, —) satisfies the identification conditions,then any distribution function of — only - which is not a function of x - will be rejected fromour identification exercise. Therefore one can conclude (x, —) is incorrectly specified whenit is used for the choice model. Some forms of these specification tests can be addressed withfurther research based on our identification results.

15

Page 16: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

Appendix

A Proof of Lemma’s for Theorem 3

A.1 Proof of Lemma 1

We prove Lemma 1 for dynamic discrete choice models in Section D. Lemma 1 for the staticdiscrete choice models with H

g

can be proved by the essentially same arguments by takingthe discount factor ” = 0 i.e., we drop the continuation payo�s function EV (x, d; —, –) in thetype specific model choice probability.

A.2 Proof of Lemma 2

If Hg

is generically completely distinguishing, it is also completely distinguishing by defini-tions. Next we show the opposite is also true. If H

g

is not generically completely distin-guishing, we can find a compact set ÂB and a nonempty open set ÂX such that �(H

g

, ÂX , ÂB) isnot uniformly dense in C( ÂB). Then there exists a distribution ÂF ”= F

0

supported on ÂB suchthat for all x œ ÂX , ÂG(x) =

sg (xÕ—) d

1ÂF (—) ≠ F

0

(—)2

= 0 by the Hahn-Banach theorem.We, however, note that ÂG(x) is real analytic because g(·) is and ÂB is compact. We furthernote that a real analytic function is equal to zero on the open set ÂX if and only if it is equalto zero everywhere. This implies that H

g

is not completely distinguishing. Therefore ifH

g

is completely distinguishing, it must be also generically completely distinguishing. Thiscompletes the proof.

In the proof ÂG(x) is a multivariate function. According to Definition 2.2.1 in Krantz andParks (2002) a function �(x), with domain an open subset T ™ RK and range R, is called(multivariate) real analytic on T if for each x œ T the function �(·) may be represented bya convergent power series in some neighborhood of x.

B Proof of Corollary 2

This can be proved similarly to the proof of Theorem 3 or the proof of Lemma 3.7 inStinchcombe and White (1998).

C Proof of Lemma 3

Proof. We prove the lemma for the dynamic binary choice without loss of generality. Letx(k) be a vector of states that is equal to x except the k-th element. Let EV (x, d; —, –)denote the value function when an agent having the covariates or states equal to x(k) takesa sequence of choices that are optimal under the current state x. Without loss of generalitywe consider the case that —

k

, the k-th element in — is positive and x(k) Ø x. Then we have

EV (x(k), d; —, –) Æ EV (x(k), d; —, –)

because EV (x(k), d; —, –) is the value of the expected value function when an agent with thestates equal to x(k) takes a sequence of optimal choices by the definition of the value functionand EV (x(k), d; —, –) is from a non-optimal choices of actions. Next we note that

EV (x, d; —, –) Æ EV (x(k), d; —, –)

16

Page 17: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

because (i) for any time period the per period utility under x(k) is greater than or equal tothe per period utility under x and (ii) the agent takes the same sequence of choices under x

and x(k) in our definition of EV (x(k), d; —, –). Combining these two results, we conclude themonotonicity because

EV (x, d; —, –) Æ EV (x(k), d; —, –) whenever x(k)

k

Ø xk

.

Our choice of the k-th element is arbitrary and so this monotonicity result holds for anyelement in x.

D Proof of Theorem 5

We prove this theorem by showing corresponding results to Lemma 1 and Lemma 2 hold forHD

g

. Lemma 2 holds trivially since the function g in HD

g

is analytic. We focus on Lemma 1.We prove this for the dynamic programming binary choice model of (5)-(6) without loss ofgenerality. We assume the discount factor ” is known. We also assume – is known since itcan be identified from an auxiliary step as discussed in Remark 1.

Define the linear spaces of functions, spanned by HD

g

as

�(HD

g

, X , B) =I

h : B æ R|h(—) = “0

+q

L

l=1

“l

g1

(x(l), —, –), “0

, “l

œ R,

x(l) œ X µ RK , l = 1, . . . , L.

J

.

If the uniform closure of spHD

g

(X ) contains C(B), then that of �(HD

g

, X , B) also must containC(B) since spHD

g

(X ) µ �(HD

g

, X , B) by construction. Now suppose the uniform closure of�(HD

g

, X , B) contains C(B) for every compact B µ RK and suppose that X µ RK hasnonempty interior containing {0}. We will prove Theorem 5 by contradiction. We provethis for the dynamic programming binary choice problem (say D = {0, 1}) without loss ofgenerality because for the multinomial choices, we can let x

d

= 0 for d ”= 1. We take g = g1

and let x = x1

below.Now suppose that spHD

g

(X ) is not dense in C(B) for some X and B. This happens if andonly if there exists a distribution function F ”= F

0

(in the sense that fl(F, F0

) ”= 0) supportedon B such that for all x œ X ,

sg

1

(x, —, –)d (F (—) ≠ F0

(—)) = 0.Let A be a compact subset of RK containing an ‘-neighborhood of B (in terms of the

Hausdor� metric) for some ‘ > 0. Pick ” > 0 and x œ X such that S(x, 2”), the ball of radius2” around x, is contained in X . By assumption, �(HD

g

, S(x, 2”), A) is uniformly dense inC(A). It follows that for every n œ N and for every strict subset A µ A, some element of�(HD

g

, S(x, ”), A) is uniformly within n≠1 of the continuous function

fn(—) := max{1 ≠ nÍ(—, A), 0}

where Í(—, A) is the Hausdor� distance from — to the set A. By construction the sequencefn(—) is uniformly bounded between zero and one and converges pointwise to the indica-tor function 1

Ó— œ A

Ô. Therefore, as n goes to infinity,

sA

fn(—)d (F (—) ≠ F0

(—)) goes to

17

Page 18: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

sÂA

1d (F (—) ≠ F0

(—)). Because each fn is in the span of HD

g

(S(x, ”)) and 1, we can write

fn(—) = “0,n

+L,nÿ

l,n=1

“l,n

g1

(x(l,n), —, –) (11)

= “0,n

+L,nÿ

l,n=1

“l,n

exp{x(l,n)Õ— + ”ËEV (x(l,n), 1; —, –) ≠ EV (x(l,n), 0; —, –)

È}

exp{–} + exp{x(l,n)Õ— + ”#EV (x(l,n), 1; —, –) ≠ EV (x(l,n), 0; —, –)

$}

where each x(l,n) œ S(x, ”). The key idea underlying this proof strategy is that we can stretchout the functions fn without changing their integral against F (—)≠F

0

(—), and then we showthis cannot happen unless F (—) = F

0

(—) for almost all — œ B.Now we formalize the idea. Because

sg

1

(x, —, –)d (F (—) ≠ F0

(—)) equal to zero for anyelement g

1

œ HD

g

(X ), we can let any x(l,n) substitute each x(l,n) in (11) without changingthe integral of fn(—) against F (—) ≠ F

0

(—). Without loss of generality we can take A as aCartesian product of intervals

rK

k=1

Ë—

k

, —k

È. Then, for each of K elements we can find a

sequence of bl,n

k

and cl,n

k

, k = 1, . . . , K, in RK such that

x(l,n)Õb(l,n)

k

+ ”ËEV (x(l,n), 1; b

(l,n)

k

, –) ≠ EV (x(l,n), 0; b(l,n)

k

, –)È

= —k

andx(l,n)Õc

(l,n)

k

+ ”ËEV (x(l,n), 1; c

(l,n)

k

, –) ≠ EV (x(l,n), 0; c(l,n)

k

, –)È

= —k

.

Because (i) S(x, ”) µ S(x, 2”) µ X and (ii) the function xÕ—+” [EV (x, 1; —, –) ≠ EV (x, 0; —, –)]is monotonic in each element of x,3 now we can find some ÷

k

œ (0, ‘), k = 1, . . . , K such thatfor all (l, n)-pairs there exists x(l,n) œ X such that

x(l,n)Õb(l,n)

k

+ ”ËEV (x(l,n), 1; b

(l,n)

k

, –) ≠ EV (x(l,n), 0; b(l,n)

k

, –)È

= —k

≠ ÷k

andx(l,n)Õc

(l,n)

k

+ ”ËEV (x(l,n), 1; c

(l,n)

k

, –) ≠ EV (x(l,n), 0; c(l,n)

k

, –)È

= —k

+ ÷k

.

In the sequence of functions {fn} defined in (11), replace each x(l,n) by the correspondingx(l,n) and obtain a sequence of functions in �(HD

g

, X , B), say {hn}. Then the sequence {hn}converges pointwise to the indicator function 1

Ó— œ A

÷

Ôwhere A

÷

=r

K

k=1

Ë—

k

≠ ÷k

, —k

+ ÷k

È.

Therefore, we find ⁄

ÂA

1d (F (—) ≠ F0

(—)) =⁄

ÂA÷

1d (F (—) ≠ F0

(—))

and this cannot be true unless F (—) = F0

(—) for almost all — œ B because A contains an‘-neighborhood of B. Based on this contradiction, we complete the proof.

3Lemma 3 implies that the di�erence of the expected value functions, EV (x, 1; —, –)≠EV (x, 0; —, –) in (6)is monotonic in each element of x because EV (x, 0; —, –) does not depend on x (Recall that “d = 0” denotesthe replacement of a bus engine). It also follows that the function x

Õ— + ” [EV (x, 1; —, –) ≠ EV (x, 0; —, –)] in

(6) is monotonic in each element of x.

18

Page 19: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

References

[1] Berry, S and P. Haile, (2010), “Nonparametric Identification of Multinomial ChoiceModels with Heterogeneous Consumers and Endogeneity,” Yale University working pa-per.

[2] Berry, S., J. Levinsohn, and A. Pakes, (1995), “Automobile Price in Market Equilib-rium,” Econometrica, 63(4), 841-890.

[3] Bierens, H., (1982), “Consistent Model Specification Tests,” Journal of Econometrics,26, 323-353.

[4] Bierens, H., (1990), “A Consistent Conditional Moment Test of Functional Form,”Econometrica, 58, 1443-1458.

[5] Boyd, J and R. Mellman, (1980), “E�ect of Fuel Economy Standards on the U. S.Automotive Market: An Hedonic Demand Analysis,” Transportation Research B, 14(5),367-378.

[6] Briesch, R, P. Chintagunta, and R. Matzkin, (2010), “Nonparametric Discrete ChoiceModels with Unobserved Heterogeneity,” Journal of Business and Economic Statistics,28(2), 291-307.

[7] Burda, M., M. Harding, and J. Hausman, (2008), “A Bayesian Mixed Logit-Probit forMultinomial Choice,” Journal of Econometrics, 147(2), 232-246.

[8] Cardell, N. and F. Dunbar, (1980), “Measuring the Societal Impacts of AutomobileDownsizing,” Transportation Research B, 14(5), 423-434.

[9] Chiappori, P. and I. Komunjer, (2009), “On the Nonparametric Identification of MultipleChoice Models,” working paper.

[10] Fox, J. and A. Gandhi, (2010), “Nonparametric Identification and Estimation of Ran-dom Coe�cients in Nonlinear Economic Models,” University of Michigan working paper.

[11] Fox, J, K. Kim, S. Ryan, and P. Bajari, (2012), “The Random Coe�cients Logit Modelis Identified,” Journal of Econometrics, 166(2), 204-212.

[12] Fox, J, K. Kim, and C. Yang, (2013), “A Simple Nonparametric Approach to Estimatingthe Distribution of Random Coe�cients in Structural Models,” working paper.

[13] Gautier, E. and Y. Kitamura, (2013), “Nonparametric Estimation in Random Coe�-cients Binary Choice Models,” Econometrica, 81(2), 581-607.

[14] Hoderlein, S., J. Klemala, and E. Mammen, (2010), “Reconsidering the Random Coef-ficient Model,” Econometric Theory, 26(3), 804-837.

[15] Hornik, K., (1991), “Approximation capabilities of multilayer feedforward networks,”Neural Networks, 4, 251-257.

[16] Ichimura, H. and T. Thompson, (1998), “Maximum Likelihood Estimation of a BinaryChoice Model with Random Coe�cients of Unknown Distribution,” Journal of Econo-metrics, 86(2), 269-295.

[17] Krantz, S. and H. Parks, (2002), A Primer on Real Analytic Functions, Second Edition.Birkhauser.

19

Page 20: Identification of the Distribution of Random Coecients in Static and Dynamic Discrete Choice … · Discrete Choice, Dynamic Discrete Choice 1 Introduction Modelling heterogeneity

[18] Lewbel, A., (2000), “Semiparametric Qualitative Response Model Estimation with Un-known Heteroskedasticity or Instrumental Variables,” Journal of Econometrics, 97(1),145-177.

[19] Magnac, T. and D. Thesmar, (2002), “Identifying Dynamic Discrete Decision Processes,”Econometrica, 70, 801-816.

[20] McFadden, D., (1978), “Modelling the Choice of Residential Location,” in Spatial In-teraction Theory and Planning Models, 75-96, A. Karlquist, L. Lundquist, F. Snickars,and J. W. Weibull et al. (Eds), Amsterdam, New York, North-Holland.

[21] McFadden, D. and K. Train, (2000), “Mixed MNL models for discrete response,” Journalof Applied Econometrics, 15(5), 447-470.

[22] Nevo, A., (2001), “Measuring Market Power in the Ready-to-Eat Cereal Industry,”Econometrica, 69(2), 307-342.

[23] Petrin, A., (2002), “Quantifying the Benefits of New Products: The Case of the Mini-van,” Journal of Political Economy, 110, 705-729.

[24] Rossi, P., G. Allenby, and R. McCulloch, (2005), Bayesian Statistics and Marketing.West Sussex: John Willy & Sons.

[25] Rust, J., (1987), “Optimal Replacement of GMC Bus Engines: An Empirical Model ofHarold Zurcher,” Econometrica, 55(5), 999-1033.

[26] Rust, J. (1994), “Structural Estimation of Markov Decision Processes”, in Handbook ofEconometrics, vol. 4, edited by Robert F. Engle and Daniel L. McFadden. Amsterdam:North-Holland.

[27] Stinchcombe, M. and H. White, (1998), “Consistent Specification Testing with NuisanceParameters Present Only Under the Alternative,” Econometric Theory, 14, 295-325.

20