testing non-identifying restrictions

7/31/2019 Testing Non-Identifying Restrictions

1/26

Testing Non-identifying Restrictions1

Marc Henry

Columbia University

First draft: September 15, 2005

This draft: January 25, 2006

Abstract

We propose a test of specification for structural models without

identifying assumptions. The model is defined as a binary relation

between latent and observable variables, coupled with a hypothesized

family of distributions for the latent variables. The objective of the

testing procedure is to determine whether this hypothesized familyof latent variable distributions has a non-empty intersection with the

set of distributions compatible with the observable data generating

process and the binary relation defining the model. When the model is

given in parametric form, The test can be inverted to yield confidence

intervals for the identified parameter set.

JEL Classification: C12, C14

Keywords: random sets, empirical process.

1Preliminary and incomplete. Helpful discussions with Alfred Galichon, Rosa Matzkin,

Alexei Onatski, Jim Powell and Peter Robinson are gratefully acknowledged (with the

usual disclaimer). Correspondence address: Department of Economics, Columbia Univer-

sity, 420 W 118th Street, New York, NY 10027, USA. [email protected].

1


2/26

1 Introduction

We consider a very general econometric model specification Variables under

consideration are divided into two groups.

Latent variables, u U = Rdu. The vector u is not observed by theanalyst, but some of its components may be observed by the economic

actors. Theorem 1 below holds more generally when U is a complete,

metrizable and separable topological space (i.e. a Polish space).

Observable variables, y Y = Rdy . The vector y is observed by theanalyst. Theorem 1 holds more generally when Y is a convex metrizable

subset of a locally convex topological vector space.

The Borel sigma-algebras ofY and U will be respectively denoted BY and BU.Call P the Borel probability measure that represents the true data generat-

ing process for the observable variables, and Va family of Borel probabilitymeasures that are hypothesized to be possible data generating processes for

the latent variables. Finally, the economic model is given by a relation be-

tween observable and latent variables, i.e. a subset ofY U, which we shallwrite as a multi-valued mapping from Y to U denoted by . Suppose a set

of restrictions on the hypothesized latent variable distributions is given by

V0 Vand a set of restrictions on the model is given by 0.

Example 1: parametric models and restrictions. Suppose the eco-

nomic model is known up to a finite dimensional parameter vector , the

chosen family of distributions for the latent variables depends on a finite

dimensional parameter vector . The hypothesized restrictions are the fol-

lowing;

0 Rd ; 0 Rd .

2


3/26

The restricted family of distributions for the latent variables then becomes

V0 = {, 0}

and all the models {, 0} are considered for the relation linkingobservable to latent variables. Hence our restricted model is defined by

0 =0

.

Example 2: games with multiple equilibria. Suppose the payoff

function for player j, j = 1, . . . , J is given by

j(Sj, Sj, Xj, Uj; ),

where Sj is player js strategy and Sj is their opponents strategies. Xj is a

vector of observable characteristics of player j and Uj a vector of unobservable

determinents of the payoff. Finally is a vector of parameters. Pure strategy

Nash equilibrium conditions

j(Sj, Sj, Xj, Uj; ) j(S, Sj , Xj, Uj; ), for all S

define a correspondence from unobservable player characteristics to ob-

servable variables (S, X), and if the unobservable player characteristics, in-

terpreted as types of the players are supposed uniformely distributed on the

relevant domain, then V0 is a singleton.

In this paper, we propose a general framework for conducting inference

without additional assumptions such as equilibrium selection mechanisms

necessary to identify the model (i.e. to ensure that is single-valued). The

usual terminology for such models is incomplete or partially identified.

3


4/26

In a parametric setting, the objective of inference in partially identi-

fied models is the estimation of the set of parameters which are compatible

with the distribution of the observed data and an assessment of the qual-

ity of that estimation. Chernozhukov, Hong, and Tamer (2002) propose an

M-estimation procedure to construct a set that contains all compatible para-

meters with a predetermined probability. Shaikh (2005) extends and refines

their method and Andrews, Berry, and Jia (2004) and Pakes, Porter, Ho,

and Ishii (2004) propose alternative procedures in a similar framework.

The inference procedure presented here is based on a characterizationof probability measures in the Core of the random set generated by the

distribution of observables P and the multivalued mapping 0 and a method

to determine whether hypothesized latent variable data generating processes

satisfy this characterization. No a priori parametric assumptions are needed,

but if they are made, the inference methodology yields similar confidence

sets to those proposed in the previously cited papers. In the notation of

example 1, for a given one can derive a set of s compatible with and

P at a given significance level, and conversely, one can derive a set of s

compatible with P and for a given .

The next section proposes the characterization of probability measures in

the Core of a random set, the following section describes the testing prin-

ciples and the last section illustrates the approach on a simple entry model

with multiple equilibria. Proofs and additional results are collected in the

appendix.

4


5/26

2 Testing general model specifications

2.1 Definition of the null hypothesis

We wish to develop a procedure to detect whether the two sets of restrictions,

on the family of distributions for the latent variables on the one hand, and

on the relation between observable and latent variables on the other hand

are compatible. First we explain what we mean by compatible. It is very

easily understood in the simple case where the link between latent and

observable variables is parametric and is measurable and single valued for

each 0. Defining the image measure ofP by by

P1 (A) = P{y Y| (y) A}, (1)

for all A BU, we say that the restrictions V0 and 0 are compatible if andonly if there is at least a 0 and a V0 such that = P1 . In thegeneral case considered here, 0 may not be single valued, and its images may

not even be disjoint (which would be the case if it was the inverse image ofa single valued mapping from U to Y, i.e. a traditional function from latent

to observable variables). However, under a measurability assumption on 0,

we can construct an analogue of the image measure, which will now be a set

Core(0, P) of Borel probability measures on U (to be defined below), and the

hypothesis ofcompatibility of the restrictions on latent variable distributions

and on the models linking latent and observable variables will naturally take

the form

H0 : V0 Core(0, P) = . (2)

Assumption 1 0 has non-empty and closed values, and for each open set

O U, 10 (O) = {y Y | 0(y) O = } BY.

5


6/26

To relate the present case to the intuition of the single-valued case, it

is useful to think in terms of single-valued selections of the multi-valued

mapping 0. A measurable selection of 0 is a measurable function such

that (y) (y) for all y Y. The set of measurable selections of a multi-valued mapping that satisfies Assumption 1 is denoted Sel(0), and it is

known to be non-empty since Rokhlin (1949) Part I, 2, No 9, Lemma 21.To each selection of 0, we can associate the image measure of P, denoted

P 1, defined as in (1).

A natural reformulation of the compatibility condition is that at leasta probability measure V0 can be written as a mixture of probabilitymeasures of the form P 1, where ranges over Sel(0). However, even

for the simplest multi-valued mapping, the set of measurable selections is

very rich, let alone the set of their mixtures. Hence, our first goal is to

give a manageable representation of such a mixture. This is the object of

Theorem 1 below.

Theorem 1 Under assumption 1, is a mixture of images ofP by measurableselections of 0, (i.e. for any in the weak closed convex hull of{P 1; Sel(0)}) if and only if there exists for P-almost all y Y a probabilitymeasure (y, .) on U with support 0(y), such that

(B) =

Y

(y, B) P(dy), all B BU. (3)

Remark 1: The weak topology on (U), the set of probability measures on

U, is the topology of convergence in distribution. (U) is also Polish, andthe weak closed convex hull of {P 1; Sel(0)} is indeed the collectionof arbitrary mixtures of elements of {P 1; Sel(0)}.

1The commentary at the end of chapter 14 of Rockafellar and Wets (1998) sheds light

on the controversy surrounding this attribution.

6


7/26

Remark 2: Notice that (3) looks like a disintegration of, and indeed, when

0 is the inverse image of a single-valued measurable function (i.e. when the

model is given by a single-valued measurable function from latent to observ-

able variables), the probability kernel is exactly the (P, 10 )-disintegration

of , in other words, (y, .) is the conditional probability measure on U un-

der the condition 10 (u) = {y}. Hence (3) has the interpretation that arandom element with distribution can be generated as a draw from (y, .)

where y is a realization of a random element with distribution P.

Remark 3: We define Core(0, P) as the weak convex-hull of {P 1; Sel(0)}, or equivalently as the set of all mixtures of images of P by measur-able selections of 0. So our null hypothesis (2) is well defined.

2

2.2 Definition of the test statistic

Now that we have identified the set of latent variable data generating processes

compatible with the observable distribution P and the model correspondence0 with Core(0, P), and we have characterized elements of the latter by

means of Theorem 1, we propose a test statistic based on this characteriza-

tion. Call V00 the subset ofV0 that is compatible with the model correspon-dence and the distribution of observables. Hence

V00 = V0 Core(0, P),

which is non empty under the null H0 by definition. By Theorem 1, an

element ofV0 is in V00 if and only if can be written as

(.) =

Y

(y, .)P(dy),

2The name Core is justified by Theorem A2 of Appendix A.

7


8/26

where the (y, .) are probability measures with support 0(y) for P-almost

all y. From now on, we shall use the de Finetti notation f for the integral

of the function f with respect to the measure , so that we shall writeY

(y, .)P(dy) = P .

Consider such a sample of observations (Y1, . . . , Y n). The empirical distrib-

ution, i.e. the probability measure that gives mass 1n

to each observation, is

denoted Pn, with

Pn(A) =1

n

nj=1

IA(Yj), all A BY.

The empirical counterpart of the integral P is

n = Pn =1

n

nj=1

(Yj, .).

The asymptotic behaviour of the difference between P and its empirical

counterpart Pn is key to the construction of the test statistic. It is described

in the following theorem. Setting u = (u1, . . . , udu), call Bu the rectangles

dui=1(, ui] and ,u = (Bu). Let Gn = n(PnP) denote the empiricalprocess associated with the sample (Y1, . . . , Y n), and finally, let denote

convergence in distribution (aka weak convergence). Then we have

Theorem 2: For any V00 with a density with respect to Lebesguemeasure, and for any satisfying (3), Gn converges weakly, uniformely over

the family of functions

,u, u Rdu

, to a P-Brownian bridge G, i.e. a

Gaussian process with zero mean and covariance function defined by

EG,uG,v = P ,u,v P ,uP ,v.

This implies that

n supuRdu

|Pn,u P ,u| G

8


9/26

where G is a random variable. In the particular case where du = 1, G is

such that for all x R, Pr(G > x) = 2j=1(1)j+1e2j2x2 .Remark 1: A remarkable feature of Theorem 2, is that in the case of a single

real valued latent variable, the test statistic has a distribution-free limit with

easily computable quantiles.

The test statistic implicitly proposed in Theorem 2 to test whether a given

latent variable distribution is compatible with the model restriction 0 and

the observable distribution P is infeasible in that is depends on the unknownprobability kernels such that = P n. They can be estimated as solutions

from the integral equation = Pn with the restriction that the (y) are

probability measures on (y). This equation has solutions (generically many)

if and only if Core(0, Pn) by Theorem 1, but solutions are likely to bedifficult to exhibit except in very simple cases, such as the cases developed

in section 3.

An alternative is to construct a test statistic based on the distance be-tween a hypothesized latent variable measure (or more generally V0) andCore(, Pn), which by construction will be smaller than the test statistic

of Theorem 2, and hence can be used as a basis for a conservative testing

procedure. This is summarized in the following corollary:

Corollary 1: Under the null H0,

limsupn infV0

infCore(0,Pn)

supuRdu

n

|(Bu)

(Bu)

| supuRdu |

G

|,

and the infima are achieved providing V0 is chosen to be closed in the weaktopology.

Given the conservative nature of the procedure based on Corollary 1, it

9


10/26

is crutial to assess the power of the test, as described in the next section.

2.3 Power analysis

The two test statistics considered in section 2.2 are the following:

TS1 = inf V0

supuRdu

n |Pn(Bu) (Bu)|

TS2 = inf V0

infCore(0,Pn)

supuRdu

n |(Bu) (Bu)|.

Note that since Pn Core(0, Pn), TS2 is dominated by TS1 by construc-tion.

To assess the power of either test, we consider the following types of local

alternatives:

dH(Vn, Core(n, P)) r1n , > 0, (4)

where rn is a deterministic sequence of reals diverging with n, and dH denotes

Hausdorff distance defined as follows: for any two sets V1 and V2 in (U),and any d metrizing weak convergence,

dH(V1, V2) = max

sup1V1

inf2V2

d(1, 2), sup2V2

inf1V1

d(1, 2)

The principle of both test statistics TS1 and TS2 rests on the set con-

vergence of Core(, Pn) to Core(, P) for a fixed model correspondence .

Hence, for n large enough, Core(n, Pn) is sufficiently close to Core(n, P) for

the test statistic to detect the sequence of local alternatives, as summarized

in the following theorem.

Theorem 3: Under the sequence of alternatives defined in (4) with rn =

o(

n), if the family of functions {(Bu): with satisfying (3) and

10


11/26

Core(n, P), u

U, n

N

}, is P-Donsker, then the test statistics TS1 and

TS2 diverge.

Remark 1: As developed in Appendix A, Theorem 3 has an interesting

interpretation in terms of convergence of empirical random sets: for a random

element Yj in Y, (Yj) is a random set in U under assumption 1, and its

distribution can be identified with Core(, P). Theorem 3 tells us when the

empirical distribution Core(, Pn) of the random set (Y) weakly converges

to the true distribution at rate

n. Such a result appears to be new in the

literature on convergence of random sets.

3 Illustration: a simple entry model

3.1 Single type

Consider a market with two firms producing complementary products with

identical costs.3

The payoff functions are

1(x1, x2, u) = (x2 u)I{x1=1},2(x1, x2, u) = (x1 u)I{x2=1},

where xi {0, 1} is firm is action, and u is an exogenous cost. The firmsknow their cost; the analyst, however, knows only that u [0, 1], and thatthe structural parameter is in (0, 1]. There are two Nash equilibria. The

first is x1 = x2 = 0 for all u

[0, 1]. The second is x1 = x2 = 1 for all

u [0, ] and zero otherwise. Since the two firms actions are perfectlycorrelated, we shall denote them by a single binary variable y = x1 = x2.

Hence the model is described by the multi-valued mapping: (1) = [0, ] and

3Jovanovic (1989) and Tamer (2003) consider this simple game in a similar context.

11


12/26

(0) = [0, 1]. If we consider the restriction

max, then the multi-valued

mapping incorporating the restriction is 0 defined by 0(1) = [0, max] and

0(0) = [0, 1]. In this case, since y is Bernoulli, we can write P = (1 p, p) with p the probability of a 1. For the distribution of u, we consider a

parametric exponential family on [0, 1]. Hence V= { := u1du}>0, andthe restriction can be chosen as > 0.

Consider the smallest reliability P that can be attached to a set A in U

based on 0 and P, defined by

P(A) = P{y Y|0(y) A}.

Our null hypothesis of compatibility of the two sets of restrictions is that for

some [, ], set-wise dominates P, in other words, that associatesto each set a measure at least as large as the smallest reliability that can be

attached to it. This is equivalent to the existence of a [, ] such that forP-almost all y a probability measure (y, .) supported on 0(y) such that

for all u [0, 1]u =

Y

(y, [0, u])P(dy).

In other words,

u = (1 p)(0, [0, u]) + p(1, [0, u])

and

(1, [0, max]) = (0, [0, 1]) = 1

with for P-almost all y, (y, [0, u]) a nondecreasing, right-continuous func-

tion of u taking values in [0, 1]. When max < p, there is no solution (i.e.

12


13/26


14/26

which for any satisfying

p (there exists at least one under the null)

converges weakly to the supremum G of a standard Brownian bridge, with

Pr(G > x) = 2j=1

(1)j+1e2j2x2.

A testing procedure that does not require computation of the kernels ,

as described in section 2.2 consists in finding the element of Core(0, Pn)

that minimizes the Kolmogorov-Smirnov distance to the set of distributions

{u, }. If max pn, then u is a minimizer and the minimumdistance is zero. If max < pn, then the minimum Kolmogorov-Smirnov

distance is pn u. Ifmax > p, ultimately so will pn and the test statistic iszero. If

max < p, then ultimately so will pn, and the test statistic diverges.

Finally, if

max = p, then the statistic will be

nmax(0, pn p). Finally, inthis very simple example, one might consider a the test of the null H0 : p =

max against the one-sided alternative Ha : p >

max using the fact that

under the null, n

max(1

max) 1

2

(pn max) converges to a standard

normal random variable.

To summarize, the procedures proposed are based on the following test

statistics:

TS1 = inf

supu[0,1]

|Gn,u|

TS2 = inf

infCore(0,Pn)

supu[0,1]

n|u

F

(u)|

TS3 =

n(pn max)

and the following approximating distributions:

14


15/26

AD1 =

p

(1 p) (1

max) |N(0, 1)|

AD2 = G

AD3 =

max(1 max) N(0, 1)

AD4 =

p(1 p) N(0, 1)

The procedure based on estimation of the probability kernels and com-

parison between hypothesized distributions = P and their empirical

counterparts Pn would result in comparing TS1 to the quantiles of AD1

(for the exact asymptotic version) or AD2 (for the conservative asymptotic

version). The procedure based on the minimum distance between the hypoth-

esized distributions and the empirical random set distribution Core(0, Pn)would result in comparing TS2 with the quantiles of AD4 (for an exact as-

ymptotic version) or AD2 (for the conservative asymptotic version). Finally,

the simple test on the boundary would result in comparing TS3 to the quan-

tiles of AD3.

3.2 Heterogeneous types

Consider a market with two firms producing complementary products withheterogeneous costs. The payoff functions are

1(x1, x2, u) = (x2 u1)I{x1=1},2(x1, x2, u) = (x1 u2)I{x2=1},

15


16/26

where xi

{0, 1

}is firm is action, and the us are firm specific exogenous

costs. The firms know their cost; the analyst, however, knows only that

u [0, 1]2, and that the structural parameter is in (0, 1]. There are twoNash equilibria. The first is x1 = x2 = 0 for all u [0, 1]2. The secondis x1 = x2 = 1 for all u [0, ]2 and zero otherwise. Since the two firmsactions are perfectly correlated, we shall denote them by a single binary

variable y = x1 = x2. Hence the model is described by the multi-valued

mapping: (1) = [0, ]2 and (0) = [0, 1]2. If we consider the restriction

max, then the multi-valued mapping incorporating the restriction is 0defined by 0(1) = [0, max]

2 and 0(0) = [0, 1]2. In this case, since y is

Bernoulli, we can write P = (1 p, p) with p the probability of a 1. Forthe distribution ofu, we consider the costs to be independent with marginals

following the same parametric exponential family on [0, 1]. Hence V= { :=u11 u

12 du1du2}>0, and the restriction can be chosen as > 0.

A density version of (3) can be derived in this case, and makes for a more

convenient test statistic. Writing u = (u1, u2),

f(u) = f(u1)f(u2) = 2u11 u

12 =

Y

(y, u)P(dy).

In other words,

2u11 u12 = (1 p)(0, u) + p(1, u)

under the constraints[0,1]2

(0, u)du =

[0,max]2

(1, u)du = 1

and (1, u) = 0 for all u / [0, max]2.When 2max < p, there is no solution (i.e. the two sets of restrictions are

incompatible), whereas when 2 p, a solution is given by

(0, u) =f(u)

1 p

1 p 2max I[0,max]2

16


17/26

(1, u) = 2max f(u) I[0,max]2.

Consider now the empirical process Gn =

n(Pn P) applied to thefamily of functions ,u(y) := (y, u). Elementary calculations yield

Gn,u =

npn p1 p g,u,

where

g,u := f(u) 2max I[0,max]2 1 .

In this case, it is convenient to use the L1 metric, we are looking at the

minimum of[0,1]2

|Gn,u| =

n|pn p|

1 p[0,1]2

f(u)2max I[0,max]2 1 du.

Now

[0,1]2 f(u) 2max I[0,max]2 1 du = 2(1

2max)

which is minimized at = to yield 2(1 2max). So

inf

[0,1]2

|Gn,u| du 2

p

1 p (1 2max) |Z|.

where Z is a standard normal random variable.

Appendix A: Empirical Distributions of Random Sets

In assumption 1, we assume that the correspondence is measurable in the

traditional sense, defined below:

Definition A1 (Effros Measurability) A correspondence : (Y, BY)(U, BU) is said to be Effros measurable, or weakly measurable, or simply

17


18/26

measurable, if the inverse image of open sets is measurable, i.e. if for all

open subsets O of U,

1(O) = {y Y | (y) O = } BY.

There are several ways a measurable correspondence can convey proba-

bilistic information on its image space (U, BU) given observed frequencies ofoutcomes in Y.

Dempster (1967) suggests to consider the smallest reliability that can be

associated with the event A BU as the belief functionP(A) = P{y Y | (y) A}

and the largest plausibility that can be associated with the event A as the

plausibility function

P(A) = P{y Y | (y) A = }

the two being linked by the relation

P(A) = 1 P(Ac), (5)

which prompted some authors to call them conjugates or dual of each other.

A natural way to construct a set of probability measures is to consider

all probability measures that dominate the set function P set-wise, forming

thus the core of the belief function:

Core(, P) = { M(U) | A BU, (A) P(A)}= { M(U) | A BU, (A) P(A)}

where the first equality can be taken as a definition, and the second follows

immediately from (5). It is well known that Core(, P) is non-empty, and it

will be shown as a consequence of (3.2) below.

18


19/26

A different way of defining probabilistic information generated by the

correspondence can be derived from Aumanns idea (in Aumann (1965))

of considering correspondences as bundlesof their selections.

Define the domain of the correspondence by

Dom() = {y Y | (y) = }.

A measurable selection of the measurable correspondence is defined by the

property below:

Definition A2 (Measurable Selection) A measurable selection of corre-

spondence : (Y, BY) (U, BU) is a (BY, BU)-measurable function suchthat (y) (y) for all y Dom().

The set of measurable selections of a measurable correspondence is

denoted Sel(), and it is non-empty by a theorem due to Rokhlin, (Rokhlin

(1949) Part I, 2, No 9, Lemma 2) and generally attributed to Kuratowskiand Ryll-Nardzewski:

Theorem A1 (Rokhlin) An Effros measurable correspondence with

closed non-empty values admits a measurable selection. For a proof, see

for instance Theorem 8.1.3 page 308 of Aubin and Frankowska (1990).

Elements of Sel() can be used to transport the probability P on Y to

probabilities on U. For each Sel(), consider the probability definedon each A BU by

(A) = P{y Y | (y) A} = P 1(A),

and define

(, P) = { M(U), = P 1 some Sel()}.

19


20/26

It is easily seen that (, P)

Core(, P). A converse is given by the

following theorem of Castaldo, Maccheroni, and Marinacci (2004):

Theorem A2 (Castaldo, Maccheroni and Marinacci) If is measur-

able and compact-valued, then Core(, P) is the weak closed convex hull of

(, P).

We now develop the claim made in remark 1 of Theorem 3. and P

define a random set with realizations (Yj) for realizations Yj from P. P is

the distribution of the random set, and the empirical distribution associated

with a sample (Y1, . . . , Y n) is given by

P(A) =1

n

nj=1

I{(Yj)A=}.

Core(, P) characterizes P and Core(, Pn) characterizes Pn. Hence The-

orem 1 and Donsker theorems provide a way to derive conditions for weak

convergence of empirical random sets at rate

n.

Appendix B: Proofs

Proof of Theorem 1:

Call (B) the set of all Borel probability measures with support B. Under

Assumption 1, the map y (0(y)) is a map from Y to the set of allnon-empty convex sets of Borel probability measures on U which are closed

with respect to the weak topology. Moreover, for any f Cb(U), the set of

all continuous bounded real functions on U, the map

y sup

f d : (0(y))

= maxu0(y)

f(u)

is BY-measurable, so that, by Theorem 3 of Strassen (1965), for a given (U), there exists satisfying (3) with (y, .) (0(y)) for P-almost

20


21/26

all y if and only ifU

f(u)(du) U

supu0(y)

f(u)P(dy) (6)

for all f Cb(U). Now, defining P as the set function

P : B P({y Y : 0(y) B = }),

the right-hand side of (6) is shown in the following sequence of equalities to

be equal to the integral of f with respect to P in the sense of Choquet (line

(7) below can be taken as a definition).Y

supu0(y)

{f(u)} dP(y)

=

0

P{y Y : supu0(y)

{f(u)} x} dx

+

0

(P{y Y : supu0(y)

{f(u)} x} 1) dx

=0

P{y Y : 0(y) {f x}} dx

+

0

(P{y Y : 0(y) {f x}} 1) dx

=

0

P({f x}) dx +0

(P({f x}) 1) dx =Ch

fdP . (7)

By Theorem 1 of Castaldo, Maccheroni, and Marinacci (2004), for any

f Cb(U),

Ch

fdP = maxSel(0)

U

f(u)P 1(du),

so that (6) is equivalent to

maxSel(0)

U

f(u)P 1(du) U

f(u)(du) (8)

21


22/26


23/26


24/26

Proof of Theorem 3

The assumption guarantees that Gn =

n(Pn P) converges to a BrownianBridge uniformely over the family defined in the statement of the theorem.

Hence,

supCore(n,P)

supuRdu

|Gn(Bu)|

converges to a bounded random variable. Since the Kolmogorov-Smirnov

metric is stronger than d which metrizes weak convergence, the latter display

implies that

supCore(n,P)

supCore(n,P)

d(, ) = O(1/

n).

Now,

TS2 dH(Vn, Core(n, Pn)) supCore(n,P)

supCore(n,P)

d(, )

and TS2 TS1, so the result follows.

References

Andrews, D., S. Berry, and P. Jia (2004): Confidence Regions for Pa-

rameters in Discrete Games with Multiple Equilibria, with an Application

to Discount Chain Store Location, unpublished manuscript.

Aubin, J.-P., and H. Frankowska (1990): Set-valued analysis. Boston:

Birkhauser.

Aumann, R. (1965): Integrals of set-valued functions, Journal of Mathe-

matical Analysis and Applications, 12, 112.

24


25/26

Castaldo, A., F. Maccheroni, and M. Marinacci (2004): Random

sets and their distributions, Sankhya (Series A), 66, 409427.

Chernozhukov, V., H. Hong, and E. Tamer (2002): Inference on

Parameter Sets in Econometric Models, unpublished manuscript.

Dempster, A. P. (1967): Upper and lower probabilities induced by a

multi-valued mapping, Annals of Mathematical Statistics, 38, 325339.

Dudley, R. (2003): Real Analysis and Probability. Cambridge University

Press.

Jovanovic, B. (1989): Observable implications of models with multiple

equilibria, Econometrica, 57, 14311437.

Pakes, A., J. Porter, K. Ho, and J. Ishii (2004): Moment Inequalities

and Their Application, unpublished manuscript.

Rockafellar, R. T., and R. J.-B. Wets (1998): Variational Analysis.

Springer.

Rokhlin, V. (1949): Selected topics from the metric theory of dynam-

ical systems, Uspekhi Matematicheskikh Nauk, 4, 57128, translated in

American Mathematical Society Transactions 49(1966), 171-240.

Shaikh, A. (2005): Inference for a Class of Partially Identified Econometric

Models, unpublished manuscript.

Strassen, V. (1965): The existence of probability measures with given

marginals, Journal of Mathematical Statistics, 36, 423439.

Tamer, E. (2003): Incomplete Simultaneous Discrete Response Model with

Multiple Equilibria, Review of Economic Studies, 70, 147165.

25


26/26

van der Vaart, A., and J. Wellner (2000): Weak Convergence and

Empirical Pocesses. Springer.

26

testing non-identifying restrictions

Documents