testing non-identifying restrictions
TRANSCRIPT
-
7/31/2019 Testing Non-Identifying Restrictions
1/26
Testing Non-identifying Restrictions1
Marc Henry
Columbia University
First draft: September 15, 2005
This draft: January 25, 2006
Abstract
We propose a test of specification for structural models without
identifying assumptions. The model is defined as a binary relation
between latent and observable variables, coupled with a hypothesized
family of distributions for the latent variables. The objective of the
testing procedure is to determine whether this hypothesized familyof latent variable distributions has a non-empty intersection with the
set of distributions compatible with the observable data generating
process and the binary relation defining the model. When the model is
given in parametric form, The test can be inverted to yield confidence
intervals for the identified parameter set.
JEL Classification: C12, C14
Keywords: random sets, empirical process.
1Preliminary and incomplete. Helpful discussions with Alfred Galichon, Rosa Matzkin,
Alexei Onatski, Jim Powell and Peter Robinson are gratefully acknowledged (with the
usual disclaimer). Correspondence address: Department of Economics, Columbia Univer-
sity, 420 W 118th Street, New York, NY 10027, USA. [email protected].
1
-
7/31/2019 Testing Non-Identifying Restrictions
2/26
1 Introduction
We consider a very general econometric model specification Variables under
consideration are divided into two groups.
Latent variables, u U = Rdu. The vector u is not observed by theanalyst, but some of its components may be observed by the economic
actors. Theorem 1 below holds more generally when U is a complete,
metrizable and separable topological space (i.e. a Polish space).
Observable variables, y Y = Rdy . The vector y is observed by theanalyst. Theorem 1 holds more generally when Y is a convex metrizable
subset of a locally convex topological vector space.
The Borel sigma-algebras ofY and U will be respectively denoted BY and BU.Call P the Borel probability measure that represents the true data generat-
ing process for the observable variables, and Va family of Borel probabilitymeasures that are hypothesized to be possible data generating processes for
the latent variables. Finally, the economic model is given by a relation be-
tween observable and latent variables, i.e. a subset ofY U, which we shallwrite as a multi-valued mapping from Y to U denoted by . Suppose a set
of restrictions on the hypothesized latent variable distributions is given by
V0 Vand a set of restrictions on the model is given by 0.
Example 1: parametric models and restrictions. Suppose the eco-
nomic model is known up to a finite dimensional parameter vector , the
chosen family of distributions for the latent variables depends on a finite
dimensional parameter vector . The hypothesized restrictions are the fol-
lowing;
0 Rd ; 0 Rd .
2
-
7/31/2019 Testing Non-Identifying Restrictions
3/26
The restricted family of distributions for the latent variables then becomes
V0 = {, 0}
and all the models {, 0} are considered for the relation linkingobservable to latent variables. Hence our restricted model is defined by
0 =0
.
Example 2: games with multiple equilibria. Suppose the payoff
function for player j, j = 1, . . . , J is given by
j(Sj, Sj, Xj, Uj; ),
where Sj is player js strategy and Sj is their opponents strategies. Xj is a
vector of observable characteristics of player j and Uj a vector of unobservable
determinents of the payoff. Finally is a vector of parameters. Pure strategy
Nash equilibrium conditions
j(Sj, Sj, Xj, Uj; ) j(S, Sj , Xj, Uj; ), for all S
define a correspondence from unobservable player characteristics to ob-
servable variables (S, X), and if the unobservable player characteristics, in-
terpreted as types of the players are supposed uniformely distributed on the
relevant domain, then V0 is a singleton.
In this paper, we propose a general framework for conducting inference
without additional assumptions such as equilibrium selection mechanisms
necessary to identify the model (i.e. to ensure that is single-valued). The
usual terminology for such models is incomplete or partially identified.
3
-
7/31/2019 Testing Non-Identifying Restrictions
4/26
In a parametric setting, the objective of inference in partially identi-
fied models is the estimation of the set of parameters which are compatible
with the distribution of the observed data and an assessment of the qual-
ity of that estimation. Chernozhukov, Hong, and Tamer (2002) propose an
M-estimation procedure to construct a set that contains all compatible para-
meters with a predetermined probability. Shaikh (2005) extends and refines
their method and Andrews, Berry, and Jia (2004) and Pakes, Porter, Ho,
and Ishii (2004) propose alternative procedures in a similar framework.
The inference procedure presented here is based on a characterizationof probability measures in the Core of the random set generated by the
distribution of observables P and the multivalued mapping 0 and a method
to determine whether hypothesized latent variable data generating processes
satisfy this characterization. No a priori parametric assumptions are needed,
but if they are made, the inference methodology yields similar confidence
sets to those proposed in the previously cited papers. In the notation of
example 1, for a given one can derive a set of s compatible with and
P at a given significance level, and conversely, one can derive a set of s
compatible with P and for a given .
The next section proposes the characterization of probability measures in
the Core of a random set, the following section describes the testing prin-
ciples and the last section illustrates the approach on a simple entry model
with multiple equilibria. Proofs and additional results are collected in the
appendix.
4
-
7/31/2019 Testing Non-Identifying Restrictions
5/26
2 Testing general model specifications
2.1 Definition of the null hypothesis
We wish to develop a procedure to detect whether the two sets of restrictions,
on the family of distributions for the latent variables on the one hand, and
on the relation between observable and latent variables on the other hand
are compatible. First we explain what we mean by compatible. It is very
easily understood in the simple case where the link between latent and
observable variables is parametric and is measurable and single valued for
each 0. Defining the image measure ofP by by
P1 (A) = P{y Y| (y) A}, (1)
for all A BU, we say that the restrictions V0 and 0 are compatible if andonly if there is at least a 0 and a V0 such that = P1 . In thegeneral case considered here, 0 may not be single valued, and its images may
not even be disjoint (which would be the case if it was the inverse image ofa single valued mapping from U to Y, i.e. a traditional function from latent
to observable variables). However, under a measurability assumption on 0,
we can construct an analogue of the image measure, which will now be a set
Core(0, P) of Borel probability measures on U (to be defined below), and the
hypothesis ofcompatibility of the restrictions on latent variable distributions
and on the models linking latent and observable variables will naturally take
the form
H0 : V0 Core(0, P) = . (2)
Assumption 1 0 has non-empty and closed values, and for each open set
O U, 10 (O) = {y Y | 0(y) O = } BY.
5
-
7/31/2019 Testing Non-Identifying Restrictions
6/26
To relate the present case to the intuition of the single-valued case, it
is useful to think in terms of single-valued selections of the multi-valued
mapping 0. A measurable selection of 0 is a measurable function such
that (y) (y) for all y Y. The set of measurable selections of a multi-valued mapping that satisfies Assumption 1 is denoted Sel(0), and it is
known to be non-empty since Rokhlin (1949) Part I, 2, No 9, Lemma 21.To each selection of 0, we can associate the image measure of P, denoted
P 1, defined as in (1).
A natural reformulation of the compatibility condition is that at leasta probability measure V0 can be written as a mixture of probabilitymeasures of the form P 1, where ranges over Sel(0). However, even
for the simplest multi-valued mapping, the set of measurable selections is
very rich, let alone the set of their mixtures. Hence, our first goal is to
give a manageable representation of such a mixture. This is the object of
Theorem 1 below.
Theorem 1 Under assumption 1, is a mixture of images ofP by measurableselections of 0, (i.e. for any in the weak closed convex hull of{P 1; Sel(0)}) if and only if there exists for P-almost all y Y a probabilitymeasure (y, .) on U with support 0(y), such that
(B) =
Y
(y, B) P(dy), all B BU. (3)
Remark 1: The weak topology on (U), the set of probability measures on
U, is the topology of convergence in distribution. (U) is also Polish, andthe weak closed convex hull of {P 1; Sel(0)} is indeed the collectionof arbitrary mixtures of elements of {P 1; Sel(0)}.
1The commentary at the end of chapter 14 of Rockafellar and Wets (1998) sheds light
on the controversy surrounding this attribution.
6
-
7/31/2019 Testing Non-Identifying Restrictions
7/26
Remark 2: Notice that (3) looks like a disintegration of, and indeed, when
0 is the inverse image of a single-valued measurable function (i.e. when the
model is given by a single-valued measurable function from latent to observ-
able variables), the probability kernel is exactly the (P, 10 )-disintegration
of , in other words, (y, .) is the conditional probability measure on U un-
der the condition 10 (u) = {y}. Hence (3) has the interpretation that arandom element with distribution can be generated as a draw from (y, .)
where y is a realization of a random element with distribution P.
Remark 3: We define Core(0, P) as the weak convex-hull of {P 1; Sel(0)}, or equivalently as the set of all mixtures of images of P by measur-able selections of 0. So our null hypothesis (2) is well defined.
2
2.2 Definition of the test statistic
Now that we have identified the set of latent variable data generating processes
compatible with the observable distribution P and the model correspondence0 with Core(0, P), and we have characterized elements of the latter by
means of Theorem 1, we propose a test statistic based on this characteriza-
tion. Call V00 the subset ofV0 that is compatible with the model correspon-dence and the distribution of observables. Hence
V00 = V0 Core(0, P),
which is non empty under the null H0 by definition. By Theorem 1, an
element ofV0 is in V00 if and only if can be written as
(.) =
Y
(y, .)P(dy),
2The name Core is justified by Theorem A2 of Appendix A.
7
-
7/31/2019 Testing Non-Identifying Restrictions
8/26
where the (y, .) are probability measures with support 0(y) for P-almost
all y. From now on, we shall use the de Finetti notation f for the integral
of the function f with respect to the measure , so that we shall writeY
(y, .)P(dy) = P .
Consider such a sample of observations (Y1, . . . , Y n). The empirical distrib-
ution, i.e. the probability measure that gives mass 1n
to each observation, is
denoted Pn, with
Pn(A) =1
n
nj=1
IA(Yj), all A BY.
The empirical counterpart of the integral P is
n = Pn =1
n
nj=1
(Yj, .).
The asymptotic behaviour of the difference between P and its empirical
counterpart Pn is key to the construction of the test statistic. It is described
in the following theorem. Setting u = (u1, . . . , udu), call Bu the rectangles
dui=1(, ui] and ,u = (Bu). Let Gn = n(PnP) denote the empiricalprocess associated with the sample (Y1, . . . , Y n), and finally, let denote
convergence in distribution (aka weak convergence). Then we have
Theorem 2: For any V00 with a density with respect to Lebesguemeasure, and for any satisfying (3), Gn converges weakly, uniformely over
the family of functions
,u, u Rdu
, to a P-Brownian bridge G, i.e. a
Gaussian process with zero mean and covariance function defined by
EG,uG,v = P ,u,v P ,uP ,v.
This implies that
n supuRdu
|Pn,u P ,u| G
8
-
7/31/2019 Testing Non-Identifying Restrictions
9/26
where G is a random variable. In the particular case where du = 1, G is
such that for all x R, Pr(G > x) = 2j=1(1)j+1e2j2x2 .Remark 1: A remarkable feature of Theorem 2, is that in the case of a single
real valued latent variable, the test statistic has a distribution-free limit with
easily computable quantiles.
The test statistic implicitly proposed in Theorem 2 to test whether a given
latent variable distribution is compatible with the model restriction 0 and
the observable distribution P is infeasible in that is depends on the unknownprobability kernels such that = P n. They can be estimated as solutions
from the integral equation = Pn with the restriction that the (y) are
probability measures on (y). This equation has solutions (generically many)
if and only if Core(0, Pn) by Theorem 1, but solutions are likely to bedifficult to exhibit except in very simple cases, such as the cases developed
in section 3.
An alternative is to construct a test statistic based on the distance be-tween a hypothesized latent variable measure (or more generally V0) andCore(, Pn), which by construction will be smaller than the test statistic
of Theorem 2, and hence can be used as a basis for a conservative testing
procedure. This is summarized in the following corollary:
Corollary 1: Under the null H0,
limsupn infV0
infCore(0,Pn)
supuRdu
n
|(Bu)
(Bu)
| supuRdu |
G
|,
and the infima are achieved providing V0 is chosen to be closed in the weaktopology.
Given the conservative nature of the procedure based on Corollary 1, it
9
-
7/31/2019 Testing Non-Identifying Restrictions
10/26
is crutial to assess the power of the test, as described in the next section.
2.3 Power analysis
The two test statistics considered in section 2.2 are the following:
TS1 = inf V0
supuRdu
n |Pn(Bu) (Bu)|
TS2 = inf V0
infCore(0,Pn)
supuRdu
n |(Bu) (Bu)|.
Note that since Pn Core(0, Pn), TS2 is dominated by TS1 by construc-tion.
To assess the power of either test, we consider the following types of local
alternatives:
dH(Vn, Core(n, P)) r1n , > 0, (4)
where rn is a deterministic sequence of reals diverging with n, and dH denotes
Hausdorff distance defined as follows: for any two sets V1 and V2 in (U),and any d metrizing weak convergence,
dH(V1, V2) = max
sup1V1
inf2V2
d(1, 2), sup2V2
inf1V1
d(1, 2)
The principle of both test statistics TS1 and TS2 rests on the set con-
vergence of Core(, Pn) to Core(, P) for a fixed model correspondence .
Hence, for n large enough, Core(n, Pn) is sufficiently close to Core(n, P) for
the test statistic to detect the sequence of local alternatives, as summarized
in the following theorem.
Theorem 3: Under the sequence of alternatives defined in (4) with rn =
o(
n), if the family of functions {(Bu): with satisfying (3) and
10
-
7/31/2019 Testing Non-Identifying Restrictions
11/26
Core(n, P), u
U, n
N
}, is P-Donsker, then the test statistics TS1 and
TS2 diverge.
Remark 1: As developed in Appendix A, Theorem 3 has an interesting
interpretation in terms of convergence of empirical random sets: for a random
element Yj in Y, (Yj) is a random set in U under assumption 1, and its
distribution can be identified with Core(, P). Theorem 3 tells us when the
empirical distribution Core(, Pn) of the random set (Y) weakly converges
to the true distribution at rate
n. Such a result appears to be new in the
literature on convergence of random sets.
3 Illustration: a simple entry model
3.1 Single type
Consider a market with two firms producing complementary products with
identical costs.3
The payoff functions are
1(x1, x2, u) = (x2 u)I{x1=1},2(x1, x2, u) = (x1 u)I{x2=1},
where xi {0, 1} is firm is action, and u is an exogenous cost. The firmsknow their cost; the analyst, however, knows only that u [0, 1], and thatthe structural parameter is in (0, 1]. There are two Nash equilibria. The
first is x1 = x2 = 0 for all u
[0, 1]. The second is x1 = x2 = 1 for all
u [0, ] and zero otherwise. Since the two firms actions are perfectlycorrelated, we shall denote them by a single binary variable y = x1 = x2.
Hence the model is described by the multi-valued mapping: (1) = [0, ] and
3Jovanovic (1989) and Tamer (2003) consider this simple game in a similar context.
11
-
7/31/2019 Testing Non-Identifying Restrictions
12/26
(0) = [0, 1]. If we consider the restriction
max, then the multi-valued
mapping incorporating the restriction is 0 defined by 0(1) = [0, max] and
0(0) = [0, 1]. In this case, since y is Bernoulli, we can write P = (1 p, p) with p the probability of a 1. For the distribution of u, we consider a
parametric exponential family on [0, 1]. Hence V= { := u1du}>0, andthe restriction can be chosen as > 0.
Consider the smallest reliability P that can be attached to a set A in U
based on 0 and P, defined by
P(A) = P{y Y|0(y) A}.
Our null hypothesis of compatibility of the two sets of restrictions is that for
some [, ], set-wise dominates P, in other words, that associatesto each set a measure at least as large as the smallest reliability that can be
attached to it. This is equivalent to the existence of a [, ] such that forP-almost all y a probability measure (y, .) supported on 0(y) such that
for all u [0, 1]u =
Y
(y, [0, u])P(dy).
In other words,
u = (1 p)(0, [0, u]) + p(1, [0, u])
and
(1, [0, max]) = (0, [0, 1]) = 1
with for P-almost all y, (y, [0, u]) a nondecreasing, right-continuous func-
tion of u taking values in [0, 1]. When max < p, there is no solution (i.e.
12
-
7/31/2019 Testing Non-Identifying Restrictions
13/26
-
7/31/2019 Testing Non-Identifying Restrictions
14/26
which for any satisfying
p (there exists at least one under the null)
converges weakly to the supremum G of a standard Brownian bridge, with
Pr(G > x) = 2j=1
(1)j+1e2j2x2.
A testing procedure that does not require computation of the kernels ,
as described in section 2.2 consists in finding the element of Core(0, Pn)
that minimizes the Kolmogorov-Smirnov distance to the set of distributions
{u, }. If max pn, then u is a minimizer and the minimumdistance is zero. If max < pn, then the minimum Kolmogorov-Smirnov
distance is pn u. Ifmax > p, ultimately so will pn and the test statistic iszero. If
max < p, then ultimately so will pn, and the test statistic diverges.
Finally, if
max = p, then the statistic will be
nmax(0, pn p). Finally, inthis very simple example, one might consider a the test of the null H0 : p =
max against the one-sided alternative Ha : p >
max using the fact that
under the null, n
max(1
max) 1
2
(pn max) converges to a standard
normal random variable.
To summarize, the procedures proposed are based on the following test
statistics:
TS1 = inf
supu[0,1]
|Gn,u|
TS2 = inf
infCore(0,Pn)
supu[0,1]
n|u
F
(u)|
TS3 =
n(pn max)
and the following approximating distributions:
14
-
7/31/2019 Testing Non-Identifying Restrictions
15/26
AD1 =
p
(1 p) (1
max) |N(0, 1)|
AD2 = G
AD3 =
max(1 max) N(0, 1)
AD4 =
p(1 p) N(0, 1)
The procedure based on estimation of the probability kernels and com-
parison between hypothesized distributions = P and their empirical
counterparts Pn would result in comparing TS1 to the quantiles of AD1
(for the exact asymptotic version) or AD2 (for the conservative asymptotic
version). The procedure based on the minimum distance between the hypoth-
esized distributions and the empirical random set distribution Core(0, Pn)would result in comparing TS2 with the quantiles of AD4 (for an exact as-
ymptotic version) or AD2 (for the conservative asymptotic version). Finally,
the simple test on the boundary would result in comparing TS3 to the quan-
tiles of AD3.
3.2 Heterogeneous types
Consider a market with two firms producing complementary products withheterogeneous costs. The payoff functions are
1(x1, x2, u) = (x2 u1)I{x1=1},2(x1, x2, u) = (x1 u2)I{x2=1},
15
-
7/31/2019 Testing Non-Identifying Restrictions
16/26
where xi
{0, 1
}is firm is action, and the us are firm specific exogenous
costs. The firms know their cost; the analyst, however, knows only that
u [0, 1]2, and that the structural parameter is in (0, 1]. There are twoNash equilibria. The first is x1 = x2 = 0 for all u [0, 1]2. The secondis x1 = x2 = 1 for all u [0, ]2 and zero otherwise. Since the two firmsactions are perfectly correlated, we shall denote them by a single binary
variable y = x1 = x2. Hence the model is described by the multi-valued
mapping: (1) = [0, ]2 and (0) = [0, 1]2. If we consider the restriction
max, then the multi-valued mapping incorporating the restriction is 0defined by 0(1) = [0, max]
2 and 0(0) = [0, 1]2. In this case, since y is
Bernoulli, we can write P = (1 p, p) with p the probability of a 1. Forthe distribution ofu, we consider the costs to be independent with marginals
following the same parametric exponential family on [0, 1]. Hence V= { :=u11 u
12 du1du2}>0, and the restriction can be chosen as > 0.
A density version of (3) can be derived in this case, and makes for a more
convenient test statistic. Writing u = (u1, u2),
f(u) = f(u1)f(u2) = 2u11 u
12 =
Y
(y, u)P(dy).
In other words,
2u11 u12 = (1 p)(0, u) + p(1, u)
under the constraints[0,1]2
(0, u)du =
[0,max]2
(1, u)du = 1
and (1, u) = 0 for all u / [0, max]2.When 2max < p, there is no solution (i.e. the two sets of restrictions are
incompatible), whereas when 2 p, a solution is given by
(0, u) =f(u)
1 p
1 p 2max I[0,max]2
16
-
7/31/2019 Testing Non-Identifying Restrictions
17/26
(1, u) = 2max f(u) I[0,max]2.
Consider now the empirical process Gn =
n(Pn P) applied to thefamily of functions ,u(y) := (y, u). Elementary calculations yield
Gn,u =
npn p1 p g,u,
where
g,u := f(u) 2max I[0,max]2 1 .
In this case, it is convenient to use the L1 metric, we are looking at the
minimum of[0,1]2
|Gn,u| =
n|pn p|
1 p[0,1]2
f(u)2max I[0,max]2 1 du.
Now
[0,1]2 f(u) 2max I[0,max]2 1 du = 2(1
2max)
which is minimized at = to yield 2(1 2max). So
inf
[0,1]2
|Gn,u| du 2
p
1 p (1 2max) |Z|.
where Z is a standard normal random variable.
Appendix A: Empirical Distributions of Random Sets
In assumption 1, we assume that the correspondence is measurable in the
traditional sense, defined below:
Definition A1 (Effros Measurability) A correspondence : (Y, BY)(U, BU) is said to be Effros measurable, or weakly measurable, or simply
17
-
7/31/2019 Testing Non-Identifying Restrictions
18/26
measurable, if the inverse image of open sets is measurable, i.e. if for all
open subsets O of U,
1(O) = {y Y | (y) O = } BY.
There are several ways a measurable correspondence can convey proba-
bilistic information on its image space (U, BU) given observed frequencies ofoutcomes in Y.
Dempster (1967) suggests to consider the smallest reliability that can be
associated with the event A BU as the belief functionP(A) = P{y Y | (y) A}
and the largest plausibility that can be associated with the event A as the
plausibility function
P(A) = P{y Y | (y) A = }
the two being linked by the relation
P(A) = 1 P(Ac), (5)
which prompted some authors to call them conjugates or dual of each other.
A natural way to construct a set of probability measures is to consider
all probability measures that dominate the set function P set-wise, forming
thus the core of the belief function:
Core(, P) = { M(U) | A BU, (A) P(A)}= { M(U) | A BU, (A) P(A)}
where the first equality can be taken as a definition, and the second follows
immediately from (5). It is well known that Core(, P) is non-empty, and it
will be shown as a consequence of (3.2) below.
18
-
7/31/2019 Testing Non-Identifying Restrictions
19/26
A different way of defining probabilistic information generated by the
correspondence can be derived from Aumanns idea (in Aumann (1965))
of considering correspondences as bundlesof their selections.
Define the domain of the correspondence by
Dom() = {y Y | (y) = }.
A measurable selection of the measurable correspondence is defined by the
property below:
Definition A2 (Measurable Selection) A measurable selection of corre-
spondence : (Y, BY) (U, BU) is a (BY, BU)-measurable function suchthat (y) (y) for all y Dom().
The set of measurable selections of a measurable correspondence is
denoted Sel(), and it is non-empty by a theorem due to Rokhlin, (Rokhlin
(1949) Part I, 2, No 9, Lemma 2) and generally attributed to Kuratowskiand Ryll-Nardzewski:
Theorem A1 (Rokhlin) An Effros measurable correspondence with
closed non-empty values admits a measurable selection. For a proof, see
for instance Theorem 8.1.3 page 308 of Aubin and Frankowska (1990).
Elements of Sel() can be used to transport the probability P on Y to
probabilities on U. For each Sel(), consider the probability definedon each A BU by
(A) = P{y Y | (y) A} = P 1(A),
and define
(, P) = { M(U), = P 1 some Sel()}.
19
-
7/31/2019 Testing Non-Identifying Restrictions
20/26
It is easily seen that (, P)
Core(, P). A converse is given by the
following theorem of Castaldo, Maccheroni, and Marinacci (2004):
Theorem A2 (Castaldo, Maccheroni and Marinacci) If is measur-
able and compact-valued, then Core(, P) is the weak closed convex hull of
(, P).
We now develop the claim made in remark 1 of Theorem 3. and P
define a random set with realizations (Yj) for realizations Yj from P. P is
the distribution of the random set, and the empirical distribution associated
with a sample (Y1, . . . , Y n) is given by
P(A) =1
n
nj=1
I{(Yj)A=}.
Core(, P) characterizes P and Core(, Pn) characterizes Pn. Hence The-
orem 1 and Donsker theorems provide a way to derive conditions for weak
convergence of empirical random sets at rate
n.
Appendix B: Proofs
Proof of Theorem 1:
Call (B) the set of all Borel probability measures with support B. Under
Assumption 1, the map y (0(y)) is a map from Y to the set of allnon-empty convex sets of Borel probability measures on U which are closed
with respect to the weak topology. Moreover, for any f Cb(U), the set of
all continuous bounded real functions on U, the map
y sup
f d : (0(y))
= maxu0(y)
f(u)
is BY-measurable, so that, by Theorem 3 of Strassen (1965), for a given (U), there exists satisfying (3) with (y, .) (0(y)) for P-almost
20
-
7/31/2019 Testing Non-Identifying Restrictions
21/26
all y if and only ifU
f(u)(du) U
supu0(y)
f(u)P(dy) (6)
for all f Cb(U). Now, defining P as the set function
P : B P({y Y : 0(y) B = }),
the right-hand side of (6) is shown in the following sequence of equalities to
be equal to the integral of f with respect to P in the sense of Choquet (line
(7) below can be taken as a definition).Y
supu0(y)
{f(u)} dP(y)
=
0
P{y Y : supu0(y)
{f(u)} x} dx
+
0
(P{y Y : supu0(y)
{f(u)} x} 1) dx
=0
P{y Y : 0(y) {f x}} dx
+
0
(P{y Y : 0(y) {f x}} 1) dx
=
0
P({f x}) dx +0
(P({f x}) 1) dx =Ch
fdP . (7)
By Theorem 1 of Castaldo, Maccheroni, and Marinacci (2004), for any
f Cb(U),
Ch
fdP = maxSel(0)
U
f(u)P 1(du),
so that (6) is equivalent to
maxSel(0)
U
f(u)P 1(du) U
f(u)(du) (8)
21
-
7/31/2019 Testing Non-Identifying Restrictions
22/26
-
7/31/2019 Testing Non-Identifying Restrictions
23/26
-
7/31/2019 Testing Non-Identifying Restrictions
24/26
Proof of Theorem 3
The assumption guarantees that Gn =
n(Pn P) converges to a BrownianBridge uniformely over the family defined in the statement of the theorem.
Hence,
supCore(n,P)
supuRdu
|Gn(Bu)|
converges to a bounded random variable. Since the Kolmogorov-Smirnov
metric is stronger than d which metrizes weak convergence, the latter display
implies that
supCore(n,P)
supCore(n,P)
d(, ) = O(1/
n).
Now,
TS2 dH(Vn, Core(n, Pn)) supCore(n,P)
supCore(n,P)
d(, )
and TS2 TS1, so the result follows.
References
Andrews, D., S. Berry, and P. Jia (2004): Confidence Regions for Pa-
rameters in Discrete Games with Multiple Equilibria, with an Application
to Discount Chain Store Location, unpublished manuscript.
Aubin, J.-P., and H. Frankowska (1990): Set-valued analysis. Boston:
Birkhauser.
Aumann, R. (1965): Integrals of set-valued functions, Journal of Mathe-
matical Analysis and Applications, 12, 112.
24
-
7/31/2019 Testing Non-Identifying Restrictions
25/26
Castaldo, A., F. Maccheroni, and M. Marinacci (2004): Random
sets and their distributions, Sankhya (Series A), 66, 409427.
Chernozhukov, V., H. Hong, and E. Tamer (2002): Inference on
Parameter Sets in Econometric Models, unpublished manuscript.
Dempster, A. P. (1967): Upper and lower probabilities induced by a
multi-valued mapping, Annals of Mathematical Statistics, 38, 325339.
Dudley, R. (2003): Real Analysis and Probability. Cambridge University
Press.
Jovanovic, B. (1989): Observable implications of models with multiple
equilibria, Econometrica, 57, 14311437.
Pakes, A., J. Porter, K. Ho, and J. Ishii (2004): Moment Inequalities
and Their Application, unpublished manuscript.
Rockafellar, R. T., and R. J.-B. Wets (1998): Variational Analysis.
Springer.
Rokhlin, V. (1949): Selected topics from the metric theory of dynam-
ical systems, Uspekhi Matematicheskikh Nauk, 4, 57128, translated in
American Mathematical Society Transactions 49(1966), 171-240.
Shaikh, A. (2005): Inference for a Class of Partially Identified Econometric
Models, unpublished manuscript.
Strassen, V. (1965): The existence of probability measures with given
marginals, Journal of Mathematical Statistics, 36, 423439.
Tamer, E. (2003): Incomplete Simultaneous Discrete Response Model with
Multiple Equilibria, Review of Economic Studies, 70, 147165.
25
-
7/31/2019 Testing Non-Identifying Restrictions
26/26
van der Vaart, A., and J. Wellner (2000): Weak Convergence and
Empirical Pocesses. Springer.
26