testing non-identifying restrictions

Upload: karun3kumar

Post on 04-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Testing Non-Identifying Restrictions

    1/26

    Testing Non-identifying Restrictions1

    Marc Henry

    Columbia University

    First draft: September 15, 2005

    This draft: January 25, 2006

    Abstract

    We propose a test of specification for structural models without

    identifying assumptions. The model is defined as a binary relation

    between latent and observable variables, coupled with a hypothesized

    family of distributions for the latent variables. The objective of the

    testing procedure is to determine whether this hypothesized familyof latent variable distributions has a non-empty intersection with the

    set of distributions compatible with the observable data generating

    process and the binary relation defining the model. When the model is

    given in parametric form, The test can be inverted to yield confidence

    intervals for the identified parameter set.

    JEL Classification: C12, C14

    Keywords: random sets, empirical process.

    1Preliminary and incomplete. Helpful discussions with Alfred Galichon, Rosa Matzkin,

    Alexei Onatski, Jim Powell and Peter Robinson are gratefully acknowledged (with the

    usual disclaimer). Correspondence address: Department of Economics, Columbia Univer-

    sity, 420 W 118th Street, New York, NY 10027, USA. [email protected].

    1

  • 7/31/2019 Testing Non-Identifying Restrictions

    2/26

    1 Introduction

    We consider a very general econometric model specification Variables under

    consideration are divided into two groups.

    Latent variables, u U = Rdu. The vector u is not observed by theanalyst, but some of its components may be observed by the economic

    actors. Theorem 1 below holds more generally when U is a complete,

    metrizable and separable topological space (i.e. a Polish space).

    Observable variables, y Y = Rdy . The vector y is observed by theanalyst. Theorem 1 holds more generally when Y is a convex metrizable

    subset of a locally convex topological vector space.

    The Borel sigma-algebras ofY and U will be respectively denoted BY and BU.Call P the Borel probability measure that represents the true data generat-

    ing process for the observable variables, and Va family of Borel probabilitymeasures that are hypothesized to be possible data generating processes for

    the latent variables. Finally, the economic model is given by a relation be-

    tween observable and latent variables, i.e. a subset ofY U, which we shallwrite as a multi-valued mapping from Y to U denoted by . Suppose a set

    of restrictions on the hypothesized latent variable distributions is given by

    V0 Vand a set of restrictions on the model is given by 0.

    Example 1: parametric models and restrictions. Suppose the eco-

    nomic model is known up to a finite dimensional parameter vector , the

    chosen family of distributions for the latent variables depends on a finite

    dimensional parameter vector . The hypothesized restrictions are the fol-

    lowing;

    0 Rd ; 0 Rd .

    2

  • 7/31/2019 Testing Non-Identifying Restrictions

    3/26

    The restricted family of distributions for the latent variables then becomes

    V0 = {, 0}

    and all the models {, 0} are considered for the relation linkingobservable to latent variables. Hence our restricted model is defined by

    0 =0

    .

    Example 2: games with multiple equilibria. Suppose the payoff

    function for player j, j = 1, . . . , J is given by

    j(Sj, Sj, Xj, Uj; ),

    where Sj is player js strategy and Sj is their opponents strategies. Xj is a

    vector of observable characteristics of player j and Uj a vector of unobservable

    determinents of the payoff. Finally is a vector of parameters. Pure strategy

    Nash equilibrium conditions

    j(Sj, Sj, Xj, Uj; ) j(S, Sj , Xj, Uj; ), for all S

    define a correspondence from unobservable player characteristics to ob-

    servable variables (S, X), and if the unobservable player characteristics, in-

    terpreted as types of the players are supposed uniformely distributed on the

    relevant domain, then V0 is a singleton.

    In this paper, we propose a general framework for conducting inference

    without additional assumptions such as equilibrium selection mechanisms

    necessary to identify the model (i.e. to ensure that is single-valued). The

    usual terminology for such models is incomplete or partially identified.

    3

  • 7/31/2019 Testing Non-Identifying Restrictions

    4/26

    In a parametric setting, the objective of inference in partially identi-

    fied models is the estimation of the set of parameters which are compatible

    with the distribution of the observed data and an assessment of the qual-

    ity of that estimation. Chernozhukov, Hong, and Tamer (2002) propose an

    M-estimation procedure to construct a set that contains all compatible para-

    meters with a predetermined probability. Shaikh (2005) extends and refines

    their method and Andrews, Berry, and Jia (2004) and Pakes, Porter, Ho,

    and Ishii (2004) propose alternative procedures in a similar framework.

    The inference procedure presented here is based on a characterizationof probability measures in the Core of the random set generated by the

    distribution of observables P and the multivalued mapping 0 and a method

    to determine whether hypothesized latent variable data generating processes

    satisfy this characterization. No a priori parametric assumptions are needed,

    but if they are made, the inference methodology yields similar confidence

    sets to those proposed in the previously cited papers. In the notation of

    example 1, for a given one can derive a set of s compatible with and

    P at a given significance level, and conversely, one can derive a set of s

    compatible with P and for a given .

    The next section proposes the characterization of probability measures in

    the Core of a random set, the following section describes the testing prin-

    ciples and the last section illustrates the approach on a simple entry model

    with multiple equilibria. Proofs and additional results are collected in the

    appendix.

    4

  • 7/31/2019 Testing Non-Identifying Restrictions

    5/26

    2 Testing general model specifications

    2.1 Definition of the null hypothesis

    We wish to develop a procedure to detect whether the two sets of restrictions,

    on the family of distributions for the latent variables on the one hand, and

    on the relation between observable and latent variables on the other hand

    are compatible. First we explain what we mean by compatible. It is very

    easily understood in the simple case where the link between latent and

    observable variables is parametric and is measurable and single valued for

    each 0. Defining the image measure ofP by by

    P1 (A) = P{y Y| (y) A}, (1)

    for all A BU, we say that the restrictions V0 and 0 are compatible if andonly if there is at least a 0 and a V0 such that = P1 . In thegeneral case considered here, 0 may not be single valued, and its images may

    not even be disjoint (which would be the case if it was the inverse image ofa single valued mapping from U to Y, i.e. a traditional function from latent

    to observable variables). However, under a measurability assumption on 0,

    we can construct an analogue of the image measure, which will now be a set

    Core(0, P) of Borel probability measures on U (to be defined below), and the

    hypothesis ofcompatibility of the restrictions on latent variable distributions

    and on the models linking latent and observable variables will naturally take

    the form

    H0 : V0 Core(0, P) = . (2)

    Assumption 1 0 has non-empty and closed values, and for each open set

    O U, 10 (O) = {y Y | 0(y) O = } BY.

    5

  • 7/31/2019 Testing Non-Identifying Restrictions

    6/26

    To relate the present case to the intuition of the single-valued case, it

    is useful to think in terms of single-valued selections of the multi-valued

    mapping 0. A measurable selection of 0 is a measurable function such

    that (y) (y) for all y Y. The set of measurable selections of a multi-valued mapping that satisfies Assumption 1 is denoted Sel(0), and it is

    known to be non-empty since Rokhlin (1949) Part I, 2, No 9, Lemma 21.To each selection of 0, we can associate the image measure of P, denoted

    P 1, defined as in (1).

    A natural reformulation of the compatibility condition is that at leasta probability measure V0 can be written as a mixture of probabilitymeasures of the form P 1, where ranges over Sel(0). However, even

    for the simplest multi-valued mapping, the set of measurable selections is

    very rich, let alone the set of their mixtures. Hence, our first goal is to

    give a manageable representation of such a mixture. This is the object of

    Theorem 1 below.

    Theorem 1 Under assumption 1, is a mixture of images ofP by measurableselections of 0, (i.e. for any in the weak closed convex hull of{P 1; Sel(0)}) if and only if there exists for P-almost all y Y a probabilitymeasure (y, .) on U with support 0(y), such that

    (B) =

    Y

    (y, B) P(dy), all B BU. (3)

    Remark 1: The weak topology on (U), the set of probability measures on

    U, is the topology of convergence in distribution. (U) is also Polish, andthe weak closed convex hull of {P 1; Sel(0)} is indeed the collectionof arbitrary mixtures of elements of {P 1; Sel(0)}.

    1The commentary at the end of chapter 14 of Rockafellar and Wets (1998) sheds light

    on the controversy surrounding this attribution.

    6

  • 7/31/2019 Testing Non-Identifying Restrictions

    7/26

    Remark 2: Notice that (3) looks like a disintegration of, and indeed, when

    0 is the inverse image of a single-valued measurable function (i.e. when the

    model is given by a single-valued measurable function from latent to observ-

    able variables), the probability kernel is exactly the (P, 10 )-disintegration

    of , in other words, (y, .) is the conditional probability measure on U un-

    der the condition 10 (u) = {y}. Hence (3) has the interpretation that arandom element with distribution can be generated as a draw from (y, .)

    where y is a realization of a random element with distribution P.

    Remark 3: We define Core(0, P) as the weak convex-hull of {P 1; Sel(0)}, or equivalently as the set of all mixtures of images of P by measur-able selections of 0. So our null hypothesis (2) is well defined.

    2

    2.2 Definition of the test statistic

    Now that we have identified the set of latent variable data generating processes

    compatible with the observable distribution P and the model correspondence0 with Core(0, P), and we have characterized elements of the latter by

    means of Theorem 1, we propose a test statistic based on this characteriza-

    tion. Call V00 the subset ofV0 that is compatible with the model correspon-dence and the distribution of observables. Hence

    V00 = V0 Core(0, P),

    which is non empty under the null H0 by definition. By Theorem 1, an

    element ofV0 is in V00 if and only if can be written as

    (.) =

    Y

    (y, .)P(dy),

    2The name Core is justified by Theorem A2 of Appendix A.

    7

  • 7/31/2019 Testing Non-Identifying Restrictions

    8/26

    where the (y, .) are probability measures with support 0(y) for P-almost

    all y. From now on, we shall use the de Finetti notation f for the integral

    of the function f with respect to the measure , so that we shall writeY

    (y, .)P(dy) = P .

    Consider such a sample of observations (Y1, . . . , Y n). The empirical distrib-

    ution, i.e. the probability measure that gives mass 1n

    to each observation, is

    denoted Pn, with

    Pn(A) =1

    n

    nj=1

    IA(Yj), all A BY.

    The empirical counterpart of the integral P is

    n = Pn =1

    n

    nj=1

    (Yj, .).

    The asymptotic behaviour of the difference between P and its empirical

    counterpart Pn is key to the construction of the test statistic. It is described

    in the following theorem. Setting u = (u1, . . . , udu), call Bu the rectangles

    dui=1(, ui] and ,u = (Bu). Let Gn = n(PnP) denote the empiricalprocess associated with the sample (Y1, . . . , Y n), and finally, let denote

    convergence in distribution (aka weak convergence). Then we have

    Theorem 2: For any V00 with a density with respect to Lebesguemeasure, and for any satisfying (3), Gn converges weakly, uniformely over

    the family of functions

    ,u, u Rdu

    , to a P-Brownian bridge G, i.e. a

    Gaussian process with zero mean and covariance function defined by

    EG,uG,v = P ,u,v P ,uP ,v.

    This implies that

    n supuRdu

    |Pn,u P ,u| G

    8

  • 7/31/2019 Testing Non-Identifying Restrictions

    9/26

    where G is a random variable. In the particular case where du = 1, G is

    such that for all x R, Pr(G > x) = 2j=1(1)j+1e2j2x2 .Remark 1: A remarkable feature of Theorem 2, is that in the case of a single

    real valued latent variable, the test statistic has a distribution-free limit with

    easily computable quantiles.

    The test statistic implicitly proposed in Theorem 2 to test whether a given

    latent variable distribution is compatible with the model restriction 0 and

    the observable distribution P is infeasible in that is depends on the unknownprobability kernels such that = P n. They can be estimated as solutions

    from the integral equation = Pn with the restriction that the (y) are

    probability measures on (y). This equation has solutions (generically many)

    if and only if Core(0, Pn) by Theorem 1, but solutions are likely to bedifficult to exhibit except in very simple cases, such as the cases developed

    in section 3.

    An alternative is to construct a test statistic based on the distance be-tween a hypothesized latent variable measure (or more generally V0) andCore(, Pn), which by construction will be smaller than the test statistic

    of Theorem 2, and hence can be used as a basis for a conservative testing

    procedure. This is summarized in the following corollary:

    Corollary 1: Under the null H0,

    limsupn infV0

    infCore(0,Pn)

    supuRdu

    n

    |(Bu)

    (Bu)

    | supuRdu |

    G

    |,

    and the infima are achieved providing V0 is chosen to be closed in the weaktopology.

    Given the conservative nature of the procedure based on Corollary 1, it

    9

  • 7/31/2019 Testing Non-Identifying Restrictions

    10/26

    is crutial to assess the power of the test, as described in the next section.

    2.3 Power analysis

    The two test statistics considered in section 2.2 are the following:

    TS1 = inf V0

    supuRdu

    n |Pn(Bu) (Bu)|

    TS2 = inf V0

    infCore(0,Pn)

    supuRdu

    n |(Bu) (Bu)|.

    Note that since Pn Core(0, Pn), TS2 is dominated by TS1 by construc-tion.

    To assess the power of either test, we consider the following types of local

    alternatives:

    dH(Vn, Core(n, P)) r1n , > 0, (4)

    where rn is a deterministic sequence of reals diverging with n, and dH denotes

    Hausdorff distance defined as follows: for any two sets V1 and V2 in (U),and any d metrizing weak convergence,

    dH(V1, V2) = max

    sup1V1

    inf2V2

    d(1, 2), sup2V2

    inf1V1

    d(1, 2)

    The principle of both test statistics TS1 and TS2 rests on the set con-

    vergence of Core(, Pn) to Core(, P) for a fixed model correspondence .

    Hence, for n large enough, Core(n, Pn) is sufficiently close to Core(n, P) for

    the test statistic to detect the sequence of local alternatives, as summarized

    in the following theorem.

    Theorem 3: Under the sequence of alternatives defined in (4) with rn =

    o(

    n), if the family of functions {(Bu): with satisfying (3) and

    10

  • 7/31/2019 Testing Non-Identifying Restrictions

    11/26

    Core(n, P), u

    U, n

    N

    }, is P-Donsker, then the test statistics TS1 and

    TS2 diverge.

    Remark 1: As developed in Appendix A, Theorem 3 has an interesting

    interpretation in terms of convergence of empirical random sets: for a random

    element Yj in Y, (Yj) is a random set in U under assumption 1, and its

    distribution can be identified with Core(, P). Theorem 3 tells us when the

    empirical distribution Core(, Pn) of the random set (Y) weakly converges

    to the true distribution at rate

    n. Such a result appears to be new in the

    literature on convergence of random sets.

    3 Illustration: a simple entry model

    3.1 Single type

    Consider a market with two firms producing complementary products with

    identical costs.3

    The payoff functions are

    1(x1, x2, u) = (x2 u)I{x1=1},2(x1, x2, u) = (x1 u)I{x2=1},

    where xi {0, 1} is firm is action, and u is an exogenous cost. The firmsknow their cost; the analyst, however, knows only that u [0, 1], and thatthe structural parameter is in (0, 1]. There are two Nash equilibria. The

    first is x1 = x2 = 0 for all u

    [0, 1]. The second is x1 = x2 = 1 for all

    u [0, ] and zero otherwise. Since the two firms actions are perfectlycorrelated, we shall denote them by a single binary variable y = x1 = x2.

    Hence the model is described by the multi-valued mapping: (1) = [0, ] and

    3Jovanovic (1989) and Tamer (2003) consider this simple game in a similar context.

    11

  • 7/31/2019 Testing Non-Identifying Restrictions

    12/26

    (0) = [0, 1]. If we consider the restriction

    max, then the multi-valued

    mapping incorporating the restriction is 0 defined by 0(1) = [0, max] and

    0(0) = [0, 1]. In this case, since y is Bernoulli, we can write P = (1 p, p) with p the probability of a 1. For the distribution of u, we consider a

    parametric exponential family on [0, 1]. Hence V= { := u1du}>0, andthe restriction can be chosen as > 0.

    Consider the smallest reliability P that can be attached to a set A in U

    based on 0 and P, defined by

    P(A) = P{y Y|0(y) A}.

    Our null hypothesis of compatibility of the two sets of restrictions is that for

    some [, ], set-wise dominates P, in other words, that associatesto each set a measure at least as large as the smallest reliability that can be

    attached to it. This is equivalent to the existence of a [, ] such that forP-almost all y a probability measure (y, .) supported on 0(y) such that

    for all u [0, 1]u =

    Y

    (y, [0, u])P(dy).

    In other words,

    u = (1 p)(0, [0, u]) + p(1, [0, u])

    and

    (1, [0, max]) = (0, [0, 1]) = 1

    with for P-almost all y, (y, [0, u]) a nondecreasing, right-continuous func-

    tion of u taking values in [0, 1]. When max < p, there is no solution (i.e.

    12

  • 7/31/2019 Testing Non-Identifying Restrictions

    13/26

  • 7/31/2019 Testing Non-Identifying Restrictions

    14/26

    which for any satisfying

    p (there exists at least one under the null)

    converges weakly to the supremum G of a standard Brownian bridge, with

    Pr(G > x) = 2j=1

    (1)j+1e2j2x2.

    A testing procedure that does not require computation of the kernels ,

    as described in section 2.2 consists in finding the element of Core(0, Pn)

    that minimizes the Kolmogorov-Smirnov distance to the set of distributions

    {u, }. If max pn, then u is a minimizer and the minimumdistance is zero. If max < pn, then the minimum Kolmogorov-Smirnov

    distance is pn u. Ifmax > p, ultimately so will pn and the test statistic iszero. If

    max < p, then ultimately so will pn, and the test statistic diverges.

    Finally, if

    max = p, then the statistic will be

    nmax(0, pn p). Finally, inthis very simple example, one might consider a the test of the null H0 : p =

    max against the one-sided alternative Ha : p >

    max using the fact that

    under the null, n

    max(1

    max) 1

    2

    (pn max) converges to a standard

    normal random variable.

    To summarize, the procedures proposed are based on the following test

    statistics:

    TS1 = inf

    supu[0,1]

    |Gn,u|

    TS2 = inf

    infCore(0,Pn)

    supu[0,1]

    n|u

    F

    (u)|

    TS3 =

    n(pn max)

    and the following approximating distributions:

    14

  • 7/31/2019 Testing Non-Identifying Restrictions

    15/26

    AD1 =

    p

    (1 p) (1

    max) |N(0, 1)|

    AD2 = G

    AD3 =

    max(1 max) N(0, 1)

    AD4 =

    p(1 p) N(0, 1)

    The procedure based on estimation of the probability kernels and com-

    parison between hypothesized distributions = P and their empirical

    counterparts Pn would result in comparing TS1 to the quantiles of AD1

    (for the exact asymptotic version) or AD2 (for the conservative asymptotic

    version). The procedure based on the minimum distance between the hypoth-

    esized distributions and the empirical random set distribution Core(0, Pn)would result in comparing TS2 with the quantiles of AD4 (for an exact as-

    ymptotic version) or AD2 (for the conservative asymptotic version). Finally,

    the simple test on the boundary would result in comparing TS3 to the quan-

    tiles of AD3.

    3.2 Heterogeneous types

    Consider a market with two firms producing complementary products withheterogeneous costs. The payoff functions are

    1(x1, x2, u) = (x2 u1)I{x1=1},2(x1, x2, u) = (x1 u2)I{x2=1},

    15

  • 7/31/2019 Testing Non-Identifying Restrictions

    16/26

    where xi

    {0, 1

    }is firm is action, and the us are firm specific exogenous

    costs. The firms know their cost; the analyst, however, knows only that

    u [0, 1]2, and that the structural parameter is in (0, 1]. There are twoNash equilibria. The first is x1 = x2 = 0 for all u [0, 1]2. The secondis x1 = x2 = 1 for all u [0, ]2 and zero otherwise. Since the two firmsactions are perfectly correlated, we shall denote them by a single binary

    variable y = x1 = x2. Hence the model is described by the multi-valued

    mapping: (1) = [0, ]2 and (0) = [0, 1]2. If we consider the restriction

    max, then the multi-valued mapping incorporating the restriction is 0defined by 0(1) = [0, max]

    2 and 0(0) = [0, 1]2. In this case, since y is

    Bernoulli, we can write P = (1 p, p) with p the probability of a 1. Forthe distribution ofu, we consider the costs to be independent with marginals

    following the same parametric exponential family on [0, 1]. Hence V= { :=u11 u

    12 du1du2}>0, and the restriction can be chosen as > 0.

    A density version of (3) can be derived in this case, and makes for a more

    convenient test statistic. Writing u = (u1, u2),

    f(u) = f(u1)f(u2) = 2u11 u

    12 =

    Y

    (y, u)P(dy).

    In other words,

    2u11 u12 = (1 p)(0, u) + p(1, u)

    under the constraints[0,1]2

    (0, u)du =

    [0,max]2

    (1, u)du = 1

    and (1, u) = 0 for all u / [0, max]2.When 2max < p, there is no solution (i.e. the two sets of restrictions are

    incompatible), whereas when 2 p, a solution is given by

    (0, u) =f(u)

    1 p

    1 p 2max I[0,max]2

    16

  • 7/31/2019 Testing Non-Identifying Restrictions

    17/26

    (1, u) = 2max f(u) I[0,max]2.

    Consider now the empirical process Gn =

    n(Pn P) applied to thefamily of functions ,u(y) := (y, u). Elementary calculations yield

    Gn,u =

    npn p1 p g,u,

    where

    g,u := f(u) 2max I[0,max]2 1 .

    In this case, it is convenient to use the L1 metric, we are looking at the

    minimum of[0,1]2

    |Gn,u| =

    n|pn p|

    1 p[0,1]2

    f(u)2max I[0,max]2 1 du.

    Now

    [0,1]2 f(u) 2max I[0,max]2 1 du = 2(1

    2max)

    which is minimized at = to yield 2(1 2max). So

    inf

    [0,1]2

    |Gn,u| du 2

    p

    1 p (1 2max) |Z|.

    where Z is a standard normal random variable.

    Appendix A: Empirical Distributions of Random Sets

    In assumption 1, we assume that the correspondence is measurable in the

    traditional sense, defined below:

    Definition A1 (Effros Measurability) A correspondence : (Y, BY)(U, BU) is said to be Effros measurable, or weakly measurable, or simply

    17

  • 7/31/2019 Testing Non-Identifying Restrictions

    18/26

    measurable, if the inverse image of open sets is measurable, i.e. if for all

    open subsets O of U,

    1(O) = {y Y | (y) O = } BY.

    There are several ways a measurable correspondence can convey proba-

    bilistic information on its image space (U, BU) given observed frequencies ofoutcomes in Y.

    Dempster (1967) suggests to consider the smallest reliability that can be

    associated with the event A BU as the belief functionP(A) = P{y Y | (y) A}

    and the largest plausibility that can be associated with the event A as the

    plausibility function

    P(A) = P{y Y | (y) A = }

    the two being linked by the relation

    P(A) = 1 P(Ac), (5)

    which prompted some authors to call them conjugates or dual of each other.

    A natural way to construct a set of probability measures is to consider

    all probability measures that dominate the set function P set-wise, forming

    thus the core of the belief function:

    Core(, P) = { M(U) | A BU, (A) P(A)}= { M(U) | A BU, (A) P(A)}

    where the first equality can be taken as a definition, and the second follows

    immediately from (5). It is well known that Core(, P) is non-empty, and it

    will be shown as a consequence of (3.2) below.

    18

  • 7/31/2019 Testing Non-Identifying Restrictions

    19/26

    A different way of defining probabilistic information generated by the

    correspondence can be derived from Aumanns idea (in Aumann (1965))

    of considering correspondences as bundlesof their selections.

    Define the domain of the correspondence by

    Dom() = {y Y | (y) = }.

    A measurable selection of the measurable correspondence is defined by the

    property below:

    Definition A2 (Measurable Selection) A measurable selection of corre-

    spondence : (Y, BY) (U, BU) is a (BY, BU)-measurable function suchthat (y) (y) for all y Dom().

    The set of measurable selections of a measurable correspondence is

    denoted Sel(), and it is non-empty by a theorem due to Rokhlin, (Rokhlin

    (1949) Part I, 2, No 9, Lemma 2) and generally attributed to Kuratowskiand Ryll-Nardzewski:

    Theorem A1 (Rokhlin) An Effros measurable correspondence with

    closed non-empty values admits a measurable selection. For a proof, see

    for instance Theorem 8.1.3 page 308 of Aubin and Frankowska (1990).

    Elements of Sel() can be used to transport the probability P on Y to

    probabilities on U. For each Sel(), consider the probability definedon each A BU by

    (A) = P{y Y | (y) A} = P 1(A),

    and define

    (, P) = { M(U), = P 1 some Sel()}.

    19

  • 7/31/2019 Testing Non-Identifying Restrictions

    20/26

    It is easily seen that (, P)

    Core(, P). A converse is given by the

    following theorem of Castaldo, Maccheroni, and Marinacci (2004):

    Theorem A2 (Castaldo, Maccheroni and Marinacci) If is measur-

    able and compact-valued, then Core(, P) is the weak closed convex hull of

    (, P).

    We now develop the claim made in remark 1 of Theorem 3. and P

    define a random set with realizations (Yj) for realizations Yj from P. P is

    the distribution of the random set, and the empirical distribution associated

    with a sample (Y1, . . . , Y n) is given by

    P(A) =1

    n

    nj=1

    I{(Yj)A=}.

    Core(, P) characterizes P and Core(, Pn) characterizes Pn. Hence The-

    orem 1 and Donsker theorems provide a way to derive conditions for weak

    convergence of empirical random sets at rate

    n.

    Appendix B: Proofs

    Proof of Theorem 1:

    Call (B) the set of all Borel probability measures with support B. Under

    Assumption 1, the map y (0(y)) is a map from Y to the set of allnon-empty convex sets of Borel probability measures on U which are closed

    with respect to the weak topology. Moreover, for any f Cb(U), the set of

    all continuous bounded real functions on U, the map

    y sup

    f d : (0(y))

    = maxu0(y)

    f(u)

    is BY-measurable, so that, by Theorem 3 of Strassen (1965), for a given (U), there exists satisfying (3) with (y, .) (0(y)) for P-almost

    20

  • 7/31/2019 Testing Non-Identifying Restrictions

    21/26

    all y if and only ifU

    f(u)(du) U

    supu0(y)

    f(u)P(dy) (6)

    for all f Cb(U). Now, defining P as the set function

    P : B P({y Y : 0(y) B = }),

    the right-hand side of (6) is shown in the following sequence of equalities to

    be equal to the integral of f with respect to P in the sense of Choquet (line

    (7) below can be taken as a definition).Y

    supu0(y)

    {f(u)} dP(y)

    =

    0

    P{y Y : supu0(y)

    {f(u)} x} dx

    +

    0

    (P{y Y : supu0(y)

    {f(u)} x} 1) dx

    =0

    P{y Y : 0(y) {f x}} dx

    +

    0

    (P{y Y : 0(y) {f x}} 1) dx

    =

    0

    P({f x}) dx +0

    (P({f x}) 1) dx =Ch

    fdP . (7)

    By Theorem 1 of Castaldo, Maccheroni, and Marinacci (2004), for any

    f Cb(U),

    Ch

    fdP = maxSel(0)

    U

    f(u)P 1(du),

    so that (6) is equivalent to

    maxSel(0)

    U

    f(u)P 1(du) U

    f(u)(du) (8)

    21

  • 7/31/2019 Testing Non-Identifying Restrictions

    22/26

  • 7/31/2019 Testing Non-Identifying Restrictions

    23/26

  • 7/31/2019 Testing Non-Identifying Restrictions

    24/26

    Proof of Theorem 3

    The assumption guarantees that Gn =

    n(Pn P) converges to a BrownianBridge uniformely over the family defined in the statement of the theorem.

    Hence,

    supCore(n,P)

    supuRdu

    |Gn(Bu)|

    converges to a bounded random variable. Since the Kolmogorov-Smirnov

    metric is stronger than d which metrizes weak convergence, the latter display

    implies that

    supCore(n,P)

    supCore(n,P)

    d(, ) = O(1/

    n).

    Now,

    TS2 dH(Vn, Core(n, Pn)) supCore(n,P)

    supCore(n,P)

    d(, )

    and TS2 TS1, so the result follows.

    References

    Andrews, D., S. Berry, and P. Jia (2004): Confidence Regions for Pa-

    rameters in Discrete Games with Multiple Equilibria, with an Application

    to Discount Chain Store Location, unpublished manuscript.

    Aubin, J.-P., and H. Frankowska (1990): Set-valued analysis. Boston:

    Birkhauser.

    Aumann, R. (1965): Integrals of set-valued functions, Journal of Mathe-

    matical Analysis and Applications, 12, 112.

    24

  • 7/31/2019 Testing Non-Identifying Restrictions

    25/26

    Castaldo, A., F. Maccheroni, and M. Marinacci (2004): Random

    sets and their distributions, Sankhya (Series A), 66, 409427.

    Chernozhukov, V., H. Hong, and E. Tamer (2002): Inference on

    Parameter Sets in Econometric Models, unpublished manuscript.

    Dempster, A. P. (1967): Upper and lower probabilities induced by a

    multi-valued mapping, Annals of Mathematical Statistics, 38, 325339.

    Dudley, R. (2003): Real Analysis and Probability. Cambridge University

    Press.

    Jovanovic, B. (1989): Observable implications of models with multiple

    equilibria, Econometrica, 57, 14311437.

    Pakes, A., J. Porter, K. Ho, and J. Ishii (2004): Moment Inequalities

    and Their Application, unpublished manuscript.

    Rockafellar, R. T., and R. J.-B. Wets (1998): Variational Analysis.

    Springer.

    Rokhlin, V. (1949): Selected topics from the metric theory of dynam-

    ical systems, Uspekhi Matematicheskikh Nauk, 4, 57128, translated in

    American Mathematical Society Transactions 49(1966), 171-240.

    Shaikh, A. (2005): Inference for a Class of Partially Identified Econometric

    Models, unpublished manuscript.

    Strassen, V. (1965): The existence of probability measures with given

    marginals, Journal of Mathematical Statistics, 36, 423439.

    Tamer, E. (2003): Incomplete Simultaneous Discrete Response Model with

    Multiple Equilibria, Review of Economic Studies, 70, 147165.

    25

  • 7/31/2019 Testing Non-Identifying Restrictions

    26/26

    van der Vaart, A., and J. Wellner (2000): Weak Convergence and

    Empirical Pocesses. Springer.

    26