asymptotic distribution of estimators with overlapping ... · in pakes and pollard (1989), chen,...
TRANSCRIPT
The Asymptotic Distribution of Estimatorswith Overlapping Simulation Draws ∗
Tim Armstronga A. Ronald Gallantb Han Hongc Huiyu Lid
First draft: September 2011Current draft: August 2015
Abstract
We study the asymptotic distribution of simulation estimators, where the same set ofdraws are used for all observations under general conditions that do not require thefunction used in the simulation to be smooth. We consider two cases: estimators thatsolve a system of equations involving simulated moments and estimators that maxi-mize a simulated likelihood. Many simulation estimators used in empirical work involveboth overlapping simulation draws and nondifferentiable moment functions. Develop-ing sampling theorems under these two conditions provides an important complimentto the existing results in the literature on the asymptotics of simulation estimators.
Keywords: U-Process, Simulation estimators.
JEL Classification: C12, C15, C22, C52.
∗Corresponding author is A. Ronald Gallant, The Pennsylvania StateUniversity, Department of Eco-nomics, 613 Kern Graduate Building, University Park PA 16802-3306, USA, [email protected]. Wethank Donald Andrews, Bernard Salanie, and Joe Romano in particular, as well as seminar participants forinsightful comments. We also acknowledge support by the National Science Foundation (SES 1024504) andSIEPR.
a Yale Universityb The Pennsylvania State Universityc Stanford Universityd Federal Reserve Bank of San Francisco
1
1 Introduction
Simulation estimation is popular in economics and is developed by Lerman and Manski
(1981), McFadden (1989), Laroque and Salanie (1989), (1993), Duffie and Singleton (1993),
Gourieroux and Monfort (1996) and Gallant and Tauchen (1996) among others. Pakes and
Pollard (1989) provided a general asymptotic approach for generalized method of simulated
moment estimators, and verified the conditions in the general theory when a fixed number
of independent simulations are used for each of the independent observations. A recent
insightful paper by Lee and Song (2015) also developed results for a class of simulated
maximum likelihood-like estimators. In practice, however, researchers sometimes use the
same set of simulation draws for all the observations in the dataset.
Independent simulation draws are doubly indexed, i.e. ωir, so that there are n × R
simulations in total, where n is the number of observations and R is the number of simulations
for each observation. Overlapping simulation draws are singly indexed, i.e., ωr, so that there
are R simulations in total, where all the same R total number of simulations are used
for each observation. The properties of simulation based estimators using overlapping and
independent simulation draws are studied by Lee (1992), Lee (1995) and Kristensen and
Salanie (2010) under the conditions that the simulated moment conditions are smooth and
continuously differentiable functions of the parameters. This is, however, a strong assumption
that is likely to be violated by many simulation estimators used in practice. We extend the
above results to nonsmooth moment functions using empirical process and U process theories
developed in a sequence of papers by Pollard (1984), Nolan and Pollard (1987, 1988) and
Neumeyer (2004). In particular, the main insight relies on verifying the high level conditions
in Pakes and Pollard (1989), Chen, Linton, and Van Keilegom (2003) and Ichimura and Lee
(2010) by combining the results in Neumeyer (2004) with results from the empirical process
literature (e.g. Andrews (1994)).
Even in the simulated method of moment estimator, the classical results in Pakes and
Pollard (1989) and McFadden (1989) are for independent simulation draws. However, their
results only apply to a finite number of independent simulations for each observation, since
the proof depends crucially on the fact that a finite sum of functions with limited complexity
2
also has limited complexity. It is a challenging question with unclear answer how their anal-
ysis can be extended to a larger number of simulation draws. With overlapping simulation
draws, this difficulty is resolved by appealing to empirical U-process theory.
A main application of maximum simulated likelihood estimators is multinomial probit
discrete choice models and its various panel data versions (Newey and McFadden (1994)).
Whether or not using overlapping simulations improves computational efficiency depends
on the specific model. Generating the random numbers is easy but computing the moment
conditions or the likelihood function is typically difficult. To equate the order of computation
effort, we will adopt the notation of letting R denote either the total number of overlapping
simulations or the number of independent simulations for each observation. For a given
R, Lee (1995) and Kristensen and Salanie (2010) pointed out that the leading terms of the
asymptotic expansion are smaller with independent draws than with overlapping draws. This
suggests that independent draws are more desirable and leads to smaller confidence intervals
whenever it is feasible.
There are still two reasons to consider overlapping draws, especially for simulated maxi-
mum likelihood estimators, based on theoretical and computational feasibility. Despite the
theoretical advantage of the method of simulated moments, the method of simulated max-
imum likelihood is still appealing in empirical research, partly because it minimizes a well
defined distance between the model and the data even when the model is misspecified. The
asymptotic theory with independent draws in this case is difficult and to our knowledge has
not been fully worked out in the literature. In particular, Pakes and Pollard (1989) only pro-
vided an analysis for simulated GMM, but did not provide an analysis for simulated MLE,
which can be in fact far more involved. Only the very recent insightful paper by Lee and Song
(2015) studies an unbiased approximation to the simulated maximum likelihood, which still
differs from most empirical implementation of simulated maximum likelihood methods using
nonsmooth crude frequency simulators. Smoothing typically requires the choice of kernel
and bandwidth parameters and introduces biases. For example, the Stern (1992) decomposi-
tion simulator, while smooth and unbiased, requires repeated calculations of eigenvalues and
is computationally prohibitive. Significant process is only made recently, in an important
paper by Kristensen and Salanie (2010) who develop bias reduction techniques for simulation
3
estimators.
When computing the simulated likelihood function is very difficult, overlapping simu-
lations can be used to trade off computational feasibility with statistical accuracy. Using
independent draws requires that R increases faster than√n, where n is the sample size in
order that the estimator has an asymptotic normal distribution. With overlapping draws,
the estimator will be asymptotically normal as long as R increases to infinity. A caveat, of
course, is that when R is much smaller than n, the asymptotic distribution would mostly
represent the simulation noise rather than the the sampling error, which reflects the cost in
statistical accuracy as a result of more feasible computation.
2 Simulated Moments and Simulated Likelihood
We begin by formally defining the method of simulated moments and maximum simulated
likelihood using overlapping simulation draws. These methods are defined in Lee (1992) and
Lee (1995) in the context of multinomial discrete choice models. We use a more general
notation to allow for both continuous and discrete dependent variables. Let zi = (yi, xi) be
i.i.d. random variables in the observed sample for i = 1, . . . , n, where the yi are the dependent
variables and the xi are the covariates or regressors. We are concerned about estimating an
unknown parameter θ ∈ Θ ⊂ Rk. As discussed in the introduction, independent draws are
typically preferrable in the method of simulated moments. The method of moment results are
developed both for completenes and for expositional transition to the simulated maximum
likelihood section.
The method of moments estimator is based on a set of moment conditions g(zi, θ) ∈ Rd
such that g(θ) ≡ Pg (·, θ) is zero if and only if θ = θ0 where θ0 is construed as the true
parameter value. In the above Pg(·, θ) =∫g (zi, ·) dP (zi) denotes expectation with respect
to the true distribution of zi. In models where the moment g(zi, θ) can not be analytically
evaluated, it can often be approximated using simulations. Let ωr, r = 1, . . . , R, be a set of
simulation draws, and let q(zi, ωr, θ) be a function such that it is an unbiased estimator of
g(zi, θ) for all zi:
Qq(z, ·, θ) ≡∫q(z, ω, θ) dQ(ω) = g(z, θ).
4
Then the unknown moment condition g(z, θ) can be estimated by
g(z, θ) = QRq(z, ωr, θ) ≡1
R
R∑r=1
q(z, ωr, θ),
which in turn is used to form an estimate of the population moment condition g(θ):
g(θ) = Png(·, θ) ≡ 1
n
n∑i=1
g(zi, θ) =1
nR
n∑i=1
R∑r=1
q(zi, ωr, θ).
In the above, both z1, . . . , zn and ω1, . . . , ωR are iid and and they are independent of each
other. The method of simulated moments (MSM) estimator with overlapping simulated
draws is defined with the usual quadratic norm as in Pakes and Pollard (1989)
θ =θ∈Θ
argmin ‖g(θ)‖2Wn
where ‖x‖2W = x′Wx,
where both Wn and W are d dimensional weighting matrixes such that Wnp−→ W . In the
maximum simulated likelihood method, we reinterpret g(zi; θ) as the likelihood function of θ
at the observation zi, and g(zi; θ) as the simulated likelihood function which is an unbiased
estimator of g(zi; θ). The MSL estimator is usually defined as, for i.i.d data,
θ =θ∈Θ
argmax Pn log g(·; θ) =1
n
n∑i=1
log g(zi; θ).
While g(zi; θ) is typically a smooth function of zi and θ, g(zi; θ) often times is not. In
these situations it is difficult to obtain the exact optimum for both MSM and MSL, and
these definitions will be relaxed below to only require that the MSM and MSL estimators
obtain “near-optimum” of the respective objective functions. The likelihood function g (z; θ)
can be either the density for continuous data, or the probability mass function for discrete
data. It can also be either the joint likelihood of the data, or the conditional likelihood
g (z; θ) = g (y|x; θ) when z = (y, x).
In the following we will develop conditions under which both MSM and MSL are consis-
tent as both n→∞ and R→∞. Under the conditions given below, they both converge at
the rate of√m, where m = min(n,R) to a limiting normal distribution. These results are de-
veloped separately for MSM and MSL. For MSL, the condition that R >>√n is required for
asymptotic normality with independent simulation draws, e.g. Laroque and Salanie (1989)
5
and Train (2003). With overlapping draws, asymptotic normality holds as long as both R
and n converge to infinity. If R << n, then the convergence rate becomes√R instead of
√n. A simulation estimator with overlapping simulations can also be viewed as a profiled
two step estimator to invoke the high level conditions in Chen, Linton, and Van Keilegom
(2003). The derivations in the remaining sections are tantamount to verifying these high
level conditions. For maximum likelihood with independent simulations, the bias reduction
condition√R/n → ∞ is derived in Laroque and Salanie (1989), (1993) and Gourieroux
and Monfort (1996), and is strengthened by Lee and Song (2015) to√R logR/n → ∞ for
nonsmooth maximum likelihood like estimators. To summarize, the following assumption is
maintained through the paper.
ASSUMPTION 1 Let zi = (yi, xi), i = 1, . . . , n and ωr, r = 1, . . . , R be two independent
sequences of i.i.d random variables with distributions P and Q respectively. The function
q(zi, ωr, θ) satisfies Qq(z, ·, θ) ≡∫q(z, ω, θ) dQ(ω) = g(z, θ) for all z and all θ ∈ Θ.
3 Asymptotics of MSM with Overlapping Simulations
The MSM objective function takes the form of a two-sample U-process studied extensively
in Neumeyer (2004):
g(θ) ≡ 1
nRSnR(θ) where SnR(θ) ≡
n∑i=1
R∑r=1
q(zi, ωr, θ),
with kernel function q(zi, wr, θ) and its associated projections
g(zi, θ) = Qq(zi, ·, θ) and h(wr, θ) ≡ Pq(·, wr, θ).
The following assumption restricts the complexity of the kernel function and its projections
viewed as classes indexed by the parameter θ.
ASSUMPTION 2 For each j = 1, . . . , d, the following three classes of functions
F = {qj(zi, wr, θ), θ ∈ Θ},
QF = {gj(zi, θ), θ ∈ Θ},
PF = {hj(wr, θ), θ ∈ Θ},
6
are Euclidean, cf. Lemma 25 (p. 27), Lemma 36 (p. 34), and Theorem 37 (p. 34) of Pollard
(1984). Their envelope functions, denoted respectively by F , QF and PF , have at least two
moments.
By Definition (2.7) in Pakes and Pollard (1989) (hereafter P&P), a class of functions F
is called Euclidean for the envelope F if there exist positive constants A and V that do not
depend on measures µ, such that if µ is a measure for which∫Fdµ < ∞, then for each
ε > 0, there are functions f1, . . . , fk in F such that (i) k ≤ Aε−V ; (ii) For each f in F , there
is an fi with∫|f − fi|dµ ≤ ε
∫Fdµ.
This assumption is satisfied by many known functions. A counter example is given
on page 2252 of Andrews (1994). In the case of binary choice models, it is satisfied given
common low level conditions on the random utility functions. For example, when the random
utility function is linear with an addititive error term, q(zi, wr, θ) typically takes a form that
resembles 1 (z′iθ + wr ≥ 0), which is Euclidean by Lemma 18 in Pollard (1984). As another
example, in random coefficient binary choice models, the conditional choice probability is
typically the integral of a distribution function of a single index Λ (x′iβ) over the disribution
of the random coefficient β. Suppose β follows a normal distribution with mean v′iθ1 and a
variance matrix with Cholesky factor θ2, then the choice probability is given by, for φ (·;µ,Σ)
normal density function with mean µ and variance matrix Σ,∫
Λ (x′iβ)φ (β; v′iθ1, θ′2θ2) dβ. In
this model, for draws ωr from the standard normal density, and for zi = (xi, vi), q(zi, wr, θ)
takes a form that resembles
Λ (x′i (viθ1 + θ′2ωr)) = Λ
(x′iviθ1 +
K∑k=1
xikθ′2kωr
).
As long as Λ (·) is a monotone function, this function is Euclidean according to Lemma 2.6.18
in Van der Vaart and Wellner (1996).
Under assumption 2, which implies that the class F and its projections QF and PF are
Euclidean (see Neumeyer (2004), p. 79), the following lemma is analogous to Theorems 2.5,
2.7 and 2.9 of Neumeyer (2004).
LEMMA 1 Under Assumption 2 the following statements hold:
7
a. Define
q(z, ω, θ) = q(z, ω, θ)− g(z, θ)− h(w, θ) + g(θ),
then
supθ∈Θ||SnR(θ)|| = Op(
√nR),
where
SnR(θ) ≡n∑i=1
R∑r=1
q(zi, ωr, θ).
b. Define
UnR(θ) ≡√m
(1
nRSnR(θ)− g(θ)
),
then
supd(θ1,θ2)=o(1)
||UnR(θ1)− UnR(θ2)|| = op(1).
where d (θ1, θ2) denotes the Euclidean distance√
(θ1 − θ2)′ (θ1 − θ2).
c. Further,
supθ∈Θ
∣∣∣∣∣∣∣∣ 1
nRSnR(θ)− g(θ)
∣∣∣∣∣∣∣∣ = op(1).
Proof Consider first the case when the moment condition q (z, ω, θ) is univariate, so that
d = 1. The first statement (a) follows from Theorem 2.5 in Neumeyer (2004). The proof of
part (b) resembles Theorem 2.7 in Neumeyer (2004) but does not require n/(n + R)→ κ ∈
(0, 1). First define UnR(θ) =√m
nRSnR(θ). It follows from part (a) that
supθ∈Θ||UnR(θ)|| = Op
(√m
nR
)= op(1).
Since UnR(θ) = UnR(θ) +√m(Pn − P )g(·, θ) +
√m(QR −Q)h(·, θ). It then only remains to
verify the stochastic equicontinuity conditions for the two projection terms:
supd(θ1,θ2)=o(1)
||√m(Pn − P )(g(·, θ1)− g(·, θ2))|| = op(1),
and
supd(θ1,θ2)=o(1)
||√m(QR −Q)(h(·, θ1)− h(·, θ2))|| = op(1).
8
This in turn follows from m ≤ n,R and the equicontinuity lemma of Pollard (1984), p. 150.
Part (c) mimicks Theorem 2.9 in Neumeyer (2004), noting that
1
nRSnR(θ)− g(θ) =
1
nRSnR(θ) + (Pn − P )g(·, θ) + (QR −Q)h(·, θ),
and invoking part (a) and Theorem 24 of Pollard (1984), p. 25.
When the moment conditions q (z, ω, θ) are multivariate, so that d > 1, the above argu-
ments apply to each univariate element of the vector moment condition q (z, ω, θ). In the
vector case, the notation of ||·|| (e.g. ||SnR(θ)||) denotes Euclidean norms. The stated results
in the lemma then follow from the equivalence between the L2 norm and the L1 norm, since
for g ∈ Rd, ||g|| ≤ d∑d
j=1 |gj|. 2
Lemma 1 will be applied in combination with the following restatement of a version of
Theorem 7.2 of Newey and McFadden (1994) and Theorem 3.3 of Pakes and Pollard (1989).
THEOREM 1 Let θp−→ θ0, where g(θ) = 0 if and only if θ = θ0, which is an interior point
of the compact Θ. If
i. ‖g(θ)‖Wn ≤ infθ ‖g(θ)‖Wn + op(m−1/2).
ii. Wn = W + op(1) where W is positive definite.
iii. g(θ) is continuously differentiable at θ0 with a full rank derivative matrix G.
iv. supd(θ,θ0)=o(1)
√m ‖g(θ)− g(θ)− g(θ0)‖W = op(1).
v.√m g(θ0)
d−→ N(0,Σ).
Then the following result holds
√m(θ − θ0)
d−→ N(0, (G′WG)−1G′WΣWG(G′WG)−1). �
Remark: The original Theorem 3.3 of Pakes and Pollard (1989) uses the Euclidean norm
to define the GMM objective function, which amounts to using an identity weighting matrix
Wn ≡ I. However, generalizing their proof arguments to a general random Wn is straight-
forward. First, note that their Theorem 3.3 is isophormic to using a fixed positive definite
weighting matrix W to define the norm. This is because if one uses the square root A of
9
W (such that A′A = W ) to form a linear combination of the original moment conditions
g(θ), the moment condition in Theorem 3.3 can be reinterpreted as Ag(θ), and exactly the
same arguments in the proof goes through, with the matrixes Γ and V in P&P being AG
and AΣA′.
Second, a close inspection of the proof of Theorem 3.3 in P&P shows that their Condi-
tion (i) is only used to the extent of requiring both
‖g(θ)‖W ≤ ‖g(θ0)‖W + op(m−1/2) and ‖g(θ)‖W ≤ ‖g(θ∗)‖W + op(m
−1/2),
where θ∗ is the minimizer of the quadratic approximation to the objective function ‖g(θ)‖Wdefined in p. 1042 of P&P. These will follow from Condition [i] if:
‖g(θ)‖Wn = ‖g(θ)‖W + op(m−1/2), ‖g(θ0)‖Wn = ‖g(θ0)‖W + op(m
−1/2)
and
‖g(θ∗)‖Wn = ‖g(θ∗)‖W + op(m−1/2),
all of which follow in turn from combining Conditions [ii], [iv], and [v].
Consistency, under the conditions stated in Corollary 1, is an immediate consequence of
part (c) of Lemma 1 and Corollary 3.2 of Pakes and Pollard (1989). Asymptotic normality
is an immediate consequence of Theorem 1.
COROLLARY 1 Given Assumption 2, θp−→ θ0 under the following conditions: (a) g(θ) =
0 if and only if θ = θ0; (b) Wnp−→ W for W positive definitive; and (c)∥∥∥g(θ)∥∥∥Wn
= ‖g(θ0)‖Wn+ op(1).
Furthermore, if∥∥∥g(θ)
∥∥∥Wn
= ‖g(θ0)‖Wn+ op(m
−1/2), and if R/n→ κ ∈ [0,∞] as n→∞,
R→∞, then the conclusion of Theorem 1 holds under Assumption 2, with Σ = (1∧κ)Σg +
(1 ∧ 1/κ)Σh, where Σg = Var(g(zi, θ0)) and Σh = Var(h(ωr, θ0)). �.
In particular, Lemma 1.b delivers condition [iv]. Condition [v] is implied by Lemma 1.a
because
√mg(θ0) = UnR(θ0) +
√m(Pn − P )g(·, θ0) +
√m(QR −Q)h(·, θ0)
=√m(Pn − P )g(·, θ) +
√m(QR −Q)h(·, θ0) + op(1)
d−→ N(0, (1 ∧ κ)Σg + (1 ∧ 1/κ)Σh).
10
3.1 MSM Variance Estimation
Each component of the asymptotic variance can be estimated using sample analogs. A
consistent estimate G of G, with individual elements Gj, can be formed by numerical differ-
entiation, for ej being a dθ × 1 vector with 1 in the jth position and 0 otherwise, and δ a
step size parameter
Gj ≡ Gj
(θ, δ)
=1
2δ
[g(θ + ejδ)− g(θ − ejδ)
].
A sufficient, although likely not necessary, condition for G(θ)p−→ G (θ0) is that both δ → 0
and√mδ → ∞. Under these conditions, Lemma 1.b implies that Gj − Gj(θ)
p−→ 0, and
Gj(θ)p−→ Gj(θ0) as both δ → 0 and θ
p→ θ0. Σ can be consistently estimated by
Σ = (1 ∧R/n) Σg + (1 ∧ n/R) Σh,
where
Σg =1
n
n∑i=1
g(zi, θ) g′(zi, θ) and Σh =
1
R
R∑r=1
h(ωr, θ) h′(ωr, θ).
In the above
h(ω, θ) =1
n
n∑i=1
q (zi, ω, θ) .
Resampling methods, such as bootstrap and subsampling, or MCMC, can also be used for
inference. Note that in Σ above, R has to go to infinity with overlapping draws. In contrast,
with independent draws, a finite R only incurs an efficiency loss of the order of 1/R.
4 Asymptotics of MSL with overlapping simulations
In this section we derive the asymptotic properties of maximum simulated likelihood estima-
tors with overlapping simulations, which requires a different approach due to the nonlinearity
of the log function. Recall that MSL is defined as
θ =θ∈Θ
argmax L(θ),
where
L(θ) = Pn logQR q(·, ·, θ) =1
n
n∑i=1
log1
R
R∑r=1
q(zi, ωr, θ) =1
n
n∑i=1
log g(zi, θ);
11
L(θ) and θ are implicitly indexed by m = min(n,R).
To begin with, the class of functions q(z, ·, θ) of ω indexed by both θ and z is required
to be a VC-class, as defined in Van der Vaart and Wellner (1996) (pp 134, 141). Frequently
g(z, θ) is a conditional likelihood in the form of g(y |x, θ) where z = (y, x) includes both the
dependent variable and the covariates. The “densities” g(zi; θ) are broadly interpreted to
include also probability mass functions for discrete choice models or a mixture of probability
density functions and probability mass functions for mixed discrete-continuous models.
ASSUMPTION 3 The class of functions indexed by both θ and z: L = { q(z, ·, θ) : z ∈
Z, θ ∈ Θ} and is VC with a uniformly bounded envelope function L. The classes {g (·, θ) , θ ∈
Θ} and {log g (·, θ) , θ ∈ Θ} are also both VC with a uniformly bounded envelope.
The following boundedness assumption is restrictive, but is difficult to relax for nons-
mooth simulators using empirical process theory. It is also assumed in Lee (1992, 1995).
ASSUMPTION 4 There is an M <∞ such that supz,θ
∣∣∣ 1g(z,θ)
∣∣∣ < M .
Let L(θ) = P log g(·; θ). The VC property and boundedness assumption ensures uniform
convergence.
LEMMA 2 Under Assumptions 2, 3, and 4, L (θ) − L (θ0) converges to L (θ) − L (θ0) as
m→∞ uniformly over Θ.
Proof Consider the decomposition
L(θ)− L(θ)− L (θ0) + L (θ0) = A (θ) +B (θ)
where
A (θ) = (Pn − P )[log g(·, θ)− log g(·, θ0)] (1)
B (θ) = Pn[log g(·, θ)− log g(·, θ0)− log g(·, θ)− log g(·, θ0)].
First, by Theorem 19.13 of van der Vaart (1999), A (θ) converges uniformly to 0 in
probability. By the monotonicity of log transformation and Lemma 2.6.18 (v) and (viii) in
Van der Vaart and Wellner (1996), log ◦QF − log g(·, θ0) is VC-subgraph.
12
Second, we show that B (θ) converges uniformly to 0 in probability as R → ∞. By
Taylor’s theorem and Assumption 4,
supθ|B (θ) | ≤ 2 sup
z,θ| log g(z, θ)− log g(z, θ)|
= 2 supz,θ
∣∣∣∣ g(z, θ)− g(z, θ)
g∗(z, θ)
∣∣∣∣ for g∗(z, θ) ∈ [g(z, θ), g(z, θ)]
≤ 2M supz,θ|g(z, θ)− g(z, θ)|
Moreover, by Assumption 3 and Theorem 19.13 of van der Vaart (1999), as R→∞,
supz,θ|g(z, θ)− g(z, θ)| p→ 0.
Therefore, B (θ) converges uniformly to 0 as R → ∞. The lemma then follows from the
triangle inequality. 2
Consistency is a direct consequence of Theorem 2.1 in Newey and McFadden (1994) from
uniform convergence when the true parameter is uniquely identified.
COROLLARY 2 Under Assumptions 2, 3, and 4, if
1. L(θ) ≥ L(θ0)− op(1)
2. For any δ > 0, sup‖θ−θ0‖≥δ L(θ) < L(θ0)
then θ − θ0p−→ 0.
As pointed out in Pollard (1984) (pp 10), the requirement that supθ∈Θ |L (θ)− L (θ) | =
op (1) can be weakened to lim supn→∞ P{
supθ∈Θ
[L(θ)− L(θ)
]≥ ε}
= 0 for all ε > 0. In the
remaining of this section, we investigate the asymptotic normality of MSL, which requires
that the limiting population likelihood is at least twice differentiable. First we recall a general
result (see for example Sherman (1993) for optimization estimators and Chernozhukov and
Hong (2003) for MCMC estimators, among others).
THEOREM 2√m(θ − θ0)
d−→ N(0, H−1ΣH−1)
under the following conditions:
13
1. L(θ) ≥ supθ∈Θ L(θ)− op( 1m
);
2. θp−→ θ0;
3. θ0 is an interior point of Θ;
4. L(θ) is twice continuously differentiable in an open neighborhood of θ0 with positive
definite Hessian H(θ);
5. There exists D such that√mD
d−→ N(0,Σ); and such that
6. For any δ → 0 and for R(θ) = L (θ)− L (θ)− L (θ0) + L(θ0)− D′ (θ − θ0),
sup‖θ−θ0‖≤δ
mR(θ)
1 +m ‖θ − θ0‖2= op (1) . (2)
(If θ is known to be rm consistent, i.e., θ − θ0 = op(1/rm) for rm → ∞, then Condition 6
only has to hold for δ = op(1/rm).)
The following analysis consists of verifying the conditions in the above general theorem.
The finite sample likelihood, without simulation, is required to satisfy the stochastic differen-
tiability condition as required in the following high level assumption. It is typically satisfied
when the true non-simulated log likelihood function is pointwise differentiable.
ASSUMPTION 5 There exists a mean zero random variable D0 (zi) with finite variance
such that for any δ → 0 we have
sup‖θ−θ0‖≤δ
nRn (θ)
1 + n‖θ − θ0‖2= op (1) (3)
for
Rn (θ) ≡ (Pn − P ) (log g(·, θ)− log g(·, θ0))− D′0(θ − θ0),
where
D0 =1
n
n∑i=1
D0 (zi) .
An primitive condition for this assumption is given in Lemma 3.2.19, p. 302, of Van der
Vaart and Wellner (1996). To account for the simulation error we need an intermediate step
which is a modification of Theorem 1 of Sherman (1993).
14
THEOREM 3 Let {am}, {bm}, and {cm} be sequences of positive numbers that tend to
infinity. Suppose
1. L(θ) ≥ L(θ0)−Op(a−1m );
2. θp−→ θ0;
3. In a neighborhood of θ0 there is a κ > 0 such that L(θ) ≤ L(θ0)− κ‖θ‖2;
4. For every sequence of positive numbers {δm} that converges to zero, ‖θm − θ0‖ < δm
implies∣∣∣L(θm)− L(θ0)− L(θm) + L(θ0)
∣∣∣ ≤ Op(‖θm‖/bm) + op(‖θm‖2) +Op(1/cm) .
then ∥∥∥ θ ∥∥∥ = Op
(1√dm
),
where dm = min (am, b2m, cm).
Proof The proof is a modification of Sherman (1993). Condition 2 implies that there is a
sequence of positive numbers {δm} that converges to zero slowly enough that P (‖θ − θ0‖ ≤
δm)→ 1. When ‖θ − θ0‖ ≤ δm we have from Conditions 1 and 2 that
κ‖θ‖2 −Op(1/am) ≤ L(θ)− L(θ0)− L(θ) + L(θ0) ≤ Op
(‖θ‖/bm
)+ op
(‖θ‖2
)+Op(1/cm)
whence
[κ+ øp(1)] ‖θ‖2 ≤ Op(1/am) +Op
(‖θ‖/bm
)+Op(1/cm) ≤ Op(1/dm) +Op
(‖θ‖/
√dm
).
Letting W denote an Op(1/√dm) random variable, the expression above implies that
1
2κ‖θ‖2 − W‖θ‖ ≤ Op(1/dm)
on an event that has probability one in the limit. Completing the square gives
1
2κ(‖θ‖ −W/κ
)2
≤ Op
(1
dm
)+W 2
2κ= Op
(1
dm
)whence
√dm
∥∥∥θ∥∥∥ ≤ √dmW +Op(1) = Op(1). 2
The next assumption requires that the simulated likelihood is not only unbiased, but is
also a proper likelihood function.
15
ASSUMPTION 6 For all simulation lengths R and all parameters θ, both g (zi; θ) and
QRq (zi, ·; θ) are proper (possibly conditional) density functions.
We also need to regulate the amount of irregularity that can be allowed by the simulation
function q (z, ω, θ). In particular, it allows for q (z, ω, θ) to be an indicator function.
ASSUMPTION 7 Define f (z, ω, θ) = q (z, ω, θ) /g (z, θ) − q (z, ω, θ0) /g (z, θ0), then (1)
Q× P[sup‖θ−θ0‖=o(1) f (·, ·, θ)2] = o (1), (2) sup‖θ−θ0‖≤δ,z∈Z Varωf (z, ω, θ) = O (δ).
ASSUMPTION 8 Define ψ (ω, θ) =∫ q(z,ω,θ)
g(z,θ)f (z) dz, where f (z) is the joint density or
probability mass function of the data. There exists a random vector D1 (ωr) with finite
variance such that for D1 = 1R
∑Rr=1D1 (ωr)−QD1 (ωr),
sup‖θ−θ0‖=o((logR)−1)
R (QR −Q) (ψ (·, θ)− ψ (·, θ0))−RD′1 (θ − θ0)
1 +R‖θ − θ0‖2= op (1)
Remark When g (z; θ) represents the joint likelihood of the data, f (z) = g (z; θ0). When
g (z; θ) = g (y|x; θ) represents a conditional likelihood, f (z) = g (z; θ0) f (x) where f (x)
is the marginal density or probability mass function of the conditioning variables, in which
case ψ (ω, θ) =∫ ∫ q(z,ω,θ)
g(z,θ)g (y|x; θ0) dyf (x) dx, with the understanding that integrals become
summations in the case of discrete data. Assumption 8 can be further simplified when
the true likelihood g (z, θ) is twice continuously differentiable (with bounded derivatives for
simplicity). In this case
D1 (ωr) = −∫q (ωr, z, θ0)
g2 (z; θ0)
∂
∂θg (z; θ0) f (z) dz. (4)
When g (z; θ) is the joint likelihood of the data, D1 (ωr) = −∫ q(ωr,z,θ0)
g(z;θ0)∂∂θg (z; θ0) dz. When
g (z; θ) is a conditional likelihood g (z; θ) = g (y|x; θ), D1 (ωr) = −∫ q(ωr,z,θ0)
g(z;θ0)∂∂θg (z; θ0) f (x) dz.
To see (4), note that
(QR −Q) (ψ (·, θ)− ψ (·, θ0))
= P
[1
g (·, θ)− 1
g (·, θ0)
](g (·, θ0)− g (·, θ0))
+ P1
g (·, θ0)(g (·, θ)− g (·, θ)− g (·, θ0) + g (·, θ0))
16
+ P
(1
g (·, θ)− 1
g (·, θ0)
)(g (·, θ)− g (·, θ)− g (·, θ0) + g (·, θ0)) .
The second line is zero because of assumption 6. The third line can be bounded by
M‖θ − θ0‖ sup‖θ−θ0‖=o((logR)−1),z∈Z
| (QR −Q) (q (·, z, θ)− q (ωr, z, θ0)) ‖ = op
(1√R
)‖θ − θ0‖,
using the same arguments that handle the B22 (θ, z) in the proof. Finally, the first line
becomes
P
[1
g (·, θ)− 1
g (·, θ0)
](g (·, θ0)− g (·, θ0)) = (QR −Q)D1 (·) (θ − θ0) + R (θ) ,
where ‖R (θ) ‖ ≤ op (‖θ − θ0‖) | supz∈Z (QR −Q) q (·, z, θ0) | = op
(‖θ−θ0‖√
R
).
THEOREM 4 Under Assumptions 2, 3, 4, 5, 6, 7, and 8 and Conditions 1, 2, 3 and 4 of
Theorem 2, the conclusion of Theorem 2 holds with D = PnD0 (·) +QRD1 (·) and
Σ = (1 ∧ κ) Var (D0 (zi)) + (1 ∧ 1/κ) Var (D1 (ωr)) .
Proof Consistency is given in Corollary 2. Consider again the decomposition given by
Equation (1). Because of the linearity structure of Conditions (5) and (6) of Theorem 2, it
suffices to verify them separately for the terms A (θ) and B (θ).
It follows immediately from Assumption 5 that Conditions (5) and (6) of Theorem 2 hold
for the first term A (θ) because n ≥ m, since (3) is increasing in n:
sup||θ−θ0||≤δ
m(A (θ)− D′0 (θ − θ0)
)1 +m||θ − θ0||2
≤ sup||θ−θ0||≤δ
R0 (θ)
1/n+ ||θ − θ0||2= oP (1) , (5)
for R0 (θ) = A (θ)− D′0 (θ − θ0). Next we verify these conditions for the B (θ) term.
Decompose B further into B (θ) = B1 (θ) +B2 (θ) +B3 (θ), where
B1 (θ) = Pn
[1
g (·, θ)(g (·, θ)− g (·, θ))− 1
g (·, θ)(g (·, θ0)− g (·, θ0))
]B2 (θ) = −1
2Pn
[1
g (·, θ)2 (g (·, θ)− g (·, θ))2 − 1
g (·, θ0)2 (g (·, θ0)− g (·, θ0))2
]B3 (θ) =
1
3Pn
[1
g (·, θ)3 (g (·, θ)− g (·, θ))3 − 1
g (·, θ0)3 (g (·, θ0)− g (·, θ0))3
].
17
In the above g (z, θ) and g (z, θ0) are mean values, dependent on z, between [g (z, θ) , g (z, θ)]
and [g (z, θ0) , g (z, θ0)] respectively. By Assumption 4,
supθ∈Θ|B3 (θ) | ≤ 2
3M3| sup
θ∈Θ,z∈Z(g (z, θ)− g (z, θ)) |3 ≤ Op
(1
R√R
),
where the last inequality follows from supθ∈Θ,z∈Z |g (z, θ)− g (z, θ) | = Op
(1√R
)due, e.g., to
Theorem 2.14.1 of Van der Vaart and Wellner (1996). By Theorem 2.14.1 it also holds that
supθ∈Θ|B1 (θ) | = Op
(1√R
)and sup
θ∈Θ|B2 (θ) | = Op
(1
R
).
This allows us to invoke Theorem 3, with dm =√m, to claim that
‖θ − θ0‖ = Op
(m−1/4
).
Next we bound the second term by, up to a constant, within ‖θ − θ0‖ = op (1/ logR):
sup‖θ−θ0‖�(logR)−1
|B2 (θ) | = op
(1
R
). (6)
To show (6), first note that
sup‖θ−θ0‖�(logR)−1
|B2 (θ) | ≤ sup‖θ−θ0‖�(logR)−1,z∈Z
B21 (θ, z)×B22 (θ, z)
where
B21 (θ, z) =
∣∣∣∣ (QR −Q)
(q (z, ·, θ)g (z, θ)
+q (z, ·, θ0)
g (z, θ0)
) ∣∣∣∣and
B22 (θ, z) =
∣∣∣∣ (QR −Q)
(q (z, ·, θ)g (z, θ)
− q (z, ·, θ0)
g (z, θ0)
) ∣∣∣∣.It follows again from Theorem 2.14.1 that
sup‖θ−θ0‖�(logR)−1,z∈Z
|B21 (θ, z) | = Op
(1√R
).
Next we consider B22 (θ, z) in light of arguments similar to Theorem 2.37 in Pollard (1984),
for which it follows that for δ = o((logR)−1), for
f (z, ω, θ) = q (z, ω, θ) /g (z, θ)− q (z, ω, θ0) /g (z, θ0)
18
where ‖θ−θ0‖ ≤ δ, and for εR = ε/√R: Var (QRf (z, ·, θ)) /ε2R → 0 for each ε > 0. Therefore
the symmetrization inequalities (30) in p. 31 of Pollard (1984) apply and subsequently, for
FR = {f (z, ω, θ) , z ∈ Z, ‖θ − θ0‖ ≤ δ},
P
(supf∈FR
∣∣∣∣ (QR −Q) f
∣∣∣∣ > 8ε√R
)
≤ 4P
(supf∈FR
|Q0Rf | > 2
ε√R
)≤ 8Aε−WRW/2 exp
(− 1
128ε2δ−1
)+ P
(supf∈FR
QRf2 > 64δ
).
The second term goes to zero for the same reason as in Pollard. The first also goes to
zero since logR − 1δ→ −∞. Thus we have shown that B22 (θ, z) = op
(1√R
)uniformly in
θ−θ0 ≤ δ and z ∈ Z, and consequently (6) holds. By considering n� R, n� R and n ≈ R
separately, (6) also implies that for some α > 0:
sup‖θ−θ0‖�m−α
|B2 (θ) | = op
(1
m
).
It remains to investigate B1 (θ) = 1n
∑ni=1
1R
∑Rr=1 f (zi, ωr, θ), which, using Assumption 6,
can be written
B1 =1
nRSnR
(fθ
)+B0,
where
f (z, ω, θ) = f (z, ω, θ)−Qf (z, ·, θ)− Pf (·, ω, θ) + PQf (·, ·, θ) ,
B0 (θ) = (QR −Q) (ψ (·, θ)− ψ (ω, θ0)) , and ψ (ω, θ) =∫ q(z,ω,θ)
g(z,θ)f (z) dz. Upon noting that
Q q(·,z,θ)g(z,θ)
= 1 identically, Qf (z, ·, θ) = 0 and PQf (z, ·, θ) = 0. The proof of Theorem 2.5 (pp.
83) of Neumeyer (2004) shows that, since by assumption 7,
Q× P
[sup
‖θ−θ0‖=o(1)
f (·, ·, θ)2
]= o (1)
the envelope expectation Q⊗ P (F )2 converges to zero. Hence,
1
nRSnR
(fθ
)= op
(1√nR
)= op
(1
m
).
Finally, B0 is handled by Assumption 8. So that, since B (θ) = B1 (θ) − B0 (θ) + B2 (θ) +
B3 (θ)+B0 (θ), and each of B1 (θ)−B0 (θ), B2 (θ) and B3 (θ) is oP(
1R
)uniformly in ||θ−θ0|| ≤
19
δ, for any δ → 0, we have, for R1 (θ) = B (θ)− D′1 (θ − θ0),
sup||θ−θ0||≤δ
RR1 (θ)
1 +R||θ − θ0||2= sup||θ−θ0||≤δ
RB0 (θ)−RD′1 (θ − θ0)
1 +R||θ − θ0||2+ oP (1) = oP (1) .
This together with (5) implies that condition 6 of Theorem 2 is satisfied with D = D0 + D1,
since we can bound (2) by
(2) = sup||θ−θ0||≤δ
R0 (θ) + R1 (θ)
1/m+ ||θ − θ0||2≤ sup||θ−θ0||≤δ
R0 (θ)
1/n+ ||θ − θ0||2+ sup||θ−θ0||≤δ
R1 (θ)
1/R + ||θ − θ0||2.
Finally to verify condition 5 in Theorem 2, write
√mD =
√(1 ∧ R
n
)1√n
n∑i=1
D0 (zi) +
√( nR∧ 1) 1√
R
R∑r=1
D1 (ωr) .
That√mD
d→ N (0,Σ) follows from 1 ∧ Rn→ 1 ∧ κ, n
R∧ 1 → 1
κ∧ 1, the continuous
mapping Theorem, Slutsky’s Lemma, and CLTs applied to√nD0 = 1√
n
∑ni=1 D0 (zi) and
√RD1 = 1√
R
∑Rr=1D1 (ωr). 2
4.1 MSL Variance Estimation
A consistent estimate of the asymptotic variance can be formed by sample analogs. In
general, each of
H = Pn∂2
∂θ∂θ′logQRq
(·, ·, θ
), D0 (zi) =
∂
∂θlog g
(zi, θ
)and D1 (ωr) =
∂
∂θPnq(ωr, ·, θ
)g(·, θ)
can not be computed analytically, and has to be replaced by numerical estimates:
Hij =1
4ε2
(Pn logQRq
(·, ·, θ + eiε+ ejε
)− Pn logQRq
(·, ·, θ − eiε+ ejε
)−Pn logQRq
(·, ·, θ + eiε− ejε
)+ Pn logQRq
(·, ·, θ − eiε− ejε
)),
D0j (zi) =1
2h
(log g
(zi, θ + ejh
)− log g
(zi, θ − ejh
)),
D1j (wr) =1
2h
Pn q(ωr, ·, θ + ejh
)g(·, θ + ejh
) − Pnq(ωr, ·, θ − ejh
)g(·, θ − ejh
) .
Let, for κ = R/n,
Σh = PnD0 (·) D0 (·)′ Σg = QRD1 (·) D1 (·) Σ = (1 ∧ κ) Σh + (1 ∧ 1/κ) Σg.
20
Under the given assumptions, if ε→ 0, h→ 0,√nh→∞ and n
14 ε→∞, then H = H+op (1)
and Σh = Σh + op (1), Σg = Σg + op (1). Hence Σ = Σ + op (1) by the continuous mapping
theorem.
5 MCMC
Simulated objective functions that are nonsmooth can be difficult to optimize by numer-
ical methods. An alternative to optimizing the objective function is to run it through a
MCMC routine, as in Chernozhukov and Hong (2003). Under the assumptions given in the
previous sections, the MCMC Laplace estimators can also be shown to be consistent and
asymptotically normal. The Laplace estimator is defined as
θ = argminθ∈Θ
∫ρ(√
m (u− θ))
exp(mL (u)
)π (u) du.
In the above ρ (·) is a convex symmetric loss function such that ρ (h) ≤ 1 + |h|p for some
p ≥ 1, and π (·) is a continuous density function with compact support and postive at θ0. In
the above the objective function can be either GMM:
L (θ) =1
2PnQRq (·, ·, θ)′WnPnQRq (·, ·, θ) ,
or the log likelihood function L (θ) =∑n
i=1 log g (zi, θ).
The asymptotic distribution of the posterior distribution and θ follows immediately from
Assumption 2, which leads to Theorem 1, and Chernozhukov and Hong (2003). Define
h =√m(θ − θ
), and consider the posterior distribution on the localized parameter space:
pn (h) =π(θ + h√
m
)exp
(mL
(θ + h/
√m)−mL
(θ))
Cm
where
Cm =
∫θ+h/
√m∈Θ
π
(θ +
h√m
)exp
(mL
(θ + h/
√m)−mL
(θ))
dh.
Desirable properties of the MCMC method include the following, for any α > 0:∫|h|α|pn (h)− p∞ (h) |dh p−→ 0, where p∞ (h) =
√| det(J0)|(2π)dim θ
exp
(−1
2h′J0h
). (7)
In the above J0 = G′WG for the GMM model and J0 = − ∂2
∂θ∂θ′L (θ0) for the likelihood
model.
21
THEOREM 5 Under Assumption 2 for the GMM model or Assumptions 2, and 8, Condi-
tions 1, 2, 4 of Theorem 2 for the MLE model, (7) holds. Consequently,√m(θ − θ
)p−→ 0,
and the variance of pn,R (h) converges to J−10 in probability.
Proof For the GMM model, the stated results follow immediately from Assumption 2, which
leads to Theorem 1, and Chernozhukov and Hong (2003) (CH). The MLE case is also almost
identical to CH but requires a small modification. When Condition (6) in Theorem 2 holds for
δ = o (1), the original proof shows (7) over three areas of integration separately, {|h| ≤√mδ}
and {|h| ≥ δ√m}. When Condition 6 in Theorem 2 only holds for δ = am = (logm)−d,
we need to consider separately, for a fixed δ, {|h| ≤√mam}, {
√mam ≤ |h| ≤
√mδ} and
{|h| ≥ δ√m}. The arguments for the first and third regions {|h| ≤
√mam} and {|h| ≥ δ
√m}
are identical to the ones in CH. Hence we only need to show that (since the prior density is
assumed bounded around θ0):∫√mam≤|h|≤
√mδ
π
(θ +
h√m
)exp
(mL
(θ + h/
√m)−mL
(θ))
dhp−→ 0.
By arguments that handle the term B in the proof of Theorem 4, in this region,
ω (h) ≡ mL(θ + h/
√m)−mL
(θ)
= −1
2(1 + op (1))h′J0h+mOp
(1√m
).
Hence the left hand side integral can be bounded by, up to a finite constant∫√mam≤|h|≤
√mδ
exp (ω (h)) dh = exp(Op
(√m)) ∫
√mam≤|h|
exp
(−1
2(1 + op (1))h′J0h
)dh.
The tail of the normal distribution can be estimated by w.p. → 1:∫√mam≤|h| exp
(−1
2(1 + op (1))h′J0h
)dh
≤∫√mam≤|h| exp
(−1
4h′J0h
)dh ≤ C (
√mam)
−1exp (−ma2
m) ,
for am >> m−α for any α > 0, hence for some α > 0.∫√mam≤|h|≤
√mδ
exp (ω (h)) dh ≤ C exp(Op
(√m)) (
m12−α)−1
exp(−m1−2α
)= op (1) .
The rest of the proof is identical to CH. 2
The MCMC method can always be used to obtain consistent and asymptotically nor-
mal parameter estimates. For the GMM model with W being the asymptotic variance of
22
√mg (θ0), or for the likelihood model where n >> R, the posterior distribution from the
MCMC can also be used to obtain valid asymptotic confidence intervals for θ0.
For the GMM model where W 6= asym Var (√mg (θ0)), or the likelihood model where
R >> n, R ∼ n, the posterior distribution does not resemble the asymptotic distribution of
θ or θ. However, in this case the variance of the posterior distribution can still be used to
estimate the inverse of the Hessian term (G′WG)−1 or H (θ0) in Condition (4) of Theorem 2.
6 Monte Carlo Simulations
In this section we report the results from a set of Monte Carlo simulations from a univariate
Probit model to illustrate the finite sample properties of the asymptotic distributions derived
in this paper. The true data generating process is specified to be:
yi = 1{α0 + xiβ0 + εi ≥ 0}, εi ⊥⊥ xi, εiiid∼ N(0, 1).
Define zi = (yi, xi) and xi = (1, xi), θ := (α, β)′. In the earlier notation the likelihood
function is g(zi, θ) = Φ(x′iθ)yi(1 − Φ(x′iθ))
1−yi , where Φ (u) =∫ u 1√
2πe−v
2/2dv, and the true
parameter θ0 maximizes
L(θ) = Ezi log g(zi, θ) = Ezi [yi log Φ(x′iθ) + (1− yi) log(1− Φ(x′iθ))].
The likelihood function is simulated by g(zi, θ) = 1R
∑Rr=1 q(wr, zi, θ), where
q(zi, wr, θ) = 1{x′iθ + wr ≥ 0}yi(1− 1{x′iθ + wr ≥ 0})1−yi ,
and wriid∼ N(0, 1). Note that Qg(zi, θ) = Qq(zi, ·, θ) = g(zi, θ). These functions fall into
the class of multinomial discrete choice models studied in Pakes and Pollard (1989) and
hence satisfy Assumptions 2 and 3. Assumption 6 is immediate. Assumptions 4 and 7 are
satisfied because of the indicator function in q(zi, wr, θ) when θ and z have bounded support.
Furthermore, since g (zi, θ) is differentiable, Assumptions 5 and 8 can be directly verified with
D0 (zi) =yi−Φ(x′iθ0)
Φ(x′iθ0)(1−Φ(x′iθ0))φ (x′iθ0)xi and
D1 (ωr) =
∫1{x′iθ0 + wr ≥ 0} − Φ (x′iθ0)
Φ (x′iθ0) (1− Φ (x′iθ0))φ (x′iθ0)xif (xi) dxi.
23
The simulated maximum likelihood estimator maximizes
LnR(θ) =1
n
n∑i=1
log g(zi, θ)
and is computed using the simulated annealing routine in Matlab’s global optimization tool-
box. The starting value for optimization is taken to be the OLS estimates. The numerical
results are not sensitive to the choice of the starting values when the temperature parame-
ter in the simulated annealing routine is reduced sufficiently slowly. Obviously this simple
example can be estimated by the probit command in Stata. The goal of this section is to
illustrate the finite properties of the simulated maximum likelihood estimator when we are
agnostic about the normal distribution function and density function.
We compute an estimate of the asymptotic variances using the empirical analog of The-
orem 2:√m(θSMLE − θ
)A∼ N
(0, H−1
((1 ∧ κ)Σ0 + (1 ∧ 1/κ)Σ1
)H−1
).
In the above κ = R/n, m = min (R, n).
While analytical derivatives can be easily computed in this example, in practice, the
analytical derivatives of the likelihood function is usually unknown. In our baseline results,
we are agnostic about the analytical derivatives and estimate the asymptotic variance using
numerical differentiation:
Σ0 =1
n
n∑i=1
D0(zi)D0(zi)′, H = −Σ0,
Σ1 =1
R
R∑r=1
D1(wr)D1(wr)′, D1(wr) := − 1
n
N∑i=n
q(zi, wr, θSMLE)
g(zi, θSMLE)D0(zi).
In the above, for an estimate of the derivative of g(zi, θSMLE) with respect to θ, denoted as
∇g(zi, θSMLE), we define:
D0(zi) :=1
g(zi, θSMLE)∇g(zi, θ
SMLE).
We use the index structure of g (zi, θ) and numerical differentiation to obtain ∇g(zi, θSMLE).
For this purpose we note that
∂g(zi, θ)
∂θ=
(− ∂
∂θΦ(−x′iθ)
)yi ( ∂
∂θΦ(−x′iθ)
)1−yi
24
= xi
( ∂
∂wΦ(w)
∣∣∣∣w=−x′iθ
)yi (− ∂
∂wΦ(w)
∣∣∣∣w=−x′iθ
)1−yi = (−1)1−yixi φ (x′iθ)
Therefore we use:
∇g(zi, θSMLE) = (−1)1−yixi φ
(x′iθ)
where to be agnostic about knowledge of φ (·), we use a first order two-sided formula to
define:
φ (w) ≡ 1
R
R∑r=1
1{wr ≤ w + ε} − 1{wr ≤ w − ε}2ε
=1
2ε
#{w − ε ≤ wr ≤ w + ε}R
,
where ε is a step size parameter. In the simulation, we experiment with a range of the step
size parameter, ε = R−α, where α ranges in[2 3
21 3
412
13
14
18
110
115
]. It turns out that the
larger step sizes produces more accurate coverage for a larger range of n and R. Therefore
we use α = 1/15 in the results reported in the following tables.
For comparison, we also provide the empirical coverages when the analytical derivatives
is used to compute H and Σ0. Here
H = − 1
n
n∑i=1
φ(x′iθ)2
Φ (x′iθ) (1− Φ (x′iθ))xix′i, Σ0 = −H,
and
Σ1 =1
R
R∑r=1
D1(wr)D1(wr)′ D1 (wr) = − 1
n
n∑i=1
q(wr, zi, θ
)g(zi, θ
)(yi − Φ
(x′iθ))
φ (x′iθ)xi
Φ(x′iθ)
(1− Φ (x′iθ)).
Table 1 reports the empirical coverage of the 95% confidential interval constructed from
the estimate of the asymptotic distribution using numerical derivatives, over 5000 Monte
Carlo repetitions. The column dimension corresponds to the sample size n and the row
dimension corresponds to the ratio between R and n. The two rows for each sample size
correspond to the intercept and the slope coefficient, respectively. The results show that
the asymptotic distribution accurately represents the finite sample distribution when m =
min (R, n) is not too small.
Table 2 reports the false empirical coverage of the 95% confidence interval when the
simulation noise is ignored in the asymptotic distribution of the estimator. As expected,
25
Table 1: Empirical coverage frequency for 95% confidence interval, numerical derivatives
n, κ 0.2 0.5 0.8 1 2 5 10 20 50 10050 0.9334 0.9428 0.9436 0.9458 0.9476 0.9508 0.9488 0.9526 0.9524 0.9546
0.9946 0.9844 0.979 0.9736 0.9712 0.9672 0.9638 0.9656 0.9676 0.9682
100 0.9254 0.9266 0.9342 0.9326 0.9356 0.9342 0.9378 0.939 0.939 0.94080.983 0.9628 0.9564 0.956 0.9536 0.9538 0.9524 0.9536 0.9514 0.9558
200 0.9342 0.9388 0.9378 0.9382 0.938 0.937 0.9408 0.9366 0.9424 0.9430.9672 0.9528 0.9462 0.9456 0.9442 0.9472 0.9466 0.9462 0.9474 0.9488
400 0.9448 0.9412 0.9388 0.9398 0.9358 0.9388 0.936 0.9362 0.9346 0.93780.9512 0.9442 0.9516 0.9472 0.9498 0.9482 0.9514 0.9528 0.9512 0.9536
800 0.9438 0.9442 0.9384 0.9386 0.9458 0.9412 0.9454 0.9408 0.9422 0.94340.9468 0.9462 0.9394 0.9442 0.948 0.9518 0.9512 0.9492 0.9538 0.9508
The total number of Monte Carlo repetitions is 5000.
Table 2: False empirical coverage frequency for 95% confidence interval, numerical derivatives
n, κ 0.2 0.5 0.8 1 2 5 10 20 50 10050 0.662 0.7822 0.8344 0.8504 0.8954 0.9298 0.939 0.9474 0.9508 0.9538
0.9614 0.9578 0.9566 0.9562 0.9626 0.964 0.9622 0.9652 0.9672 0.968
100 0.594 0.742 0.799 0.8236 0.878 0.9136 0.9282 0.9352 0.9362 0.94020.9246 0.9348 0.9414 0.9406 0.9456 0.951 0.9518 0.9534 0.951 0.9554
200 0.5892 0.737 0.8016 0.8232 0.8784 0.9154 0.9292 0.931 0.9384 0.94260.9172 0.931 0.9338 0.9334 0.9378 0.9452 0.9454 0.9454 0.9474 0.9486
400 0.5744 0.727 0.794 0.8218 0.8734 0.9156 0.9224 0.9292 0.932 0.93620.9134 0.9308 0.9412 0.9394 0.9472 0.947 0.9508 0.9524 0.9512 0.9536
800 0.575 0.7302 0.8022 0.8182 0.8828 0.9192 0.9314 0.9362 0.9394 0.9420.9242 0.9362 0.9328 0.9376 0.9442 0.9498 0.9504 0.949 0.9538 0.9508
The total number of Monte Carlo repetitions is 5000.
26
Table 3: Empirical coverage frequency for 95% confidence interval, analytical derivatives
n, κ 0.2 0.5 0.8 1 2 5 10 20 50 10050 0.9956 0.953 0.9542 0.9574 0.9646 0.9662 0.9604 0.9542 0.9518 0.9526
0.9944 0.9824 0.976 0.9724 0.9694 0.9582 0.9586 0.9558 0.9574 0.9582
100 0.94 0.943 0.9522 0.9572 0.9576 0.9436 0.9402 0.9422 0.9402 0.94080.9856 0.9722 0.9628 0.962 0.9534 0.952 0.9506 0.952 0.9528 0.954
200 0.9326 0.9472 0.9572 0.951 0.9464 0.9398 0.9428 0.9378 0.9418 0.94220.9742 0.9568 0.9446 0.9456 0.9392 0.9434 0.9412 0.9418 0.9432 0.9436
400 0.9356 0.9512 0.954 0.9494 0.9346 0.941 0.9368 0.9374 0.9352 0.93760.9638 0.9488 0.9514 0.9498 0.9482 0.9504 0.9492 0.9514 0.9526 0.9526
800 0.9416 0.948 0.943 0.943 0.9458 0.9416 0.9452 0.9422 0.9418 0.94340.9542 0.9462 0.9402 0.943 0.9468 0.9496 0.9502 0.95 0.95 0.9504
The total number of Monte Carlo repetitions is 5000.
when R/n is large, in particular above 10, the improvement from accounting for Σ1 in the
asymptotic distribution is very small. When R/n is very small, the size distortion from
ignoring Σ1 is very sizable. The size distortion is quite visible when R/n is as big as 2, and
still visible even when R/n = 5.
Table 3 and 4 report the counterparts of Table 1 and 2 when analytical derivatives are
used instead to compute the asymptotic variances. In Table 3, we see that using analytical
derivatives do not necessarity give a more accurate coverage than using numerical derivatives.
The results in Table 4 is similar to that in Table 2: ignoring variances due to simulation
when R is smaller than n can lead to significant errors in the confidence interval.
Empirically, choosing an optimal step size for numerical gradient calculation can be
difficult and depends on knowledge of the underlying function to be simulated. Without this
knowledge, we recommend using a rule of thumb of the form Cn−α for α < 1/2 for the step
size choice, where we have chosen α = 1/15 in this simulation example. The optimal choice
of C and α is an open challenging theoretical question in this context which is beyond what
27
Table 4: False empirical coverage frequency for 95% confidence interval, analytical derivatives
n, κ 0.2 0.5 0.8 1 2 5 10 20 50 10050 0.6676 0.7916 0.842 0.8606 0.9002 0.9314 0.9424 0.9474 0.9504 0.9508
0.9186 0.9316 0.9394 0.9364 0.9472 0.9506 0.9546 0.9534 0.9566 0.958
100 0.6056 0.754 0.8064 0.8274 0.88 0.9166 0.9298 0.937 0.938 0.93920.91 0.922 0.9322 0.932 0.9418 0.9488 0.9488 0.9508 0.9528 0.954
200 0.5948 0.7442 0.8052 0.8258 0.8804 0.9156 0.9288 0.9318 0.9396 0.94120.9114 0.9224 0.9244 0.9272 0.9322 0.941 0.9404 0.9414 0.9428 0.9436
400 0.5736 0.7236 0.795 0.8234 0.8758 0.916 0.9232 0.93 0.9324 0.93640.9132 0.9294 0.94 0.9368 0.9446 0.9498 0.9482 0.9512 0.9526 0.9524
800 0.5724 0.7308 0.8028 0.8226 0.8826 0.919 0.9316 0.9364 0.9394 0.9420.9198 0.9338 0.9338 0.936 0.9428 0.9486 0.9494 0.949 0.9498 0.9504
The total number of Monte Carlo repetitions is 5000.
we are able to obtain in this paper. We can also adopt Silverman’s rule of thumb or use
cross-validation methods for data driven automated bandwidth selection. However, these
methods are designed for nonparametric curve fitting. They are not known to be optimal
in semiparametric problems such as variance estimation that we consider. We conducted a
small experiment where we looked for the stepsize that minimizes the difference between the
analytical and asymptotic variance in each Monte Carlo trial across the above values of α,
and find substantial variation in the best step size in this sense, which is tabulated in Table
7. However, minimizing the difference between the analytical and asymptotic variance does
not seem to translate into coverage accuracy. Comparing Table 1 (for α = 1/15) with Table
8 (for α = 1/2) shows that the larger step size produces more accurate confidence intervals
than the smaller step size in general. We leave it for future research for a theoretical guidance
on the properties of the step sizes.
Overlapping draws are applicable in situations in which independent draws are not com-
putationally practicable, or with nonsmooth moment conditions where the theoretical va-
28
Table 5: False empirical coverage frequency for 95% confidence interval, analytical, indepen-dent draws
n, κ 0.2 0.5 0.8 1 2 5 10 20 50 10050 0.7816 0.9182 0.9208 0.922 0.9332 0.942 0.9506 0.9506 0.9502 0.948
0.912 0.9266 0.9384 0.94 0.9434 0.9506 0.9514 0.9558 0.957 0.9598
100 0.8838 0.908 0.8992 0.9068 0.9208 0.9366 0.9354 0.9358 0.9392 0.94260.9102 0.9184 0.9288 0.9254 0.9378 0.945 0.951 0.9518 0.952 0.9542
200 0.8622 0.8966 0.9108 0.914 0.9198 0.9352 0.937 0.9414 0.9426 0.9410.8912 0.9152 0.918 0.9266 0.9356 0.9378 0.9422 0.9448 0.9454 0.9458
400 0.8764 0.9056 0.9188 0.9214 0.929 0.934 0.9338 0.9372 0.9386 0.93780.9068 0.9256 0.9366 0.936 0.9426 0.9486 0.9486 0.9528 0.952 0.9528
800 0.9056 0.9228 0.9288 0.932 0.9352 0.9406 0.9422 0.9454 0.9448 0.94420.9116 0.9332 0.9378 0.94 0.9428 0.9488 0.9476 0.9498 0.9496 0.9488
The total number of Monte Carlo repetitions is 5000.
29
Table 6: False empirical coverage frequency for 95% confidence interval, numerical, indepen-dent draws
n, κ 0.2 0.5 0.8 1 2 5 10 20 50 10050 0.2892 0.7306 0.8178 0.8348 0.8944 0.9202 0.935 0.94 0.9462 0.9488
0.417 0.749 0.853 0.8786 0.9252 0.947 0.9524 0.9614 0.9602 0.9652
100 0.5322 0.8216 0.855 0.8704 0.8986 0.9196 0.9276 0.9324 0.9358 0.93820.5858 0.8462 0.8936 0.898 0.9272 0.9342 0.9434 0.9494 0.9516 0.9534
200 0.75 0.8646 0.8908 0.8978 0.9128 0.9316 0.934 0.9386 0.9404 0.93840.7778 0.893 0.9052 0.9156 0.9302 0.9334 0.9438 0.9456 0.9482 0.9482
400 0.832 0.8932 0.9076 0.917 0.926 0.9324 0.9324 0.9376 0.9376 0.93780.8722 0.914 0.9292 0.9316 0.9402 0.9474 0.9476 0.9522 0.9522 0.9532
800 0.8908 0.9196 0.924 0.9302 0.9334 0.94 0.9422 0.944 0.9458 0.94420.901 0.9318 0.936 0.9368 0.9426 0.9498 0.9472 0.95 0.9488 0.9504
The total number of Monte Carlo repetitions is 5000.
Table 7: Step size parameter (α) minimizing difference between numerical and asymptoticvariance in each trial of (n, κ)
n, κ 0.2 0.5 0.8 1 2 5 10 20 50 10050 0.75 0.75 0.75 0.5 0.5 0.5 0.5 0.5 0.5 0.5100 0.75 0.75 0.75 0.5 0.5 0.5 0.5 0.5 0.5 0.5200 0.75 0.75 0.75 0.5 0.5 0.5 0.5 0.067 0.067 0.067400 0.75 0.75 0.75 0.5 0.5 0.067 0.067 0.067 0.067 0.067800 0.75 0.75 0.75 0.125 0.067 0.067 0.067 0.067 0.067 0.067
The total number of Monte Carlo repetitions is 5000.
30
Table 8: Coverage with α = 1/2, overlapping, correct variance
n, κ 0.2 0.5 0.8 1 2 5 10 20 50 10050 0.854 0.871 0.881 0.891 0.910 0.932 0.937 0.947 0.951 0.954
0.963 0.949 0.948 0.945 0.953 0.956 0.958 0.964 0.967 0.968
100 0.846 0.873 0.890 0.891 0.907 0.927 0.930 0.938 0.938 0.9410.948 0.939 0.935 0.940 0.940 0.945 0.948 0.953 0.951 0.954
200 0.863 0.890 0.906 0.906 0.915 0.928 0.936 0.932 0.939 0.9410.944 0.935 0.929 0.931 0.932 0.939 0.940 0.944 0.945 0.949
400 0.887 0.900 0.912 0.920 0.922 0.934 0.930 0.935 0.935 0.9350.942 0.936 0.941 0.939 0.944 0.943 0.948 0.951 0.949 0.953
800 0.897 0.914 0.922 0.915 0.936 0.935 0.941 0.939 0.942 0.9420.945 0.940 0.934 0.938 0.942 0.948 0.947 0.947 0.952 0.950
The total number of Monte Carlo repetitions is 5000.
lidity of independent draws is more difficult and beyond the scope of the current paper. In
spite of the lack of a theoretical proof, in tables 5 and 6 we report the couterparts with inde-
pendent draws. When R is relatively small compared to n, the point estimate is sufficiently
biased and the confidence interval does not center on the true parameter value. Larger R
reduces bias and leads to more accurate empirical coverage frequencies. In particular, the
independent draws method does not perform well relative to overlapping draws when both N
ad R/N are small. In unreported simulations we also experimented with the EM algorithm.
While the EM algorithm converges quickly, we are not able to obtain empirical coverage
frequencies that are close to the nominal level.
We also report the mean bias and root mean square error for both overlapping and
independent draws in tables 9 to 12. Especially for small n and R/n, overlapping draws tend
to have smaller bias. While its intercept term tends to have a larger RMSE in overlapping
draws, the larger variance can be accounted for in constructing confidence intervals.
31
Table 9: Mean Bias, overlapping draws
n, κ 0.2 0.5 0.8 1 2 5 10 20 50 10050 0.0636 0.0777 0.0782 0.077 0.0726 0.0639 0.0637 0.0592 0.0579 0.0575
0.0027 0.0015 0.00066 -0.00051 0.0010 0.00036 -5.4E-05 0.00048 -2.2E-05 0.00022
100 0.0323 0.033 0.0331 0.0312 0.0299 0.0279 0.0269 0.027 0.0256 0.02570.0009 0.0005 0.0004 0.0004 0.0006 0.0011 0.0005 0.0006 0.0007 0.0005
200 0.0129 0.0139 0.015 0.0147 0.0145 0.0142 0.014 0.0127 0.0131 0.0131-0.0009 -0.0011 -0.0008 -0.00065 -0.00078 -0.00067 -0.0009 -0.0009 -0.00104 -0.0011
400 0.0056 0.0047 0.0062 0.0051 0.0062 0.0062 0.0051 0.0052 0.0052 0.0052-0.00035 -0.0003 -0.0004 -0.0004 -0.00045 -0.00053 -0.00053 -0.00066 -0.00059 -0.00057
800 0.0023 0.0025 0.0023 0.0029 0.0029 0.0026 0.0024 0.0028 0.0025 0.0026-0.00023 -0.00033 -0.00023 -0.00038 -0.00034 -0.00031 -0.00039 -0.00040 -0.00049 -0.00045
The total number of Monte Carlo repetitions is 5000.
Table 10: Mean Bias, Independent draws
n, κ 0.2 0.5 0.8 1 2 5 10 20 50 10050 -.16800 .02191 .05585 .05852 .05531 .06072 .06648 .06469 .06100 .05951
.00202 .00049 .00000 -.00143 .00001 .00059 -.00016 -.00009 .00046 -.00002
100 -.06624 .00892 .01600 .02348 .03354 .03154 .02970 .02825 .02566 .02617-.00037 .00120 .00018 .00101 .00005 .00109 .00091 .00058 .00061 .00067
200 -.02114 .01151 .01541 .01802 .01637 .01534 .01412 .01387 .01352 .01381-.00156 -.00128 -.00135 -.00106 -.00110 -.00104 -.00089 -.00111 -.00096 -.00084
400 -.00027 .00542 .00358 .00488 .00515 .00538 .00487 .00527 .00567 .00506-.00094 -.00046 -.00049 -.00005 -.00059 -.00055 -.00066 -.00082 -.00050 -.00061
800 -.00361 .00095 .00253 .00220 .00267 .00242 .00309 .00313 .00267 .00284-.00099 -.00057 -.00033 -.00050 -.00040 -.00058 -.00055 -.00041 -.00040 -.00038
The total number of Monte Carlo repetitions is 5000.
32
Table 11: Root Mean Square Error, Overlapping draws
n, κ 0.2 0.5 0.8 1 2 5 10 20 50 10050 0.5407 0.5407 0.5407 0.5407 0.5407 0.5407 0.5407 0.5407 0.5407 0.5407
0.1577 0.1533 0.1489 0.1489 0.1398 0.1347 0.1327 0.1301 0.1287 0.1284
100 0.3832 0.2829 0.2523 0.2386 0.2092 0.1878 0.1787 0.1749 0.1718 0.17110.1024 0.0954 0.0923 0.0909 0.0879 0.0857 0.0844 0.0833 0.0828 0.0825
200 0.2632 0.1929 0.1690 0.1592 0.1397 0.1260 0.1204 0.1181 0.1153 0.11480.0660 0.0630 0.0618 0.0611 0.0598 0.0583 0.0578 0.0573 0.0569 0.0566
400 0.1851 0.1350 0.1175 0.1106 0.0972 0.0872 0.0844 0.0822 0.0814 0.08040.0449 0.0427 0.0413 0.0415 0.0404 0.0395 0.0391 0.0388 0.0386 0.0386
800 0.1316 0.0944 0.0819 0.0780 0.0670 0.0604 0.0579 0.0567 0.0562 0.05570.0305 0.0292 0.0289 0.0286 0.0280 0.0274 0.0272 0.0272 0.0270 0.0270
The total number of Monte Carlo repetitions is 5000.
Table 12: Root Mean Square Error, Independent draws
n, κ 0.2 0.5 0.8 1 2 5 10 20 50 10050 0.3204 0.2869 0.2987 0.3025 0.2864 0.2756 0.2712 0.2694 0.2661 0.2640
0.1384 0.1427 0.1418 0.1407 0.1390 0.1340 0.1323 0.1299 0.1293 0.1280
100 0.1862 0.1902 0.1931 0.1947 0.1851 0.1773 0.1753 0.1731 0.1708 0.17000.0936 0.0939 0.0924 0.0925 0.0886 0.0855 0.0841 0.0833 0.0824 0.0823
200 0.1407 0.1336 0.1293 0.1260 0.1221 0.1171 0.1163 0.1150 0.1148 0.11520.0680 0.0633 0.0624 0.0605 0.0592 0.0586 0.0573 0.0569 0.0567 0.0568
400 0.0971 0.0894 0.0853 0.0849 0.0821 0.0814 0.0814 0.0810 0.0804 0.08030.0456 0.0417 0.0413 0.0406 0.0395 0.0392 0.0391 0.0389 0.0387 0.0385
800 0.0629 0.0595 0.0587 0.0577 0.0568 0.0559 0.0556 0.0549 0.0551 0.05510.0309 0.0287 0.0285 0.0283 0.0280 0.0274 0.0273 0.0271 0.0271 0.0270
The total number of Monte Carlo repetitions is 5000.
33
7 Conclusion
We provide an asymptotic theory for simulated GMM and simulated MLE for nonsmooth
simulated objective function. The total number of simulations, R, has to increase without
bound but can be much smaller than the total number of observations. In this case, the
error in the parameter estimates is dominated by the simulation errors. This is a necessary
cost of inference when the simulation model is very intensive to compute.
References
Andrews, D. (1994): “Empirical Process Methods in Econometrics,” in Handbook of Economet-rics, Vol. 4, ed. by R. Engle, and D. McFadden, pp. 2248–2292. North Holland.
Chen, X., O. Linton, and I. Van Keilegom (2003): “Estimation of Semiparametric Modelswhen the Criterion Function Is Not Smooth,” Econometrica, 71(5), 1591–1608.
Chernozhukov, V., and H. Hong (2003): “A MCMC Approach to Classical Estimation,”Journal of Econometrics, 115(2), 293–346.
Duffie, D., and K. J. Singleton (1993): “Simulated Moments Estimation of Markov Modelsof Asset Prices,” Econometrica, 61(4), pp. 929–952.
Gallant, R., and G. Tauchen (1996): “Which Moments to Match,” Econometric Theory, 12,363–390.
Gourieroux, C., and A. Monfort (1996): Simulation-based econometric methods. Oxford Uni-versity Press, USA.
Ichimura, H., and S. Lee (2010): “Characterization of the asymptotic distribution of semipara-metric M-estimators,” Journal of Econometrics, 159(2), 252–266.
Kristensen, D., and B. Salanie (2010): “Higher order improvements for approximate estima-tors,” CAM Working Papers.
Laroque, G., and B. Salanie (1989): “Estimation of multi-market fix-price models: An ap-plication of pseudo maximum likelihood methods,” Econometrica: Journal of the EconometricSociety, pp. 831–860.
Laroque, G., and B. Salanie (1993): “Simulation-based estimation of models with lagged latentvariables,” Journal of Applied Econometrics, 8(S1), S119–S133.
Lee, D., and K. Song (2015): “Simulated MLE for Discrete Choices using Transformed SimulatedFrequencies,” Journal of Econometrics, 187, 131–153.
Lee, L. (1992): “On efficiency of methods of simulated moments and maximum simulated likeli-hood estimation of discrete response models,” Econometric Theory, 8, 518–552.
34
(1995): “Asymptotic bias in simulated maximum likelihood estimation of discrete choicemodels,” Econometric Theory, 11, 437–483.
Lerman, S., and C. Manski (1981): “On the Use of Simulated Frequencies to ApproximateChoice Probabilities,” in Structural Analysis of Discrete Data with Econometric Applications,ed. by C. Manski, and D. McFadden. MIT Press.
McFadden, D. (1989): “A Method of Simulated Moments for Estimation of Discrete ResponseModels without Numerical Integration,” Econometrica.
Neumeyer, N. (2004): “A central limit theorem for two-sample U-processes,” Statistics & Prob-ability Letters, 67, 73–85.
Newey, W., and D. McFadden (1994): “Large Sample Estimation and Hypothesis Testing,”in Handbook of Econometrics, Vol. 4, ed. by R. Engle, and D. McFadden, pp. 2113–2241. NorthHolland.
Nolan, D., and D. Pollard (1987): “U-processes:rates of convergence,” The Annals of Statis-tics, pp. 780–799.
(1988): “Functional limit theorems for U-processes,” The Annals of Probability, pp. 1291–1298.
Pakes, A., and D. Pollard (1989): “Simulation and the Asymptotics of Optimization Estima-tors,” Econometrica, 57, 1027–1057.
Pollard, D. (1984): Convergence of Stochastic Processes. Springer Verlag.
Sherman, R. P. (1993): “The limiting distribution of the maximum rank correlation estimator,”Econometrica, 61, 123–137.
Stern, S. (1992): “A method for smoothing simulated moments of discrete probabilities in multi-nomial probit models,” Econometrica, 60(4), 943–952.
Train, K. (2003): Discrete choice methods with simulation. Cambridge Univ Pr.
van der Vaart, A. (1999): Asymptotic Statistics. Cambridge University Press, Cambridge, UK.
Van der Vaart, A. W., and J. A. Wellner (1996): Weak convergence and empirical processes.Springer-Verlag, New York.
35