testing a parametric model against a … · 2013-08-17 · alternative with identification through...

TESTING A PARAMETRIC MODEL AGAINST A NONPARAMETRIC ALTERNATIVE WITH IDENTIFICATION THROUGH INSTRUMENTAL VARIABLES

by

Joel L. Horowitz Department of Economics Northwestern University

Evanston, IL 60208 USA

January 2004

ABSTRACT This paper is concerned with inference about a function g that is identified by a conditional moment restriction involving instrumental variables. The paper presents the first test of the hypothesis that g belongs to a finite-dimensional parametric family against a nonparametric alternative. The test does not require nonparametric estimation of g and is not subject to the ill-posed inverse problem of nonparametric instrumental variables estimation. Under mild conditions, the test is consistent against any alternative model. Moreover, it has power exceeding the probability of rejecting a correct null hypothesis uniformly over a class of alternatives whose distance from the null hypothesis is O n , where is the sample size. 1/ 2( − ) n Keywords: Hypothesis test, instrumental variables, specification testing, consistent testing _____________________________________________________________________________ Part of this research was carried out while I was a visitor at the Centre for Microdata Methods and Practice, University College London. I thank Richard Blundell for many helpful discussions and comments. Research supported in part by NSF Grant SES 9910925.

TESTING A PARAMETRIC MODEL AGAINST A NONPARAMETRIC ALTERNATIVE WITH IDENTIFICATION THROUGH INSTRUMENTAL VARIABLES

1. INTRODUCTION

Let Y be a scalar random variable, X and W be continuously distributed random

scalars or vectors, and g be a function that is identified by the relation

(1.1) . [ ( ) | ]Y g X W− =E 0

In (1.1), Y is the dependent variable, X is a possibly endogenous explanatory variable, and W

is an instrument for X . This paper presents, for the first time, a test of the null hypothesis that g

in (1.1) belongs to a finite-dimensional parametric family against a nonparametric alternative

hypothesis. Specifically, let be a compact subset of for some finite integer . The

null hypothesis, H

Θ d 0d >

0, is that

(1.2) ( ) ( , )g x G x θ=

for some θ ∈Θ and almost every x , where is a known function. The alternative hypothesis,

H

G

1, is that there is no θ ∈Θ such that (1.2) holds for almost every x . Under mild conditions, the

test presented here is consistent against any alternative model. Moreover, in large samples the

test has power exceeding the probability of rejecting a correct H0 uniformly over a class of

alternative models whose “distance” from H0 is , where is the sample size. 1/ 2(O n− ) n

There has been much recent interest in nonparametric estimation of g in (1.1). See, for

example, Newey, Powell and Vella (1999); Newey and Powell (2003); Darolles, Florens, and

Renault (2002); Blundell, Chen, and Kristensen, (2003); and Hall and Horowitz (2003).

However, methods for testing a parametric model of g against a nonparametric alternative do not

yet exist. This paper presents the first such test. In contrast, there is a large literature on testing a

parametric model of a conditional mean or quantile function against a nonparametric alternative.1

Testing is particularly important in (1.1) because it provides the only currently available

form of inference about g that does not require g to be known up to a finite-dimensional

parameter. Obtaining the asymptotic distribution of a nonparametric estimator of g is very

difficult, and no existing estimator has a known asymptotic distribution. Nor is there a currently

known method for obtaining a nonparametric confidence band for g . By contrast, the test

statistic described in this paper has a relatively simple asymptotic distribution, and

implementation of the test is not difficult.

1

The test developed here does not require nonparametric estimation of g and, therefore, is

not affected by the ill-posed inverse problem of nonparametric instrumental variables estimation.

Consequently, the “precision” of the test is greater than that of any nonparametric estimator of g .

The rate of convergence in probability of a nonparametric estimator of g is always slower than

and, depending on the details of the probability distribution of ( , may be

slower than

1/ 2(pO n− ) , , )Y X W

( )pO n ε− for any 0ε >

1/ 2( −

(Hall and Horowitz 2003). In contrast, the test described in

this paper can detect a large class of nonparametric alternative models whose distance from the

null-hypothesis model is . Nonparametric estimation and testing of conditional mean

and median functions is another setting in which the rate of testing is faster than the fastest

possible rate of estimation. See, for example, Guerre and Lavergne (2002) and Horowitz and

Spokoiny (2001, 2002).

)O n

Section 2 of this paper presents the test for the special case in which X and W are

scalars. The extension to multivariate X and W is in Section 3. Section 4 presents the results of

a Monte Carlo investigation of the finite-sample performance of the test, and Section 5 presents

an illustrative application of the test to real data. The proofs of theorems are in the appendix.

2. THE TEST WHEN X AND W ARE SCALARS

The assumption that X and W are scalars enables the main ideas of this paper to be

presented with a minimum of notational and technical complexity. Rewrite (1.1) as

(2.1) , ( ) ; ( | ) 0Y g X U U W= + =E

where . Assume that the support of ( )U Y g X= − ( , )X W is contained within the unit square.

This assumption can always be satisfied by carrying out a monotone transformation of ( , )X W .

The data, , are an independent random sample of ( . , , :i i iY X W 1,..., i n= , , )Y X W

2.1 The Test Statistic

To develop the test statistic, let XWf denote the probability density function of ( , )X W .

Define the operator T on by 2[0,1]L

, ( ) ( , ) ( )T z t x z x dxν ν= ∫where ν is any square integrable function and

( , ) ( , ) ( , )XW XWt x z f x w f z w dw= ∫ .

2

Assume that is nonsingular. Consider two functions T 1( , )G x θ and 2 ( , )G x θ to be equal if they

differ only on a set of x values with Lebesgue measure 0. Then H0 is equivalent to

(2.2) ( ) [ ( , )]( ) 0S z T g G zθ≡ − ⋅ =

for some θ ∈Θ and almost every [0,1]z∈ . H1 is equivalent to the statement that there is no θ

such that (2.2) holds. A test statistic can be based on a sample analog of

(2.3) . 1 20

( )S z dz∫ To form the analog, let ( )ˆ i

XWf − denote a leave-observation-i-out kernel estimator of XWf .

That is, for a kernel function and bandwidth K h

( )2

1

1ˆ ( , )n

i i iXW

jj i

x X w Wf x w K Kh hnh

−

=≠

− − =

∑

.

Let nθ be an estimator of θ that is consistent under H0. Then the sample analog of is ( )S z

(2.4) . ( )1/ 2

1

ˆˆ( ) [ ( , )] ( , )n

in i i n XW

iS z n Y G X f z Wθ −−

=

= −∑ i

The test statistic is

(2.5) . 1 20

( )n nS z dzτ = ∫0H is rejected if nτ is large.

2.2 Regularity Conditions

This section states the assumptions that are used to obtain the asymptotic properties of nτ

under the null and alternative hypotheses. The following additional notation is used. Let

1 1 2 2( , ) ( , )x w x w− denote the Euclidean distance between the points ( ,1 1)x w and ( ,2 2 )x w in

. Let 2[0,1] j XWD f denote any ’th partial or mixed partial derivative of j XWf . Set

. The assumptions are as follows. 0 ( , )XW XWD f x w f= (x, )w

1. (i) The support of ( , )X W is contained in [ . (ii) 20,1] ( , )X W has a probability density

function ( , )XWf x w

| ( , ) | f

with respect to Lebesgue measure. (iii) There is a constant such that fC < ∞

j XWD f x w C≤ for all and 20,1]( , ) [x w ∈ 0j ,1,2= . (iv) 2 1 2 2, ( ,x w x w−1 2) XWD f| (XWD f ) |

1 1 2( , ) ( ,w x 2 )wfC x≤ − for any second derivative and any 1 1)( ,x w and ( ,2 2 )x w 2,1] in [ . (v)

The operator T is nonsingular.

0

3

2. (i) and for each ( | ) 0U W w= =E 2( | ) UU W w C= ≤E [0,1]w∈ and some constant

. (ii) |UC < ∞ ( ) | gg x C≤ for some constant Cg < ∞ and all [0,1x ]∈ .

3. (i) As , n →∞ 0p

nθ θ→ for some 0θ ∈Θ , a compact subset of . If Hd0 is true,

then 0 )( ) ( ,g x G x θ= , 0 int( )θ ∈ Θ , and

(2.6) 1/ 2 1/ 20 0

1

ˆ( ) ( , , , )n

n i i ii

n n U X Wθ θ γ θ−

=

− = +∑ (1)po

for some function γ taking values in such that d0( , , , ) 0Y X Wγ θ =E and Va 0[ ( , , , )]r Y X Wγ θ

is a finite, non-singular matrix.

4. (i) | ( , ) | GG x Cθ ≤ for all [0,1]x∈ , all θ ∈Θ , and some constant . (ii) The

first and second derivatives of

GC < ∞

( ,G x )θ with respect to θ are bounded by C uniformly over

and

G

[0,1]x∈ θ ∈Θ .

5. (i) The kernel function, , is a symmetrical, twice continuously differentiable

probability density function on [ . (ii) The bandwidth, , satisfies , where is

a constant and .

K

1,1]− h 1/ 6hh c n−= hc

0 hc< < ∞

The representation (2.6) of 1/ 20

ˆ( nn )θ θ− holds, for example, if nθ is a generalized

method of moments estimator

2.3 The Asymptotic Distribution of the Test Statistic under the Null Hypothesis

To obtain the asymptotic distribution of nτ under H0, define G x( , ) ( , ) /G xθ θ θ θ= ∂ ∂ ,

0( ) [ ( , ) ( , )]XWz G X f z Wθ θΓ = E ,

1/ 20

1( ) [ ( , ) ( ) ( , , , )]

n

n i XW i i ii

B z n U f z W z U X Wiγ θ−

=

′= − Γ∑ ,

and V z . Define the operator 1 2 1 2( , ) [ ( ) ( )]n nz B z B z= E Ω on by 2[0,1]L

. 1

0( )( ) ( , ) ( )z V z x xψ ψΩ = ∫ dx

Let : 1,2,...j jω = denote the eigenvalues of Ω sorted so that 1 2 ... 0ω ω≥ ≥ ≥

n

. Let

denote independent random variables that are distributed as chi-square with one

degree of freedom. The following theorem gives the asymptotic distribution of

21 jχ : 1,2,...j =

τ under H0.

Theorem 1: Let H0 be true. Then under assumptions 1-5,

4

21

1

dn j

jjτ ω χ

∞

−

→ ∑ .

2.4 Obtaining the Critical Value

The statistic nτ is not asymptotically pivotal, so its asymptotic distribution cannot be

tabulated. This section presents a method for obtaining an approximate asymptotic critical value

for the nτ test. The method is based on replacing the asymptotic distribution of nτ with an

approximate distribution. The difference between the true and approximate distributions can be

made arbitrarily small under both the null hypothesis and alternatives. Moreover, the quantiles of

the approximate distribution can be estimated consistently as . Accordingly, the proposed

approximate

n →∞

1 α− critical value of the nτ test is a consistent estimator of the 1 α− quantile of

the approximate distribution.

The approximate critical value is obtained under sampling from a pseudo-true model that

coincides with (2.1) if H0 is true and satisfies a version of 0[ ( , ) | ]Y G X W 0θ− =E if H0 is false.

The critical value for the case of a false H0 is used later to establish the properties of nτ under H1.

The pseudo-true model is defined by

(2.7) , ( , )Y G X Uθ= +

where , 0[ ( , ) |Y Y Y G X Wθ= − −E ] 0( , )U Y G X θ= − , and 0θ is the probability limit of nθ .

This model coincides with (2.1) when H0 is true. Moreover, H0 holds for the pseudo-true model in

the sense that , regardless of whether H0[ ( , ) |Y G X Wθ−E ] 0= 0 holds for model (2.1).

To describe the approximation to the asymptotic distribution of nτ , let : 1,2,...j jω =

be the eigenvalues of the version of Ω (denoted Ω ) that is obtained by replacing model (2.1)

with model (2.7). Order the jω ’s such that 1 2 ... 0ω ω≥ ≥ ≥ . Then under sampling from (2.7),

nτ is asymptotically distributed as

21

1j j

jτ ω χ

∞

=

≡∑ .

Given any 0ε > , there is an integer Kε < ∞ such that

21

10 (

K

j jj

t tε

)ω χ τ=

< ≤ − ≤ ∑P P ε< .

uniformly over t . Define

5

21

1

K

j jj

ε

ετ ω χ=

=∑ .

Let zεα denote the 1 α− quantile of the distribution of ετ . Then 0 ( )zεατ α ε< > − <P . Thus,

using zεα to approximate the asymptotic 1 α− critical value of nτ creates an arbitrarily small

error in the probability that a correct null hypothesis is rejected. Similarly, use of the

approximation creates an arbitrarily small change in the power of the nτ test when the null

hypothesis is false. However, the distribution of ετ is unknown because the eigenvalues jω are

unknown. Accordingly, the approximate 1 α− critical value for the nτ test is a consistent

estimator of the 1 α− quantile of the distribution of ετ . Specifically, let ˆ jω ( 1,2,...,j K )ε= be a

consistent estimator of jω under sampling from (2.7). Then the approximate critical value of nτ

is the 1 α− quantile of the distribution of

21

1

ˆ ˆK

n jj

ε

τ ω χ=

=∑ . j

This quantile, which will be denoted zεα , can be estimated with arbitrary accuracy by

simulation.2

The remainder of this section describes how to obtain the estimated eigenvalues ˆ jω .

For this purpose, it is assumed that nθ satisfies the estimating equation

1

1

ˆ[ ( , )]n

i i i ni

n W Y G X θ−

=

− =∑ 0

i

]

,

where for some known function whose components are linearly

independent. For example, W might be a vector whose components are powers of W . By an

application of the delta method,

( )iW H W= : dH →

i i

10( , , , )U X W Q WUγ θ −= ,

where , ( )W H W= 0[ ( , )Q WG Xθ θ ′= E , and is assumed to be non-singular. Some algebra

now shows that

Q

(2.8) . 1 2 11 2 1 1 2 2( , ) [ ( , ) ( ) ] [ ( , ) ( ) ]XW XWV z z f z W z Q W U f z W z Q W− −′ ′= − Γ − ΓE

A consistent estimator of V z can be obtained by replacing the unknown quantities

on the right-hand side of (2.8) with estimators. Let

1 2( , )z

ˆXWf a kernel estimator of XWf with

bandwidth . Define h

6

1

1

ˆ ˆ( , )n

i ii

Q n W G Xθ nθ−

=

′= ∑

and

1

1

ˆ ˆˆ ( ) ( , ) ( , )n

XW i i ni

z n f z W G Xθ θ−

=

Γ = ∑ .

Let be the leave-observation i -out kernel estimator ( )ˆ ( )iq w−

( )

1

ˆˆ ( ) ( )[ ( , ]n

ih j j j

jj i

q w w W Y G Xκ θ−

=≠

= − −∑ n ,

where 1

1( )

nj k

h jkk i

w W w Ww W K Kh h

κ

−

=≠

− − − =

∑

and is the bandwidth. The U ’s are estimated by residuals of model (2.7). These are 1/ 5h n−∝ i

( )ˆˆ ˆ( , ) (ii i i n iU Y G X q Wθ −= − − )

1−

.

Then V z is estimated consistently by 1 2( , )z

1 1 21 2 1 1 2 2

1

ˆ ˆˆ ˆˆ ˆˆ ˆ( , ) [ ( , ) ( ) ] [ ( , ) ( ) ]n

XW i i i XW i ii

V z z n f z W z Q W U f z W z Q W− −

=

′ ′= − Γ −Γ∑ .

Define to be the integral operator whose kernel is V z . The Ω 1 2ˆ( , )z ˆ jω ’s are the eigenvalues of

. Ω

Theorem 2: Let assumptions 1-5 hold. Then as , (i) supn →∞ 1 ˆ| |j K j jεω ω≤ ≤ −

almost surely and (ii) 2 1/ 2[(log ) /( ) ]O n nh= ˆ pz zεα ε→ α .

To obtain an accurate numerical approximation to the ˆ jω ’s, let ˆ ( )F z denote the n 1×

vector whose i ’th component is ˆ ( , )XW if z W , Gθ denote the n d× matrix whose ( element is , )i j

ˆ( ,i nG Xθ )θ , denote the diagonal matrix whose ( element is U , and W denote the

matrix . Finally, define the matrix

ϒ

(

n×

′

n ,i i)

n

2ˆi

1n d× 1,..., )nW W′ ′ 1 ˆ ˆM I n= − Gθ− Q W− ′ , where is the

identity matrix. Then

nI

n n×

. 11 2 1 2

ˆ ˆ( , ) ( ) ( )V z z n F z M M F z− ′ ′= ϒ ˆ

7

The computation of the ˆ jω ’s can now be reduced to finding the eigenvalues of a finite-

dimensional matrix. To this end, let : 1,2,...j jφ = be an orthonormal basis for . Then 2[0,1]L

1 1

ˆ ˆ( , ) ( ) ( )XW jk j kj k

f z W d z Wφ φ∞ ∞

= =

=∑ ∑ ,

where 1 1

0 0ˆ ˆ ( , ) ( ) ( )jk XW j kd dx dwf x w xφ φ= ∫ ∫ w .

Approximate ˆXWf by the finite sum

1 1

ˆ ˆ( , ) ( ) ( )L L

XW jk j kj k

f z W d z Wφ φ= =

=∑ ∑

for some integer . Since L < ∞ ˆXWf is a known function, can be chosen to make L ˆ

XWf

approximate ˆXWf with any desired accuracy. Let ( )zφ denote the 1L× vector whose ’th

component is

j

( )zjφ . Let Φ be the L n× matrix whose component is ( , )j k ( )j kWφ . Let D be

the matrix . Then V z is approximated by L L× jkd 1 2( , zˆ )

11 2 1 2

ˆ ( , ) ( ) ( )V z z n z D M M D zφ φ− ′ ′ ′ ′= Φ ϒ Φ .

The eigenvalues of are approximated by those of the Ω L L× matrix D M M D′ ′ ′Φ ϒ Φ .

2.5 Consistency of the Test against a Fixed Alternative Model

In this section, it is assumed that H0 is false. That is, there is no θ ∈Θ such that

( ) ( , )g x G x θ= for almost every x . Let 0θ denote the probability limit of nθ . Define

0( , )G x( ) ( )q x g x θ= − . Let zα denote the 1 α− quantile of the distribution of nτ under

sampling from the pseudo-true model (2.7). Let zεα denote the 1 α− quantile of ˆnτ . The

following theorem establishes consistency of the nτ test against a fixed alternative hypothesis.

Theorem 3: Suppose that 1 20[( )( )] 0Tq z dz >∫

Let assumptions 1-5 hold. Then for any α such that 0 1α< < ,

(2.9) lim ( ) 1nn

zατ→∞

> =P

and

8

(2.10) . ˆlim ( ) 1nn

zατ→∞

> =P

Because T is nonsingular, the nτ test is consistent against any alternative that differs

from G x 0( , )θ on a set of x values whose Lebesgue measure exceeds zero.

2.6 Asymptotic Distribution under Local Alternatives

This section obtains the asymptotic distribution of nτ under the sequence of local

alternative hypotheses 1/ 2

0( ) ( , ) ( )ng x G x n xθ −= + ∆ ,

where is a bounded function on [ and ∆ 0,1] 0 int( )θ ∈ Θ . Under this sequence of local

alternatives, the data are generated by the model

(2.11) . 1/ 20( , ) ( )Y G X n X Uθ −= + ∆ +

The following additional notation is used to state the result. Let : 1,2,...j jψ = denote

the orthonormal eigenvectors of . Define Ω ( ) ( )( )z T zµ = ∆ and

1

0( ) ( )j jz z dµ µ ψ= ∫ z .

Let denote independent random variables that are distributed as non-central

chi-square with one degree of freedom and non-central parameters

21 ( ) : 1,2,...j j jχ µ =

jµ . The following theorem

states the result.

Theorem 4: Let assumptions 1-5 hold. Under the sequence of local alternatives (2.11),

21

1( )d

n j jj

jτ ω χ µ∞

−

→ ∑ ,

where the jω ’s are the eigenvalues of the operator Ω defined in (2.6).

Let zα denote the 1 α− quantile of the distribution of 211

(j j jj)ω χ µ∞

=∑ . Let zεα denote

the estimated approximate α -level critical value defined in Section 2.2. Then it follows from

Theorems 2 and 4 that for any 0ε > and all sufficiently large n ,

ˆlimsup ( ) ( ) |n nn

z zεα ατ τ ε→∞

> − > ≤| P P .

9

2.7 Uniform Consistency

This section shows that for any 0ε > , the nτ test rejects 0H with probability exceeding

1 ε−

(O n−

uniformly over a class of alternative models whose distance from the null hypothesis is

. The following additional notation is used. Let 1/ 2 ) gθ be the probability limit of nθ under

the hypothesis (not necessarily true) that ( ) ( ),g x G x θ= for some θ ∈Θ and a given function G .

Define ( ) ( ) ( , )g gq x g x G x θ= − . Let denote the bandwidth in h ( iXW

)f − . For each and

define as a set of functions

1n = ,2.,, ,

0C > ncF g such that: (i) | ( ) | gg x C≤ , for all and some

constant ; (ii)

[0,1]x∈

gC < ∞ gθ ∈Θ ; (iii) for each 0ε > there is a such that Mε < ∞

1/ 2 sup [0,1],ˆ( , )

ncx g nn G ( ,x )gG x Mεθ θ ε∈ ∈ F

− > <P ; (iv) 1/n−≥ 2

gTq C , where ⋅ denotes the

norm; and (v) 2L 2g gh q Tq/ (o 1) n= as . is a set of functions whose distance from →∞ ncF

0H shrinks to zero at the rate . That is, F includes functions such that 1/− 2n nc1/q O n− 2( )g = .

Condition (v) rules out alternatives that depend on x only through sequences of eigenvectors of

whose eigenvalues converge to 0 too rapidly. For example, let T , :j j 1,2,...jλ φ = denote the

eigenvalues and eigenvectors of ordered so that T 1 2 ... 0λ λ≥ ≥ > . Suppose that

1( )x( ,G x )θ θφ 1( ) ( )g x x= , ( )n xφ φ= + , and W . Then 1( )Wφ= 2 2 //g gq h nh q T λ= . Because

, condition (v) is violated if . The practical significance of condition (v)

is that the

1/ 6−h n∝ 1/ 3( − )n o nλ =

nτ test has relatively low power against alternatives that differ from the null hypothesis

only through eigenvectors of T with very small eigenvalues.

The following theorem states the result of this section.

Theorem 5: Let Assumptions 1, 2, 4, and 5 hold. Then given any 0ε > , any α such

that 0 1α< < , and any sufficiently large (but finite) , C

(2.12) lim inf ( ) 1nc

nn

zατ ε→∞

> ≥ −PF

and

(2.13) ˆlim inf ( ) 1 2nc

nn

zεατ ε→∞

> ≥ −PF

.

3. MULTIVARIATE GENERALIZATION

This section generalizes the nτ test to a multivariate version of (1.1) and (2.1) in which

some of the explanatory variables may be exogenous. The model is

10

(3.1) , ( , ) ; ( | , ) 0Y g X Z U U Z W= + E =

where Y and U are scalar random variables, X and W are random variables whose supports

are contained in [0,1]p ( 1p ≥ ), and Z is a random variable whose support is contained in [

( 0 ). If , then

0,1]r

r ≥ 0=r Z is not a variable of the model. In (3.1), X and Z , respectively, are

endogenous and exogenous explanatory variables. W is an instrument for X . The inferential

problem is to test the null hypothesis, H0, that

(3.2) ( , ) ( , , )g x z G x z θ=

for some unknown θ ∈Θ , known function G , and almost every ( , ) [0,1]p rx z +∈ . The alternative

hypothesis, H1 is that there is no θ ∈Θ such that (3.2) holds for almost every ( , ) [0,1]p rx z +∈ .

The data, , are a simple random sample of ( . , , 1,..., i iY X n=, :i iZ W i , , , )Y X Z W

3.1 The Test Statistic

To form the test statistic, let denote the probability density function of (XZWf , , )X Z W ,

and let Zf denote the probability density function of Z . Let ν be any function in 2[0,1]p rL + .

For each define the operator T on [0,1]r∈z z 2[0,1]pL by

( , ) ( , ) ( , )Z zT x z t x z dν ξ ν ξ= ∫ ξ ,

where for each ( , 21 2 ) [0,1] px x ∈ ,

1 2 1 2( , ) ( , , ) ( , , )z XZW XZWt x x f x z w f x z w dw= ∫ .

Assume that T is nonsingular for each . Then Hz [0,1]rz∈ 0 is equivalent to

(3.3) ( , ) [ ( , ) ( , , )]( , ) 0zS x z T g G x zθ≡ ⋅ ⋅ − ⋅ ⋅ =

for some θ ∈Θ and almost every ( , ) [0,1]p rx z +∈ . H1 is equivalent to the statement that there is

no θ ∈Θ such that (3.3) holds almost every ( , ) [0,1]p rx z +∈ . A test statistic can be based on a

sample analog of

, 2( , )S x z dxdz∫but the resulting rate of testing is slower than if . A rate of can be achieved,

though at the cost of uniform consistency over a smaller class of alternatives, by carrying out an

additional smoothing step. To this end, let denote the kernel of a nonsingular integral

operator, , on . That is, the operator defined by

1/ 2n−

1 2( ,z z

L

0r > 1/ 2n−

)

L 2[0,1]rL

11

( ) ( , ) ( )L z z dν ζ ν ζ ζ= ∫

is nonsingular. Define the operator MT on 2[0,1]p rL + by T x( , ) ( , )M zz LT x zν ν= . Then MT is

non-singular. H0 is equivalent to

(3.4) ( , ) [ ( , ) ( , , )]( , ) 0M MS x z T g G x zθ≡ ⋅ ⋅ − ⋅ ⋅ =

for some θ ∈Θ and almost every ( , ) [0,1]p rx z +∈ . H1 is equivalent to the statement that there is

no θ ∈Θ such that (3.5) holds. The test statistic is based on a sample analog of

2( , )MS x z dxd∫ z

i

.

To form the analog, let denote a leave-observation-i-out kernel estimator of .

That is, for V X and a kernel function of a

( )ˆ iXZWf −

κ

XZWf

( , , )i i iZ W≡ 2 p r+ -dimensional argument,

( )2

1

1ˆ ( )n

i iXZW p r

jj i

v Vf vhnh

κ−+

=≠

− =

∑ ,

where is the bandwidth. Let h nθ be an estimator of θ . The sample analog of is ( , )MS x z

(3.5) . ( )1/ 2

1

ˆˆ( , ) [ ( , , )] ( , , ) ( , )n

iMn i i i n i i iXZW

iS x z n Y G X Z f x Z W Z zθ −−

=

= −∑

The test statistic is

(3.6) 2 ( , )Mn MnS x z dxdzτ = ∫H0 is rejected if Mnτ is large.

3.2 Regularity Conditions

This section states the assumptions that are used to obtain the asymptotic properties of

Mnτ under the null and alternative hypotheses. Let 1 1 1 2 2 2( , , ) ( , , )x z w x z w− denote the

Euclidean distance between 1 1 1( , , )x z w and 2 2 2( , , )x z w .

M1. (i) The support of ( , , )X Z W is contained in [0 2,1] p r+ . (ii) ( , , )X Z W has a

probability density function with respect to Lebesgue measure. (iii) There is a constant

such that |

XZWf

( ,ZWfC < ∞ , ) | fj XD f x z w C≤ for all 2) [0,1]( , , p rx z w +∈ and . (iv) 0,1,2j =

2 1 1| ( , ,x z w −1)XZWD f 2 2( ,W x 2 2, ) |z w 1( ,fC x z≤ 1 1 2, ) (w x2 2, , )z w−XZD f for any second

12

derivative and any 1 1 1( , , )x z w and 2 2 2( , , )x z w in [0 2,1] p r+ . (v) The operator T is nonsingular

for almost every .

z

(

[0,1]r∈

( | ,U Z z

z

E ) 0w = E 2( |U Z z

UC < ∞ ( , gg x z C≤ gC

n →∞ 0pθ θ 0θ ∈Θ d

0, , )z(x θ 0 )θ Θ

( , ,n iU X1/ 2( )i

n nθ θ− = , ) (i iW θ +

γ d , ,Z W

0 )],θ

( , , ) |G x z θ ≤ ( , ) ,1]p rx z +

( , , )z θ

]p r+ θ ∈Θ

0 hc< <

XZWf

L

=

1/(n−

j v K

,1]

Mnτ nτ

Mnτ = ∂

0, , ) XZW , ) ( , )]X Z f x Z W Z zθ

( , , )ZW i ix Z W1/ 2

1[

n

in 0,)z , ) (iZ z θ−

=∑= −

M2. (i) and for each W= = , ) UW w C= = ≤ , ) [0,1]p rz w +∈

and some constant . (ii) | ) | for some constant and all < ∞

( , ) [0,1]p rx z +∈ .

M3. (i) As , n → for some , a compact subset of . If H0 is true,

then ( , )g x z G= , int(∈ , and

1/ 20 0

1

ˆ ,n

i Zγ−

=∑ 1)po

for some function taking values in such that 0( , , ) 0U Xγ θ =E and

[ ( , , ,Var U X Z Wγ is a finite, non-singular matrix.

M4. (i) | GC for all [0∈ , all θ ∈Θ , and some constant CG < ∞ .

(ii) The first and second derivatives of G x with respect to θ are bounded by C uniformly

over (

G

, ) [0,1x z ∈ and .

M5. (i) The kernel function used to estimate has the form ,

where is the ’th component of and is a symmetrical, twice continuously differentiable

probability density function on [ . (ii) The bandwidth, , satisfies , where

is a constant and . (iii) The operator is nonsingular.

21

( ) ( )p rjj

v Kκ+

=∏

2 4)p rhh c + +=

jv

1−

∞

h

hc

v

3.2 Asymptotic Distribution under the Null Hypothesis

The asymptotic distributional properties of are similar to those of . To state the

asymptotic distribution of under H0, define G x( , , ) ( , , ) /z G x zθ θ θ θ∂ ,

( , ) [ ( ( ,x z GθΓ = E ,

( , ( , ) ( , , , )]Mn i X i i i iB x U f x z U X Z Wγ′Γ ,

and

1 1 2 2 1 1 2 2( , ; , ) [ ( , ) ( , )]M Mn MV x z x z B x z B x z= E n .

Define the operator MΩ on 2[0,1]q rL + by

13

(3.7) 1

0( )( , ) ( , ; , ) ( , )M Mx z V x z d dν ξ ζ ν ξ ζ ξ ζΩ = ∫ .

Let , : 1,2,...Mj Mj jω ψ = denote the eigenvalues and orthonormal eigenvectors of MΩ sorted so

that 1 2 ... 0M Mω ω≥ ≥ ≥ . Let denote independent random variables that are

distributed as chi-square with one degree of freedom. The asymptotic distribution of

21 : 1,2,...j jχ =

Mnτ under

H0 is given by the following theorem.

Theorem 6: If 0H is true and assumptions M1-M5 hold, then

21

1

dMn M

jj jτ ω χ

∞

−

→ ∑ .

To obtain an approximate critical value for the Mnτ test, define the pseudo-true model

(3.8) , ( , , )Y G X Z Uθ= +

where , U Y0[ ( , , ) | ,Y Y Y G X Z Z Wθ= − −E ] 0( , , )G X Z θ= − , and 0θ is the probability limit of

nθ . Let : 1,2,...Mj jω = be the eigenvalues of the version of MΩ that is obtained by replacing

model (3.1) with model (3.8). It follows from Theorem 6 that under sampling from (3.8), Mnτ is

asymptotically distributed as

21

1M Mj j

jτ ω χ

∞

=

≡∑ .

Let Mz α denote the 1 α− quantile of this distribution. The method for approximating this

quantile in an application is similar to the method proposed for nτ . Given any 0ε > , there is an

integer such that Kε < ∞

21

10 (

K

Mj j Mj

t tε

)ω χ τ=

< ≤ − ∑P P ε ≤ <

uniformly over t . Define

21

1

K

M Mj jj

ε

ετ ω χ=

=∑ .

Let Mz εα denote the 1 α− quantile of the distribution of Mετ . Then using Mz εα to approximate

the asymptotic 1 α− critical value of Mnτ creates an arbitrarily small error in the probability that

a correct null hypothesis is rejected. The proposed approximate 1 α− critical value for the Mnτ

test is a consistent estimator of the 1 α− quantile of the distribution of Mετ . Specifically, let

14

ˆMjω ( 1,2,...,j )Kε= be the estimator of Mjω under sampling from (3.8) that is described below.

Then the approximate critical value of nτ , ˆMz εα , is the 1 α− quantile of the distribution of

1

ˆ ˆK

Mn Mj

ε

τ=

=∑

( ) ,W ]i Zi′′ H

dim≡

ˆ arg miθ∈Θ

, ,Zθ θ )]i W [n Y Gi′

nA c

d

d cθ× Mγ =

( , , ,i i iU X Zγ

1 1 2( , ;

i

[

, ,

1( ,

(

1( ,

( ,)

M XZ

i

f W

iZ W

iZ z

xi

xV x z x

M ( ,GθW X ˆMγ =

) ( ,i i1ˆ ( , ) ˆ) XZWn (x z n−Γ =

ˆXZWf

( )ˆ ( , )iMq z w−

f

XZW

Z

ˆf

( , )Z W M

1 1 2( , ; 1( ,

)

1, )

ˆ ( ,

,

(, ,

M XZW

i

f i

i

Z iZ z

x

= −

i

x

Z W

V x z x

21jω χ . j

The estimator of Mjω is the multivariate generalization of the estimator ˆ jω for the

bivariate model (2.1). Define W H[i ′= , where is a known, vector-valued function

whose components are linearly independent, and c H rθ + ≥ . Assume that nθ is the

GMM estimator

d

1 1n [ ( ( , , )]

n n

n i i i i ii i

Y G X A X Z W= =

= − − ∑ ∑ , iθ

where is a sequence of possibly stochastic cθ θ× weight matrices converging in probability

to a non-stochastic limit matrix . Define the A cθ × matrix and the 0[ ( , , )D WG X Zθ θ ′= E

matrix . Then standard calculations for GMM estimators show that 1( )D AD D−′ A

]

′

0, )i M iW W Uθ γ= .

Therefore,

1 22 1 1

1

2 2 2 2

, ) , ) ) ( , ) ]

[ ( , ) ) ] .

n

i i M i ii

XZW M i

z n Z W x z W U

f x Z z z W

γ

γ

−

=

′= − Γ

′× − Γ

∑E

To estimate V , define 11

ˆˆ , )ni ni

D n Z θ−=

′= ∑ , 1ˆ ˆ ˆ( )n nD A D D A−′ ′ , and

1( , , , , )n

i i iiG X Z x Z W zθ θ

=∑ ,

where is a kernel estimator of . Also define U Y ,

where is the leave-observation i -out kernel regression estimator of Y G

( )ˆ ˆ( , , ) ( ,ii i i i n i iMG X Z q Z Wθ −= − −

ˆ( , , nX Z

)

)θ−

on . Then V x is estimated consistently by 1 1 2 2( , ; , )z x z

1 22 1 1

1

2 2 2 2

ˆˆ ˆˆ ˆ, ) [ ) ( ( , ) ]

ˆ ˆ[ ( , ) ) ] .

n

i M i ii

XZW M i

z n W x z W U

f x Z z z W

γ

γ

−

=

′Γ

′× − Γ

∑

15

Let ˆMΩ be the integral operator whose kernel is V x . Then 1 1 2 2

ˆ ( , ; , )M z x z ˆMjω is the ’th

eigenvalue of

j

ˆMΩ . The multivariate analog of Theorem 2 is:

Theorem 7: Let assumptions M1-M5 hold. Then as , (i) n →∞ 1 ˆsup | |j K Mj Mjεω ω≤ ≤ − =

almost surely and (ii) 2 1/ 2[(log ) /( ) ]p rO n nh + ˆ pM Mz zεα ε→ α .

3.3 Consistency against a Fixed Alternative Model

Suppose that H0 is false, meaning that there is no θ ∈Θ such that ( , ) ( , , )g x z G x z θ= for

almost every ( , )x z . Define 0( , ) ( , ) ( , ,q x z g x z G x z )θ= − . The following theorem establishes

consistency of the Mnτ test against a fixed alternative hypothesis.

Theorem 8: Let assumptions M1-M5 hold. Suppose that H0 is false and that

. Then for any 2[( )( , )] 0MT q x z dxdz >∫ α such that 0 1α< < ,

lim ( ) 1Mn Mn

z ατ→∞

> =P .

and

. ˆlim ( ) 1Mn Mn

z εατ→∞

> =P

Because MT is nonsingular, the Mnτ test is consistent against any alternative that differs

from G x 0( , , )z θ on a set of ( , )x z values whose Lebesgue measure exceeds zero.

3.4 Asymptotic Distribution under Local Alternatives

This section obtains the asymptotic distribution of Mnτ under the sequence of local

alternative hypotheses 1/ 2

0( , ) ( , , ) ( , )ng X Z G X Z n X Zθ −= + ∆ ,

where is a bounded function on [∆ 0,1]p r+ . Under this sequence of local alternatives, the data

are generated by the model

(3.9) . 1/ 20( , , ) ( , )Y G X Z n X Z Uθ −= + ∆ +

dz

Define

( )( , ) ( , )Mj M MjT x z x z dxµ ψ= ∆∫ .

16

Let denote independent random variables that are distributed as non-

central chi-square with one degree of freedom and non-central parameters

21 ( ) : 1,2,...j Mj jχ µ =

Mjµ . The following

theorem states the result.

Theorem 9: Let assumptions M1-M5 hold. Under the sequence of alternatives (3.9),

21

1( )d

Mn Mj jj

τ ω χ∞

−

→ ∑ Mjµ .

Let zα denote the 1 α− quantile of the distribution of 211

( )Mj j Mjjω χ µ∞

=∑ . Let ˆMz εα

denote the estimated approximate α -level critical value pf Mnτ . Then it follows from Theorems

7 and 9 that for any 0ε > ,

ˆlimsup ( ) ( ) |Mn M Mn Mn

z zεα ατ τ ε→∞

> − >| P P ≤ .

3.5 Uniform Consistency

The multivariate version of is denoted ncF MncF , and is defined as follows. As before,

let gθ be the probability limit of nθ under the hypothesis that ( , ) ( , , )g x z G x z θ= for some

θ ∈Θ and a given function G . Define ( , ) ) ( ,( , , )Mg g x gzq x z z G x θ= − . For each and

define

1n = ,2.,,,

0>C MncF as a set of functions g such that: (i) | ( , ) | gg x z C≤ for all ( , ) [0,1]p rx z +∈ and

some constant ; (ii) g < ∞C gθ ∈Θ ; (iii) for each 0ε > there is a such that Mε < ∞

1/ 2, [0,1]x z ,p r

Mncgˆ, ,supn G( ) ( , , )n gx z G Mx z εθ θ ε+∈ ∈

− > <P F ; (iv) 1/ 2M MgT q n C−≥ ; and (v)

2 / (o= 1) nMgh q M MgqT as . The following theorem states the multivariate uniform

consistency result.

→∞

Theorem 10: Let assumptions M1, M2, M4, and M5 hold. Then given any 0ε > , α

such that 0 1α< < , and any sufficiently large but finite constant , C

lim inf ( ) 1Mnc

Mn Mn

z ατ ε→∞

> ≥ −PF

.

and

ˆlim inf ( ) 1 2Mnc

Mn Mn

z εατ ε→∞

> ≥ −PF

.

17

4. MONTE CARLO EXPERIMENTS

This section reports the results of a Monte Carlo investigation of the finite-sample

performance of the nτ test. The experiments consist of testing the hypothesis that

(4.1) 0 1( )g x xθ θ= + .

The alternative hypotheses are that g is quadratic,

(4.2) 20 1 2( )g x x xθ θ θ= + +

and ( )g x is cubic,

(4.3) . 2 30 1 2 3( )g x x x xθ θ θ θ= + + +

To provide a basis for judging whether the power of the nτ test is high or low, this

section also reports the results of an asymptotic t test of the hypothesis 2 0θ = . The t test is an

example of an ad hoc test that might be used in applied research. In all experiments, 0 0θ = and

1 0.5θ = . In experiments where (4.2) is the correct model, 2 0.5θ = − . In experiments where

(4.3) is the correct model, 2 1θ = − and 3 1θ = . Realizations of ( , )X W were generated by

(X )ξ= Φ , ( )W ζ= Φ , where Φ is the cumulative normal distribution function, ~ (0N ,1)ζ ,

2 1/ 2)(1ξ ρζ ρ ε−= + , (0,1)Nε ∼ , and ρ is a constant parameter whose value varies among

experiments. Realizations of Y were generated from ( )Y g Ux Uσ= + , where

2 1/ 2)(1U ηε η− ν N= + , (0,1)ν ∼ , 0.2Uσ = , and η is a constant parameter whose value varies

among experiments. The instruments used to estimate (4.1), (4.2), and (4.3), respectively, are

, , and . The bandwidth used to estimate (1, )W 2, )W W(1, 2, ,W W W 3(1, ) h XWf was selected by

cross-validation. The kernel is 2 2)( )K v (15/16)( (|v I1 | 1)v= − ≤

Kε

, where is the indicator

function. The asymptotic critical value was estimated by setting

I

25= . The results of the

experiments are not sensitive to the choice of Kε . The experiments use a sample size of 500n =

and the nominal 0.05 level. There are 1000 Monte Carlo replications in each experiment.

The results of the experiments are shown in Table 1. The differences between the

nominal and empirical rejection probabilities are small when 0H is true (model (4.1)). When

0H is false, the powers of the nτ and tests are similar. Not surprisingly, the test is

somewhat more powerful than the

t t

nτ test under model (4.2). The nτ test is slightly more

powerful under model (4.3).

18

5. AN EMPIRICAL EXAMPLE

This section presents an empirical example in which nτ is used to test two hypotheses

about the shape of an Engle curve. One hypothesis is that the curve is linear, and the other is that

the curve is quadratic. The curve is given by (2.1). Y denotes the logarithm of the expenditure

share of food consumed off the premises of the establishment where it was purchased, X denotes

the logarithm of total expenditures, and W denotes annual income from wages and salaries. The

data consist of 785 household-level observations from the 1996 U.S. Consumer Expenditure

Survey. The bandwidth for estimating XWf was selected by cross-validation. The kernel is the

same as the one used in the Monte Carlo experiments. As in the experiments, the critical value of

nτ was estimated by setting . 25Kε =

The nτ test of the hypothesis that g is linear (quadratic) gives 13.4nτ = (0.32) with a

0.05-level critical value of 3.07 (5.22). Thus, the test rejects the hypothesis that g is linear but

not the hypothesis that g is quadratic. The hypotheses were also tested using the t test

described in the section on Monte Carlo experiments. This test gives t 2.60= for the hypothesis

that g is linear ( 2 0θ = in (4.2)) and t 0.34= for the hypothesis that g is quadratic ( 3 0θ = in

(4.3)). The 0.05-level critical value is 1.96. Thus, the test, like the t nτ test, rejects the

hypothesis that g is linear but not the hypothesis that it is quadratic.

MATHEMATICAL APPENDIX: PROOFS OF THEOREMS

A.1 Model (2.1)

Define

1/ 21

1( ) ( , )

n

n i XWi

S z n U f z W−

=

= ∑ i

XW i

XW i

i

W i

,

1/ 22 0

1( ) [ ( ) ( , )] ( , )

n

n i ii

S z n g X G X f z Wθ−

=

= −∑ ,

1/ 23 0

1

ˆ( ) [ ( , ) ( , ))] ( , )n

n i i ni

S z n G X G X f z Wθ θ−

=

= −∑ ,

( )1/ 24

1

ˆ( ) [ ( , ) ( , )]n

in i i XWXW

iS z n U f z W f z W−−

=

= −∑ ,

( )1/ 25 0

1

ˆ( ) [ ( ) ( , )][ ( , ) ( , )]n

in i i i XXW

iS z n g X G X f z W f z Wθ −−

=

= − −∑ ,

and

19

( )1/ 26 0

1

ˆˆ( ) [ ( , ) ( , ))][ ( , ) ( , )]n

in i i n i XXW

iS z n G X G X f z W f z Wθ θ −−

=

= − −∑ W i

j

.

Then 6

1( ) ( )n n

jS z S z

=

=∑ .

Lemma 1: As , n →∞

1/ 23 0

1/ 20

1

ˆ( ) ( ) ( ) (1)

( ) ( , , , ) (1).

n n p

n

i i i pi

S z z n o

z n U X W o

θ θ

γ θ−

=

′= −Γ − +

′= −Γ +∑

uniformly over . [0,1]z∈

Proof: A Taylor series expansion gives

1 13 0

1

ˆ( ) ( , ) ( , ) ( )n

n i n XW ii

S z n G X f z W nθ/ 2

nθ θ θ−

=

= − −∑ ,

where nθ is between nθ and 0θ . Application of Jennrich’s (1969) uniform law of large numbers

gives the first result of the lemma. The second result follows from the first by applying

Assumption 3. Q.E.D.

Lemma 2: As , |n →∞ ( ) 1/ 2 2ˆ ( , ) / ( , ) / | [(log ) /( ) ]iXWXWf z w z f z w z o n n h h−∂ ∂ − ∂ ∂ =

20,1]

+ almost

surely uniformly over . ( , )z w ∈[

Proof: This is a modified version of Theorem 2.2(2) of Bosq (1996) and is proved the

same way as that theorem. Q.E.D.

Lemma 3: As , n →∞ 4( ) (1)n pS z o= uniformly over [0,1]z∈ .

Proof: Let be a partition of [ into m intervals of length 1 . For each

, choose a point . Define

1,..., mI I 0,1]

( ) ( ,XW

/ m

)1,...,j = m jjz I∈ ( )) ( , ) ( ,i iXWXW

ˆf x w x w f x w∆ = −f− − . Then for any

0ε > ,

20

( )1/ 24

1 1

( )1/ 2

1 1

( ) ( )1/ 2

1 1

41 42

( ) ( ) ( , )

( ) ( , )

( )[ ( , ) ( ,

( ) ( ).

m ni

n i j iXWj i

m ni

i j j iXWj i

m ni i

i j i jXW XWj i

n n

S z n U I z I f z W

n U I z I f z W

n U I z I f z W f z

S z S z

−−

= =

−−

= =

− −−

= =

= ∈ ∆

= ∈ ∆

+ ∈ ∆ − ∆

≡ +

∑ ∑

∑ ∑

∑ ∑ )]iW

j−

|j i

j

∂

∈

A Taylor series expansion gives

( )1/ 242

1 1( ) ( )[ ( , ) / ]( )

m ni

n i j j iXWj i

S z n U I z I f z W z z z−−

= =

= ∈ ∂∆ ∂∑ ∑ ,

where is between and . Therefore, it follows from Lemma 2 that jz jz z

( )1/ 2 142

1 1

1/ 2 1 1/ 2 2

1 1

2 1/ 2

| ( ) | | | ( ) | ( , ) /

[(log ) /( ) ] | | ( )

[(log ) /( ) / ]

m ni

n i j XWj i

m n

p ij i

p

S z n m U I z I f z W z

n m o n n h h U I z I

O n mh n h m

−− −

= =

− −

= =

≤ ∈ ∂∆

≤ +

= +

∑ ∑

∑ ∑

uniformly over . In addition, for any [0,1]z∈ 0ε > ,

( )1/ 241

1[0,1] 1

( )1/ 2

1 1

sup | ( ) | max ( , )

( , ) .

ni

n i XWj mz i

m ni

i j iXWj i

S z n U f z W

n U f z W

j iε ε

ε

−−

≤ ≤∈ =

−−

= =

> = ∆ >

≤ ∆ >

∑

∑ ∑

P P

P

But , and standard calculations for kernel estimators show that ( )[ ( , )]ii j iXWU f z W−∆E 0=

+( )1/ 2 2 1 4

1( , ) [( ) ]

ni

i iXWi

Var n U f z W O nh h−− −

=

∆ =

∑

for any . Therefore, it follows from Chebyshev’s inequality that [0,1]z∈

21

( )1/ 2 2 2 4 2

1 1( , ) [ /( ) / ]

m ni

i j iXWj i

n U f z W O m nh mhε ε ε−−

= =

∆ > = +

∑ ∑P ,

which implies that

2 2 4 241

[0,1]sup | ( ) | [ /( ) / ]n

zS z O m nh mhε ε ε

∈

> = +

P .

The lemma now follows by choosing so that as , where C is a constant

such that 0 . Q.E.D.

m 1/ 23n m C− → n →∞ 3

3C< < ∞

Lemma 4: As , n →∞ 6 ( ) (1)n pS z o= uniformly over [0,1]z∈ .

Proof: A Taylor series expansion gives

( )1 16 0

1

ˆ ˆ( ) ( , )[ ( , ) ( , )] ( )n

in i n i XW iXW

iS z n G X f z W f z W nθ

/ 2nθ θ θ−−

=

= −∑ − ,

where nθ is between nθ and 0θ . The result follows from boundedness of Gθ ,

, and [ ( almost surely

uniformly over . Q.E.D.

1/ 20

ˆ( )nn O− =

[0z∈

(1)p2 1/ 2 ]θ θ ( ) 2ˆ ) ( , )] [ (log ) /( )i

i XW iXWf z W f z W O h n nh− − = +,

,1]

Lemma 5: Under 0H , ( ) ( ) (1)n n pS z B z o= + uniformly over [0,1]z∈ .

Proof: Under 0H , 2 5( ) ( ) 0n nS z S z= = for all . Now apply Lemmas 1, 2, and 4.

Q.E.D.

z

Proof of Theorem 1:

Under 0H , S z uniformly over ( ) ( ) (1)n n pB z o= + [0,1]z∈ by Lemma 5. Under

assumptions 1-4, Ω is a completely continuous operator, so its eigenvectors form a complete,

orthonormal basis for . Therefore, 2[0,1]L ( )nB z has the Fourier representation

1( ) ( )n j

jjB z b ψ

∞

=

=∑ z

dz

p

,

where 1

0( ) ( )j n jb B z zψ= ∫ .

It follows that . Therefore, it suffices to find the asymptotic distribution of

.

21

(1)n jjb oτ ∞

== +∑

21n jjbν ∞

=≡∑

22

To do this, observe that 2( )jb jω=E and n Cνν ≤E for some Cν < ∞ . Therefore, for any

0ε > and t , there is a ( , )∈ −∞ ∞ Kε < ∞ such that | |1 jj Kε

t ω ε∞

= +<∑ . Define .

For each and , , and

21

KnK jj

bεν=

≡∑j k bE 0j =

1 11 2 1 2 1 20 0

1 11 2 1 2 1 20 0

11 1 10

( ) ( ) ( ) ( )

( , ) ( ) ( )

( )( )( )

,

j k n n j k

j k

j k

j jk

b b dz dz B z B z z z

dz dz V z z z z

z z dz

ψ ψ

ψ ψ

ψ ψ

ω δ

=

=

= Ω

=

∫ ∫

∫ ∫

∫

E E

where 1jkδ = if and 0 otherwise. It follows from the Lindeberg-Levy theorem that the

’s are asymptotically independent

j k=

jb (0, )jN ω variates, and the random variables b2 /j jω

( 0jω ≠ ) are independently chi-square distributed with one degree of freedom. Consequently,

2 21

1

Kd

nK j j Kj

ε

ν ω χ η=

→ ≡∑ .

Moreover,

(A1) | [exp( ) exp( )] |nK Kt tι ν ι η− <E ε

for all sufficiently large , where n 1ι = − .

Next, use the inequality 1 |teι |t− ≤ to obtain

1

(A2) | [exp( ) exp( ) | | | | |

| |

.

n nK n

jj K

t t t

tε

nKι ν ι ν ν ν

ω

ε

∞

= +

− ≤ −

=

<

∑

E E

Define 211 j jj

η ω χ∞

==∑ . Then

23

21

1

(A3) [exp( ) exp( ) | | | | |

| |

.

K K

j jj K

t t t

tε

ι η ι η η η

ω χ

ε

∞

= +

− ≤ −

=

<

∑

| E E

E

Now combine (A1)-(A3) to obtain | [exp( ) exp( )] |nt tι ν ι η ε− <E . Thus, the characteristic

functions of nν and η can be made arbitrarily close by making sufficiently large, which

proves the theorem. Q.E.D.

n

Proof of Theorem 2: ( )ˆˆ j j Oω ω| |− = Ω −Ω by Theorem 5.1a of Bhatia, Davis, and

McIntosh (1983). Moreover, standard calculations for kernel density estimators show that 2 1/ 2ˆ [(log ) /( ) ]O n nhΩ−Ω = . Part (i) of the theorem follows by combining these two results.

Part (ii) is an immediate consequence of part (i) and Theorem 1. Q.E.D.

Proof of Theorem 3: Let zα denote the 1 α− quantile of the distribution of

211 j jj

ω χ∞

=∑ . It suffices to show that when 1H holds, then under sampling from Y g , ( )X U= +

. lim ( ) 1nn

zατ→∞

> =P

This will be done by proving that 11 20

plim [( )( )] 0nn

n Tq z dzτ−

→∞= >∫ .

To do this, observe that by Jennrich’s (1969) uniform law of large numbers,

uniformly over 1/ 22 ( ) ( )( ) (1)nn S z Tq z o− = + p [0,1]z∈ . Moreover, 1

5 ( ) ( log )nS z o h n−= =

( , ) [(log ) /( )f z w o n n− =

a.s. uniformly over because

a.s. uniformly over . Combining these results with Lemma 5 yields

1/ 6( log )o n n

[0,1]z∈

[0,1]z∈ ( ) 2 1/ 2( , ) ]XWf z w hˆ iXW−

1/ 2 1/ 2( ) ( ) ( )( ) (1)n nn S z n B z Tq z o− −= + + p

dz

.

A further application of Jennrich’s (1969) uniform law of large numbers shows that

, so n . Q.E.D. 1/ 2 ( ) ( )( )pnn S z Tq z− →

11 20[( )( )]p

n Tq zτ− → ∫Proof of Theorem 4: Arguments like those leading to lemma 5 show that

2 5( ) ( ) ( ) ( ) (1)n n n n pS z B z S z S z o= + + +

uniformly over . Moreover, [0,1]z∈

24

( )15

1

2 1/ 2

ˆ( ) ( )[ ( , ) ( , )]

[(log ) /( ) ]

ni

n i i XWXWi

S z n X f z W f z W

O n nh

−−

=

= ∆ −

=

∑ i

W i

almost surely uniformly over . In addition z

12

1( ) ( ) ( , )

( ) (1)

n

n i Xi

S z n X f z W

z oµ

−

=

= ∆

= +

∑

almost surely uniformly over . Therefore, z ( ) ( ) ( ) (1)n n pS z B z z oµ= + + uniformly over . But z

1( ) ( ) ( )n j

jjB z z bµ ψ

∞

=

+ =∑ z

j

,

where j jb b µ= + and is defined as in the proof of Theorem 1. Moreover, the b ’s are

asymptotically distributed as independent

jb j

( , )j jN µ ω variates. Now proceed as in the proof of

Theorem 1. Q.E.D.

Proof of Theorem 5: Define 3 6 2 5( ) ( ) ( ) [ ( ) ( )]n n n n nD z S z S z S z S z= + + +E and

. Then ( ) ( ) ( )n n nS z S z D z= −2

n n nS Dτ = + . Use the inequality

(A5) 2 20.5 ( )a b b≥ − − 2a

with and to obtain na S= nb D=

22( ) 0.5n n nz D S zα ατ > ≥ − >

P P .

For any finite , 0M >

( )

2 22 2

2 22

22

0.5 0.5 ,

0.5 ,

0.5 .

n n n n n

n n n

n n

D S z D z S S M

D z S S M

2

D z M S M

α α

α

α

− ≤ = ≤ + ≤

+ ≤ + >

≤ ≤ + +

P P

P

P P

>

nS is bounded in probability uniformly over ncg∈F . Therefore, for each 0ε > there is

such that for all Mε < ∞ M Mε>

25

( )22 20.5 .5n n nD S z D z Mα α ε − ≤ ≤ ≤ + + P P .

Equivalently,

( )22 20.5 .5n n nD S z D z Mα α ε − > ≥ > + − P P

and

(A6) ( )2( ) .5n nz D z Mα ατ ε> ≥ > + −P P .

Now

( )1/ 22 5

1

ˆ( ) ( )] [ ( ) ( , )] ( , )n

in n i i g XW

iS z S z n g X G X f z Wθ −−

=

+ = −∑ i .

Therefore,

1/ 2 22 5

1[ ( ) ( )] [ ( ) ( , )][ ( , ) ( )]

n

n n i i g XW i ni

S z S z n g X G X f z W h R zθ−

=

+ = − +∑E E ,

where ( )nR z is nonstochastic, does not depend on g , and is bounded uniformly over .

It follows that

[0,1]z∈

1/ 2 1/ 2 22 5[ ( ) ( )] ( )( )n nS z S z n Tq z O n h q + = + E

and

1/ 22 5[ ( ) ( )] 0.5 ( )( )n nS z S z n Tq z+ ≥E

uniformly over for all sufficiently large . ncg∈F n

Now

( )1/ 2 13 6

[0,1], 1

ˆˆ( ) ( ) sup ( , ) ( , ) ( , )nc

ni

n n n g XWx g i

S z S z n G x G x n f z Wθ θ −−

∈ ∈ =

+ ≤ − ∑F

i .

Therefore, it follows from the definition F and uniform convergence of nc( )ˆ iXWf − to XWf that

3 5 (1)n n pS S O+ = uniformly over ncg∈F . A further application of (A5) with and

gives

( )n za D=

2[ ( )nb S z S= +E 5 ( )n z ]

(A7) 2 2.125 (1)n pD n Tq O≥ +

uniformly over as . The theorem follows by substituting (A7) into (A6) and

choosing C to be sufficiently large. Q.E.D.

ncg∈F n →∞

A.2 Model (3.1)

26

Proofs of Theorems 6-10

The proofs are identical to the proofs of Theorems 1-5 after replacing quantities for

model (2.1) with the analogous quantities for model (3.1). Q.E.D.

27

FOOTNOTES

0

1 Tests of a parametric model of a conditional mean or quantile function against a nonparametric

alternative include Aït-Sahalia, et al. (1994), Bierens (1982, 1990), Bierens and Ginther (2000),

Bierens and Ploberger (1997), de Jong (1996), Eubank and Spiegelman (1990), Fan (1996), Fan

and Huang (2001), Fan and Li (1996), Gozalo (1993), Guerre and Lavergne (2002), Härdle and

Mammen (1993), Hart (1997), Hong and White (1995), Horowitz and Spokoiny (2001, 2002),

Stute (1997), Li and Wang (1998), Whang and Andrews (1993), Wooldridge (1992), Yatchew

(1992), and Zheng (1996, 1998). 2 At the cost of additional analytic complexity, it may be possible to let and as

, thereby obtaining a consistent estimator of the asymptotic critical value of τ . However,

doing this would likely provide little insight into the accuracy of the estimator or the choice of

in applications. This is because the difference between the distributions of and and,

therefore, the approximation error are complicated functions of the multiplicities and spacings of

the ’s. Letting and has no practical consequences because and

with any finite sample.

ε → Kε →∞

n →∞ n

Kε τ ετ

jω 0ε → Kε →∞ 0ε >

Kε < ∞

28

REFERENCES

Aït-Sahalia, Y., P.J. Bickel, and T.M. Stoker (2001). Goodness-of-Fit Tests for Kernel Regression with an Application to Option Implied Volatilities, Journal of Econometrics, 105, 363-412.

Andrews, D.W.K. (1997). A Conditional Kolmogorov Test, Econometrica, 65, 1097-1128. Bhatia, R., C. Davis, and A. McIntosh (1983). Perturbation of Spectral Subspaces and Solution

of Linear Operator Equations, Linear Algebra and Its Applications, 52/53, 45-67. Bierens, H.J. (1982). Consistent Model Specification Tests, Journal of Econometrics, 20, 105-

134. Bierens, H.J. (1990). A Consistent Conditional Moment Test of Functional Form, Econometrica,

58, 1443-1458. Bierens, H.J. and D.K. Ginther (2000). Integrated Conditional Moment Testing of Quantile

Regression Models, working paper, Department of Economics, Pennsylvania State University. Bierens, H.J. and W. Ploberger (1997). Asymptotic Theory of Integrated Conditional Moment

Tests, Econometrica, 65, 1129-1151. Blundell, R., X. Chen and D. Kristensen (2003). Semi-Nonparametric IV Estimation of Shape

Invariant Engle Curves, working paper CWP 15/03, Centre for Microdata Methods and Practice, University College London.

Bosq, D. (1996). Nonparametric Statistics for Stochastic Processes, New York: Springer. Darolles, S., J.-P. Florens, and E. Renault (2002). Nonparametric Instrumental Regression,

working paper, GREMAQ, University of Social Science, Toulouse. de Jong, R.M. (1996). On the Bierens Test under Data Dependence, Journal of Econometrics, 72,

1-32. Eubank, R.L. and C.H. Spiegelman (1990). Testing the Goodness of Fit of a Linear Model via

Nonparametric Regression Techniques, Journal of the American Statistical Association, 85, 387-392.

Fan, J. (1996). Test of Significance Based on Wavelet Thresholding and Neyman’s Truncation,

Journal of the American Statistical Association, 91, 674-688. Fan, J. and L.-S. Huang (2001). Goodness of Fit Tests for Parametric Regression Models,

Journal of the American Statistical Association, 96, 640-652. Fan, Y. and Q. Li (1996). Consistent Model Specification Tests: Omitted Variables and

Semiparametric Functional Forms, Econometrica, 64, 865-890. Gozalo, P.L. (1993). A Consistent Model Specification Test for Nonparametric Estimation of

Regression Function Models, Econometric Theory, 9, 451-477.

29

Guerre, E. and P. Lavergne (2002). Optimal Minimax Rates for Nonparametric Specification Testing in Regression Models, Econometric Theory, 18, 1139-1171.

Hall, P. and J.L. Horowitz (2003). Nonparametric Methods for Inference in the Presence of

Instrumental Variables, working paper, Department of Economics, Northwestern University. Härdle, W. and E. Mammen (1993). Comparing Nonparametric Versus Parametric Regression

Fits, Annals of Statistics, 21, 1926-1947. Hart, J.D. (1997). Nonparametric Smoothing and Lack-of-Fit Tests. New York: Springer-

Verlag. Hong, Y. and H. White (1996). Consistent Specification Testing via Nonparametric Series

Regressions, Econometrica, 63, 1133-1160. Horowitz, J.L. and V.G. Spokoiny (2001). An Adaptive, Rate-Optimal Test of a Parametric

Mean Regression Model against a Nonparametric Alternative, Econometrica, 69, 599-631. Horowitz, J.L. and V.G.Spokoiny (2002). An Adaptive, Rate-Optimal Test of Linearity for

Median Regression Models, Journal of the American Statistical Association, 97, 822-835. Jennrich, R.I. (1969). Asymptotic Properties of Non-Linear Least Squares Estimators, Annals of

Mathematical Statistics, 40, 633-643. Li, Q. and S. Wang (1998). A Simple Consistent Bootstrap Test for a Parametric Regression

Function, Journal of Econometrics, 87, 145-165. Newey, W.K. and J.L. Powell (2003). Instrumental Variable Estimation of Nonparametric

Models, Econometrica, 71, 1565-1578. Newey, W.K., J.L. Powell, and F. Vella (1999). Nonparametric Estimation of Triangular

Simultaneous Equations Models, Econometrica, 67, 565-603. Stute, W. (1997). Nonparametric Model Checks for Regression, Annals of Statistics, 25, 613-641. Whang, Y.-J. and D.W.K. Andrews (1993). Tests of Specification for Parametric and

Semiparametric Models, Journal of Econometrics, 57, 277-318. Wooldridge, J.M. (1992). A Test for Functional Form against Nonparametric Alternatives,

Econometric Theory, 8, 452-475. Yatchew, A.J. (1992). Nonparametric Regression Tests Based on Least Squares, Econometric

Theory, 8, 435-451. Zheng, J.X. (1996). A Consistent Test of Functional Form via Nonparametric Estimation

Techniques, Journal of Econometrics, 75, 263-289. Zheng, J.X. (1998). A Consistent Nonparametric Test of Parametric Regression Models under

Conditional Quantile Restrictions, Econometric Theory, 14, 123-138.

30

Table 1: Results of Monte Carlo Experiments

Empirical Probability that H0 Is Rejected Using Model ρ η nτ test t_____________________ ________________

H0 is true

(11) 0.8 0.1 0.051 0.052 0.8 0.5 0.030 0.034 0.7 0.1 0.049 0.052

H0 is false

(12) 0.8 0.1 0.658 0.714 0.8 0.5 0.721 0.827 0.7 0.1 0.421 0.444

(13) 0.8 0.1 0.684 0.671 0.8 0.5 0.663 0.580 0.7 0.1 0.424 0.412

31

testing a parametric model against a … · 2013-08-17 · alternative with identification through...

Documents