the delta method for nonparametric kernel ...yacine/delta.pdfthe delta method for nonparametric...

27
THE DELTA METHOD FOR NONPARAMETRIC KERNEL FUNCTIONALS Yacine AÔt-Sahalia 1 Graduate School of Business University of Chicago 1101 E 58th St Chicago, IL 60637 January 1992 Revised, August 1994 1 This paper is part of my Ph.D. dissertation at the MIT Department of Economics. I am very indebted to Richard Dudley, Jerry Hausman and Whitney Newey for helpful suggestions and advice. The comments of Peter Robinson and three anonymous referees have considerably improved this paper. I am also grateful to seminar participants at Berkeley, British Columbia, Carnegie-Mellon, Chicago, Columbia, Harvard, Northwestern, Michigan, MIT, Princeton, Stanford, Washington University, Wharton, Yale and the Yale- NSF Conference on Asymptotics for Infinite Dimensional Parameters as well as Emmanuel Guerre, Pascal Massart and Jens Praestgaard for stimulating conversations. Part of this research was conducted while I was visiting CREST-INSEE in Paris whose hospitality is gratefully acknowledged. Financial support from Ecole Polytechnique and MIT Fellowships is also gratefully acknowledged. All errors are mine.

Upload: others

Post on 20-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • THE DELTA METHOD

    FOR NONPARAMETRIC KERNEL FUNCTIONALS

    Yacine AÔt-Sahalia1Graduate School of Business

    University of Chicago1101 E 58th St

    Chicago, IL 60637

    January 1992Revised, August 1994

    1 This paper is part of my Ph.D. dissertation at the MIT Department of Economics. I am very indebted to

    Richard Dudley, Jerry Hausman and Whitney Newey for helpful suggestions and advice. The comments of

    Peter Robinson and three anonymous referees have considerably improved this paper. I am also grateful to

    seminar participants at Berkeley, British Columbia, Carnegie-Mellon, Chicago, Columbia, Harvard,

    Northwestern, Michigan, MIT, Princeton, Stanford, Washington University, Wharton, Yale and the Yale-

    NSF Conference on Asymptotics for Infinite Dimensional Parameters as well as Emmanuel Guerre, Pascal

    Massart and Jens Praestgaard for stimulating conversations. Part of this research was conducted while I was

    visiting CREST-INSEE in Paris whose hospitality is gratefully acknowledged. Financial support from

    Ecole Polytechnique and MIT Fellowships is also gratefully acknowledged. All errors are mine.

  • The Delta Methods for Nonparametric Kernel Functionals

    by Yacine AÔt-Sahalia

    Abstract

    This paper provides under weak conditions a generalized delta method for

    functionals of nonparametric kernel estimators, based on (possibly) dependent and

    (possibly) multivariate data. Generalized derivatives are allowed to permit the inclusion of

    virtually any functional, global or pointwise, explicitly or implicitly defined. It is shown

    that forming the estimator with dependent data modifies the asymptotic distribution only if

    the functional is more irregular than some threshold level. Variance estimators and rates of

    convergence are derived. Many examples are provided.

    Keywords: Delta Method, Nonparametric Kernel Estimation, Dependent Data, Rate of

    Convergence, Functional Differentiation, Generalized Functions.

  • 1

    1. Introduction

    The delta method is a simple and widely used tool to derive the asymptotic

    distribution of nonlinear functionals of an estimator. Most parametric estimators converge

    at the familiar root-n case, and so does the functional. When the estimator is

    nonparametric, however, some functionals will converge at a rate slower than root-n while

    others will retain the root-n rate. The slower than root-n functionals require some form of

    smoothing to be estimated, the most popular being the kernel method. A delta method has

    long been available for the class of root-n functionals that can be estimated without

    smoothing (von Mises [1947], Reeds [1976], Huber [1981], Dudley [1990]).

    In all cases, the essence of the delta method is a first order Taylor expansion of the

    functional. The problem is that the slower-than-root-n functionals are not differentiable in

    the usual sense. Therefore the examples of slower-than-root-n functionals studied in the

    literature have been tackled without the systematic "plug-and-play" feature that made the

    delta method attractive in the settings where it was available. This paper proposes a simple

    delta method that covers also slower-than-root-n functionals, and under conditions that

    equal and often relax those used in the previous "case-by-case" work. To address the

    problem of non-differentiability, the paper allows generalized functions as functional

    derivatives. An example of a generalized function is the Dirac delta function, and its

    derivatives. With generalized functions, the familiar delta method approach based on

    differentiating the functional is shown to be easily implemented for non-trivial examples.

    The results are valid even when the data are serially correlated, with independent data as a

    special case.

    The main contribution of the paper is to show how to linearize systematically

    slower-than-root-n functionals, and then to provide a general yet simple result yielding

    their asymptotic distribution. The purpose of the examples is two-fold: first, some classical

  • 2

    examples (regression function, etc.) are included to show how the method of this paper

    significantly beats the previous approaches; second, new distributions are derived for cases

    where they were not previously available (dependent censored least absolute deviation,

    quantiles, mode, stochastic differential equations, etc.). The paper is organized as follows:

    Section 2 derives the generalized delta method. Consistent estimators of the asymptotic

    variances are proposed in Section 3. Section 4 discusses the rates of convergence of the

    estimators. Section 5 illustrates the application of the result through many examples.

    Section 6 concludes. Proofs are in the Appendix.

    2. The Delta Method with Generalized Derivatives

    2.1 Assumptions

    Consider Rd-valued random variables X1, X2,...,Xn identically distributed as f(.),

    an unknown density function with associated cumulative density function F x( ) ≡ f t( )dt−∞

    x

    ∫where x x x xd≡ ( )1 2, , ,K . The following regularity conditions are imposed:

    Assumption A1: The sequence Xi{ } is a strictly stationary β-mixing sequence satisfying:

    k k kδ β →∞ → 0 for some fixed δ > 1.

    βk k= ∀ ≥0 1, corresponds to the independence case. As long as βk k→∞ → 0,

    the sequence is said to be absolutely regular.

    Assumption A2: The density function f(.) is continuously differentiable on Rd up to

    order s. Its successive derivatives are bounded and in L Rd2( ) .

    Let Cs be the space of density functions satisfying A2. To estimate the density

    function f(.), a Parzen-Rosenblatt kernel function K(.) will be used. The kernel will be

    required to satisfy:

  • 3

    Assumption A3: (i) K is an even function integrating to one;

    (ii) The kernel is of order r=s, an even integer:

    1 1 1 0

    2 0

    3

    1 1

    1

    1

    1

    ) / , , , ( ) ;

    ) / ( ) ;

    ) | ( ) | .

    ∀ ∈ ≡ + + ∈ −{ } =

    ∃ ∈ = ≠

    < + ∞

    −∞

    +∞

    −∞

    +∞

    −∞

    +∞

    λ λ λ λ

    λ λ

    λ λ

    λ λ

    N r x x K x dx

    N r and x x K x dx

    x K x dx

    dd d

    dd

    r

    d

    d

    K K L

    L

    (iii) K is continuously differentiable up to order s+d on Rd, and its

    derivatives of order up to s are in L2(Rd).

    The last assumption indicates how the bandwidth hn in the kernel density estimator

    should be chosen. The statement of the assumption depends upon an exponent parameter

    e>0 and an integer m, 0≤m

  • 4

    where Φ(1)[G](.) is a continuous linear (in H) functional and L(2,m) is the sum of the L2

    norm of the all derivatives of H up to order m. If this holds uniformly on H in any compact

    subset K of Cs, and Φ( ),

    [ ]( )12

    G H C K HL s

    ≤ ( ) ( ), then Φ is said to be L(2,m)-

    Hadamard-differentiable at F. In what follows it will always implicitly be assumed that the

    linear term Φ(1)[F](.) is not degenerate. If it were then the asymptotic distribution would be

    given by a term of higher order in the Taylor expansion.

    By the Riesz Representation Theorem (see e.g., Schwartz [1966]), there exists a

    distribution ϕ F R Rd[ ] : a such that Φ( )1 F H F x dH x[ ]( ) = [ ]( ) ( )

    −∞

    +∞

    ∫ ϕ . Call ϕ F[ ] ⋅( ) the

    functional derivative2 of Φ at F. The standard delta method is applicable only if ϕ F[ ] ⋅( ) is a

    regular function, i.e., at least cadlag (right-continuous, left-limit). For some functionals Φ,

    the functional derivative will indeed be a regular function. For example, let

    Φ F f x dx[ ] ≡−∞

    +∞

    ∫ ( )2 . Then:

    Φ F H f x h x dx f x dx f x h x dx h x dx+[ ] = ( ) + ( ){ } = ( ) + ( ) ( ) + ( )−∞

    +∞

    −∞

    +∞

    −∞

    +∞

    −∞

    +∞

    ∫ ∫ ∫ ∫2 2 22

    so R F H h x dxΦ[ , ] = ( )−∞

    +∞

    ∫ 2 and Φ( )1 2F H F x dH x f x h x dx[ ]( ) = [ ]( ) ( ) = ( ) ( )−∞

    +∞

    −∞

    +∞

    ∫ ∫ϕ . Thus its

    functional derivative is: ϕ F f[ ] ⋅( ) = ⋅( )2 , a function in Cs.

    Unfortunately, many functionals of interest in econometrics do not have "regular"

    functional derivatives, that is ϕ F[ ] ⋅( ) will not be a cadlag function. Instead, it will be a

    2 Although ϕ[F] is not unique, the respective asymptotic distributions given by Theorem 3 are independent

    of the choice of ϕ [F]. One way to make the representation unique would be to impose that

    ϕ F x dF x[ ]( ) ( )−∞

    +∞

    ∫ = 0 . Any ϕ[F] given by the Riesz Theorem will be called "the" derivative, even though

    "a" derivative would be more appropriate.

  • 5

    generalized function. The existing delta method cannot treat such functionals. The main

    point of the paper is that the same familiar delta approach will work provided that one

    includes generalized functions as functional derivatives. This method turns out to be very

    simple as well as powerful. Many examples will be provided below. To get the flavor of

    the result immediately, consider the hazard rate function Φ F f yF y

    [ ] ≡−

    ( )( )1

    evaluated at some

    y. It will be shown below that it is differentiable in the extended sense of this paper, with

    functional derivative: ϕ δ[ ]( )( )

    ( )( )

    ( )( )( )

    F xF y

    xF y

    F yyy x= −

    +−[ ] ( )

    ≥11 1

    11

    2 . This functional

    derivative is a linear combination of a Dirac mass at y (a generalized function) and an

    indicator function (a regular function). The asymptotic distribution of the kernel estimator

    of the hazard rate will be driven by the Dirac term in the linear expansion, only the most

    unsmooth term counts.

    2.3. Generalized Functions

    The concept of generalized function, or distribution, was formally introduced by

    Schwartz [1954,1966]. Simply put, any function g, DEFINED AS???? no matter how

    unsmooth, can be differentiated. Its derivative g(1) is defined by its cross-products against

    smooth functions f: g x f x dx g x f x dx1 11( )−∞

    +∞ ( )−∞

    +∞( ) ( ) ≡ −( ) ( ) ( )∫ ∫ in the univariate case,

    where f(1) is a standard derivative. Of course, by integration by part, this reduces to the

    common definition of differentiability if g turns out to be a regular function.

    For example, a Dirac function at 0 is defined by: δ 0 0( )−∞+∞

    ( ) ( ) = ( )∫ x f x dx f , and itsderivative is given by δ δ0

    10

    1 1 0( )−∞+∞

    ( )( )

    −∞

    +∞ ( )( ) ( ) = − ( ) ( ) = − ( )∫ ∫( ) x f x dx x f x dx f . Successivedifferentiation of the Dirac function is possible up to the number of derivatives that the

    functions f admit, to yield δ δ0 01 1 0( )−∞+∞

    ( )( )

    −∞

    +∞ ( )( ) ( ) = −( ) ( ) ( ) = −( ) ( )∫ ∫( )q q q q qx f x dx x f x dx f .Besides Dirac functions, many other generalized functions can be constructed; see

    Schwartz [1954,1966] or Zemanian [1965] for examples.

  • 6

    This paper allows functional derivatives ϕ F[ ] ⋅( ) to be generalized functions. I will

    define an increasing sequence of spaces of generalized functions. Each space will contain

    functions of a given "level of unsmoothness." It will be first shown that the asymptotic

    distribution of the plug-in depends on the particular space containing ϕ F[ ] ⋅( ) and that the

    more unsmooth ϕ F[ ] ⋅( ) is the slower the rate of convergence. Furthermore when ϕ F[ ] ⋅( ) is

    more unsmooth than cadlag (i.e., when it is a generalized function instead of a regular

    function) it will be shown that constructing the estimator on correlated data does not affect

    its asymptotic variance.

    Start by defining the space C-1 of bounded cadlag functions from [0,1]d to R. C-1

    contains all the usual spaces C0, C1, etc. of continuous, continuously once-differentiable

    functions, etc. The regular functions are the elements of C-1. Now define C-2 to be the

    space of linear combinations of Dirac functions and functions of C-1, C-3 to be the space of

    linear combinations of derivatives of Dirac functions and functions of C-2, etc. When

    ϕ F[ ] ⋅( ) belongs to the generalized function space C-q, q 2, but not to the space

    immediately smaller C-q+1, write ϕ F C Cq q[ ] ∈ − − +\ 1. q can readily be interpreted as an

    "order of unsmoothness" of ϕ F[ ] ⋅( ). Moving up the following scale, the functions become

    more unsmooth, and conversely:

  • 7

    C

    C

    C

    C

    C

    0

    - 3

    - 1

    - 2

    1

    DIFFERENTIATE

    INTEGRATE

    GENERAL I ZED

    FUNCT I ONS

    FUNCT I ONS

    REGULAR

    II.4. The Generalized Delta Method

    The result is stated in dimension d=1. An extension to the multivariate case is

    provided in the Appendix. ϕ F C Cq q[ ] ∈ − − +\ 1 has the form:

    ϕ α δF x F x x B F x

    L

    yq[ ]( ) = [ ]( ) ( ) + [ ]( )

    =( )

    −( )∑ ll

    l

    1

    2 where each yl is a fixed point, αl F C[ ] ⋅( ) ∈−1

    and B F C q[ ] ⋅( ) ∈ − +1 (see Section 5 for examples).

    The following delta method characterizes the asymptotic distribution of the plug-in

    functional as a function of the particular space C-q where the functional derivative ϕ F[ ] ⋅( )

    lies:

  • 8

    Theorem: Suppose that Φ is L(2,m)-Hadamard-differentiable at the

    true cdf F with functional derivative ϕ F[ ] ⋅( ). Then under A1-A3:

    (i) If ϕ[ ]F C∈ −1 , then under A4(r,m):

    n F F N V Fnd1 2 0Φ Φ Φ( ˆ ) ( ) ,−{ } → [ ]( ) with asymptotic variance:

    V F VAR F x COV F x F xtk

    t t kΦ[ ] = ( )( ) + ( ) ( )( )=

    +∞

    +∑ϕ ϕ ϕ[ ] [ ] , [ ]21

    ( i i ) If ϕ[ ] \F C Cq q∈ − − +1 for some q [2,s], then under A4(r+1/2,

    m): h n F F N V Fnq

    nd2 3 2 1 2 0−( ) −{ } → [ ]( )/ / ( ˆ ) ( ) ,Φ Φ Φ with asymptotic variance:

    V F K x dx F yqL

    Φ[ ] = ( )

    [ ]( ){ }( )−∞

    +∞

    =∫ ∑

    2 2

    1

    αl ll

    . The asymptotic variance is the

    same whether or not the data are serially dependent.

    When the functional derivative is a regular function (that is ϕ[ ]F C∈ −1), the result

    does not depend on how smooth it is beyond being cadlag. On the other hand, when it is a

    generalized function, the result depends on the exact degree of unsmoothness of the

    functional derivative (belonging to the space C q− , but not the one immediately smoother,

    C q− +1). Many (but not all, e.g. the integrated squared density) of the functionals with

    regular derivatives can be estimated without smoothing. In that case, the result is exactly

    the same whether a kernel or empirical cdf is plugged-in, and there is no reason to smooth.

    For functionals with unsmooth derivatives however, smoothing is essential, as the

    plug-in cannot even be defined at the empirical cdf. And the asymptotic distribution is

    driven exclusively by the "most unsmooth" component of the functional derivative: the

    smoother component B F C q[ ] ⋅( ) ∈ − +1 of ϕ F C Cq q[ ] ∈ − − +\ 1 does not appear in the

    asymptotic variance. Such a functional (asymptotically) behaves essentially like a linear

    combination of the density or its derivatives that are not integrated upon. When

    ϕ[ ] \F C Cq q∈ − − +1, it can also be noted that not only the asymptotic variance contains no

  • 9

    time-series term, but also has no cross-covariances across the L terms in the functional

    derivative. For example, it is known that the kernel density evaluated at a point y1 and at a

    different point y2 are asymptotically uncorellated.

    This brings the following remark. The slower-than-root-n functionals have a

    "local" character, such as the density evaluated at a point, or the mode of the density

    function. Consider for example the density function f(.) and the local functional

    Φy F f y[ ] ≡ ( ) (real-valued) as opposed to the global functional Φ F f[ ] ≡ ⋅( ) (Cs -valued).

    Drawing from the experience of root-n functionals, it may be tempting to try to obtain weak

    convergence to a Gaussian process of Φ F f[ ] ≡ ⋅( ). Unfortunately, no such result holds for

    slower-than-root-n functionals. Indeed if it existed a limiting process for the normalized

    kernel density estimator, this process would have to take independent values W(t) and

    W(s) for every t … s.

    The delta method derived here has an intuitive duality interpretation. The asymptotic

    distribution of an unsmooth functional Φ is driven by the inner product ϕ[ ]( ) ( )F x dH x−∞

    +∞

    ∫ .

    When ϕ[F] is a generalized function (in C-q, q 2) then H must belong to C+q-1. Therefore

    one needs to have a sufficiently regular nonparametric estimator and unknown cdf. to plug-

    in as H = F̂n − F . This is the role played by the kernel smoothing. If one uses the empirical

    distribution Fn instead of the KCDF F̂n then H F Fn= − will be in C-1 only, and therefore

    the only functionals that can be plugged-into must have derivatives in C-1.

    3. Consistent Estimation of the Asymptotic Variances

    The asymptotic variances given by the delta method can be consistently estimated in

    each case:

  • 10

    (i) If ϕ[F] C-1, then under A4(r,m) and the technical regularity condition A5

    (given in the Appendix, and designed to guarantee that the truncated sum in the variance

    estimator will effectively approximate the infinite sum) the asymptotic variance V FΦ[ ] can

    be consistently estimated by:

    ˆ [ ˆ ]( ) [ ˆ ]( )

    [ ˆ ]( ) [ ˆ ]( ) [ ˆ ]( )

    Vn

    F xn

    F x

    G k nF x F x

    nF x

    n n ii

    n

    n ii

    n

    nk

    G

    n i n i k n ii

    n

    i

    n kn

    ≡ −

    ++ −

    = =

    =+

    ==

    ∑ ∑

    ∑ ∑∑

    1 1

    21

    1 1

    2

    1 1

    2

    1 1

    2

    1

    ϕ ϕ

    ϕ ϕ ϕ

    where Gn is a truncation lag chosen such that limn

    nG→∞= +∞ and G O nn = ( )1 3/ . This is an

    estimator of the spectral density at zero (see Newey-West [1987], Robinson [1989,1991]).

    The choice of the truncation lag Gn and the Bartlett kernel is subject to the same provisions

    and can be improved upon as in Andrews [1991] in the parametric case.

    (ii) If ϕ[F] C-q \ C-q+1, then under A4(r+1/2, m) the asymptotic variance V FΦ[ ]

    can be consistently estimated by:

    ˆ ˆV K x dx F ynq

    n

    L

    ≡ ( )

    [ ]( ){ }( )−∞

    +∞

    =∫ ∑

    2 2

    1

    αl ll

    .

    The appropriate estimate of the asymptotic variance makes it possible to construct

    confidence intervals on Φ F̂n[ ] and carry out tests of general hypotheses regarding F̂n . Forexample, to test the hypothesis H F0 0: Φ[ ] = versus H F1 0: Φ[ ] ≠ one could simply use

    the following Wald-type test statistics:

    Wn ≡ λ(n)Φ(F̂n )' V̂n−1 Φ(F̂n )

    under H 0

    d → χ[1]2 , where λ( )( )

    ( )n

    n in i

    n h in iinq≡

    −( )2 3 .

    4. Rates of Convergence

    The speed of decrease of the bandwidth to zero as the sample size increases is

    constrained by A4. The bandwidth can be chosen within the bounds allowed by A4 in

  • 11

    order to generate the fastest possible rate of convergence β (the speed of convergence being

    n−β ).

    (i) If ϕ[F] C-1, then the plug-in will converge at rate β=1/2. The root-n rate is

    achieved by kernel plug-ins under A3 no matter how hn is chosen within A4, and will

    produce an asymptotic distribution centered at zero.

    (ii) If ϕ[F] C-q \ C-q+1 for some q [2,s], then the rate of convergence is at best

    β = − −( ) +( )r q r( )2 2 1 . It can be achieved by kernel plug-ins under A3 when choosinghn of the order n

    −α , with α = +1 2 1( )r . The resulting asymptotic distribution of the plug-

    in will not be centered at zero. For any ε>0, the rate of convergence β−ε however can be

    achieved with a resulting asymptotic distribution centered at zero by choosing hn of the

    order n−α , with α ε= + + −( ){ } ( ) 1 2 1 1 2 3/ /r q . This choice is admissible under

    A4(r+1/2,m). Given the optimal rates of Stone [1980] and Goldstein and Messer [1992], it

    therefore turns out that the kernel-type estimators can achieve the optimal rate (but if one

    insists on getting the optimal rate then the limiting distribution is not centered at zero).

    5. Examples and Applications

    Classical examples as well as new distributions are provided in this section to both

    show how the method can very easily yield classical results and provide new results.

    Example 1: Ordinary Least Squares

    The following trivial example illustrates the method in a very simple case,

    recovering the asymptotic distribution of classical parametric estimators. Consider a simple

    linear model: y x E xt t t t t= + [ ] =β ε ε, | 0 . Although at first sight a quintessentiallyparametric model, the linear regression model in fact makes no assumptions whatsoever

    regarding the distribution of the disturbances (other than uncorrelatedness with the

  • 12

    regressors). In that sense, the OLS estimator can be treated as a nonparametric estimator.

    OLS estimates the functional:

    β = [ ][ ] ≡ ( ) = ( ) ( )−∞+∞

    −∞

    +∞

    −∞

    +∞

    −∞

    +∞

    ∫∫ ∫∫E XYE X F x yf x y dxdy x f x y dxdy22Φ , ,

    by plugging-into this expression the empirical cdf Fn :

    β̂OLS n t tt

    n

    tt

    n

    Fn

    y xn

    x≡ ( ) == =∑ ∑Φ 1 1

    1

    2

    1

    Now compute the functional derivative of Φ:

    Φ F Hx y f h

    x f h

    x yf

    x f

    x y h

    x f

    x y f x h

    x f

    +[ ] =+{ }

    +{ }= + −

    −∞

    +∞

    −∞

    +∞

    −∞

    +∞

    −∞

    +∞−∞

    +∞

    −∞

    +∞

    −∞

    +∞

    −∞

    +∞−∞

    +∞

    −∞

    +∞

    −∞

    +∞

    −∞

    +∞−∞

    +∞

    −∞

    +∞

    −∞

    +∞

    −∞

    +∞

    −∞

    +∞

    −∞

    +∞

    ∫∫

    ∫∫

    ∫∫

    ∫∫

    ∫∫

    ∫∫

    ∫∫ ∫∫

    ∫∫2 2 2

    2

    2

    + ( )

    = [ ] + [ ]( ) ( ) + ( )

    ( )

    −∞

    +∞

    −∞

    +∞

    ( )∫∫

    2 2 1

    2

    2 1

    2

    O H

    F F x y h x y dxdy O H

    L

    L

    ,

    ,, ,Φ ϕ

    So the functional F Fa Φ( ) = β is L(2,1)-Hadamard-differentiable at F, and its

    derivative is ϕ F u v u vE X

    E XY u

    E XC[ ]( ) = [ ] −

    [ ][ ]{ }

    ∈ −, 22

    2 21. Theorem (i) gives the asymptotic

    distribution of the plug-in (using either the empirical or the kernel estimator of F) for

    ϕ[ ]F C∈ −1 : n F F N V Fnd1 2 0Φ Φ Φ( ˆ ) ( ) ,−{ } → [ ]( ) , with asymptotic variance given by:

    V F VAR F y x COV F y x F y xt tk

    t t t k t kΦ[ ] = ( )( ) + ( ) ( )( )=

    +∞

    + +∑ϕ ϕ ϕ[ ] , [ ] , , [ ] ,21

    .

    Replacing the functional derivative ϕ[F] by its expression and yt-xtβ by εt, it is

    easy to check that this expression is equal to the classical OLS asymptotic variance

    V E x E x E x xOLS t t t t t k t t kk

    ≡ [ ]( ) [ ] + [ ]

    + +=

    +∞

    ∑22 2 2

    1

    2ε ε ε .

  • 13

    Example 2: Least Absolute Deviations

    Consider again the simple linear model y xt t t= +β ε where x is K-variate, β is an

    unknown K-dimensional parameter vector. The identification assumption on

    ε t t T/ , ,={ }1 K consists now of being independent of x t Tt / , ,={ }1 K and having zeromedian. Let fε be the marginal density of the disturbances ε. The least absolute deviation

    (LAD) estimator is defined by (with the minimum taken over a compact set):

    ˆ argminβ ββLAD t tt

    T

    Ty x≡ −

    =∑1

    1

    .

    The first order condition is: 1

    01T

    sign y x xt t LAD tt

    T

    −( ) ≡=∑ β̂ , where sign(a) -1 if

    a

  • 14

    Example 3: Censored Least Absolute Deviations

    Powell [1984] extended the LAD estimator to the case where the dependent variable

    is censored, i.e., only y xt t t≡ +{ }max ,0 β ε is observed. This case is typical of situationsarising in a labor supply context. In that case, consider the CLAD estimator:

    ˆ argmin max ,β ββCLAD t tt

    T

    Ty x≡ − { }

    =∑1 0

    1

    Regularity conditions guaranteeing the identifiability of β, and the existence and unicity

    (asymptotically) of a solution are given by Powell [1984], and also assumed here. The

    population first order condition is: E sign y x x x− { }( ) ( )[ ] =max ,0 1 0β β and let Φ F( ) ≡ βso β̂CLAD nF= ( )Φ . Now: ϕ β β

    εCLAD F x y f

    B sign y x x x C[ ] , max , '( ) = ( )

    − { }( ) ( ) ∈− −12 0

    0 11 1

    with B E x x xt t t≡ ( )[ ]1 β ' . Thus n N f B W BCLAD d1 2 2 1 10 2 0ˆ ,β β ε−{ } → ( )( )( )− − − ,W VAR sign x x COV sign x x sign x x

    COV sign x x sign x x

    t t t t t t t k t k t kk

    t k t k t k t t t

    ≡ ( ) ( )( ) + ( ) ( ) ( ) ( )( ){+ ( ) ( ) ( ) ( )( )}

    + + +=

    +∞

    + + +

    ∑ε β ε β ε βε β ε β

    1 1 1

    1 11

    ,

    ,

    This asymptotic distribution for dependent data appears to be new.

    Example 4: Integrated Functionals

    Consider next the family of real-valued functionals of the following form, where

    ω(.) is a trimming function: Φ( ) , , ,...,F x x F x F x F x dxm≡ ( ) ( ) ( ) ( )( )−∞

    +∞( ) ( ) ( )∫ ω ψ 1 2 . This class

    includes the information matrix giving the asymptotic variance of maximum likelihood

    estimators, the entropy measure, the average derivative estimators of Powell, Stock and

    Stoker [1989] and Robinson [1989], the integral of the squared density, etc. The functional

    derivative is: ϕ ∂∂

    ω ∂∂

    [ ]( ) ( ) ( ) ( , ( ),..., ( ))( )( )F x

    xx

    Fx F x F xq

    q

    q qm

    q

    m

    = −

    −−

    −( )

    =∑ 1 1

    1

    11

    1

    Ψ

    C-1, so the

    plug-in will converge at rate root-n and have an asymptotic distribution sensitive to

    dependent data.

  • 15

    Example 5: Pointwise Estimation

    Consider the classical example ΦqqF F y[ ] ≡ ( )( ), a derivative of the cdf evaluated at

    y. Then if q=0, ϕ0[F](x) = 1(y − x) ∈C−1 , while ϕ δq y

    q q qF x x C C[ ]( ) \= ( ) ∈( )−( ) − − −1 1 , hence

    for q 1: h n F y F y N K x dx f ynq

    nq q d q( ) / ˆ , | |2 1 2 1 2 1 20− ( ) ( )

    −∞

    +∞−( )( ) − ( ){ } → ( )

    ( )

    ∫ . The

    extension to multivariate data is immediate given the multivariate result in the Appendix.

    Example 6: Smooth Quantiles

    Take Φ F F y[ ] ≡ [ ]−1 for some y. In the independent case, smooth estimation ofquantiles has been studied e.g., by Parzen [1979] and Silverman and Young [1987]. Here

    the functional derivative can be computed as: ϕ[F](x) = − 1F(1) F−1(y)[ ] 1 F

    −1(y) − x( ) ∈C−1

    so the asymptotic distribution will converge at rate root-n and have time-dependent terms.

    Letting f be the joint density of observations at lag k, the asymptotic variance is:

    V F

    y y f s t f s f t dsdt

    F F y

    k

    F yF y

    k

    Φ[ ] =− + −{ }

    [ ]( )−∞−∞=

    +∞

    −−

    ∫∫∑( ) ( , ) ( ) ( )

    ( )

    ( )( )

    ( )

    1 2

    11

    1

    1 1 2

    This result also appears to be new. Weak convergence of the quantile process to a

    Gaussian process is proved using the same method (see AÔt-Sahalia [1993]).

    Example 7: Mode

    The mode of a unimodal univariate density, studied by Parzen [1962] for i.i.d.

    data, can be obtained by the following functional: Φ(F) ≡ [F(2) ]−1(0), that is the point at

    which the derivative of the density is zero. The functional derivative can be computed here:

    ϕ[F](x) = − 1F(3) [F(2) ]−1(0)[ ] δ([F(2) ]−1 (0))

    (1) ∈C−3 \ C−2 , so it follows that:

  • 16

    h n F F N K x dxF F

    F Fn n

    d3 2 1 2 2 1 2 1 1 21 2 1

    3 2 1 20 0 0

    0

    0

    / ( ) ( )( ) ( )

    ( ) ( )[ ˆ ] ( ) [ ] ( ) , | |

    [ ] ( )

    [ ] ( )

    − −

    −∞

    +∞( )

    −−{ } → ( )

    [ ][ ]( )

    ∫ .

    This result appears to be new for dependent data.

    Example 8: Hazard Rate

    Consider Φ F F yF y

    [ ] ≡−

    ( ) ( )( )

    1

    1 for some fixed y. Its kernel estimation has been studied

    by Roussas [1990]. Hazard rates are typically useful in unemployment studies. Here the

    derivative can easily be computed:

    ϕ δ[ ]( )( )

    ( )( )

    ( )\( )

    ( )

    F xF y

    xF y

    F yy C Cy x= −

    +−[ ] ( )

    ∈≥ − −11 1

    11

    22 1, and therefore:

    h nf y

    F y

    f yF y

    N K x dxf y

    F ynn

    n

    d1 2 1 2 1 221 1

    01

    /ˆ ( )

    ˆ ( )

    ( )( )

    , | |( )

    ( )−−

    → ( )

    −( )

    −∞

    +∞( )∫ .

    Example 9: Regression Function

    The Nadaraya-Watson method relies on a kernel plug-in to estimate

    Φ F E Z Y y zf y z dz f y z dz[ ] ≡ =[ ] = ( ) ( )−∞

    +∞

    −∞

    +∞

    ∫ ∫, , . The asymptotic distribution ofRobinson [1983] and Bierens [1985] can be recovered by computing the functional

    derivative: ϕ δ[ ]( ) [ ][ ]

    ( ) \( )F xz a F

    b Fw C Cy=

    ∈ − −2 1 where a F E Z Y y[ ] ≡ =[ ] andb F f y z dz[ ] ,≡ ( )

    −∞

    +∞

    ∫ . Hence for ε ≡ − =[ ]Z E Z Y y| and k regressors in Y:

    h n F F NE Y y K w dw

    f y z dznk

    nd/ / ( ˆ ) ( ) ,

    |

    ,

    2 1 2

    2 2

    0Φ Φ−{ } →=[ ] ( )[ ]{ }

    ( )

    ( )−∞

    +∞

    −∞

    +∞

    ∫∫

    ε l

  • 17

    6. Conclusions

    This paper has extended the delta method to nonparametric estimators of unsmooth

    functionals. The regularity conditions are simple, easily verifiable and generally equal or

    beat conditions used in case-by-case studies. Generalized derivatives were allowed to

    permit the inclusion of virtually any functional, global or pointwise, explicitly or implicitly

    defined. It was found here that both the rate of convergence to the asymptotic distribution

    and the asymptotic variance were functions of the unsmoothness of the functional

    derivative. Basing the estimator on dependent data modifies the asymptotic distribution

    only if the functional is more irregular than some threshold level (cadlag). New functional

    derivatives were computed for a variety of practical estimators used in econometrics, and

    used to obtain straightforwardly their asymptotic distribution.

    Compared to the case-by-case approach, the generalized delta method has another

    advantage. It isolates the computation of the functional derivative, which is computed once

    and for all. When considering dependent sequences, or nonparametric estimation strategies

    other than kernel-based, the exact same functional derivative will be needed. The kernel

    results of this paper could therefore potentially be extended to cover other nonparametric

    methods. Many popular nonparametric procedures for density estimation are indeed of the

    form ˆ ,f un

    K u xn n ii

    n

    ( ) ≡ ( )=∑1

    1

    . For example, the kernel method sets

    K u xh

    Ku x

    hn i nd

    i

    n

    ,( ) ≡ −

    1 with a fixed function K(.), while the orthogonal function

    method is based on K u x p u p xn i jj

    h

    j i

    n

    ,( ) ≡ ( ) ( )=

    ∑1

    1

    where{pi} is a system of orthogonal

    functions in L2(Rd).

  • 18

    References

    AÔt-Sahalia, Y., [1993], "Nonparametric Functional Estimation with Applications toFinancial Models," Ph.D. Thesis, MIT, May.

    Andrews, D.W.K., [1991], "Heteroskedasticity and Autocorrelation ConsistentCovariance Matrix Estimation," Econometrica, Vol. 59, No. 3, 817-858.

    Arcones, M.A., and Yu, B., [1992], "Central Limit Theorems for Empirical and U-Processes of Stationary Mixing Sequences," MSRI Mimeo, U.C. Berkeley.

    Bierens, H.J., [1985], "Kernel Estimators of Regression Functions," in Bewley, T.F.,ed., Advances in Econometrics, Fifth World Congress, Vol. I, Econometric SocietyMonographs, Cambridge University Press, Cambridge, England.

    Billingsley, P., [1968], Convergence of Probability Measures, Wiley, New-York.

    Dudley, R.M., [1990], "Nonlinear Functionals of Empirical Measures and theBootstrap," in Probability in Banach Spaces 7, ed. E. Eberlein et al., Progress inProbability 21, 63-82, Birh‰user, Boston.

    Fernholz, L.T., [1983], Von Mises Calculus for Statistical Functionals, LectureNotes in Statistics 19, Springer-Verlag.

    Gill, R.D., [1989], "Non- and Semi-parametric Maximum Likelihood Estimators and thevon Mises Method (Part 1)," Scandinavian Journal of Statistics, Vol. 16, 97-128.

    Goldstein, L. and Messer, P., [1992], "Optimal Plug-In Estimators forNonparametric Functional Estimation," Annals of Statistics, Vol. 20, 1306-1328.

    Gyˆrfi, L., H‰rdle, W., Sarda, P. and Vieu, P., [1989], Nonparametric CurveEstimation from Time Series, Lecture Notes in Statistics 60, Springer-Verlag.

    Masry, E., [1989], "Nonparametric Estimation of Conditional Probability Densities andExpectations of Stationary Processes: Strong Consistency and Rates," StochasticProcesses and their Applications, 32, 109-127.

    von Mises, R., [1947], "On the Asymptotic Distribution of Differentiable StatisticalFunctions," Annals of Mathematical Statistics, Vol. 18, 309-348.

    Newey, W.K. and West, K.D., [1987], "A Simple, Positive Semi-Definite,Heteroskedasticity and Autocorrelation Consistent Covariance Matrix," Econometrica,Vol. 55, No. 3, 703-708.

    Pakes, A. and Pollard, D., [1989], "Simulation and the Asymptotics of OptimizationEstimators," Econometrica, Vol. 57, No. 5, 1025-1057.

    Parzen, E., [1962], "On Estimation of a Probability Density Function and the Mode,"Annals of Mathematical Statistics, 33, 1065-1076.

  • 19

    ------------, [1979], "Nonparametric Statistical Data Modeling," Journal of theAmerican Statistical Association, 74, 105-131.

    Phillips, P.C.B., [1991], "A Shortcut to LAD Estimator Asymptotics," EconometricTheory, 7, 450-463.

    Pollard, D., [1990], "Asymptotics for Least Absolute Deviation Estimator,"Econometric Theory, 6.---------------, [1984], Convergence of Stochastic Processes, Springer, New-York.

    Powell, J.L., [1984], "Least Absolute Deviations Estimation for the CensoredRegression Model," Journal of Econometrics, Vol. 25, 303-325.Powell, J.L., Stock, J.H. and Stoker, T.M., [1989], "Semiparametric Estimationof Index Coefficients," Econometrica, Vol. 57, No. 6, 1403-1430.

    Reeds, J.A.III, [1976], "On the Definition of Von Mises functionals," Ph.D. Thesis,Harvard University, Department of Statistics.

    Robinson, P.M., [1991], "Automatic Frequency Domain Inference on Semiparametricand Nonparametric Models," Econometrica, Vol. 59, No. 5, 1329-1363.-----------------, [1989], "Hypothesis Testing in Semiparametric and NonparametricModels for Econometric Time Series," Review of Economic Studies, Vol. 56, 511-534.---------------, [1988], "Root-N Consistent Semiparametric Regression,"Econometrica, Vol. 56, 931-954.-----------------, [1984], "Robust Nonparametric Autoregression," in Robust andNonlinear Time Series Analysis, Franke, H‰rdle and Martin eds., Lecture Notes inStatistics 26, Springer-Verlag, Heidelberg.-----------------, [1983], "Nonparametric Estimators for Time Series," Journal of TimeSeries Analysis, 4, 185-207.

    Rosenblatt, M., [1991], Stochastic Curve Estimation, NSF-CMBS RegionalConference Series in Probability and Statistics, Vol. 3, Institute of MathematicalStatistics, Hayward.-----------------, [1971], "Curve Estimates," Annals of Mathematical Statistics, Vol.42, 1815-1842.

    Roussas, G., [1969], "Nonparametric Estimation in Markov Processes," Annals of theInstitute of Statistical Mathematics, Vol. 21, 73-87.-------------, [1989], "Hazard Rate Estimation Under Dependence Conditions," Journalof Statistical Planning and Inference, 22, 81-93.

    Schwartz, L., [1954,1966], Theory of Distributions, Hermann, Paris.

    Stone, C.J., [1980], "Optimal Convergence Rates for Nonparametric Estimators,"Annals of Statistics, Vol. 8, No. 6, 1348-1360.

    Zemanian, A.H., [1965], Distribution Theory and Transform Analysis, McGraw-Hill, New-York.

  • 20

    Appendix

    The statement of the Theorem in the multivariate case is the following:

    (i) If ϕ[ ]F C∈ −1 , the result reads the same in dimension d;

    (ii) If ϕ[ ] \F C Cq q∈ − − +1: let ϕ α ∂ δF x F x x B F x

    L

    y[ ]( ) = [ ]( ) ( ) + [ ]( )=

    ( )( )∑ l

    l

    lll

    1

    ∆ where

    l∆ = −q 2, αl F C[ ] ⋅( ) ∈

    −1 and B F C q[ ] ⋅( ) ∈ − +1. l ly R d∈ ( ) contains d l( ) components, and

    x x x= ( )−l l, is partitioned accordingly. The maximal number of variables affected by theDirac mass is d d L

    * max / , ,≡ ( ) ∈{ }{ }l l K1 and is attained at l i n

    L* *, , /≡ ∈{ } ( ) ={ }l K l1 L d d .

    Then under A4(r+d*/2, m): h n F F N V Fnd q

    nd* / / ( ˆ ) ( ) ,+ −( ) −{ } → [ ]( )2 4 2 1 2 0Φ Φ Φ

    where:

    V F K u d u F y t f y t d tΦ∆[ ] = ( )( )[ ]

    [ ]( ){ } ( )

    ( )( )

    −∞

    +∞− − −

    −∞

    +∞

    ∈∫ ∫∑ ∂ α

    l

    ll l

    ll l l l l

    l

    2 2, ,

    L

    and

    K K v d vl

    l l( )

    −∞

    +∞−⋅( ) ≡ ⋅( )∫ , .

    Proof of Theorem: The following two lemmas will be used:

    Lemma 1 (Central Limit Theorem): Under A1-A4( ,0), ˆ̂ ˆ ˆA n F E Fn n n≡ − [ ]( )1 2converges in law to a Gaussian C0-stochastic distribution GF in the space ( , ),C L

    00∞( ) ,

    with finite-dimensional covariances given below. If A4( ,0) is replaced by the more

    stringent requirement A4(r,0), then the preceding statements hold for the centered process

    ˆ ˆA n F Fn n≡ −( )1 2 instead of ˆ̂ ˆ ˆA n F E Fn n n≡ − [ ]( )1 2 . The covariance kernel of thegeneralized Brownian Bridge G B FF ≡ ˜ o is given by (where Fk is the joint cdf of

    observations at lag k):

  • 21

    E F s B F t F s t F t F s t F t s F s F tk kk

    ˜ ( ( )) ˜ ( ( )) min , ( ) ( , ) ( , ) ( ) ( )B[ ] = ( )( ) −( ) + + −{ }=

    +∞

    ∑1 21

    .

    Lemma 2 (Bounds for Remainder Term): Under A1-A3, for q = 0,...,d+s:

    ˆ ˆ,

    F E F O n hn nL q

    p nq− [ ] = ( )( ) − −2 1 2 and: E F F O hn L q nr q dˆ ,[ ] − = ( )( ) − −( )2 .

    To prove Lemma 1, show that the class of functions Γ ≡ ∈ ∈{ }( ) +W y R h Ry h d, */ ,where W

    hK

    th

    dty h d

    y

    ,( )−∞

    ⋅( ) ≡ − ⋅∫

    1 forms a subgraph VC class. But such a class is a

    Euclidean class (from Lemma (2.12) in Pakes and Pollard [1989]); conclude with Theorem

    1 of Arcones and Yu [1992]. Alternatively one could use the U-statistics approach of

    Robinson [1989]. Lemma 2 is easy given A2. Details are in AÔt-Sahalia [1993].

    (i) Consider now the first part of the Theorem. By differentiability of the functional

    Φ at F: n F F F x dA x R F An n n1 2 Φ Φ Φˆ [ ] ˆ [ , ˆ ][ ] − [ ]{ } = ( ) ( ) +

    −∞

    +∞

    ∫ ϕ , where:

    R F A O n F FnX

    n L mΦ[ , ˆ ] ˆ

    ,= −

    ∞( )

    1 22

    . First, R F A onX

    pΦ[ , ˆ ] ( )= 1 follows from

    Lemma 2 since: n F F O n h n hn L m p nm

    nr m d1 2

    21 2 2 1 2 2 2/

    ,

    / ( )ˆ − = +( )∞( )

    − − − − is op (1) under

    A4(r,m) as r>2(m-d). Then by Slutsky's Theorem, the distribution of n F Fn1 2 Φ Φˆ[ ] − [ ]{ }

    is given by that of ϕ[ ] ˆF x dA xn( ) ( )−∞

    +∞

    ∫ .

    But by the continuous mapping theorem (e.g., Proposition 9.3.7 in Dudley

    [1989]), ϕ[ ] ˆF x dA xn( ) ( )−∞

    +∞

    ∫ converges in law to ϕ[ ]( ) ( ˜ ( ( ))F x d B F x−∞

    +∞

    ∫ since from Lemma 1

    Ân converges in law to the process G B FF ≡ ˜ o . ϕ[ ]( ) ( ˜ ( ( ))F x d B F x−∞

    +∞

    ∫ is the ItÙ integral of

    the real-valued, non-random function ϕ[F] with respect to the Gaussian stochastic process

    ̃B Fo and is therefore normally distributed. The asymptotic variance of the generic ItÙ

    integral ω( ) ( ˜ ( ( ))x d B F x−∞

    +∞

    ∫ can be computed as:

  • 22

    E x d B F x E x y d B F y d B F x

    E B F y B F xx

    x x

    yd

    d

    d

    ω ω ω

    ∂ ω∂ ∂

    ∂ ω

    ( ) ( ˜ ( )) ( ) ( ) ( ˜ ( )) ( ˜ ( ))

    ( ˜ ( ))( ˜ ( ))

    o o o

    o oL

    −∞

    +∞

    −∞

    +∞

    −∞

    +∞

    −∞

    +∞

    −∞

    +∞

    ∫ ∫∫

    ∫∫

    =

    = [ ] ( ) ( )

    2

    1 ∂∂ ∂1y ydydx

    dL

    Thus:

    V F E F x E F x

    E F x F x E F x E F x

    t t

    kt t k t t k

    Φ[ ] = ( )[ ] − ( )[ ]+ ( ) ( )[ ] − ( )[ ] ( )[ ]{ }{

    =

    +∞

    + +∑

    ϕ ϕ

    ϕ ϕ ϕ ϕ

    [ ] [ ]

    [ ] [ ] [ ] [ ]

    2 2

    1

    2

    (ii) Consider now the case where:

    ϕ α ∂ δ[ ]( ) \ ,F x F x x C C for qy

    q q= [ ]( ) ( ) ∈ = −( ) ( ) − − +l l ll

    l∆ ∆1 2 , 2≤q≤s.

    The remainder term in the expansion of the functional is bounded as in (i). Then by

    Slutsky's Theorem the asymptotic distribution is given by the linear term (scaled for now at

    the rate n1/2):

    ∂ δ α

    ∂ α

    l

    l

    l

    l l

    ll

    ll l

    l l l ll

    ( )( )

    −∞

    +∞

    ( ) −− −

    =−∞

    +∞−

    −∞

    +∞

    ( ) [ ]( )

    = [ ]( ) − −

    ∫∫

    y n

    nd

    n n x y

    n

    x F x dA x

    hF x x K

    x th

    x th

    d x a t dt

    ˆ ( )

    , , ( )|

    1

    + [ ]( ) − −

    − [ ]( )

    ( ) −− −

    =−∞

    +∞−

    −∞

    +∞

    ( ) − −

    ∫∫n h F x x Kx th

    x th

    d x f t dt

    F x x f x

    nd

    n n x y

    1 2 1/

    |

    , , ( )

    , ,

    ∂ α

    ∂ α

    l

    l l

    l

    ll l

    l l l ll

    ll l l

    ∆ ll ll l

    x d xx y

    ( ){ } =−∞

    +∞−∫ |

  • 23

    = [ ]( ) − −

    + ( )( ) −

    − −

    =−∞

    +∞−

    −∞

    +∞− −( )∫∫ 1 1 2 2h F x x K

    x th

    x th

    d x a t dt O n hnd

    n n x y

    n p nr q∂ α

    l

    l l

    ll l

    l l l ll∆ , , ( )

    |

    Write the leading term as ωn nt a t dt( ) ( )−∞

    +∞

    ∫ , where: a t dt dA tn n( ) = ( ) and

    A n F Fn n≡ −( )1 2 . Then let ν ωn nd nt h t( ) ≡ ( )( )+( )ll2 2∆ / . Next show that:

    E h t da t

    K u d u F y t f y t d t o

    nd

    n nl

    ll l

    ll l l l l

    l

    l

    ( )+( )−∞

    +∞

    ( )( )

    −∞

    +∞− − −

    −∞

    +∞

    ( ) ( )

    = ( )( )[ ]

    [ ]( ){ } ( )

    +

    ∫ ∫

    2 2

    2

    2 21

    /

    , , (

    ω

    ∂ α ))

    But:

    ω ω ω ω ωn n n i ni

    n

    n i ni

    n

    x dA xn

    x x f x dxn

    x E x( ) ( ) = ( ) − ( ) ( )

    = ( ) − ( )[ ]{ }−∞

    +∞

    −∞

    +∞

    = =∫ ∫∑ ∑1 11 2

    11 2

    1/ / , so:

    E t dA t t f t dt t f t dt

    f t s f s f t t s dt ds

    n n n n

    k

    n

    k n n

    ω ω ω

    ω ω

    ( ) ( )

    = ( )[ ] − ( )

    + −{ } ( ) ( )

    −∞

    +∞

    −∞

    +∞

    −∞

    +∞

    =

    −∞

    +∞

    −∞

    +∞

    ∫ ∫ ∫

    ∑ ∫∫

    2

    2

    2

    1

    1

    2

    ( ) ( )

    ( , ) ( ) ( )

    The first of the three terms above is O hn

    d d− + − ( )( )( )2 l l∆ , while the other two are o hn

    d d− + − ( )( )( )2 l l∆ . In particular the "time-series" term containing the sum over time lags isof lower order than the first term. The computations are very similar for all three terms; for

    example for the first term:

  • 24

    −∞

    +∞

    −∞

    +∞( ) −

    − −

    =−∞

    +∞−

    −∞

    +∞( )

    ∫ ∫ ∫

    ( )[ ] = [ ]( ) − −

    =

    ω ∂ α

    ∂ α

    nn

    dn n x y

    nd

    t f t dth

    F x x Kx th

    x th

    d x f t dt

    hF

    2

    2

    2

    2

    1

    1

    ( ) , , ( )|

    l

    l l

    l

    ll l

    l l l ll

    l

    Θ [[ ]( ){ } − −

    =

    −( )− −

    =−∞

    +∞−

    ≤∫∑ l l

    l l l ll

    l l

    l l

    l ll l l

    x x Kx th

    x th

    d x f t dtx y

    n n x y

    , , ( )|

    |/

    ∂ ∆ ΘΘ Θ ∆

    2

    Because

    ∂ ∂l l

    l l

    l ll l l l l l l l

    ∆ Θ∆ Θ

    ∆ Θ−( )− −

    −−( )

    − −− −

    =( ) ( )

    − −

    Kx th

    x th h

    Kx th

    x thn n n n n

    , ,1

    , the

    term of highest order corresponds to lΘ = 0 , and is therefore given by:

    −∞

    +∞− ( )

    − −

    −∞

    ∞−∫ ∫ [ ]( ) ( ) − −

    1 12

    2

    hF y x

    hK

    y th

    x th

    d x f t dtn

    d

    n n n

    α ∂ll l

    l l l ll

    l

    l

    , , ( )∆

    = ( )( )[ ]

    [ ]( ){ } ( )

    +

    ( )+( )

    ( )−∞

    +∞− − −

    −∞

    +∞

    ( )+∫ ∫1 1

    2

    2 2

    2hK u d u F y t f y t d t o

    hnd

    ndl l

    l ll

    l l l l l

    ll

    l

    l∆∆

    ∆∂ α , ,

    Hence:

    E t dA t K u d u F y t f y t d t o

    V F o

    n nν ∂ α( ) ( )

    = ( )( )[ ]

    [ ]( ){ } ( )

    + ( )

    ≡ [ ] + ( )−∞

    +∞( )

    ( )−∞

    +∞− − −

    −∞

    +∞

    ∫ ∫ ∫2

    2 21

    1

    l

    ll l

    ll l l l l∆

    Φ

    , ,

    Therefore ν ν νn n n i ni

    ndx dA x

    nx E x N V F( ) ( ) = ( ) − ( )[ ]{ } → [ ]( )

    −∞

    +∞

    =∫ ∑1 01 2

    1/ , Φ

    by the Central Limit Theorem. But the first order Taylor expansion of the functional yields:

    n F F t dA t O n h O n F Fn n n p nr q

    p n L m

    1 2 1 2 2 1 22

    / /

    ,( ˆ ) ( ) ˆΦ Φ−{ } = ( ) ( ) + ( ) + −

    −∞

    +∞− −( )

    ∞( )∫ ω

    Suppose that d dl( ) = * (when there are more than one l the only terms that matter

    are those corresponding to l ∈L*). Then under Assumption A4(r+d*/2,m),

    h O n h on

    dp n

    r qp

    l l( )+( ) − −( )( ) = ( )2 2 1 2 2 1∆ / and the remainder term is also op(1). Therefore:

    h n F F t a t dt o

    N K u d u F y t f y t d

    nd

    n n n p

    d

    l

    ll l

    ll l l l l

    l

    l

    ( )+( )−∞

    +∞

    ( )( )

    −∞

    +∞− − −

    −{ } = ( ) ( ) + ( )

    → ( )( )[ ]

    [ ]( ){ } ( )

    2 2 1 2

    2 2

    1

    0

    Φ Φ/ / ( ˆ ) ( )

    , , ,

    ν

    ∂ α tt−∞

    +∞

  • 25

    Under Assumption A4(r,m) only, this will be the asymptotic distribution of

    h n F E Fn

    dn n

    l l( )+( ) − [ ]{ }2 2 1 2∆ Φ Φ/ / ( ˆ ) ( ˆ ) instead of h n F Fnd nll( )+( ) −{ }2 2 1 2∆ Φ Φ/ / ( ˆ ) ( ) . Indeed:

    E F F O hn nr qΦ Φ( ˆ ) ( )[ ] − = ( )− −( )2 . Under A4(r+d*/2,m), this asymptotic bias term once

    multiplied by h nnd l l( )+( )2 2 1 2∆ / / is o(1).

    The absence of a covariance at the same order O hn

    d− ( )+( )( )l l2 ∆ between termsassociated with different l yields:

    h n F F

    N K u d u F y t f y t d t

    nd q

    n

    d

    * ( ) / / ( ˆ ) ( )

    , , ,

    + −( )

    ( )( )

    −∞

    +∞− − −

    −∞

    +∞

    −{ } → ( )( )[ ]

    [ ]( ){ } ( )

    ∫ ∫∑

    2 2 2 1 2

    2 20

    Φ Φ

    ∆∂ αl

    ll l

    ll l l l l

    l L*

    If present, the term B[F](.) C-q+1 only contributes terms of higher order in

    powers of hn-1 and therefore does not change the asymptotic distribution.

    Estimation of the Asymptotic Variance: The additional technical regularity

    condition is A5: Let V V F F xi i i≡ [ ] ≡ [ ]( )ϕ and ˆ ˆV F xi n i≡ [ ]( )ϕ . Assume:(1) E Vi

    3+[ ] < ∞δ where δ ε ε ε> +( ) +( )3 3 2/ for some ε>0;(2) E V GG isup ∈ [ ][ ] < ∞Ν 2 where N is a neighborhood of the true cdf F;(3) E V G V H C G Hi i L m[ ] − [ ][ ] ≤ − ∞( )2 2 , for G and H in N.