density estimation for associated sampling: a point process influenced approach

This article was downloaded by: [Kungliga Tekniska Hogskola]On: 07 October 2014, At: 12:02Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Nonparametric StatisticsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/gnst20

Density estimation for associatedsampling: A point process influencedapproachPaulo Eduardo Oliveira aa Dep. Matemática , Universidade de Coimbra , Apartado 3008,Coimbra, 3001-454, PortugalPublished online: 27 Oct 2010.

To cite this article: Paulo Eduardo Oliveira (2002) Density estimation for associated sampling:A point process influenced approach, Journal of Nonparametric Statistics, 14:5, 495-509, DOI:10.1080/10485250213904

To link to this article: http://dx.doi.org/10.1080/10485250213904

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/gnst20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/10485250213904

http://dx.doi.org/10.1080/10485250213904

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Nonparametric Statistics, 2002, Vol. 14(5), pp. 495–509

DENSITY ESTIMATION FOR ASSOCIATEDSAMPLING: A POINT PROCESS

INFLUENCED APPROACH

PAULO EDUARDO OLIVEIRA

Dep. Matematica, Universidade de Coimbra, Apartado 3008, 3001-454 Coimbra, Portugal

(Received May 2000; In final form December 2001)

Let Xn, n 2 N, be a sequence of associated variables with common density function. We study the kernel estimationof this density, based on the given sequence of variables. Sufficient conditions are given for the consistency andasymptotic normality of the kernel estimator. The assumptions made require that the distribution of pairs ðXi;XjÞ

decompose as the sum of an absolutely continuous measure with another measure concentrated on the diagonalof R� R satisfying a further absolute continuity with respect to the Lebesgue measure on this diagonal. For theconvergence in probability we find the usual convergence rate on the bandwidth, whereas for the almost sureconvergence we need to require that the bandwidth does not decrease to fast and that the kernel is of boundedvariation. This assumption on the kernel is also required for the asymptotic normality, together with a slightlystrengthened version of the usual decrease rate on the bandwidth. The assumption of bounded variation on thekernel is needed as a consequence of the dependence structure we are dealing with, as association is onlypreserved by monotone transformations.

Keywords: Kernel estimation; Association; Diagonal decomposition; Convergence

AMS Classification: 62G07, 62G20

1 INTRODUCTION

The estimation of the density of random variables or random vectors is a quite classical sta-

tistical problem for which there exists a well established theory and a very wide literature.

There are several methods proposed and we will be here interested in the non-parametric app-

roach. Naturally, the problem was first studied supposing that the data collected satisfied an

independence assumption. A nice account of results and existing literature is given in Bosq

and Lecoutre [6]. Eventually the same problem was treated for depend sampling, considering

mixing samples as in Roussas [18], Pham and Tran [17] or Masry [14] (see the monograph

Bosq [5] for existing literature and a presentation of the asymptotic properties of the kernel

estimator). More recently there was some interest in considering data satisfying a positive dep-

endence assumption, namely supposing the data to be associated. For this dependence struc-

ture Bagai and Prakasa Rao ½1; 2� and Roussas [19] established the consistency of the kernel

estimator for the density while Dewan and Prakasa Rao [7] proved an uniform exponential rate

of convergence supposing a convenient decrease rate on the covariances of the variables.

ISSN 1048-5252 print; ISSN 1029-0311 online # 2002 Taylor & Francis LtdDOI: 10.1080=1048525021000002118

Dow

nloa

ded

by [

Kun

glig

a T

ekni

ska

Hog

skol

a] a

t 12:

02 0

7 O

ctob

er 2

014

This last reference deals with a general estimator that includes both the kernel and the histo-

gram estimator, among others, although the conditions imposed essentially exclude the histo-

gram from their results. The asymptotic normality, for the kernel estimator, was established in

Roussas [20]. We note that in Roussas [20], besides some more or less usual conditions on the

covariance structure of the data (which is a natural way of controlling the dependence for asso-

ciated variables, as it follows from Theorem 10 in Newman [15]), it is supposed that the pairs

of random variables are absolutely continuous, while in this article we will suppose the dis-

tribution of these pairs of variables to have some mass concentrated on the diagonal of

R� R. This property appears in simple models of associated random variables.

The assumptions we will make are suggested by a point process approach to estimation

problems. This approach consisted on defining convenient point processes or random mea-

sures x and Z with mean measures such that Ex � EZ and then estimate the Radon–Nikodym

derivative ðd ExÞ=ðd EZÞ. This Radon–Nikodym derivative may be conveniently interpreted

representing, for a suitable choice of x and Z, the density we want to estimate. We may esti-

mate this Radon–Nikodym derivative using histograms or kernels and this correspond to the

usual estimators defined in the classical framework. A point process approach to density esti-

mation was first used in Ellis [8] in a context that was slightly different from ours. For an

approach corresponding to the one that has directly influenced this article there exists

some literature available: for independent sampling, Jacob and Oliveira [11] considered

the histogram, while Jacob and Oliveira [12] considered the kernel; for strong mixing sam-

pling, Bensaıd and Fabre [3] studied the kernel and Bensaıd and Oliveira [4] the histogram;

for associated sampling, Ferrieux [10] treated the kernel estimator and Jacob and Oliveira

[13] the histogram. In all these references it is supposed that one samples the point processes

themselves, that is, a sample is a collection of possible realizations of the point processes.

This is, in fact, equivalent to the usual sampling in density estimation, as well as in other

models that may be included in this point process approach, as explained in those references.

We refer the interested reader to Jacob and Oliveira ½11; 12; 13� for more details on how the

point process approach models some more classical estimation problems. Now, while inde-

pendence or strong mixing of the sampled point processes corresponds to independence or

strong mixing of the underlying variables (the ones that define the points), the same is not

true for association. In fact, as noted by Ferrieux [10] and Jacob and Oliveira [13] association

of point processes seems to have no relation with association of the underlying variables, thus

the results in Ferrieux [10] or Jacob and Oliveira [13] do not overlap with those in Bagai and

Prakasa Rao [1,2], Roussas [19,20], Dewan and Prakasa Rao [7]. To be a little more precise,

a random variable is always associated with itself while the point process with just one point

defined by this random variable is associated (as point process) with itself only in degenerate

cases. This is due to the fact that the order defined in the space of discrete measures is only a

partial order. Thus, the idea of approaching the same problem as in Jacob and Oliveira [13],

but supposing instead that the points are associated. Several proofs in Jacob and Oliveira [13]

are no longer valid in the new context. Moreover, the assumptions used there make little

sense in the framework we are placing ourselves throughout this article. When dealing

with point processes we are lead to the control of second order moment measures, which

are defined on the product of the base space by itself, and it becomes natural to suppose

that those measures may be decomposed as the sum of an absolutely continuous measure

with a measure which is concentrated on the diagonal of the product space. When regarding

this decomposition in terms of the distribution of a pair of random variables this means that

there is some probability concentrated on the diagonal, so there is no absolute continuity with

respect to the product measure if this measure is, for example, diffuse. The need for such a

decomposition appeared even when dealing with independent sampling of point processes.

In fact, this sort of decomposition was first used in Jacob and Oliveira [11] and later adapted

496 P. E. OLIVEIRA

Dow

nloa

ded

by [

Kun

glig

a T

ekni

ska

Hog

skol

a] a

t 12:

02 0

7 O

ctob

er 2

014

to dependent sampling in Ferrieux [10], Bensaıd and Fabre [3], Jacob and Oliveira [13] and

Bensaıd and Oliveira [4]. The conditions we will use here are suggested by this point process

approach but we will try to keep the language on the usual level, thus avoiding the use of point

processes as much as possible. Further, this article will concentrate on the kernel estimator.

The same approached may be used to find conditions for similar results for the histogram.

2 DEFINITIONS, ASSUMPTIONS AND PRELIMINARIES

Let X1;X2; . . . be an associated sequence of random variables with the same distribution as X

for which there exists a density function f . Let K be a fixed probability density and hn a

sequence of real numbers converging to zero. The kernel estimator of the density function

f is defined as

ffnðxÞ ¼1

nhn

Xn

j¼1

Kx � Xj

hn

� �:

This estimator is well known to be asymptotically unbiased if there exists a suitable version

of the density, as a consequence of the Bochner’s Lemma.

LEMMA 2.1 If f is bounded and continuous then Eð ffnðxÞÞ ! f ðxÞ uniformly on any

compact set.

This means also that, in order to establish the convergence of ffnðxÞ, it is enough to prove

that ffnðxÞ � EffnðxÞ ! 0 in the appropriate mode of convergence.

Let us now present the assumptions we will suppose throughout this article. On what

follows l� represents the Lebesgue measure on the diagonal D of R� R, and l2 represents

the Lebesgue measure on R2.

ðDÞ

� For each j; k 2 N, the distribution of ðXj;XkÞ is the sum of a measure m1; j;k on

R� RnD with a measure m2; j;k on D, such that m1; j;k � l2 and m2; j;k � l�;

� For each j; k 2 N, there exists a bounded version g1; j;k of ðdm1; j;kÞ=ðd l2Þ;

� For each j; k 2 N, there exists a bounded and continuous version g2; j;k of

ðd m2; j;kÞ=ðd l�Þ.

Note that we allow the pairs to have some mass concentrated on the diagonal, thus

they need not to be absolutely continuous with respect to l2, as it was supposed in

Roussas [20]. A simple example where this property is required arises from a method

used to construct sequences of associated variables from an independent sequence of vari-

ables, as described next. Suppose that Zn, n 2 N, are independent and identically distributed

absolutely continuous random variables. Let m be some fixed integer and define, for each

n 2 N, Xn ¼ minðZn; . . . ; ZnþmÞ. The variables Xn, n 2 N, are associated, have a common

absolutely continuous distribution but the random pairs ðXn;XnþjÞ, j ¼ 1; . . . ;m, are not

absolutely continuous. Nevertheless their distribution decomposes according to (D).

In order to prove the convergence results we will need some control on the Radon–

Nikodym derivatives introduced in the previous condition.

Suppose (D) is satisfied and

ðAÞ

� ð1=nÞPn

j;k¼1 jg1; j;kðx; yÞ � f ðxÞ f ðyÞj converges uniformly to a bounded function g1;

� ð1=nÞPn

j;k¼1 g2; j;kðx; xÞ converges uniformly to a bounded and continuous

function g2.

DENSITY ESTIMATION FOR ASSOCIATED SAMPLING 497

Dow

nloa

ded

by [

Kun

glig

a T

ekni

ska

Hog

skol

a] a

t 12:

02 0

7 O

ctob

er 2

014

Remark that, if the variables X1;X2; . . . are strictly stationary, it is easy to check that

1

n

Xn

j;k¼1

jg1; j;k � f � f j �Xn�1

j¼1

jg1;1; j � f � f j

and analogously for the other summation in (A). Then, as in Bensaıd and Fabre [3] or

Bensaıd and Oliveira [4], it is enough to assume the convergence of the upper bounds.

The main tool for proving convergence is stated in the following lemma.

LEMMA 2.2 Suppose (D) and (A) are satisfied. If the kernel K is bounded then

1

nhn

Xn

j;k¼1

Cov Kx � Xj

hn

� �; K

x � Xk

hn

� �� ! g2ðx; xÞ

ðK2ðuÞ du

uniformly on any compact set.

Proof Write

1

hn

Cov Kx � Xj

hn

� �; K

x � Xk

hn

� ��

¼1

hn

ðR2

Kx � u

hn

� �K

x � v

hn

� �ðg1; j;kðu; vÞ � f ðuÞf ðvÞÞ du dv

þ1

hn

ðD

K2 x � u

hn

� �g2; j;kðu; uÞ du: ð1Þ

Now the first integral in (1) is bounded above byðR2

Kx � u

hn

� �K

x � v

hn

� �jg1; j;kðu; vÞ � f ðuÞf ðvÞj du dv;

so that, for n 2 N large enough

1

nhn

Xn

j;k¼1

ðR2

Kx � u

hn

� �K

x � v

hn

� �jg1; j;kðu; vÞ � f ðuÞf ðvÞj du dv

� 2hn

1

h2n

ðR2

Kx � u

hn

� �K

x � v

hn

� �g1ðu; vÞ du dv � 2hn sup

u;vjg1ðu; vÞj ! 0;

as hn ! 0. For the second integral in (1), it leads to

1

hn

ðD

K2 x � u

hn

� �1

n

Xn

j;k¼1

g2; j;kðu; uÞ du ! g2ðx; xÞ

ðK2ðuÞ du

using the Bochner’s Lemma after renormalizing K2 to find a density. j

Note that the limit obtained in this lemma is exactly the diagonal density. So, if we had

supposed the absolute continuity of the random pairs ðXj;XkÞ this limit would have been

zero. As this term appears when dealing with the convergence of the estimator, if it converges

to zero most of the convergences would follow with relaxed assumptions on the bandwidth

sequence hn.

498 P. E. OLIVEIRA

Dow

nloa

ded

by [

Kun

glig

a T

ekni

ska

Hog

skol

a] a

t 12:

02 0

7 O

ctob

er 2

014

3 CONVERGENCE OF THE ESTIMATOR

In this section we state conditions for convergence in probability and almost surely of ffnðxÞ.

THEOREM 3.1 Suppose (D) and (A) are satisfied. If the kernel K is bounded and

nhn ! þ1 ð2Þ

then ffnðxÞ converges in probability to f ðxÞ, for every x 2 R.

Proof As

Pðj ffnðxÞ � Eð ffnðxÞÞj > eÞ �1

e2nhn

1

nhn

Xn

j;k¼1

Cov Kx � Xj

hn

� �; K

x � Xk

hn

� ��

the theorem follows immediately taking account of Lemmas 2.1 and 2.2. j

The almost sure convergence requires some further assumptions on the kernel K and on

the bandwidth sequence. This is due to the fact that the variables Kððx � XjÞ=hnÞ are, in gen-

eral, not associated. In order to keep the association we should apply only monotone trans-

formations to the original variables, which is not the case with a general kernel. This may

resolved supposing the kernel K to be of bounded variation. There are some additional tech-

nical assumptions required by the method of proof, which follows closely the proof of

Theorem 2.7.1 in Ferrieux [9].

THEOREM 3.2 Suppose (D) and (A) are satisfied; K is of bounded variation such that

Kða xÞ; for fixed x 2 R; is a decreasing function of a > 0; and

hn & 0;X1n¼1

1

n2hn2

< 1;hðnþ1Þ2

hn2

! 1:

Then ffnðxÞ converges almost surely to f ðxÞ, for every x 2 R.

Proof We first check that the subsequence corresponding to terms of order n2 converges

almost surely. This is immediate consequence of the Borel-Cantelli Lemma, Lemma 2.2 and

the assumptions made, as

Pðj ffn2ðxÞ � Eð ffn2ðxÞÞj > eÞ �1

e2n2hn2

1

n2hn2

Xn2

j;k¼1

Cov Kx � Xj

hn2

� �; K

x � Xk

hn2

� �� :

For the remaining terms we write, for an integer k 2 ðn2; ðn þ 1Þ2�,

jð ffkðxÞ � E ffkðxÞÞ � ð ffn2ðxÞ � E ffn2 ðxÞÞj �1

n2max

n2<k�ðnþ1Þ2

�� 1

hk

Xk

j¼1

Kx � Xj

hk

� �� EK

x � Xj

hk

� ��

�1

hn2

Xn2

j¼1

"K

x � Xj

hn2

� �� EK

x � Xj

hn2

� �#��þ 1

n2hn2

Xn2

j¼1

Kx � Xj

hn2

� �� EK

x � Xj

hn2

� �� ;


Dow

nloa

ded

by [

Kun

glig

a T

ekni

ska

Hog

skol

a] a

t 12:

02 0

7 O

ctob

er 2

014

thus leaving the first term to be treated. Now

1

hk

Xk

j¼1

Kx � Xj

hk

� �� EK

x � Xj

hk

� ��

1

hn2

Xn2

j¼1

Kx � Xj

hn2

� �� EK

x � Xj

hn2

� ��

¼Xk

j¼1

1

hk

�1

hn2

� �K

x � Xj

hk

� �� EK

x � Xj

hk

� ��

þXk

j¼1

1

hn2

Kx � Xj

hk

� �� EK

x � Xj

hk

� �� K

x � Xj

hn2

� �� EK

x � Xj

hn2

� ��

þXk

j¼n2þ1

1

hn2

Kx � Xj

hn2

� �� EK

x � Xj

hn2

� �� :

Let us denote the maximum over k 2 ðn2; ðn þ 1Þ2� of each of the three preceding terms by

an, bn and cn, respectively. The almost sure convergence to zero of an=n2, bn=n2 and cn=n2,

that concludes the proof of the theorem, is proved in the following lemmas. j

LEMMA 3.3 Under the assumptions of Theorem 3:2; ðan=n2Þ ! 0 almost surely.

Proof As hn is decreasing and using the decreasing assumption on the kernel, it follows that

0 �1

hk

�1

hn2

� �K

x � u

hk

� ��

1

hðnþ1Þ2�

1

hn2

!K

x � u

hn2

� �:

Now, for every k 2 ðn2; ðn þ 1Þ2�;

1

n2

Xk

j¼1

1

hk

�1

hn2

� �K

x � Xj

hk

� �� EK

x � Xj

hk

� ��

�1

n2

Xðnþ1Þ2

j¼1

1

hðnþ1Þ2�

1

hn2

!K

x � Xj

hn2

� �þ EK

x � Xj

hn2

� �� : ð3Þ

Let us first look at the terms with the mathematical expectation. On one side we have

1

n2hðnþ1Þ2

Xðnþ1Þ2

j¼1

EKx � Xj

hn2

� �¼

ðn þ 1Þ2

n2

hn2

hðnþ1Þ2

1

hn2

EKx � X

hn2

� �! f ðxÞ

using Lemma 2.1. Analogously, it follows

1

n2hn2

Xðnþ1Þ2

j¼1

EKx � Xj

hn2

� �! f ðxÞ:

500 P. E. OLIVEIRA

Dow

nloa

ded

by [

Kun

glig

a T

ekni

ska

Hog

skol

a] a

t 12:

02 0

7 O

ctob

er 2

014

Thus, in (3) the terms corresponding to the mathematical expectations converge to zero. This

allows replacing the sign þ on the right hand side of (3) by a sign � as, given d > 0, for n

large enough, the right hand side in (3) becomes smaller than

1

n2

Xðnþ1Þ2

j¼1

1

hðnþ1Þ2�

1

hn2

!K

x � Xj

hn2

� �� EK

x � Xj

hn2

� �� þ d;

and it is enough to verify that this term converges almost surely to zero. Like this,

Chebyshev’s inequality gives an upper bound with a variance term: for any e > 0,

P1

n2

Xðnþ1Þ2

j¼1

1

hðnþ1Þ2�

1

hn2

!K

x � Xj

hn2

� �� EK

x � Xj

hn2

� �� > e

0@

1A

�1

e2n4

1

hðnþ1Þ2�

1

hn2

!2 Xðnþ1Þ2

j; j0¼1

Cov Kx � Xj

hn2

� �; K

x � Xj0

hn2

� ��

¼1

e2n2hn2

hn2

hðnþ1Þ2� 1

!21

n2hn2

Xðnþ1Þ2

j; j0¼1

Cov Kx � Xj

hn2

� �; K

x � Xj0

hn2

� �� ;

which defines a convergent series, taking account of Lemma 2.2 and the assumptions made

on the bandwidth sequence. j

LEMMA 3.4 Under the assumptions of Theorem 3:2; ðbn=n2Þ ! 0 almost surely.

Proof Using the decreasing assumption on the kernel, it follows that, for every k 2 ðn2;ðn þ 1Þ2�,

jbnj

n2�

1

n2hn2

Xðnþ1Þ2

j¼1

K

x � Xj

hn2

� �þ EK

x � Xj

hn2

� �� K

x � Xj

hðnþ1Þ2

!þ EK

x � Xj

hðnþ1Þ2

!" #!:

As in the previous lemma, it is easy to check that the terms with the mathematical expecta-

tions cancel each other in the limit. So, we are left with checking the almost sure convergence

to zero of

1

n2hn2

Xðnþ1Þ2

j¼1

Kx � Xj

hn2

� �� K

x � Xj

hðnþ1Þ2

!" #:

Given e > 0,

P1

n2hn2

Xðnþ1Þ2

j¼1

Kx � Xj

hn2

� �� K

x � Xj

hðnþ1Þ2

!" #�� > e

0@

1A

�1

e2n4h2n2

Xðnþ1Þ2

j; j0¼1

Cov Kx � Xj

hn2

� �� K

x � Xj

hðnþ1Þ2

!; K

x � Xj0

hn2

� �� K

x � Xj0

hðnþ1Þ2

! !:


Dow

nloa

ded

by [

Kun

glig

a T

ekni

ska

Hog

skol

a] a

t 12:

02 0

7 O

ctob

er 2

014

Using Lemma 2.2 and ððhðnþ1Þ2Þ=ðhn2 ÞÞ ! 1, it follows that the sum of these covariances,

divided by n2hn2 is convergent (to 4g2ðx; xÞÐ

K2ðuÞ du). As there remains the term 1=ðn2h2n2Þ

we have in fact a convergent series defined by the probabilities. j

Notice that the association and the fact that K is of bounded variation has not yet been used.

These assumptions are required to prove the almost sure convergence of the remaining term.

LEMMA 3.5 Under the assumptions of Theorem 3:2; ðcn=n2Þ ! 0 almost surely.

Proof Write K ¼ K1 � K2, with K1; K2 increasing functions. The random variables

K1ððx � XjÞ=hn2 Þ, j ¼ 1; 2; . . . , being monotone transformations of associated variables, are

associated. Then, we may apply the generalization of Kolmogorov’s inequality for associated

variables proved by Newman and Wright [16], to obtain that, given e > 0,

P1

n2hn2

maxn2<k�ðnþ1Þ2

Xk

j¼n2þ1

K1

x � Xj

hn2

� �� EK1

x � Xj

hn2

� �� > e

0@

1A

�2

e2n4h2n2

Xðnþ1Þ2

j; j0¼n2þ1

Cov K1

x � Xj

hn2

� �; K1

x � Xj0

hn2

� �� :

Using the association this sum is bounded above by the sum with j; j0 ranging from 1 to

ðn þ 1Þ2 and then the proof is completed repeating the arguments of the two previous

lemmas. The terms corresponding to K2 are treated analogously. j

4 ASYMPTOTIC NORMALITY OF THE ESTIMATOR

We now look at the asymptotic normality of the kernel estimator. We will prove the asymp-

totic normality under conditions similar to those of Theorem 4.2 in Jacob and Oliveira [13],

which are somewhat weaker than second order stationarity. As the proof is quite long and

technical we will divide it into several lemmas that will be put together at the end of the sec-

tion to state the general result. For the remaining of the section let k be an integer smaller

than n and m ¼ bn=kc the greatest integer less or equal than n=k. Let us further define the

random variables

Tn;i ¼1ffiffiffiffiffihn

p Kx � Xi

hn

� �� EK

x � Xi

hn

� �� i ¼ 1; . . . ; n; n 2 N

Tn ¼1ffiffiffin

pXn

i¼1

Tn;i; and Yn; j ¼1ffiffiffik

pXjk

l¼ð j�1Þkþ1

Tn;l; j ¼ 1; . . . ;m:

Note that Tmk ¼ ð1=ffiffiffiffim

pÞPm

j¼1 Ymk; j ¼ ð1=ffiffiffiffiffiffimk

pÞPmk

i¼1 Tmk;i. The first lemma describes the

behaviour of the variances and is just a restating of Lemma 2.2.

LEMMA 4.1 Suppose ðDÞ and ðAÞ are satisfied. Then

s2mk ¼ VarðTmkÞ ! g2ðx; xÞ

ðK2ðuÞ du:

502 P. E. OLIVEIRA

Dow

nloa

ded

by [

Kun

glig

a T

ekni

ska

Hog

skol

a] a

t 12:

02 0

7 O

ctob

er 2

014

This first step towards the asymptotic normality is proving that it is enough to look at those

values of n that are multiples of k.

LEMMA 4.2 Suppose ðDÞ and ðAÞ are satisfied. Let k be fixed and suppose further that

hn ! 0;hnþ1

hn

! 1; K is bounded and limjuj!1

KðuÞ ¼ 0:

Then

jEeiuTn � EeiuTmk j ! 0:

Proof As jEeiuTn � EeiuTmk j � jujVar1=2ðTn � TmkÞ it is enough to prove the convergence to

zero of this variance. Now we write

Var1=2ðTn � TmkÞ � Var1=2 1ffiffiffin

pXmk

j¼1

ðTn; j � Tmk; jÞ

" #

þ Var1=2 1ffiffiffiffiffiffimk

p �1ffiffiffin

p

� �Xmk

j¼1

Tmk; j

" #þ Var1=2 1ffiffiffi

np

Xn

j¼mkþ1

Tn; j

" #; ð4Þ

and prove that each of these terms converges to zero. As for the first term

1

nVar

Xmk

j¼1


" #¼

1

n

Xmk

j; j0¼1

"Cov

1ffiffiffiffiffihn

p Kx � Xj

hn

� �;

1ffiffiffiffiffihn

p Kx � Xj0

hn

� ��

� Cov1ffiffiffiffiffihn

p Kx � Xj

hn

� �;

1ffiffiffiffiffiffiffihmk

p Kx � Xj0

hmk

� ��

� Cov1ffiffiffiffiffiffiffihmk

p Kx � Xj

hmk

� �;

1ffiffiffiffiffihn

p Kx � Xj0

hn

� ��

þ Cov1ffiffiffiffiffiffiffihmk

p Kx � Xj

hmk

� �;

1ffiffiffiffiffiffiffihmk

p Kx � Xj0

hmk

� �� #:

From Lemma 2.2, as ðn=ðmkÞÞ ! 1, it follows that the summation over the first and last

terms of this expansion is convergent to 2g2ðx; xÞÐ

K2ðuÞ du. The remaining terms are of

the form

1

n

Xmk

j; j0¼1

1ffiffiffiffiffiffiffiffiffiffiffiffihnhmk

p

ðK

x � u

hn

� �K

x � v

hmk

� �ðg1; j; j0 � f � f Þ du dv

þ

ðK

x � u

hn

� �K

x � u

hmk

� �g2; j; j0 ðu; uÞ du

!:

The part corresponding to the first integral converges to zero as it is easily checked reprodu-

cing the arguments in the proof of Lemma 2.2. As for the second integral, rewrite it as

1

n

ffiffiffiffiffiffiffihn

hmk

s Xmk

j; j0¼1

ðKðzÞK z

hn

hmk

� �g2; j; j0 ðx � hn z; x � hnzÞ dz;


Dow

nloa

ded

by [

Kun

glig

a T

ekni

ska

Hog

skol

a] a

t 12:

02 0

7 O

ctob

er 2

014

which, by the Lebesgue Dominated Convergence Theorem, (D) and the assumption

ððhnþ1Þ=hnÞ ! 1, converges to g2ðx; xÞÐ

K2ðuÞ du. So, it follows that

1

nVar

Xmk

j¼1


" #! 0:

The second term in (4) may be rewritten as

1ffiffiffiffiffiffimk

p �1ffiffiffin

p

� �2

VarXmk

j¼1

Tmk; j

" #

¼ 1 �

ffiffiffiffiffiffimk

p ffiffiffin

p

� �21

mkhmk

Xmk

j; j0¼1

Cov Kx � Xj

hmk

� �; K

x � Xj0

hmk

� �� :

As 1 �ffiffiffiffiffiffimk

p=ffiffiffin

p! 0, the convergence to zero of this term follows from Lemma 2.2.

Finally, for the third term in (4):

1

nVar

Xn

j¼mkþ1

Tn; j

" #¼

1

nhn

Xn

j; j0¼mkþ1

Cov Kx � Xj

hn

� �; K

x � Xj0

hn

� ��

�1

nhn

Xn

j; j0¼1

ðR2

Kx � u

hn

� �K

x � v

hn

� �

� jg1; j; j0 ðu; vÞ � f ðuÞf ðvÞj du dv þ

ðD

K2 x � u

hn

� �g2; j; j0 ðu; uÞ du

!;

which converges to zero due to the nonnegativity of the terms and ððhnþ1Þ=hnÞ ! 1. j

The variables Tmk are the sum of m�1=2Ymk;1; . . . ;m�1=2Ymk;m, thus we need to prove a

Central Limit Theorem for this sum. Note that the variables Ymk; j, j ¼ 1; . . . ;m, are not

associated as they are not obtained by monotone transformations of the variables Xj,

j 2 N. Thus, in order to be able to use this association, we need, as before, some extra

assumption on the kernel: we will suppose K to be of bounded variation. This means that

K ¼ K1 � K2 with K1 and K2 monotone functions. Corresponding to these two functions

we define, for each n 2 N the analogous of the auxiliary variables introducing in this section:

Tn;i;q ¼1ffiffiffiffiffihn

p Kq

x � Xi

hn

� �� EKq

x � Xi

hn

� �� ; q ¼ 1; 2; i ¼ 1; . . . ; n;

Yn; j;q ¼1ffiffiffik

pXjk

l¼ðj�1Þkþ1

Tn;l;q; q ¼ 1; 2; j ¼ 1; . . . ;m:

Remark that Lemma 2.2 is still applicable when replacing K by any of the functions K1 or K2

obtaining, evidently, a different normalization on the limit.

First we control the difference between the situation we are dealing with and what we

would find if the variables were independent.

504 P. E. OLIVEIRA

Dow

nloa

ded

by [

Kun

glig

a T

ekni

ska

Hog

skol

a] a

t 12:

02 0

7 O

ctob

er 2

014

LEMMA 4.3 Suppose (D) and (A) are satisfied. Suppose the kernel K is of bounded var-

iation with K1 and K2 being bounded. If; for each k fixed;

limm!þ1

1

mk

Xm

j¼1

Xjk

l;l0¼ðj�1Þkþ1

g2;l;l0 ¼ g2;k ð5Þ

uniformly, where g2;k is bounded and continuous, then

lim supm!þ1

EeiuTmk �Ymj¼1

Eeiðu=ffiffiffim

pÞYmk; j

�� Mu2ðg2ðx; xÞ � g2;kðx; xÞÞ;

where M ¼Ð

K21 ðuÞ þ K2

2 ðuÞ þ 2K1ðuÞK2ðuÞ du.

Proof As the variables Ymk; j, j ¼ 1; . . . ;m, are not associated but, nevertheless, functions of

associated variables, we apply the extended version of Newman’s inequality (see Theorem 16

in Newman [15]) to obtain

EeiuTmk �Ymj¼1

Eeiðu=ffiffiffim

pÞYmk; j

�� u2

2m

Xm

j; j0¼1

j 6¼j0

CovðYmk; j;1 þ Ymk; j;2; Ymk; j0;1 þ Ymk; j0;2Þ:

After expanding this covariance, we find four terms which are controlled in the same way as

the following one:

1

m

Xm

j; j0¼1

j 6¼j0

CovðYmk; j;1; Ymk; j0;1Þ ¼1

m

Xm

j; j0¼1

CovðYmk; j;1; Ymk; j0;1Þ �1

m

Xm

j¼1

VarðYmk; j;1Þ:

From Lemma 2.2, it follows that

1

m

Xm

j; j0¼1

CovðYmk; j;1; Ymk; j0;1Þ ! g2ðx; xÞ

ðK2

1 ðuÞ du:

As for the other term,

1

m

Xm

j¼1

VarðYmk; j;1Þ ¼1

mkhmk

Xm

j¼1

Xjk

l;l0¼ðj�1Þkþ1

Cov Kq

x � Xl

hmk

� �; Kq

x � Xl0

hmk

� �� :

Now, using (D), we may write these covariances as a sum of an integral over R2 with an inte-

gral over the diagonal D of R2, as was done in the proof of Lemma 2.2. The first integral thus

appearing is bounded above by

1

mkhmk

Xmk

l;l0¼1

ðR2

Kq

x � u

hmk

� �Kq

x � v

hmk

� �jg1;l;l0 ðu; vÞ � f ðuÞf ðvÞj du dv ! 0;


Dow

nloa

ded

by [

Kun

glig

a T

ekni

ska

Hog

skol

a] a

t 12:

02 0

7 O

ctob

er 2

014

as in the proof of Lemma 2.2. The second integral appearing in this decomposition is

1

mkhmk

Xm

j¼1

Xjk

l;l0¼ðj�1Þkþ1

ðD

K2q

x � u

hmk

� �g2;l;l0 ðu; uÞ du ! g2;kðx; xÞ

ðK2

q ðuÞ du;

according to (5). j

Note that, if the variables Xn, n 2 N, are second order stationary, condition (5) is satisfied.

The next step towards the proof of the asymptotic normality of the estimator is to prove a

Central Limit Theorem for the variables Ymk; j, j ¼ 1; . . . ;m, treating them as if they where

independent, in view of the preceding Lemma (in fact we should introduce a new collection

of independent variables with the same distributions as the ones we have; we shall not do so

to avoid further notation).

LEMMA 4.4 Suppose (D), (A), (2) and ð5Þ are satisfied. If; for any constant c > 0;

ðfK2ððx�X Þ=hnÞ>cnhng

1

hn

K2 x � X

hn

� �dP ! 0; ð6Þ

then m�1=2Pm

j¼1 Ymk; j converges in distribution to a centered gaussian random variable

with covariance g2;kðx; xÞÐ

K2ðuÞ du.

Proof As before, it is easy to check that

1

mVar

Xm

j¼1

Ymk; j

" #! g2;kðx; xÞ

ðK2ðuÞ du;

so that the Lindeberg condition reduces to verifying

Xm

j¼1

ðfjYmk; jj>cg2;k ðx;xÞ

ffiffiffim

pg

1

mY 2

mk; j dP ! 0:

This integral is, using Lemma 4 in Utev [21], bounded above by

2

m

Xm

j¼1

Xjk

l¼ðj�1Þkþ1

ðfjTmk;l j>ððcg2;k ðx;xÞÞ=2Þ

ffiffiffiffiffiffim=k

pg

T2mk;l dP

¼2

m

Xmk

j¼1

ðfjTmk; jj>ððcg2;k ðx;xÞÞ=2Þ

ffiffiffiffiffiffim=k

pg

T2mk; j dP

and this sum converges to zero using a Cesaro argument and (6) (see Theorem 10 in Jacob

and Oliveira [12] for the computational details). j

It remains now to gather all the partial results to find a set of conditions ensuring the

asymptotic normality of the kernel estimator, as announced earlier. Given a real number v,

we define s2ðvÞ ¼ vÐ

K2ðuÞ du.

506 P. E. OLIVEIRA

Dow

nloa

ded

by [

Kun

glig

a T

ekni

ska

Hog

skol

a] a

t 12:

02 0

7 O

ctob

er 2

014

THEOREM 4.5 Suppose (D), (A) and ð5Þ are satisfied. If

hn ! 0; nhn ! þ1;hnþ1

hn

! 1;

K is of bounded variation; with K1; K2 bounded

limju!þ1

KðuÞ ¼ 0

and

limk!þ1

g2;k ¼ g2 ð7Þ

uniformly, where the functions g2;k are those introduced in (5), then

1ffiffiffiffiffiffiffinhn

pXn

i¼1

Kx � Xi

hn

� �� EK

x � Xi

hn

� ��

converges in distribution to a centered gaussian random variable with covariance

s2ðg2ðx; xÞÞ ¼ g2ðx; xÞÐ

K2ðuÞ du.

Proof Write

jEeiuTn � e�ðu2=2Þs2ðg2ðx;xÞÞj � jEeiuTn � EeiuTmk j þ EeiuTmk �Ymj¼1

Eeiðu=ffiffiffim

pÞYmk; j

��

þYmj¼1

Eeiðu=ffiffiffim

pÞYmk; j � e�ðu2=2Þs2ðg2;k ðx;xÞÞ

��þ je�ðu2=2Þs2ðg2;k ðx;xÞÞ � e�ðu2=2Þs2ðg2ðx;xÞÞj:

For each k fixed, it follows from the preceding lemmas that (note that, as K is bounded and

nhn ! þ1, condition (6) is satisfied, for n 2 N large enough),

lim supn!þ1

jEeiuTn � e�ðu2=2Þs2ðg2ðx;xÞÞj

� Mu2ðg2ðx; xÞ � g2;kðx; xÞÞ þ je�ðu2=2Þs2ðg2;k ðx;xÞÞ � e�ðu2=2Þs2ðg2ðx;xÞÞj;

where M is defined in Lemma 4.3. Letting now k ! þ1 this upper bound converges to

zero, according to (7). j

Note that, if the random variables Xn, n 2 N, are second order stationary, conditions (5)

and (7) are satisfied. Thus we may state the following corollary.

COROLLARY 4.6 Suppose the random variables Xn; n 2 N; are second order stationary

and satisfy (D), (A) and ð5Þ. If

hn ! 0; nhn ! þ1;hnþ1

hn

! 1;

K is of bounded variation; with K1; K2 bounded

limjuj!þ1

KðuÞ ¼ 0;


Dow

nloa

ded

by [

Kun

glig

a T

ekni

ska

Hog

skol

a] a

t 12:

02 0

7 O

ctob

er 2

014

then


pXn

i¼1

Kx � Xi

hn

� �� EK

x � Xi

hn

� ��

converges in distribution to a centered gaussian random variable with covariance

s2ðg2ðx; xÞÞ ¼ g2ðx; xÞÐ

K2ðuÞ du.

It is not difficult to adapt the proof of Theorem 4.5 to a multidimensional version. This will

not be done here for sake of brevity, but we will state, nevertheless, the result.

THEOREM 4.7 Suppose all the conditions of Theorem 4:5 are satisfied. Then, given,

x1; . . . ; xs 2 R;


pXn

i¼1

Kx1 � Xi

hn

� �� EK

x1 � Xi

hn

� �; . . . ;K

xs � Xi

hn

� �� EK

xs � Xi

hn

� ��

converges in distribution to a centered gaussian random vector with diagonal covariance

matrix G ¼ diagðs2ðg2ðx1; x2ÞÞ; . . . ; s2ðg2ðxs; xsÞÞÞ.

A multidimensional version of Corollary 4.6 is immediate.

Acknowledgements

The author was supported by Centro de Matematica da Universidade de Coimbra, 3001-454

Coimbra, Portugal.

References

[1] Bagai, I. and Prakasa Rao, B. L. S. (1991). Estimation of the survival function for stationary associatedprocesses. Statist. Probab. Lett., 12, 291–385.

[2] Bagai, I. and Prakasa Rao, B. L. S. (1995). Kernel-type density and failure rate estimation for associatedsequences. Ann. Inst. Statist. Math., 47, 253–266.

[3] Bensaıd, N. and Fabre, J. (1998). Estimation par noyau d’une derivee de Radon–Nikodym sous des conditions demelange. Canadian J. Statist., 28, 267–282.

[4] Bensaıd, N. and Oliveira, P. E. (2001). Histogram estimation of Radon–Nikodym derivatives for strong mixingdata. Statistics, 35, 569–592.

[5] Bosq, D. (1996). Nonparametric statistics for stochastic processes, Lecture Notes in Statistics, Vol. 110,Springer.

[6] Bosq, D. and Lecoutre, J. P. (1987). Theorie de l’estimation fonctionelle. Economica, Paris.[7] Dewan, I. and Prakasa Rao, B. L. S. (1999). A general method of density estimation for associated random

variables. J. Nonparametr. Statist., 10, 405–420.[8] Ellis, S. P. (1991). Density estimation for point processes. Stoch. Proc. Appl., 39, 345–358.[9] Ferrieux, D. (1996). Estimation de densites de mesures moyennes de processus ponctuels associes. PhD Thesis,

University of Montpellier II.[10] Ferrieux, D. (1998). Estimation a noyau de densites moyennes de mesures aleatoires associees. Compte Rendus

Acad. Sciences de Paris, 326, Serie I, 1131–1134.[11] Jacob, P. and Oliveira, P. E. (1995). A general approach to non parametric histogram estimation. Statistics, 27,

73–92.[12] Jacob, P. and Oliveira, P. E. (1997). Kernel estimators of general Radon–Nikodym derivatives. Statistics, 30,

25–46.[13] Jacob, P. and Oliveira, P. E. (1999). Histogram and associated point processes. Statist. Inf. Stoch. Proc., 2,

227–251.[14] Masry, E. (1986). Recursive probability density estimation for weakly dependent stationary processes. IEEE

Trans. Inform. Theory II, 32, 254–267.

508 P. E. OLIVEIRA

Dow

nloa

ded

by [

Kun

glig

a T

ekni

ska

Hog

skol

a] a

t 12:

02 0

7 O

ctob

er 2

014

[15] Newman, C. (1984). Asymptotic independence and limit theorems for positively and negatively dependentrandom variables. In: Tong, Y. L. (Ed.), Inequalities in Statistics and Probability, IMS Lecture Notes –Monograph Series, Vol. 5, pp. 127–140.

[16] Newman, C. and Wright, A. L. (1981). An invariance principle for certain dependent sequences. Ann. Probab., 9,671–675.

[17] Pham, T. D. and Tran, L. T. (1991). Kernel density estimation under a locally mixing condition. In: Roussas, G.(Ed.), Nonparametric Functional Estimation and Related Topics, Vol. 335, pp. 419–430.

[18] Roussas, G. G. (1988). Nonparametric estimation in mixing sequences of random variables. J. Statist. Plann.Inference, 18, 135–149.

[19] Roussas, G. G. (1991). Kernel estimates under association: strong uniform consistency. Statist. Probab. Lett., 12,393–403.

[20] Roussas, G. G. (2000). Asymptotic normality of the kernel estimate of a probability density function underassociation. Statist. Probab. Lett., 50, 1–12.

[21] Utev, S. A. (1990). On the central limit theorem for j-mixing arrays of random variables. Th. Probab. Appl., 35,131–139.


Dow

nloa

ded

by [

Kun

glig

a T

ekni

ska

Hog

skol

a] a

t 12:

02 0

7 O

ctob

er 2

014

density estimation for associated sampling: a point process influenced approach

Documents