density estimation for associated sampling: a point process influenced approach
TRANSCRIPT
This article was downloaded by: [Kungliga Tekniska Hogskola]On: 07 October 2014, At: 12:02Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Journal of Nonparametric StatisticsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/gnst20
Density estimation for associatedsampling: A point process influencedapproachPaulo Eduardo Oliveira aa Dep. Matemática , Universidade de Coimbra , Apartado 3008,Coimbra, 3001-454, PortugalPublished online: 27 Oct 2010.
To cite this article: Paulo Eduardo Oliveira (2002) Density estimation for associated sampling:A point process influenced approach, Journal of Nonparametric Statistics, 14:5, 495-509, DOI:10.1080/10485250213904
To link to this article: http://dx.doi.org/10.1080/10485250213904
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.
This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Nonparametric Statistics, 2002, Vol. 14(5), pp. 495–509
DENSITY ESTIMATION FOR ASSOCIATEDSAMPLING: A POINT PROCESS
INFLUENCED APPROACH
PAULO EDUARDO OLIVEIRA
Dep. Matematica, Universidade de Coimbra, Apartado 3008, 3001-454 Coimbra, Portugal
(Received May 2000; In final form December 2001)
Let Xn, n 2 N, be a sequence of associated variables with common density function. We study the kernel estimationof this density, based on the given sequence of variables. Sufficient conditions are given for the consistency andasymptotic normality of the kernel estimator. The assumptions made require that the distribution of pairs ðXi;XjÞ
decompose as the sum of an absolutely continuous measure with another measure concentrated on the diagonalof R� R satisfying a further absolute continuity with respect to the Lebesgue measure on this diagonal. For theconvergence in probability we find the usual convergence rate on the bandwidth, whereas for the almost sureconvergence we need to require that the bandwidth does not decrease to fast and that the kernel is of boundedvariation. This assumption on the kernel is also required for the asymptotic normality, together with a slightlystrengthened version of the usual decrease rate on the bandwidth. The assumption of bounded variation on thekernel is needed as a consequence of the dependence structure we are dealing with, as association is onlypreserved by monotone transformations.
Keywords: Kernel estimation; Association; Diagonal decomposition; Convergence
AMS Classification: 62G07, 62G20
1 INTRODUCTION
The estimation of the density of random variables or random vectors is a quite classical sta-
tistical problem for which there exists a well established theory and a very wide literature.
There are several methods proposed and we will be here interested in the non-parametric app-
roach. Naturally, the problem was first studied supposing that the data collected satisfied an
independence assumption. A nice account of results and existing literature is given in Bosq
and Lecoutre [6]. Eventually the same problem was treated for depend sampling, considering
mixing samples as in Roussas [18], Pham and Tran [17] or Masry [14] (see the monograph
Bosq [5] for existing literature and a presentation of the asymptotic properties of the kernel
estimator). More recently there was some interest in considering data satisfying a positive dep-
endence assumption, namely supposing the data to be associated. For this dependence struc-
ture Bagai and Prakasa Rao ½1; 2� and Roussas [19] established the consistency of the kernel
estimator for the density while Dewan and Prakasa Rao [7] proved an uniform exponential rate
of convergence supposing a convenient decrease rate on the covariances of the variables.
ISSN 1048-5252 print; ISSN 1029-0311 online # 2002 Taylor & Francis LtdDOI: 10.1080=1048525021000002118
Dow
nloa
ded
by [
Kun
glig
a T
ekni
ska
Hog
skol
a] a
t 12:
02 0
7 O
ctob
er 2
014
This last reference deals with a general estimator that includes both the kernel and the histo-
gram estimator, among others, although the conditions imposed essentially exclude the histo-
gram from their results. The asymptotic normality, for the kernel estimator, was established in
Roussas [20]. We note that in Roussas [20], besides some more or less usual conditions on the
covariance structure of the data (which is a natural way of controlling the dependence for asso-
ciated variables, as it follows from Theorem 10 in Newman [15]), it is supposed that the pairs
of random variables are absolutely continuous, while in this article we will suppose the dis-
tribution of these pairs of variables to have some mass concentrated on the diagonal of
R� R. This property appears in simple models of associated random variables.
The assumptions we will make are suggested by a point process approach to estimation
problems. This approach consisted on defining convenient point processes or random mea-
sures x and Z with mean measures such that Ex � EZ and then estimate the Radon–Nikodym
derivative ðd ExÞ=ðd EZÞ. This Radon–Nikodym derivative may be conveniently interpreted
representing, for a suitable choice of x and Z, the density we want to estimate. We may esti-
mate this Radon–Nikodym derivative using histograms or kernels and this correspond to the
usual estimators defined in the classical framework. A point process approach to density esti-
mation was first used in Ellis [8] in a context that was slightly different from ours. For an
approach corresponding to the one that has directly influenced this article there exists
some literature available: for independent sampling, Jacob and Oliveira [11] considered
the histogram, while Jacob and Oliveira [12] considered the kernel; for strong mixing sam-
pling, Bensaıd and Fabre [3] studied the kernel and Bensaıd and Oliveira [4] the histogram;
for associated sampling, Ferrieux [10] treated the kernel estimator and Jacob and Oliveira
[13] the histogram. In all these references it is supposed that one samples the point processes
themselves, that is, a sample is a collection of possible realizations of the point processes.
This is, in fact, equivalent to the usual sampling in density estimation, as well as in other
models that may be included in this point process approach, as explained in those references.
We refer the interested reader to Jacob and Oliveira ½11; 12; 13� for more details on how the
point process approach models some more classical estimation problems. Now, while inde-
pendence or strong mixing of the sampled point processes corresponds to independence or
strong mixing of the underlying variables (the ones that define the points), the same is not
true for association. In fact, as noted by Ferrieux [10] and Jacob and Oliveira [13] association
of point processes seems to have no relation with association of the underlying variables, thus
the results in Ferrieux [10] or Jacob and Oliveira [13] do not overlap with those in Bagai and
Prakasa Rao [1,2], Roussas [19,20], Dewan and Prakasa Rao [7]. To be a little more precise,
a random variable is always associated with itself while the point process with just one point
defined by this random variable is associated (as point process) with itself only in degenerate
cases. This is due to the fact that the order defined in the space of discrete measures is only a
partial order. Thus, the idea of approaching the same problem as in Jacob and Oliveira [13],
but supposing instead that the points are associated. Several proofs in Jacob and Oliveira [13]
are no longer valid in the new context. Moreover, the assumptions used there make little
sense in the framework we are placing ourselves throughout this article. When dealing
with point processes we are lead to the control of second order moment measures, which
are defined on the product of the base space by itself, and it becomes natural to suppose
that those measures may be decomposed as the sum of an absolutely continuous measure
with a measure which is concentrated on the diagonal of the product space. When regarding
this decomposition in terms of the distribution of a pair of random variables this means that
there is some probability concentrated on the diagonal, so there is no absolute continuity with
respect to the product measure if this measure is, for example, diffuse. The need for such a
decomposition appeared even when dealing with independent sampling of point processes.
In fact, this sort of decomposition was first used in Jacob and Oliveira [11] and later adapted
496 P. E. OLIVEIRA
Dow
nloa
ded
by [
Kun
glig
a T
ekni
ska
Hog
skol
a] a
t 12:
02 0
7 O
ctob
er 2
014
to dependent sampling in Ferrieux [10], Bensaıd and Fabre [3], Jacob and Oliveira [13] and
Bensaıd and Oliveira [4]. The conditions we will use here are suggested by this point process
approach but we will try to keep the language on the usual level, thus avoiding the use of point
processes as much as possible. Further, this article will concentrate on the kernel estimator.
The same approached may be used to find conditions for similar results for the histogram.
2 DEFINITIONS, ASSUMPTIONS AND PRELIMINARIES
Let X1;X2; . . . be an associated sequence of random variables with the same distribution as X
for which there exists a density function f . Let K be a fixed probability density and hn a
sequence of real numbers converging to zero. The kernel estimator of the density function
f is defined as
ffnðxÞ ¼1
nhn
Xn
j¼1
Kx � Xj
hn
� �:
This estimator is well known to be asymptotically unbiased if there exists a suitable version
of the density, as a consequence of the Bochner’s Lemma.
LEMMA 2.1 If f is bounded and continuous then Eð ffnðxÞÞ ! f ðxÞ uniformly on any
compact set.
This means also that, in order to establish the convergence of ffnðxÞ, it is enough to prove
that ffnðxÞ � EffnðxÞ ! 0 in the appropriate mode of convergence.
Let us now present the assumptions we will suppose throughout this article. On what
follows l� represents the Lebesgue measure on the diagonal D of R� R, and l2 represents
the Lebesgue measure on R2.
ðDÞ
� For each j; k 2 N, the distribution of ðXj;XkÞ is the sum of a measure m1; j;k on
R� RnD with a measure m2; j;k on D, such that m1; j;k � l2 and m2; j;k � l�;
� For each j; k 2 N, there exists a bounded version g1; j;k of ðdm1; j;kÞ=ðd l2Þ;
� For each j; k 2 N, there exists a bounded and continuous version g2; j;k of
ðd m2; j;kÞ=ðd l�Þ.
Note that we allow the pairs to have some mass concentrated on the diagonal, thus
they need not to be absolutely continuous with respect to l2, as it was supposed in
Roussas [20]. A simple example where this property is required arises from a method
used to construct sequences of associated variables from an independent sequence of vari-
ables, as described next. Suppose that Zn, n 2 N, are independent and identically distributed
absolutely continuous random variables. Let m be some fixed integer and define, for each
n 2 N, Xn ¼ minðZn; . . . ; ZnþmÞ. The variables Xn, n 2 N, are associated, have a common
absolutely continuous distribution but the random pairs ðXn;XnþjÞ, j ¼ 1; . . . ;m, are not
absolutely continuous. Nevertheless their distribution decomposes according to (D).
In order to prove the convergence results we will need some control on the Radon–
Nikodym derivatives introduced in the previous condition.
Suppose (D) is satisfied and
ðAÞ
� ð1=nÞPn
j;k¼1 jg1; j;kðx; yÞ � f ðxÞ f ðyÞj converges uniformly to a bounded function g1;
� ð1=nÞPn
j;k¼1 g2; j;kðx; xÞ converges uniformly to a bounded and continuous
function g2.
DENSITY ESTIMATION FOR ASSOCIATED SAMPLING 497
Dow
nloa
ded
by [
Kun
glig
a T
ekni
ska
Hog
skol
a] a
t 12:
02 0
7 O
ctob
er 2
014
Remark that, if the variables X1;X2; . . . are strictly stationary, it is easy to check that
1
n
Xn
j;k¼1
jg1; j;k � f � f j �Xn�1
j¼1
jg1;1; j � f � f j
and analogously for the other summation in (A). Then, as in Bensaıd and Fabre [3] or
Bensaıd and Oliveira [4], it is enough to assume the convergence of the upper bounds.
The main tool for proving convergence is stated in the following lemma.
LEMMA 2.2 Suppose (D) and (A) are satisfied. If the kernel K is bounded then
1
nhn
Xn
j;k¼1
Cov Kx � Xj
hn
� �; K
x � Xk
hn
� �� �! g2ðx; xÞ
ðK2ðuÞ du
uniformly on any compact set.
Proof Write
1
hn
Cov Kx � Xj
hn
� �; K
x � Xk
hn
� �� �
¼1
hn
ðR2
Kx � u
hn
� �K
x � v
hn
� �ðg1; j;kðu; vÞ � f ðuÞf ðvÞÞ du dv
þ1
hn
ðD
K2 x � u
hn
� �g2; j;kðu; uÞ du: ð1Þ
Now the first integral in (1) is bounded above byðR2
Kx � u
hn
� �K
x � v
hn
� �jg1; j;kðu; vÞ � f ðuÞf ðvÞj du dv;
so that, for n 2 N large enough
1
nhn
Xn
j;k¼1
ðR2
Kx � u
hn
� �K
x � v
hn
� �jg1; j;kðu; vÞ � f ðuÞf ðvÞj du dv
� 2hn
1
h2n
ðR2
Kx � u
hn
� �K
x � v
hn
� �g1ðu; vÞ du dv � 2hn sup
u;vjg1ðu; vÞj ! 0;
as hn ! 0. For the second integral in (1), it leads to
1
hn
ðD
K2 x � u
hn
� �1
n
Xn
j;k¼1
g2; j;kðu; uÞ du ! g2ðx; xÞ
ðK2ðuÞ du
using the Bochner’s Lemma after renormalizing K2 to find a density. j
Note that the limit obtained in this lemma is exactly the diagonal density. So, if we had
supposed the absolute continuity of the random pairs ðXj;XkÞ this limit would have been
zero. As this term appears when dealing with the convergence of the estimator, if it converges
to zero most of the convergences would follow with relaxed assumptions on the bandwidth
sequence hn.
498 P. E. OLIVEIRA
Dow
nloa
ded
by [
Kun
glig
a T
ekni
ska
Hog
skol
a] a
t 12:
02 0
7 O
ctob
er 2
014
3 CONVERGENCE OF THE ESTIMATOR
In this section we state conditions for convergence in probability and almost surely of ffnðxÞ.
THEOREM 3.1 Suppose (D) and (A) are satisfied. If the kernel K is bounded and
nhn ! þ1 ð2Þ
then ffnðxÞ converges in probability to f ðxÞ, for every x 2 R.
Proof As
Pðj ffnðxÞ � Eð ffnðxÞÞj > eÞ �1
e2nhn
1
nhn
Xn
j;k¼1
Cov Kx � Xj
hn
� �; K
x � Xk
hn
� �� �
the theorem follows immediately taking account of Lemmas 2.1 and 2.2. j
The almost sure convergence requires some further assumptions on the kernel K and on
the bandwidth sequence. This is due to the fact that the variables Kððx � XjÞ=hnÞ are, in gen-
eral, not associated. In order to keep the association we should apply only monotone trans-
formations to the original variables, which is not the case with a general kernel. This may
resolved supposing the kernel K to be of bounded variation. There are some additional tech-
nical assumptions required by the method of proof, which follows closely the proof of
Theorem 2.7.1 in Ferrieux [9].
THEOREM 3.2 Suppose (D) and (A) are satisfied; K is of bounded variation such that
Kða xÞ; for fixed x 2 R; is a decreasing function of a > 0; and
hn & 0;X1n¼1
1
n2hn2
< 1;hðnþ1Þ2
hn2
! 1:
Then ffnðxÞ converges almost surely to f ðxÞ, for every x 2 R.
Proof We first check that the subsequence corresponding to terms of order n2 converges
almost surely. This is immediate consequence of the Borel-Cantelli Lemma, Lemma 2.2 and
the assumptions made, as
Pðj ffn2ðxÞ � Eð ffn2ðxÞÞj > eÞ �1
e2n2hn2
1
n2hn2
Xn2
j;k¼1
Cov Kx � Xj
hn2
� �; K
x � Xk
hn2
� �� �:
For the remaining terms we write, for an integer k 2 ðn2; ðn þ 1Þ2�,
jð ffkðxÞ � E ffkðxÞÞ � ð ffn2ðxÞ � E ffn2 ðxÞÞj �1
n2max
n2<k�ðnþ1Þ2
����� 1
hk
Xk
j¼1
Kx � Xj
hk
� �� EK
x � Xj
hk
� �� �
�1
hn2
Xn2
j¼1
"K
x � Xj
hn2
� �� EK
x � Xj
hn2
� �#�����þ 1
n2hn2
Xn2
j¼1
Kx � Xj
hn2
� �� EK
x � Xj
hn2
� �� �����������;
DENSITY ESTIMATION FOR ASSOCIATED SAMPLING 499
Dow
nloa
ded
by [
Kun
glig
a T
ekni
ska
Hog
skol
a] a
t 12:
02 0
7 O
ctob
er 2
014
thus leaving the first term to be treated. Now
1
hk
Xk
j¼1
Kx � Xj
hk
� �� EK
x � Xj
hk
� �� ��
1
hn2
Xn2
j¼1
Kx � Xj
hn2
� �� EK
x � Xj
hn2
� �� �
¼Xk
j¼1
1
hk
�1
hn2
� �K
x � Xj
hk
� �� EK
x � Xj
hk
� �� �
þXk
j¼1
1
hn2
Kx � Xj
hk
� �� EK
x � Xj
hk
� �� �� K
x � Xj
hn2
� �� EK
x � Xj
hn2
� �� �� �
þXk
j¼n2þ1
1
hn2
Kx � Xj
hn2
� �� EK
x � Xj
hn2
� �� �:
Let us denote the maximum over k 2 ðn2; ðn þ 1Þ2� of each of the three preceding terms by
an, bn and cn, respectively. The almost sure convergence to zero of an=n2, bn=n2 and cn=n2,
that concludes the proof of the theorem, is proved in the following lemmas. j
LEMMA 3.3 Under the assumptions of Theorem 3:2; ðan=n2Þ ! 0 almost surely.
Proof As hn is decreasing and using the decreasing assumption on the kernel, it follows that
0 �1
hk
�1
hn2
� �K
x � u
hk
� ��
1
hðnþ1Þ2�
1
hn2
!K
x � u
hn2
� �:
Now, for every k 2 ðn2; ðn þ 1Þ2�;
1
n2
Xk
j¼1
1
hk
�1
hn2
� �K
x � Xj
hk
� �� EK
x � Xj
hk
� �� �����������
�1
n2
Xðnþ1Þ2
j¼1
1
hðnþ1Þ2�
1
hn2
!K
x � Xj
hn2
� �þ EK
x � Xj
hn2
� �� �: ð3Þ
Let us first look at the terms with the mathematical expectation. On one side we have
1
n2hðnþ1Þ2
Xðnþ1Þ2
j¼1
EKx � Xj
hn2
� �¼
ðn þ 1Þ2
n2
hn2
hðnþ1Þ2
1
hn2
EKx � X
hn2
� �! f ðxÞ
using Lemma 2.1. Analogously, it follows
1
n2hn2
Xðnþ1Þ2
j¼1
EKx � Xj
hn2
� �! f ðxÞ:
500 P. E. OLIVEIRA
Dow
nloa
ded
by [
Kun
glig
a T
ekni
ska
Hog
skol
a] a
t 12:
02 0
7 O
ctob
er 2
014
Thus, in (3) the terms corresponding to the mathematical expectations converge to zero. This
allows replacing the sign þ on the right hand side of (3) by a sign � as, given d > 0, for n
large enough, the right hand side in (3) becomes smaller than
1
n2
Xðnþ1Þ2
j¼1
1
hðnþ1Þ2�
1
hn2
!K
x � Xj
hn2
� �� EK
x � Xj
hn2
� �� �þ d;
and it is enough to verify that this term converges almost surely to zero. Like this,
Chebyshev’s inequality gives an upper bound with a variance term: for any e > 0,
P1
n2
Xðnþ1Þ2
j¼1
1
hðnþ1Þ2�
1
hn2
!K
x � Xj
hn2
� �� EK
x � Xj
hn2
� �� ������������� > e
0@
1A
�1
e2n4
1
hðnþ1Þ2�
1
hn2
!2 Xðnþ1Þ2
j; j0¼1
Cov Kx � Xj
hn2
� �; K
x � Xj0
hn2
� �� �
¼1
e2n2hn2
hn2
hðnþ1Þ2� 1
!21
n2hn2
Xðnþ1Þ2
j; j0¼1
Cov Kx � Xj
hn2
� �; K
x � Xj0
hn2
� �� �;
which defines a convergent series, taking account of Lemma 2.2 and the assumptions made
on the bandwidth sequence. j
LEMMA 3.4 Under the assumptions of Theorem 3:2; ðbn=n2Þ ! 0 almost surely.
Proof Using the decreasing assumption on the kernel, it follows that, for every k 2 ðn2;ðn þ 1Þ2�,
jbnj
n2�
1
n2hn2
Xðnþ1Þ2
j¼1
K
x � Xj
hn2
� �þ EK
x � Xj
hn2
� �� �� K
x � Xj
hðnþ1Þ2
!þ EK
x � Xj
hðnþ1Þ2
!" #!:
As in the previous lemma, it is easy to check that the terms with the mathematical expecta-
tions cancel each other in the limit. So, we are left with checking the almost sure convergence
to zero of
1
n2hn2
Xðnþ1Þ2
j¼1
Kx � Xj
hn2
� �� K
x � Xj
hðnþ1Þ2
!" #:
Given e > 0,
P1
n2hn2
Xðnþ1Þ2
j¼1
Kx � Xj
hn2
� �� K
x � Xj
hðnþ1Þ2
!" #������������ > e
0@
1A
�1
e2n4h2n2
Xðnþ1Þ2
j; j0¼1
Cov Kx � Xj
hn2
� �� K
x � Xj
hðnþ1Þ2
!; K
x � Xj0
hn2
� �� K
x � Xj0
hðnþ1Þ2
! !:
DENSITY ESTIMATION FOR ASSOCIATED SAMPLING 501
Dow
nloa
ded
by [
Kun
glig
a T
ekni
ska
Hog
skol
a] a
t 12:
02 0
7 O
ctob
er 2
014
Using Lemma 2.2 and ððhðnþ1Þ2Þ=ðhn2 ÞÞ ! 1, it follows that the sum of these covariances,
divided by n2hn2 is convergent (to 4g2ðx; xÞÐ
K2ðuÞ du). As there remains the term 1=ðn2h2n2Þ
we have in fact a convergent series defined by the probabilities. j
Notice that the association and the fact that K is of bounded variation has not yet been used.
These assumptions are required to prove the almost sure convergence of the remaining term.
LEMMA 3.5 Under the assumptions of Theorem 3:2; ðcn=n2Þ ! 0 almost surely.
Proof Write K ¼ K1 � K2, with K1; K2 increasing functions. The random variables
K1ððx � XjÞ=hn2 Þ, j ¼ 1; 2; . . . , being monotone transformations of associated variables, are
associated. Then, we may apply the generalization of Kolmogorov’s inequality for associated
variables proved by Newman and Wright [16], to obtain that, given e > 0,
P1
n2hn2
maxn2<k�ðnþ1Þ2
Xk
j¼n2þ1
K1
x � Xj
hn2
� �� EK1
x � Xj
hn2
� �� ������������� > e
0@
1A
�2
e2n4h2n2
Xðnþ1Þ2
j; j0¼n2þ1
Cov K1
x � Xj
hn2
� �; K1
x � Xj0
hn2
� �� �:
Using the association this sum is bounded above by the sum with j; j0 ranging from 1 to
ðn þ 1Þ2 and then the proof is completed repeating the arguments of the two previous
lemmas. The terms corresponding to K2 are treated analogously. j
4 ASYMPTOTIC NORMALITY OF THE ESTIMATOR
We now look at the asymptotic normality of the kernel estimator. We will prove the asymp-
totic normality under conditions similar to those of Theorem 4.2 in Jacob and Oliveira [13],
which are somewhat weaker than second order stationarity. As the proof is quite long and
technical we will divide it into several lemmas that will be put together at the end of the sec-
tion to state the general result. For the remaining of the section let k be an integer smaller
than n and m ¼ bn=kc the greatest integer less or equal than n=k. Let us further define the
random variables
Tn;i ¼1ffiffiffiffiffihn
p Kx � Xi
hn
� �� EK
x � Xi
hn
� �� �i ¼ 1; . . . ; n; n 2 N
Tn ¼1ffiffiffin
pXn
i¼1
Tn;i; and Yn; j ¼1ffiffiffik
pXjk
l¼ð j�1Þkþ1
Tn;l; j ¼ 1; . . . ;m:
Note that Tmk ¼ ð1=ffiffiffiffim
pÞPm
j¼1 Ymk; j ¼ ð1=ffiffiffiffiffiffimk
pÞPmk
i¼1 Tmk;i. The first lemma describes the
behaviour of the variances and is just a restating of Lemma 2.2.
LEMMA 4.1 Suppose ðDÞ and ðAÞ are satisfied. Then
s2mk ¼ VarðTmkÞ ! g2ðx; xÞ
ðK2ðuÞ du:
502 P. E. OLIVEIRA
Dow
nloa
ded
by [
Kun
glig
a T
ekni
ska
Hog
skol
a] a
t 12:
02 0
7 O
ctob
er 2
014
This first step towards the asymptotic normality is proving that it is enough to look at those
values of n that are multiples of k.
LEMMA 4.2 Suppose ðDÞ and ðAÞ are satisfied. Let k be fixed and suppose further that
hn ! 0;hnþ1
hn
! 1; K is bounded and limjuj!1
KðuÞ ¼ 0:
Then
jEeiuTn � EeiuTmk j ! 0:
Proof As jEeiuTn � EeiuTmk j � jujVar1=2ðTn � TmkÞ it is enough to prove the convergence to
zero of this variance. Now we write
Var1=2ðTn � TmkÞ � Var1=2 1ffiffiffin
pXmk
j¼1
ðTn; j � Tmk; jÞ
" #
þ Var1=2 1ffiffiffiffiffiffimk
p �1ffiffiffin
p
� �Xmk
j¼1
Tmk; j
" #þ Var1=2 1ffiffiffi
np
Xn
j¼mkþ1
Tn; j
" #; ð4Þ
and prove that each of these terms converges to zero. As for the first term
1
nVar
Xmk
j¼1
ðTn; j � Tmk; jÞ
" #¼
1
n
Xmk
j; j0¼1
"Cov
1ffiffiffiffiffihn
p Kx � Xj
hn
� �;
1ffiffiffiffiffihn
p Kx � Xj0
hn
� �� �
� Cov1ffiffiffiffiffihn
p Kx � Xj
hn
� �;
1ffiffiffiffiffiffiffihmk
p Kx � Xj0
hmk
� �� �
� Cov1ffiffiffiffiffiffiffihmk
p Kx � Xj
hmk
� �;
1ffiffiffiffiffihn
p Kx � Xj0
hn
� �� �
þ Cov1ffiffiffiffiffiffiffihmk
p Kx � Xj
hmk
� �;
1ffiffiffiffiffiffiffihmk
p Kx � Xj0
hmk
� �� �#:
From Lemma 2.2, as ðn=ðmkÞÞ ! 1, it follows that the summation over the first and last
terms of this expansion is convergent to 2g2ðx; xÞÐ
K2ðuÞ du. The remaining terms are of
the form
1
n
Xmk
j; j0¼1
1ffiffiffiffiffiffiffiffiffiffiffiffihnhmk
p
ðK
x � u
hn
� �K
x � v
hmk
� �ðg1; j; j0 � f � f Þ du dv
þ
ðK
x � u
hn
� �K
x � u
hmk
� �g2; j; j0 ðu; uÞ du
!:
The part corresponding to the first integral converges to zero as it is easily checked reprodu-
cing the arguments in the proof of Lemma 2.2. As for the second integral, rewrite it as
1
n
ffiffiffiffiffiffiffihn
hmk
s Xmk
j; j0¼1
ðKðzÞK z
hn
hmk
� �g2; j; j0 ðx � hn z; x � hnzÞ dz;
DENSITY ESTIMATION FOR ASSOCIATED SAMPLING 503
Dow
nloa
ded
by [
Kun
glig
a T
ekni
ska
Hog
skol
a] a
t 12:
02 0
7 O
ctob
er 2
014
which, by the Lebesgue Dominated Convergence Theorem, (D) and the assumption
ððhnþ1Þ=hnÞ ! 1, converges to g2ðx; xÞÐ
K2ðuÞ du. So, it follows that
1
nVar
Xmk
j¼1
ðTn; j � Tmk; jÞ
" #! 0:
The second term in (4) may be rewritten as
1ffiffiffiffiffiffimk
p �1ffiffiffin
p
� �2
VarXmk
j¼1
Tmk; j
" #
¼ 1 �
ffiffiffiffiffiffimk
p ffiffiffin
p
� �21
mkhmk
Xmk
j; j0¼1
Cov Kx � Xj
hmk
� �; K
x � Xj0
hmk
� �� �:
As 1 �ffiffiffiffiffiffimk
p=ffiffiffin
p! 0, the convergence to zero of this term follows from Lemma 2.2.
Finally, for the third term in (4):
1
nVar
Xn
j¼mkþ1
Tn; j
" #¼
1
nhn
Xn
j; j0¼mkþ1
Cov Kx � Xj
hn
� �; K
x � Xj0
hn
� �� ���������
�1
nhn
Xn
j; j0¼1
ðR2
Kx � u
hn
� �K
x � v
hn
� �
� jg1; j; j0 ðu; vÞ � f ðuÞf ðvÞj du dv þ
ðD
K2 x � u
hn
� �g2; j; j0 ðu; uÞ du
!;
which converges to zero due to the nonnegativity of the terms and ððhnþ1Þ=hnÞ ! 1. j
The variables Tmk are the sum of m�1=2Ymk;1; . . . ;m�1=2Ymk;m, thus we need to prove a
Central Limit Theorem for this sum. Note that the variables Ymk; j, j ¼ 1; . . . ;m, are not
associated as they are not obtained by monotone transformations of the variables Xj,
j 2 N. Thus, in order to be able to use this association, we need, as before, some extra
assumption on the kernel: we will suppose K to be of bounded variation. This means that
K ¼ K1 � K2 with K1 and K2 monotone functions. Corresponding to these two functions
we define, for each n 2 N the analogous of the auxiliary variables introducing in this section:
Tn;i;q ¼1ffiffiffiffiffihn
p Kq
x � Xi
hn
� �� EKq
x � Xi
hn
� �� �; q ¼ 1; 2; i ¼ 1; . . . ; n;
Yn; j;q ¼1ffiffiffik
pXjk
l¼ðj�1Þkþ1
Tn;l;q; q ¼ 1; 2; j ¼ 1; . . . ;m:
Remark that Lemma 2.2 is still applicable when replacing K by any of the functions K1 or K2
obtaining, evidently, a different normalization on the limit.
First we control the difference between the situation we are dealing with and what we
would find if the variables were independent.
504 P. E. OLIVEIRA
Dow
nloa
ded
by [
Kun
glig
a T
ekni
ska
Hog
skol
a] a
t 12:
02 0
7 O
ctob
er 2
014
LEMMA 4.3 Suppose (D) and (A) are satisfied. Suppose the kernel K is of bounded var-
iation with K1 and K2 being bounded. If; for each k fixed;
limm!þ1
1
mk
Xm
j¼1
Xjk
l;l0¼ðj�1Þkþ1
g2;l;l0 ¼ g2;k ð5Þ
uniformly, where g2;k is bounded and continuous, then
lim supm!þ1
EeiuTmk �Ymj¼1
Eeiðu=ffiffiffim
pÞYmk; j
���������� � Mu2ðg2ðx; xÞ � g2;kðx; xÞÞ;
where M ¼Ð
K21 ðuÞ þ K2
2 ðuÞ þ 2K1ðuÞK2ðuÞ du.
Proof As the variables Ymk; j, j ¼ 1; . . . ;m, are not associated but, nevertheless, functions of
associated variables, we apply the extended version of Newman’s inequality (see Theorem 16
in Newman [15]) to obtain
EeiuTmk �Ymj¼1
Eeiðu=ffiffiffim
pÞYmk; j
���������� � u2
2m
Xm
j; j0¼1
j 6¼j0
CovðYmk; j;1 þ Ymk; j;2; Ymk; j0;1 þ Ymk; j0;2Þ:
After expanding this covariance, we find four terms which are controlled in the same way as
the following one:
1
m
Xm
j; j0¼1
j 6¼j0
CovðYmk; j;1; Ymk; j0;1Þ ¼1
m
Xm
j; j0¼1
CovðYmk; j;1; Ymk; j0;1Þ �1
m
Xm
j¼1
VarðYmk; j;1Þ:
From Lemma 2.2, it follows that
1
m
Xm
j; j0¼1
CovðYmk; j;1; Ymk; j0;1Þ ! g2ðx; xÞ
ðK2
1 ðuÞ du:
As for the other term,
1
m
Xm
j¼1
VarðYmk; j;1Þ ¼1
mkhmk
Xm
j¼1
Xjk
l;l0¼ðj�1Þkþ1
Cov Kq
x � Xl
hmk
� �; Kq
x � Xl0
hmk
� �� �:
Now, using (D), we may write these covariances as a sum of an integral over R2 with an inte-
gral over the diagonal D of R2, as was done in the proof of Lemma 2.2. The first integral thus
appearing is bounded above by
1
mkhmk
Xmk
l;l0¼1
ðR2
Kq
x � u
hmk
� �Kq
x � v
hmk
� �jg1;l;l0 ðu; vÞ � f ðuÞf ðvÞj du dv ! 0;
DENSITY ESTIMATION FOR ASSOCIATED SAMPLING 505
Dow
nloa
ded
by [
Kun
glig
a T
ekni
ska
Hog
skol
a] a
t 12:
02 0
7 O
ctob
er 2
014
as in the proof of Lemma 2.2. The second integral appearing in this decomposition is
1
mkhmk
Xm
j¼1
Xjk
l;l0¼ðj�1Þkþ1
ðD
K2q
x � u
hmk
� �g2;l;l0 ðu; uÞ du ! g2;kðx; xÞ
ðK2
q ðuÞ du;
according to (5). j
Note that, if the variables Xn, n 2 N, are second order stationary, condition (5) is satisfied.
The next step towards the proof of the asymptotic normality of the estimator is to prove a
Central Limit Theorem for the variables Ymk; j, j ¼ 1; . . . ;m, treating them as if they where
independent, in view of the preceding Lemma (in fact we should introduce a new collection
of independent variables with the same distributions as the ones we have; we shall not do so
to avoid further notation).
LEMMA 4.4 Suppose (D), (A), (2) and ð5Þ are satisfied. If; for any constant c > 0;
ðfK2ððx�X Þ=hnÞ>cnhng
1
hn
K2 x � X
hn
� �dP ! 0; ð6Þ
then m�1=2Pm
j¼1 Ymk; j converges in distribution to a centered gaussian random variable
with covariance g2;kðx; xÞÐ
K2ðuÞ du.
Proof As before, it is easy to check that
1
mVar
Xm
j¼1
Ymk; j
" #! g2;kðx; xÞ
ðK2ðuÞ du;
so that the Lindeberg condition reduces to verifying
Xm
j¼1
ðfjYmk; jj>cg2;k ðx;xÞ
ffiffiffim
pg
1
mY 2
mk; j dP ! 0:
This integral is, using Lemma 4 in Utev [21], bounded above by
2
m
Xm
j¼1
Xjk
l¼ðj�1Þkþ1
ðfjTmk;l j>ððcg2;k ðx;xÞÞ=2Þ
ffiffiffiffiffiffim=k
pg
T2mk;l dP
¼2
m
Xmk
j¼1
ðfjTmk; jj>ððcg2;k ðx;xÞÞ=2Þ
ffiffiffiffiffiffim=k
pg
T2mk; j dP
and this sum converges to zero using a Cesaro argument and (6) (see Theorem 10 in Jacob
and Oliveira [12] for the computational details). j
It remains now to gather all the partial results to find a set of conditions ensuring the
asymptotic normality of the kernel estimator, as announced earlier. Given a real number v,
we define s2ðvÞ ¼ vÐ
K2ðuÞ du.
506 P. E. OLIVEIRA
Dow
nloa
ded
by [
Kun
glig
a T
ekni
ska
Hog
skol
a] a
t 12:
02 0
7 O
ctob
er 2
014
THEOREM 4.5 Suppose (D), (A) and ð5Þ are satisfied. If
hn ! 0; nhn ! þ1;hnþ1
hn
! 1;
K is of bounded variation; with K1; K2 bounded
limju!þ1
KðuÞ ¼ 0
and
limk!þ1
g2;k ¼ g2 ð7Þ
uniformly, where the functions g2;k are those introduced in (5), then
1ffiffiffiffiffiffiffinhn
pXn
i¼1
Kx � Xi
hn
� �� EK
x � Xi
hn
� �� �
converges in distribution to a centered gaussian random variable with covariance
s2ðg2ðx; xÞÞ ¼ g2ðx; xÞÐ
K2ðuÞ du.
Proof Write
jEeiuTn � e�ðu2=2Þs2ðg2ðx;xÞÞj � jEeiuTn � EeiuTmk j þ EeiuTmk �Ymj¼1
Eeiðu=ffiffiffim
pÞYmk; j
����������
þYmj¼1
Eeiðu=ffiffiffim
pÞYmk; j � e�ðu2=2Þs2ðg2;k ðx;xÞÞ
����������þ je�ðu2=2Þs2ðg2;k ðx;xÞÞ � e�ðu2=2Þs2ðg2ðx;xÞÞj:
For each k fixed, it follows from the preceding lemmas that (note that, as K is bounded and
nhn ! þ1, condition (6) is satisfied, for n 2 N large enough),
lim supn!þ1
jEeiuTn � e�ðu2=2Þs2ðg2ðx;xÞÞj
� Mu2ðg2ðx; xÞ � g2;kðx; xÞÞ þ je�ðu2=2Þs2ðg2;k ðx;xÞÞ � e�ðu2=2Þs2ðg2ðx;xÞÞj;
where M is defined in Lemma 4.3. Letting now k ! þ1 this upper bound converges to
zero, according to (7). j
Note that, if the random variables Xn, n 2 N, are second order stationary, conditions (5)
and (7) are satisfied. Thus we may state the following corollary.
COROLLARY 4.6 Suppose the random variables Xn; n 2 N; are second order stationary
and satisfy (D), (A) and ð5Þ. If
hn ! 0; nhn ! þ1;hnþ1
hn
! 1;
K is of bounded variation; with K1; K2 bounded
limjuj!þ1
KðuÞ ¼ 0;
DENSITY ESTIMATION FOR ASSOCIATED SAMPLING 507
Dow
nloa
ded
by [
Kun
glig
a T
ekni
ska
Hog
skol
a] a
t 12:
02 0
7 O
ctob
er 2
014
then
1ffiffiffiffiffiffiffinhn
pXn
i¼1
Kx � Xi
hn
� �� EK
x � Xi
hn
� �� �
converges in distribution to a centered gaussian random variable with covariance
s2ðg2ðx; xÞÞ ¼ g2ðx; xÞÐ
K2ðuÞ du.
It is not difficult to adapt the proof of Theorem 4.5 to a multidimensional version. This will
not be done here for sake of brevity, but we will state, nevertheless, the result.
THEOREM 4.7 Suppose all the conditions of Theorem 4:5 are satisfied. Then, given,
x1; . . . ; xs 2 R;
1ffiffiffiffiffiffiffinhn
pXn
i¼1
Kx1 � Xi
hn
� �� EK
x1 � Xi
hn
� �; . . . ;K
xs � Xi
hn
� �� EK
xs � Xi
hn
� �� �
converges in distribution to a centered gaussian random vector with diagonal covariance
matrix G ¼ diagðs2ðg2ðx1; x2ÞÞ; . . . ; s2ðg2ðxs; xsÞÞÞ.
A multidimensional version of Corollary 4.6 is immediate.
Acknowledgements
The author was supported by Centro de Matematica da Universidade de Coimbra, 3001-454
Coimbra, Portugal.
References
[1] Bagai, I. and Prakasa Rao, B. L. S. (1991). Estimation of the survival function for stationary associatedprocesses. Statist. Probab. Lett., 12, 291–385.
[2] Bagai, I. and Prakasa Rao, B. L. S. (1995). Kernel-type density and failure rate estimation for associatedsequences. Ann. Inst. Statist. Math., 47, 253–266.
[3] Bensaıd, N. and Fabre, J. (1998). Estimation par noyau d’une derivee de Radon–Nikodym sous des conditions demelange. Canadian J. Statist., 28, 267–282.
[4] Bensaıd, N. and Oliveira, P. E. (2001). Histogram estimation of Radon–Nikodym derivatives for strong mixingdata. Statistics, 35, 569–592.
[5] Bosq, D. (1996). Nonparametric statistics for stochastic processes, Lecture Notes in Statistics, Vol. 110,Springer.
[6] Bosq, D. and Lecoutre, J. P. (1987). Theorie de l’estimation fonctionelle. Economica, Paris.[7] Dewan, I. and Prakasa Rao, B. L. S. (1999). A general method of density estimation for associated random
variables. J. Nonparametr. Statist., 10, 405–420.[8] Ellis, S. P. (1991). Density estimation for point processes. Stoch. Proc. Appl., 39, 345–358.[9] Ferrieux, D. (1996). Estimation de densites de mesures moyennes de processus ponctuels associes. PhD Thesis,
University of Montpellier II.[10] Ferrieux, D. (1998). Estimation a noyau de densites moyennes de mesures aleatoires associees. Compte Rendus
Acad. Sciences de Paris, 326, Serie I, 1131–1134.[11] Jacob, P. and Oliveira, P. E. (1995). A general approach to non parametric histogram estimation. Statistics, 27,
73–92.[12] Jacob, P. and Oliveira, P. E. (1997). Kernel estimators of general Radon–Nikodym derivatives. Statistics, 30,
25–46.[13] Jacob, P. and Oliveira, P. E. (1999). Histogram and associated point processes. Statist. Inf. Stoch. Proc., 2,
227–251.[14] Masry, E. (1986). Recursive probability density estimation for weakly dependent stationary processes. IEEE
Trans. Inform. Theory II, 32, 254–267.
508 P. E. OLIVEIRA
Dow
nloa
ded
by [
Kun
glig
a T
ekni
ska
Hog
skol
a] a
t 12:
02 0
7 O
ctob
er 2
014
[15] Newman, C. (1984). Asymptotic independence and limit theorems for positively and negatively dependentrandom variables. In: Tong, Y. L. (Ed.), Inequalities in Statistics and Probability, IMS Lecture Notes –Monograph Series, Vol. 5, pp. 127–140.
[16] Newman, C. and Wright, A. L. (1981). An invariance principle for certain dependent sequences. Ann. Probab., 9,671–675.
[17] Pham, T. D. and Tran, L. T. (1991). Kernel density estimation under a locally mixing condition. In: Roussas, G.(Ed.), Nonparametric Functional Estimation and Related Topics, Vol. 335, pp. 419–430.
[18] Roussas, G. G. (1988). Nonparametric estimation in mixing sequences of random variables. J. Statist. Plann.Inference, 18, 135–149.
[19] Roussas, G. G. (1991). Kernel estimates under association: strong uniform consistency. Statist. Probab. Lett., 12,393–403.
[20] Roussas, G. G. (2000). Asymptotic normality of the kernel estimate of a probability density function underassociation. Statist. Probab. Lett., 50, 1–12.
[21] Utev, S. A. (1990). On the central limit theorem for j-mixing arrays of random variables. Th. Probab. Appl., 35,131–139.
DENSITY ESTIMATION FOR ASSOCIATED SAMPLING 509
Dow
nloa
ded
by [
Kun
glig
a T
ekni
ska
Hog
skol
a] a
t 12:
02 0
7 O
ctob
er 2
014