testing the equality of linear single-index models

12
Journal of Multivariate Analysis 101 (2010) 1156–1167 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Testing the equality of linear single-index models Wei Lin a , K.B. Kulasekera b,* a Department of Mathematics, Ohio University, Athens, OH 45701, USA b Department of Mathematical Sciences, Clemson University, Clemson, SC 29631, USA article info Article history: Received 28 January 2009 Available online 23 October 2009 AMS subject classifications: primary 62G08 secondary 62G20 abstract Comparison of nonparametric regression models has been extensively discussed in the literature for the one-dimensional covariate case. The comparison problem largely remains open for completely nonparametric models with multi-dimensional covariates. We address this issue under the assumption that models are single-index models (SIMs). We propose a test for checking the equality of the mean functions of two (or more) SIM’s. The asymptotic normality of the test statistic is established and an empirical study is conducted to evaluate the finite-sample performance of the proposed procedure. © 2009 Elsevier Inc. All rights reserved. 1. Introduction Suppose we have k independent samples following the models Y ij = m i (X ij ) + ij , j = 1,..., n i , i = 1,..., k, (1.1) where X ’s are iid p-dimensional random vectors, m 1 (·),..., m k (·) are smooth functions, and ’s are independent random errors with E ( ij |X ij ) = 0, j = 1,..., n i , i = 1,..., k. The k samples in this model are taken independently as opposed to data from a cross-section of individuals or items in panel data. The comparison of regression functions of this type has been extensively investigated in the literature for p = 1 ([1–6], and references therein). Multiple regression in a nonparametric frame has been discussed by many authors (see for example [7,8]). In most cases the statistician develops an approximation to the multivariate mean function or assumes that the multivariate regression function is a special type such as a generalized additive model (GAM) [9–13] or an analysis of a covariance model [14–17]. These approaches are taken to overcome the estimation deficiencies due to the curse of dimensionality. However, testing the equality of these models has not been discussed to a degree of satisfaction in the literature. For general p under the complete nonparametric model structure the testing problem is very difficult to address due to dimensionality issues. Even in cases where the model is assumed to be of a special structure such as a GAM, the focus has been on the efficient estimation except for the testing in the restrictive cases such as the transformation technique where one assumes the models to be of the form m i (X ) m i (g 0 (X )) for a known function g 0 [18]. In a wide range of applications such as economics, finance, etc. the above regression models can be expressed as Single Index Models (SIM); see [19–22] and references therein. Estimation of the link function and the index function and related issues have been discussed very thoroughly in many articles. Among many articles in this direction [23,24,21,25,26] and references included in these articles give a comprehensive treatment of efficient estimation in SIMs. Goodness of Fit testing whether a given model is a SIM has been addressed by Xia et al. [27], Stute and Zhu [28]. Comparison of multiple groups with respect to the mean function is an important but not yet a discussed issue for the SIMs. In this article we shall address this comparison problem. We assume that m i (x) = g i 0 i x), i = 1,..., k, for some index * Corresponding author. E-mail addresses: [email protected] (W. Lin), [email protected] (K.B. Kulasekera). 0047-259X/$ – see front matter © 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.jmva.2009.10.006

Upload: wei-lin

Post on 26-Jun-2016

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Testing the equality of linear single-index models

Journal of Multivariate Analysis 101 (2010) 1156–1167

Contents lists available at ScienceDirect

Journal of Multivariate Analysis

journal homepage: www.elsevier.com/locate/jmva

Testing the equality of linear single-index modelsWei Lin a, K.B. Kulasekera b,∗a Department of Mathematics, Ohio University, Athens, OH 45701, USAb Department of Mathematical Sciences, Clemson University, Clemson, SC 29631, USA

a r t i c l e i n f o

Article history:Received 28 January 2009Available online 23 October 2009

AMS subject classifications:primary 62G08secondary 62G20

a b s t r a c t

Comparison of nonparametric regression models has been extensively discussed in theliterature for the one-dimensional covariate case. The comparison problem largely remainsopen for completely nonparametricmodelswithmulti-dimensional covariates.Weaddressthis issue under the assumption that models are single-indexmodels (SIMs). We propose atest for checking the equality of themean functions of two (or more) SIM’s. The asymptoticnormality of the test statistic is established and an empirical study is conducted to evaluatethe finite-sample performance of the proposed procedure.

© 2009 Elsevier Inc. All rights reserved.

1. Introduction

Suppose we have k independent samples following the models

Yij = mi(Xij)+ εij, j = 1, . . . , ni, i = 1, . . . , k, (1.1)

where X ’s are iid p-dimensional random vectors, m1(·), . . . ,mk(·) are smooth functions, and ε’s are independent randomerrors with E(εij|Xij) = 0, j = 1, . . . , ni, i = 1, . . . , k. The k samples in this model are taken independently as opposed todata from a cross-section of individuals or items in panel data. The comparison of regression functions of this type has beenextensively investigated in the literature for p = 1 ([1–6], and references therein). Multiple regression in a nonparametricframe has been discussed bymany authors (see for example [7,8]). In most cases the statistician develops an approximationto themultivariatemean function or assumes that themultivariate regression function is a special type such as a generalizedadditive model (GAM) [9–13] or an analysis of a covariance model [14–17]. These approaches are taken to overcome theestimation deficiencies due to the curse of dimensionality. However, testing the equality of these models has not beendiscussed to a degree of satisfaction in the literature. For general p under the complete nonparametric model structure thetesting problem is very difficult to address due to dimensionality issues. Even in cases where the model is assumed to beof a special structure such as a GAM, the focus has been on the efficient estimation except for the testing in the restrictivecases such as the transformation techniquewhere one assumes themodels to be of the formmi(X) = mi(g0(X)) for a knownfunction g0 [18].In a wide range of applications such as economics, finance, etc. the above regression models can be expressed as Single

Index Models (SIM); see [19–22] and references therein. Estimation of the link function and the index function and relatedissues have been discussed very thoroughly in many articles. Among many articles in this direction [23,24,21,25,26] andreferences included in these articles give a comprehensive treatment of efficient estimation in SIMs. Goodness of Fit testingwhether a given model is a SIM has been addressed by Xia et al. [27], Stute and Zhu [28].Comparison of multiple groups with respect to the mean function is an important but not yet a discussed issue for the

SIMs. In this article we shall address this comparison problem.We assume thatmi(x) = gi(θ ′i x), i = 1, . . . , k, for some index

∗ Corresponding author.E-mail addresses: [email protected] (W. Lin), [email protected] (K.B. Kulasekera).

0047-259X/$ – see front matter© 2009 Elsevier Inc. All rights reserved.doi:10.1016/j.jmva.2009.10.006

Page 2: Testing the equality of linear single-index models

W. Lin, K.B. Kulasekera / Journal of Multivariate Analysis 101 (2010) 1156–1167 1157

vectors θi and for some smooth univariate functions gi. For identifiability purposes, θ ’s are assumed to be unit vectors withfirst nonzero elements positive [29]. Our focus is on testing m1(x) = · · · = mk(x) where each mean function is a SIM. Thatis, testing

H0 : g1(·) ≡ · · · ≡ gk(·) and θ1 = · · · = θk vs. Ha : Not H0. (1.2)

Note that if any of the θ ’s are different, then the mean functions are different. In some practical settings one may chooseto test the equality of the unknown links gi, i = 1, . . . , k assuming the equality of the index vectors (perhaps after a testfor the index vectors). Our goal is to develop a single test for equality of Single Index mean functions rather than sequentialtesting for the θ ’s and then g ’s which may result in inferior sampling properties.There are two major approaches for testing the equality of univariate nonparametric mean regressions. The first is

ANOVA-type methods, which compare the variability ‘‘between samples’’ and the variability ‘‘within samples’’ and rejectthe null hypothesis if their difference is too large [2,30,5,31,32] or if their ratio is too large [33], or simply rejects the nullhypothesis for large values of the ‘‘between-sample’’ variability using resampling techniques [1]. The other is to constructa test statistic based on an appropriate norm, such as the sup norm or the L2 norm, of the empirical process of residuals[3,6,34]. Apart from these, [35] use a graphical method and [36] uses a U-process method to test the equality of the tworegression functions.Although the effective predictor dimension of a SIM is one, testing the equality of two SIMs is much more complicated

than the univariate scenario due to the added technicalities introduced by the unknown index vectors. In this paper weadopt an ANOVA-type approach to compare two or more SIMs.We motivate the construction of the test statistic for testing equality of the mean functions for the two sample case.

Consider (1.1) above for i = 1 with the Single Index mean functionm1(x) = g1(θ ′1x) and a constant error variance σ21 . For a

suitable set of weightsw1j, j = 1, . . . , n1 we consider the sum of squared errors

d1(α) =n1∑j=1

w1j[Y1j − gα(α′X1j)]2

where gα(α′X1j) is a kernel estimator of the mean function E[Y1j|α′X1j] for a given vector α (see Section 2). It is knownthat [37]

d1(θ1) = infαd1(α) (1.3)

is a CAN estimator for σ 21 . Now, in the two-sample case, under H0 above, we have the option of constructing the estimatorgα(α′X) of the mean function E[Y |α′X] using the combined sample as well as using each individual sample and, the threeestimators should not be very different for large samples. Therefore, under H0, the values of the minimized sums of squarederrors in (1.3) should be close to the corresponding error variance σ 21 when using an estimator of the mean function basedonly on sample 1 or based on the combined sample. However, under Ha, due to the bias introduced by the difference inthe two mean functions, the resulting minimized sums of squared errors d1(θ1) will be systematically higher than thecorresponding error variance σ 21 when the combined sample estimator of the mean function is used instead of the firstsample estimator.We shall show that for smooth mean functions (to be formally defined in the sequel), the difference in the two versions

of the minimized sums of squared errors, after being properly standardized, is asymptotically normal under H0 and, willdiverge underHa leading to a suitable asymptotic decision rule. As shown in the sequel, the test statistic developed using thisargument can actually be used evenwith non-constant error variances (see Theorem2.1)widening the scope of the proposedcomparison scheme for the SIMs. Using a simulation studywe demonstrate that the developed comparisonmethod has highaccuracy in size and very desirable power properties for many sample sizes.The classical F test is somewhat the default test in practice when comparing the means of two multiple regressions

when the approximate linearity holds. Since our proposed test is the first available test for comparing SIM’s, which aregeneralized multiple regression models with unknown links, there are no direct competitors for our test. Although onemay use ad modifications for existing equality tests for univariate predictors and use those for testing the equality of SIM’s,we cannot directly compare our test to such modified tests due to their unknown sampling properties resulting from suchmodifications. Thus, we only formally compare our procedure to the F test and comment on the results of a comparisonto an ad modification of an existing univariate test by Neumeyer and Dette [6]. In addition to the simulated examples, weanalyze a real data set using the proposed method comparing the findings to those from classical data analysis techniques.The remainder of the paper is organized as follows. In Section 2, we propose the test for hypotheses (1.2) and give theasymptotic results. A simulation study and the analysis of the real data set are provided in Section 3. Some technical resultsare deferred to Section 4.

2. Testing equality of regression functions

In this section we first give the details of the construction of the test statistic, its properties and discuss theimplementation for hypotheses (1.2).

Page 3: Testing the equality of linear single-index models

1158 W. Lin, K.B. Kulasekera / Journal of Multivariate Analysis 101 (2010) 1156–1167

2.1. Construction of the test statistics

We present the development of the statistics for the two sample case with constant variance due to the notationalsimplicity. The extension to the k-sample (k > 2) case will be straightforward as shown in Section 2.3. In all cases wepresent the main theoretical results in full generality. We begin by introducing some notations.Let SX be the domain of X and let k = 2 and define index setsH ′1 = {1, 2, . . . , n1} andH ′2 = {1, 2, . . . , n2}. For every α

define giα(t) = E(Yij|α′Xij = t) for the ith group, i = 1, 2; each of which can be consistently estimated by a kernel estimator

giα(t) =

∑j∈H ′ i

YijKh(α′Xij − t)∑j∈H ′ i

Kh(α′Xij − t), i = 1, 2; (2.1)

where Kh(·) = K(·/h)/h for a suitable kernel function K . A corresponding pooled-sample version, which is consistent forgiα(t), i = 1, 2 only under H0, is given by

gα(t) =

2∑i=1

ni∑j=1YijKh(α′Xij − t)

2∑i=1

ni∑j=1Kh(α′Xij − t)

. (2.2)

Our next step is to develop a set of weighted sums of squares of residuals corresponding to each sample and eachestimator giα , i = 1, 2 and gα for eachα excluding the boundary points. These sums of squareswill then be suitably combinedto create an ANOVA type test statistic. We accomplish this in the following manner. For every α, let cα and 2wα denote thecenter and width of the set {α′x | x ∈ SX }. Fix a constant q close to (but less than) 1 as a width control parameter and let

qα = q · wα . For any function L(·) ≥ 0 supported on (−1, 1), define Lq,α(x) = L([α′x − cα]/qα

), which will serve as a

weight function to exclude the points on the boundary of SX along the direction of vector α. A similar approach has beenused by many authors in developing index vector estimators (see [27]). We used this method rather than the boundarykernel method to remove the boundary effects for each vector α. The role of q here is very similar to the role of the q ina boundary kernel [38]. Furthermore, our simulations show that the index vector estimation using boundary kernels wasinferior to our method in an overall mean square sense.Now, we define weighted averages of squared-errors for each estimator giα by

di(α) =

∑j∈H ′i

(Yij − giα(α′Xij)

)2Lq,α(Xij)∑

j∈H ′ i

Lq,α(Xij), i = 1, 2;

and for the estimator gα by

dpi(α) =

∑j∈H ′i

(Yij − gα(α′Xij)

)2Lq,α(Xij)∑

j∈H ′ i

Lq,α(Xij), i = 1, 2.

Let di = infα di(α) and dpi = infα dpi(α), i = 1, 2, where the inf is taken over the set D defined in assumption (A1) below.This set D is defined to ensure the identifiability of the SIMs [29].The statistic di is a CAN estimator for the error variance σ 2i , i = 1, 2 [37]. When the mean functionsm1(·) andm2(·) are

identical, the estimators di and dpi both are consistent for σ 2i , i = 1, 2. When m1(·) and m2(·) are different, dpi tends to belarger than di for each i, i = 1, 2. Thus we propose to use a test statistic

T =n1N[dp1 − d1] +

n2N[dp2 − d2],

and rejectH0 for ‘‘large’’ values of T . HereN =∑ki=1 ni. Under the null hypothesis, a properly normalized version of T can be

shown to converge in distribution to a standard normal random variable. In particular, we shall show that the distributionof the test statistic

N√h(T −

1NhDN

)/σN ,

where DN and σN are two easily estimable centralizing and scaling parameters (see Theorem 2.1 for the definition of thesetwo quantities) respectively, and h is the bandwidth for estimating the unknown link function(s), converges to a standardnormal distribution when the sample size diverges.Hence, our proposal is to reject H0 when N

√h(T − 1

Nh DN)/σN is larger than the upper-α quantile of the standard normaldistribution, where DN (that converges faster than a rate of

√h) and σN are any two consistent estimators of DN and σN

Page 4: Testing the equality of linear single-index models

W. Lin, K.B. Kulasekera / Journal of Multivariate Analysis 101 (2010) 1156–1167 1159

respectively. Our simulations show that the use of asymptotic critical points using estimated normalizing constants of Tabove gives very accurate sizes even for small sample sizes from each population. Furthermore, the estimators we proposefor DN and σN in the sequel (see Remark 2.3) are accurate as seen in a simulation study to assess the effectiveness of theproposed estimators of these two quantities.It will also be shown that the above test statistic diverges to infinity at a rate N

√h under Ha (Theorem 2.1) so that it can

detect near root-n alternatives. It is noted that the asymptotic properties of the test statistic based on T are very similar tothose of the test statistics developed by Dette and Neumeyer [5]. The test has a very reasonable power even for small samplesizes as seen by our simulations.Implementation of this test requires a user selected bandwidth for the estimation of the unknown link function(s). In

theory, any bandwidth that satisfies the asymptotic properties can be used in order to use the asymptotic critical points.However, in practice onemayhave to select the bandwidthusing a data based technique.Weused amodified cross validationtechnique to select the required bandwidths in all our simulations. In this situation, although the choice of the bandwidthhas an impact on the power of the test indirectly, the bandwidth selection is for optimal estimation of smooth functionsrather than optimal power performance [4]. Thus, established methods such as cross validation criteria are suitable. Oursimulation studies show that the changes of bandwidth selection method for estimating the link functions do not impactthe power/size of the test significantly.

2.2. Main results

Next we present details of the theoretical properties of the proposed test statistic. We begin by listing some assumptionsthat are used in the sequel. Most of these assumptions have been used in the literature andwe restate them for easy referral.(A1) The mean functions m1(·), . . . ,mk(·) are r-times continuously differentiable. Under H0, the true index vectors are in

Dwhere, for the remainder of the paper, D ={u = (u1, . . . , up)′ ∈ Rp | ‖u‖ = 1, u1 > 0

}.

(A2) The covariates of all samples follow the same distribution. The domain SX of this covariate vector X is a closed andbounded convex set. The support of Lq,θ (X) contains at least one interior ball with radiusw0 > 0. The density of X , α′Xand α′(X1 − X2)where X1 and X2 are iid copies of X , will be denoted by f (·), fα(·) and φα(·), respectively. The functionf ∈ C r(SX ) for some r ≥ 2, and there exist constants 0 < c1 < c2 <∞ such that c1 ≤ f (x) ≤ c2, ∀x ∈ SX .

(A3) The error ε in each sample has at least v ≥ max(2v0, 8) moments, where v0 is the smallest positive even integergreater than 3p

2r−3 . For each i, Var(εij|Xij) = σ2i (θ′

i Xij), 1 ≤ j ≤ ni.(A4) The kernel function K(·) is an rth order symmetric kernel and K ∈ C r([−1, 1]). The constant cθi =

∫ 1−1 K(s)φθi(s)ds 6=

0, i = 1, . . . , k, where φθi is given in (A2).(A5) The function L(·) is a bounded, symmetric non-negative function supported on (−1, 1). It is Lipschitz continuous of

order 1 and non-increasing in |t| and L(t) > 0 for all t ∈ (−1, 1). When the errors are heteroscedastic, Lq,α(·) is freeof α.

(A6) The bandwidth h is of order O(N−ξ ) where 1(2r+1) ≤ ξ ≤

12r , N = n1 + · · · + nk, and limN→∞ ni/N = ∆i for some

∆i > 0 with∑ki=1∆i = 1.

Remark 2.1. The density of X is assumed to be bounded away from zero to avoid the sparseness problem. The number ofmoments required in the error distribution is a result of simultaneous minimization of dwith respect to θ and h. However,this requirement is less restrictive than that of [39,40]. On the other hand, if we use a different approach other thanthe weighted least squares to estimate the index vector, we can relax the requirement of the number of moments. Theassumption that cθi 6= 0 in (A4) guarantees that the denominators of the kernel estimators g(t) and gj(t), j = 1, 2, will bezero with diminishing probability as N → ∞ when t is away from the boundary. To avoid the zero denominator problemfor these kernel estimators in finite samples, we may, say, exclude the summand (Yi − Yi)2 in constructing d’s when thedenominator of a Yi is zero or close to zero.

The asymptotic distribution of T is given in the following theorem. Let k0 =∫[K ∗K(u)−2K(u)]2du and k2 =

∫K 2(u)du,

where K ∗ K is the convolution of K(·)with itself. We have

Theorem 2.1. Let assumptions (A1)–(A6) hold with h = O(N−1/2r) and let θ be the common index vector under the nullhypothesis. Let ai = E[Lq,θ (X)σ 2i (θ

′X)/fθ (θ ′X)], i = 1, 2, b0 = E[Lq,θ (X)], and, for i, j = 1, 2,

aij = k0

∫L2(t − cθqθ

)σ 2i (t)σ

2j (t)dt. (2.3)

Then, under H0 we have T = 1NhDN +

σNN√hZ + op(N−1h−

12 ), where Z is a standard normal random variable and

DN =1b0(2K(0)− k2)

(n2Na1 +

n1Na2), (2.4)

σ 2N =1b20

(2n22N2a11 +

4n1n2N2

a12 +2n21N2a22).

Page 5: Testing the equality of linear single-index models

1160 W. Lin, K.B. Kulasekera / Journal of Multivariate Analysis 101 (2010) 1156–1167

Under Ha, T = ∆1c01 +∆2c02 + op(1) for the positive constants c0i, i = 1, 2 where

c0i = infα

E([mi(X)−∆1g1α(α′X)−∆2g2α(α′X)

]2Lq,α(X)

)ELq,α(X)

. (2.5)

Hence, our proposal to reject H0 when N√h(T − 1

Nh DN)/σN is larger than the upper-α quantile of the standard normaldistribution is well founded. Here we need the estimators DN (that converges faster than a rate of

√h) and σN to be any

consistent estimators of DN and σN under the null hypothesis respectively. It is noteworthy that the asymptotic varianceabove has a close resemblance to the asymptotic variance given in [5].

Remark 2.2. There are numerousways of estimating θi, i = 1, 2with√N-consistency (see [22,26] for a detailed discussion).

Thus, all the quantities in DN and σN can also be consistently estimated to satisfy the required asymptotic rates. Inparticular, when the error variances are homogeneous with σ 2i (θ

i x) = σ 2i , the quantities ai and aij, i, j = 1, 2 reduce toai = σ 2i qθ

∫ 1−1 L(s)ds and aij = k2σ

2i σ2j qθ

∫ 1−1 L

2(t)dt . It has been shown that di, i = 1, 2 are√N consistent estimators for

σ 2i , i = 1, 2 [37,27]. Thus, using a simple plug in method, we can form√N-consistent estimators of ai and aij, i, j = 1, 2. If

the error variances are non constant, then

ai =∫ cθ+wθ

cθ−wθσ 2i (u)L(u)du

where the function σi can be estimated from the ith sample data using a kernel method as in [41] for the univariatenonparametric regression data where we can use θ ′Xij values as design points and Yij values as the responses. Here θ isthe estimated common index parameter which we also use to estimate cθ and wθ . The above integral is then estimatedusing these estimated limits and the estimated σ 2i (·) function. The same approach can be used in estimating the aij’s. Sinceb0 only involves θ , one can easily estimate it by a moment type estimator

b0 =1N

N∑i=1

Lq,θ (Xi).

This would be a√n consistent estimator of b0 provided θ is.

Remark 2.3. The selection of the bandwidth parameter for the test statistic is a critical step. It can be shown that theasymptotic results hold with 15 ≤ ξ ≤ 1

4 (i.e. r = 2), accommodating mean square optimal smoothing parameters oforder N−1/5 in estimation of smooth functions. However, for bandwidths of order N−ξ , ξ ≤ 2

9 the ‘‘drift term’’DNNh in the test

statistic actually takes the form DNNh +O(h

4)where O(h4) term involves the unknown link functions and the density functionof θ ′X under the null hypothesis. Since the estimation of these quantities bring in additional complexity we give the resultsfor ξ = 1

4 .In practice, the experimenters use adaptive bandwidths that have asymptotic optimality for function estimation.

The smoothing parameter selection for power optimality in testing hypotheses is very important for best asymptoticpower performance. It is noteworthy that the bandwidths that optimize power for testing of smooth mean functions arenot necessarily the same as those that perform well in estimating the unknown smooth mean functions [4]. Selectingbandwidths that optimize power depends on the behavior of local alternatives in a very intricate manner and requiresan extensive investigation. Due to the added technical and empirical complexity we do not formally address the bandwidthestimation for power optimality in this article. To get a working adaptive bandwidth for the testing here one can proceed asfollows. First obtain an adaptive estimation optimal bandwidth h0, say, and use a bandwidth h0N−ξ

where ξ ′ is chosen tomatch the desired N−ξ rate.In our simulationsweuse an adaptive bandwidth h0 using cross validation that is known to provide reasonable estimators

of the unknown functions gi above.We conducted simulations using bandwidths of orderN−14 aswell as those of orderN−

15 .

It was noted that with both these selections our simulations indicated that the proposed test has power and size superiorityover tests that can be thought of as the closest competitors for testing the equality of SIM’s.

Remark 2.4. The results in Theorem 2.1 still hold if di is replaced by di(θi) for any√n-consistent estimator θi of θi, i = 1, 2.

This allows for greater flexibility in the construction of T , especially when computing time becomes a concern since theminimizations can be time consuming. Thus, we can define a statistic T ′ as

T ′ =n1N[dp1(θ1)− d1] +

n2N[dp2(θ2)− d2],

and reject H0 for large values of T ′, where θi, i = 1, 2, only need to be√n-consistent for θi under H0. The motivation for this

is the following. Under H0, θi, i = 1, 2, are√N-consistent for θi and therefore dpi(θi) and dpi are asymptotically equivalent.

Page 6: Testing the equality of linear single-index models

W. Lin, K.B. Kulasekera / Journal of Multivariate Analysis 101 (2010) 1156–1167 1161

Then, it can be shown that both T and T ′ have the same asymptotic properties. Since only two minimizations are requiredfor T ′ rather than four for T , this approach is computationally less demanding comparedwith T for large sample sizes and forlarge p. However, since T ′ ≥ T and we reject H0 for large values of T (T ′), under Ha, T ′ may report a higher power comparedwith T for finite samples. Our simulations show that, under H0, T ′ seem to converge much slower than T and might inflatethe levels. Hence, we recommend using T ′ only when the sample sizes are large and computing time becomes a seriousissue.

2.3. The k-sample case

The above results can readily be generalized to the k-sample (k ≥ 2) case. In fact, the generalization below allows thedensities of the covariates Xij to be different over a common support. If the supports were also different then testing theequality is not very meaningful due to the differences in θ ′X values themselves. Formally, the assumption (A2) is relaxed to(A2′) The covariate Xij follows the density fxi, j = 1, . . . , ni, i = 1, . . . , k. The densities have common support SX , which is

a closed and bounded convex set, fxi ∈ C r(SX ) for some r ≥ 2, and there exists constants 0 < c1 < c2 <∞ such thatc1 ≤ fxi(u) ≤ c2, ∀u ∈ SX . For each α, the common support of Lq,α(Xij) contains at least one interior ball with radiusw0 > 0, and E[Lq,θ (Xij)] = b0, j = 1, . . . , ni, i = 1, . . . , k. The densities of θ ′X and θ ′(X1 − X2) where X1 and X2 areiid copies of X in the ith sample, will be denoted by fi(·) and φθ,i(·), respectively.

We would like to test the hypothesis (1.2) for k > 2. Let N = n1 + · · · + nk and let

T =k∑i=1

niN

(dpi − di

),

where di is the version of d from the ith sample, and dpi is the pooled version of di with kernel estimator of themean functionconstructed from the complete pooled sample instead of the ith sample only. Using arguments that are similar to those forthe two sample case we reject the null hypothesis for large values of this T . We have the following result providing thecritical points for this test.

Theorem 2.2. Let the conditions (A1)–(A6) hold with (A2) replaced by (A2′) and h = O(N−1/2r). Let λl = nlN , l, s = 1, . . . , k,

and f (·) =∑kl=1 λlfl(·). Then, under H0 we have T =

1NhD∗

N +τ

N√hZ + op(N−1h−

12 ), where Z is a standard normal random

variable and,

D∗N =k2 − 2K(0)

b0

k∑l=1

(λla′l − al) (2.6)

τ 2 =2b20

( k∑l,s=1

λlλsals +k∑l=1

all − 2k∑l=1

λlal), (2.7)

with all defined in (2.3) and,

a′l =∫fl(t)

f (t)σ 2l (t)L

(t − cθqθ

)dt; al = k0

∫fl(t)

f (t)σ 4l (t)L

2(t − cθqθ

)dt;

als = k0

∫fl(t)fs(t)

f 2(t)σ 2l (t)σ

2s (t)L

2(t − cθqθ

)dt. (2.8)

The estimation of quantities like als can be done similarly to the estimation of those in the two sample case usingappropriate plug-in methods. Note that in the most general form of the asymptotic result above, one needs to estimateindividual covariate densities fi for each group while no density estimation is needed with a common density.

3. Empirical study

To investigate the finite sample performance of the proposed procedure, we conducted an extensive simulation study oftwo-sample comparisons followed by examination of a real data set.

3.1. Simulation study

Our simulation study covers several aspects. We examined linear and highly non linear link functions under the nullhypothesis. Since the null can be rejected either due to unequal links or unequal index vectors or both, we used meanfunctions and index vectors to cover of all three situations in our power simulations. Since the error variance has a clearimpact on the power/size of the test, we also examined several error variances.In our simulations we considered dimensions p = 2, p = 3 (smaller dimensions) and p = 6 (higher dimension) for

the two sample case. The covariates were taken to be independent following a uniform design where each covariate was

Page 7: Testing the equality of linear single-index models

1162 W. Lin, K.B. Kulasekera / Journal of Multivariate Analysis 101 (2010) 1156–1167

Table 1Simulated levels for with sample size n1 = n2 = n.

Case TS \ α n = 25 n = 50 n = 100.10 .05 .025 .10 .05 .025 .10 .05 .025

(i) T .084 .064 .046 .085 .058 .042 .076 .051 .033F .108 .052 .009 .100 .048 .010 .106 .048 .009

(ii) T .096 .073 .064 .086 .058 .044 .062 .038 .024F .106 .056 .010 .098 .052 .014 .092 .046 .009

(iii) T .082 .068 .052 .080 .056 .044 .065 .046 .030F .132 .076 .018 .124 .072 .017 .123 .065 .016

(iv) T .080 .058 .048 .072 .052 .038 .060 .042 .030F .142 .080 .018 .140 .074 .023 .131 .073 .020

uniform between 0 and 1. The error distributions were taken to be mean zero normals with variances σ 21 and σ22 . Although

we considered equal and unequal sample sizes with equal and unequal error variances, for space considerations we onlypresent the results for the equal sample size cases with σ 21 = σ 22 = 0.25 for p = 2 and p = 3. The results for all othersituations with these values of pwere very similar. Results for p = 6 where we contrast the use of different bandwidths anddiscuss non homogeneous error variances are given in the sequel.We used the quadratic kernel function K(u) = 3

4 (1− u2)I(|u| ≤ 1) and the L(·) function

L(u) = I(|u| ≤ 0.9)+ 10(1− u)I(0.9 < |u| ≤ 1).

Our simulations show that the results are fairly insensitive to the choice of q and we present the results for q = 0.95throughout. As an estimator of qθ , we use qθ = 1

2q(wθ1+wθ2)wherewθi is obtained from the ith sample, i = 1, 2. To reduce

the computational burden, θi were computed using the PPR procedure in Rwith default options and we use test statistic

T =n1N[dp1 − d1(θ1)] +

n2N[dp2 − d2(θ2)].

A common bandwidth h is used, which, as remarked in the previous section, was chosen by a cross validation method.Namely, we defined dcv,i(α; h) as the version of di(α) where in each term Yij − gα(α′Xij), the estimator gα(α′Xij) is nowcomputed by leaving out the jth observation in the ith sample. Then we chose h by minimizing dcv,1(θ1; h) + dcv,2(θ2; h)with respect to h. Since our asymptotic theory as stated in the theorems requires that the bandwidth parameter h shouldbe proportional to N−1/4 for a second order kernel with r = 2, one could use h = hN−1/20 to adjust the above bandwidth tomatch the asymptotic rate. The results presented are based on h. The powers (and levels) would be slightly higher for mostof the examples we examined if hwas used (results not shown here). All the results reported are based on 2000 simulations.With e1 = (1, 1)/

√2, e2 = (1,−1)/

√2, e3 = (1, 1, 1)/

√3 and e4 = (1,−1,−1)/

√3, the following cases were

examined:

(i) g1(t) = g2(t) = t, θ1 = θ2 = e1(ii) g1(t) = g2(t) = t, θ1 = θ2 = e3(iii) g1(t) = g2(t) = sin(2π t), θ1 = θ2 = e1(iv) g1(t) = g2(t) = sin(2π t), θ1 = θ2 = e3(v) g1(t) = g2(t) = t, θ1 = e1, θ2 = e2(vi) g1(t) = g2(t) = t, θ1 = e3, θ2 = e4(vii) g1(t) = g2(t) = sin(2π t), θ1 = e1, θ2 = e2(viii) g1(t) = g2(t) = sin(2π t); θ1 = e3, θ2 = e4(ix) g1(t) = sin(2π t), g2(t) = sin(2π t)+ t; θ1 = θ2 = e3(x) g1(t) = sin(2π t), g2(t) = 2 sin(2π t); θ1 = θ2 = e3(xi) g1(t) = sin(2π t), g2(t) = sin(2π t)+ t; θ1 = e3, θ2 = e4(xii) g1(t) = sin(2π t), g2(t) = 2 sin(2π t); θ1 = e3, θ2 = e4.

Cases (i)–(iv) examine the performance of T under H0; and cases (v)–(xii) are included for power demonstrations. Foreach case we took n1 = n2 = 25, 50, 100. The results for cases (i)–(v) are presented in Table 1, and the results for cases(v)–(xii) are included in Table 2. We can see that the proposed procedure shows decent level performance and can welldetect the difference in the two mean functions.Since to our knowledge this is the first work that gives a formal test on the equality of SIMs, we compare our procedure

to the traditional F-test used for testing the equality of two linear models. When the true models are linear (cases (i), (ii),(v) and (vi)), our T gives very competitive results in terms of both level and power. When the models are nonlinear, ourprocedure out-performs the F-test as expected.

Page 8: Testing the equality of linear single-index models

W. Lin, K.B. Kulasekera / Journal of Multivariate Analysis 101 (2010) 1156–1167 1163

Table 2Simulated powers with sample size n1 = n2 = n.

Case TS \ α n = 25 n = 50 n = 100.10 .05 .025 .10 .05 .025 .10 .05 .025

(v) T .823 .792 .773 .944 .934 .926 .997 .996 .994F .776 .666 .420 .972 .949 .850 1.0 1.0 .998

(vi) T .967 .962 .958 .992 .992 .991 1.0 1.0 1.0F .815 .706 .463 .986 .964 .908 1.0 1.0 1.0

(vii) T .724 .694 .672 .890 .877 .865 .963 .958 .958F .468 .345 .177 .669 .572 .349 .916 .867 .686

(vii) T .525 .478 .447 .750 .726 .698 .936 .932 .929F .222 .152 .054 .294 .201 .076 .470 .349 .168

(ix) T .734 .710 .684 .948 .945 .939 .988 .987 .987F .272 .183 .061 .394 .290 .132 .609 .507 .301

(x) T .466 .432 .400 .774 .756 .741 .962 .956 .956F .219 .146 .062 .256 .172 .077 .324 .242 .117

(xi) T .626 .587 .560 .863 .846 .840 .973 .972 .970F .345 .230 .097 .511 .398 .210 .786 .688 .484

(xii) T .594 .566 .544 .716 .692 .677 .874 .866 .862F .180 .118 .043 .234 .157 .058 .322 .222 .086

Table 3Average and standard deviation of estimators of ai ’s and aijs.

Function Parameter n1 = n2 = 50 n1 = n2 = 100

sin(π t) a1 0.2195 (0.0593) 0.2501 (0.0430)a2 0.2214 (0.0566) 0.2502 (0.0429)a11 0.0434 (0.0221) 0.0501 (0.0167)a22 0.0428 (0.0232) 0.0501 (0.0171)a12 0.0410 (0.0167) 0.0490 (0.0127)

t + 1 a1 0.2215 (0.0557) 0.2521 (0.0433)a2 0.2215 (0.0553) 0.2521 (0.0428)a11 0.0443 (0.0229) 0.0514 (0.0172)a22 0.0442 (0.0216) 0.0514 (0.0170)a12 0.0422 (0.0163) 0.0502 (0.0127)

A naive method for comparing SIMs is to reduce those to a univariate comparison case with ‘‘pseudo‘‘covariates Uij =θ ′iXij, where θi is the estimated index vector from each sample, and then use the existing methods in the literature forcomparing univariate regressionmodels. One canplug in any

√N-consistent estimator of θi.Weplugged in the PPR estimator

of θi for the ith sample and then computed the test statisticsK(1)N andK

(2)N proposed byNeumeyer andDette [6] (NDhereafter)

which are based on the empirical processes of residuals. A wild bootstrap method [39] was used to compute the criticalvalues as suggested by ND. Testing in this manner gave very poor level and power performance (results not shown here).This was perhaps due to the use of θi in place of θi which introduces dependence among covariates within each sample.In addition, since the estimated index parameter values are most likely different (even under the null hypothesis), theresulting ‘‘pseudo’’ covariate values for the two samples may appear to come from two different domains. This may resultin level inaccuracies and inflated power. It is also noteworthy that the proposed test, unlike the univariate tests like theND test, effectively uses asymptotic critical values rather than bootstrap critical values. This is a substantial computationaladvantage, especially for large sample sizes and for high dimensions.In the test of equality we needed the estimators of the quantities ai and aij. These were estimated using simple plug in

methods as discussed in Remark 2.2. We conducted a small simulation study to assess the impact of the estimated ai’s andaij’s where sample sizes of 25, 50 and 100 were used with two mean functions. We used uniform (0, 1) designs for the X ’swith p = 2 and θ = (1, 1)′/

√2, the errors were N(0, 0.52) for both samples where we used q = 0.95. The true values were

a1 = a2 = 0.32; aij = 0.069, i, j = 1, 2. The averages and standard deviations of the estimated quantities based on 2000simulations are reported in Table 3.It is noted that there seems to be under-estimation in the values of the a’s where the amount of underestimation reducing

as the sample size increases. The impact of underestimation is an inflation of the test statistic. However, for sample sizes50 and 100, observed levels for the test statistics are generally lower than the nominal level leading us to believe thatestimated a’s using the proposed plug-in method does not inflate the size of the test. In order to examine the severity ofunderestimation of these estimators, we estimated the parameters ai, i = 1, 2 and aij using samples of size 500 in 2000simulations keeping all other conditions same. These simulations showed much improved estimators. For example, thesegave an average (standard deviation) values 0.2936 (0.0215) for estimators of a1 and 0.0628 (0.0085) for a11. Thus, it seemsthat the underestimation is reduced for large samples.

Page 9: Testing the equality of linear single-index models

1164 W. Lin, K.B. Kulasekera / Journal of Multivariate Analysis 101 (2010) 1156–1167

Table 4Simulated levels and power for p = 6 with sample sizes n1 = n2 = n.

a TS n \ α λ = 0 λ = 1 λ = 2.10 .05 .025 .10 .05 .025 .10 .05 .025

0 T (h1) 25 0.172 0.139 0.118 0.35 0.308 0.279 0.702 0.67 0.64950 0.163 0.127 0.1 0.478 0.43 0.387 0.818 0.8 0.785100 0.112 0.072 0.052 0.586 0.548 0.508 0.906 0.894 0.886

T (h2) 25 0.123 0.105 0.092 0.276 0.238 0.206 0.655 0.624 0.59450 0.117 0.086 0.068 0.41 0.362 0.324 0.809 0.794 0.777100 0.064 0.039 0.03 0.55 0.5 0.456 0.931 0.92 0.91

T (h3) 25 0.137 0.112 0.097 0.312 0.281 0.242 0.709 0.68 0.65450 0.085 0.059 0.048 0.395 0.344 0.304 0.864 0.842 0.824100 0.054 0.038 0.026 0.54 0.492 0.445 0.956 0.948 0.936

F 25 0.091 0.045 0.025 0.115 0.059 0.03 0.163 0.09 0.05250 0.099 0.04 0.018 0.138 0.072 0.046 0.283 0.178 0.104100 0.104 0.058 0.032 0.19 0.112 0.068 0.522 0.383 0.276

0.5 T (h1) 25 0.178 0.15 0.125 0.336 0.3 0.272 0.624 0.595 0.56950 0.166 0.126 0.104 0.464 0.414 0.37 0.798 0.783 0.762100 0.106 0.07 0.048 0.556 0.51 0.464 0.868 0.856 0.848

T (h2) 25 0.14 0.111 0.095 0.272 0.244 0.216 0.586 0.544 0.50950 0.125 0.096 0.076 0.4 0.351 0.308 0.783 0.759 0.738100 0.062 0.036 0.024 0.49 0.447 0.406 0.892 0.878 0.867

T (h3) 25 0.136 0.114 0.094 0.296 0.258 0.23 0.635 0.602 0.56850 0.086 0.063 0.054 0.367 0.307 0.258 0.822 0.795 0.776100 0.045 0.028 0.019 0.501 0.44 0.398 0.946 0.926 0.911

F 25 0.096 0.047 0.024 0.116 0.061 0.03 0.155 0.088 0.04850 0.088 0.042 0.022 0.15 0.086 0.048 0.26 0.162 0.1100 0.092 0.046 0.02 0.194 0.106 0.058 0.448 0.312 0.22

To examine the performance of our procedure for higher dimensions and small departures from error homogeneityassumption that is very common in practice we conducted a very similar simulation study as above with index vectorsθ1 = θ2 = (1, 1, 1, 1, 1, 1)/

√6 and link functions g1(t) = t and g2(t) = t + λe−t for λ = 0, 1, 2. We considered errors for

the ith group, i = 1, 2, following a normal distribution with mean zero and variance σ 2i (θ′

i x) = 0.25|gi(θ′

i x)|a with a = 0

(homogeneous errors) and a = 0.5 (heteroscedastic errors). In addition, we compare three bandwidths h1 = h from CV asdescribed above, h2 = h = hN−1/20, and h3 from the plug-in formula (e.g. [39,42]) which has rate N−1/5.The results for the three cases of the test statistics T (hi), i = 1, 2, 3 corresponding to each bandwidth and the traditional

F test are summarized in Table 4. It can be seen that our procedure works reasonably well for p = 6, especially when thesample size is above 50. The test appears to be very robust to moderate departure from error homogeneity assumption andas expected that it has much better size and power properties than the F test. It is also seen that, while the performancefor all three bandwidths are satisfactory, especially when n = 100, h2 outperforms h1 as expected, and it seems that h3, theplug-in bandwidth without undersmoothing, has a slight edge over h2 in most cases.

3.2. Real data example

Ourmethod is illustrated using data from a textiles manufacturing study conducted at the USDA Cotton Quality ResearchStation in Clemson, SC [43]. Bales of cotton with a wide range of fiber properties were spun into yarns of the same sizeusing two different spinning processes: ring spinning (R) and open end spinning (OE). Properties of the cotton fibers ineach bale were measured beforehand using three different instruments/methods; the High Volume Instrument (HVI), theAdvanced Fiber Information System (AFIS), and the Suter–Webb array method (SW). Several characteristics of the twotypes of yarn (such as yarn tensile strength) were subsequently measured with the goal of relating fiber properties toyarn characteristics. A total of 112 observations (equal number for each spinning type) were taken in this study. Takingyarn breaking strength (yarnstr, cN/tex) as the dependent variable Y , a multiple linear regression model with three fiberproperties, X1: fiber strength (HVISTR, g/tex); X2: micronaire, a dimensionless measure of average fiber diameter (HVIMIC);X3: short fiber content (SWSFC as a %), and an indicator for the type of spinning gives a good fit where all covariates werehighly significant with p-values less than 0.0001 and a multiple R2 = 0.91. Our goal in re-analyzing this data using theproposed method is to highlight the possible use of SIM’s in data of this type.Based on the linear model analysis it seems a SIM with a linear link function should be a reasonable fit for this data for

each spinning type. We thus split the data along R and OE type spinning and fitted linear models to each group resultingin an adjusted R2 values of 0.9336 for the OE group and 0.94 for the R group respectively with all predictor variables beinghighly significant. Therefore SIM’s would be reasonable semiparametric models (with a linear link function) for these twogroups. A residual analysis showed some evidence of non normality of the error variance but there was no indication ofnon constant error variance. Thus we proceeded assuming constant error variance for the two SIM’s. Based on the previousanalysis we would expect our test to show that there is a significant difference between the two groups.

Page 10: Testing the equality of linear single-index models

W. Lin, K.B. Kulasekera / Journal of Multivariate Analysis 101 (2010) 1156–1167 1165

U

Yar

n S

tren

gth

00 00

00

00

00

00

000000

0000

00

0000

00

00

00

00

00

00 00

0000

00

0000

0000++

++

++

++

++

++

++

++++

++

++

++

++

++

++

++

++

+

+

++++ +

+++++

++

++

++

++

++

810

1214

1618

10 12 14 16 18

Fig. 1. Plots of smoothed yarn strengths against the single index for each group. Solid: OE; Dashed: R.

In calculating our test statistic we used q = 0.95 with the same K and L functions as above. The estimated bandwidth forthe two groups was h = 1.516 where we used the same cross validation idea above for equal sample sizes. Our calculationsresulted in dOE = 0.135 and dR = 0.298 where the MSE’s for fitting linear models were 0.146 for OE and 0.392 for Rrespectively. Note that each d is actually an estimator of the error variance in the corresponding SIM. The pooled estimatorsdpi, i = 1, 2 were 1.713 and 0.209 respectively. Finally, the resulting test statistic value was T = 106.4, rejecting the nullhypothesis of equality of the mean function for two groups. Fig. 1 gives a plot of the smoothed yarn strength against θ ′Xvalues for the two groups. The smoothed curves show approximate linearity. In fact, when we write the two linear modelsin the form γ0 + γ1γ ′X where γ is the normalized estimated coefficient vector, the estimated single index for each groupwas almost identical to γ in each group.

4. Proofs

In this section, we give a sketch of the proofs for Theorems 2.1 and 2.2. The details of these proofs can be obtained fromthe authors.To simplify the presentation, let Ni = n1 + · · · + ni, and define index sets H = {1, 2, . . . ,N} and Hi = {Ni−1 +

1,Ni−1 + 2, . . . ,Ni}, i = 1, 2, . . . , k, with N0 = 0. Now combine samples {(Xli, Yli) | i = 1, . . . , nl; l = 1, . . . , k} into{(Xi, Yi), i = 1, 2, . . . ,N} in such a way that the index set Hl corresponds to the lth sample. For example, the elements1, . . . , n2 in H ′2 correspond to the indices n1 + 1, . . . ,N2 in H2 in that order. Then H will represent the indices of thecombined sample where the elements with indices inHl represent the lth sample, l = 1, . . . , k. Now, let S and G be subsetsofH . Define

d(α; S,G) =

∑j∈S

(Yj − gα(α′Xj;G)

)2Lq,α(Xj)∑

j∈SLq,α(Xj)

(4.1)

where gα(t;G) =∑j∈G YjKh(α

′Xj − t)/∑j∈G Kh(α

′Xj − t). Then we have dl(α) = d(α;Hl,Hl) and dpl(α) = d(α;Hl,H),

l = 1, . . . , k. Letting d(S,G) = infα∈D d(α; S,G), the test statistic T can be written as

T =k∑l=1

nlN

[d(Hl,H)− d(Hl,Hl)

].

We first give the following asymptotic result for a quadratic form.

Lemma 4.1. Let {(Xi, εi), 1 ≤ i ≤ n} be independent. Suppose E(εi|Xi) = 0, E(ε2i |Xi) = τi(Xi) and E(ε4i |Xi) = λi(Xi). LetTn =

∑i6=jwijεiεj, wherewij = w(Xi, Xj). Suppose

(i) for a sequence of real numbers hn > 0, hn = o(1) and nh2n →∞;(ii) τi’s and λi’s are bounded by some constant C1 > 0;(iii) uniformly in u ≤ 4 and r1, . . . , ru ≤ 4,

E∣∣∣wr1i1j1wr2i2j2 · · ·wruiuju ∣∣∣ = O(htn); (4.2)

Page 11: Testing the equality of linear single-index models

1166 W. Lin, K.B. Kulasekera / Journal of Multivariate Analysis 101 (2010) 1156–1167

where t is the number of distinct pairs in {(is, js)} (with (i, j) and (j, i) being considered as identical) such that is 6= js and{(is, js)} contains an index (either is or js) that is not contained in other pairs;

(iv) s2n = Var(Tn) = C0n2hn + o(n2hn) for some constant C0 > 0.

Then we have Tn/snD−→N(0, 1).

Proof. Let wij = wij+wji and Yi = Ziεi, where Z1 = 0 and Zi =∑i−1j=1 wijεj, i = 2, . . . , n. Then Tn =

∑ni=1 Yi and {(Yi,Fi)} is a

martingale difference sequencewhereFi is the σ -field generated by {X1, . . . , Xi, ε1, . . . , εi}. By Theorem 1 of [44], it sufficesto show

∑ni=1 EY

4i = o(s

4n) and E

(∑ni=1 Y

2i − s

2n

)2= o(s4n), which can be verified by straightforward computation. �

Proof of Theorem 2.1. Let G be any ofH ,H1 orH2. Let t = |G| and ti = |G ∩Hi|, i, j = 1, 2. Then, either under H0 for allG, or under Ha for G = Hi, i = 1, 2, it can be shown that we have the decomposition

d(Hi,G) =

∑l∈Hi

ε2l Lq,θ (Xl)∑l∈HiLq,θ (Xl)

+

∫ 1−1 K

2(s)ds

t2hb0

(t1a1 + t2a2

)−2K(0)thb0

ai

−1thb0

∑l∈Hi,j∈G,l6=j

(1twlj −

2niwlj

)εlεj + op(N−1h−

12 ),

where

wij = Kh(θ ′Xj − θ ′Xi) ·Lq,θ (Xi)fθ (θ ′Xi)

, (4.3)

wij =

∫ 1

−1K(s)K

(s+

θ ′Xi − θ ′Xjh

)ds ·

Lq,θ (Xi)fθ (θ ′Xi)

. (4.4)

Here θ is the common index parameter under H0, and θ = θi if G = Hi, i = 1, 2, under Ha. Thus we have T = 1NhDN +

1Nhb0T1 + op(N−1h−

12 ) where DN is defined in (2.4) and T1 =

∑i,j∈H,i6=j cij(wij − 2wij)εiεj. Here cij =

1N −

1nlif i, j ∈ Hl,

l = 1, 2, and cij = 1N otherwise. For i ∈ Hl, j ∈ Hs, l, s = 1, 2, straightforward calculation shows that, uniformly for i 6= j,

E[(wij − 2wij)2ε2i ε2j ] = alsh+ O(h

2), and E[(wij − 2wij)(wji − 2wji)ε2i ε2j ] = alsh+ O(h

2). Hence

Var(T1) =n1∑i=1

n1∑j=1

(c2ij + cijcji

)a11h+

n1∑i=1

N∑j=n1+1

(c2ij + cijcji

)a12h

+

N∑i=n1+1

n1∑j=1

(c2ij + cijcji

)a21h+

N∑i=n1+1

n1∑j=1

(c2ij + cijcji

)a22h+ O(h2)

= h(2n22N2a11 +

4n1n2N2

a12 +2n21N2a22)+ O(h2).

It is easy to check that the conditions of Lemma 4.1 are satisfied and thus the result under H0 follows. Under Ha, we have

d(Hi,H) = E[Lq,θi(X)σ2i (θ′

i X)]/b0 + c0i + op(1),

which completes the proof. �

Proof of Theorem 2.2. The result under Ha is similar to the 2-sample case. Now suppose H0 holds. Similar to the 2-samplecase, we have the decomposition T =

∑kl=1

nlN

(d(Hl,H)− d(Hl,Hl)

)=

1NhD∗

N+1Nhb0T1+op(N−1h−

12 ), where D∗N is defined

in (2.6) and T1 =∑i,j∈H,i6=j cijεiεj. Here, provided i ∈ Hl and j ∈ Hs,

cij =

1N(w′ij − 2w

ij)−1nl(wij − 2wij), l = s;

1N(w′ij − 2w

ij), l 6= s,

wherewij and wij are defined in (4.3) and (4.4) with fθ replaced by fl, and

w′ij =Kh(θ ′Xj − θ ′Xi)

f (θ ′Xi)Lq,θ (Xi),

w′ij =

∫ 1

−1K(s)K

(s+

θ ′Xi − θ ′Xjh

)dsLq,θ (Xi)

f (θ ′Xi).

Page 12: Testing the equality of linear single-index models

W. Lin, K.B. Kulasekera / Journal of Multivariate Analysis 101 (2010) 1156–1167 1167

Straightforward calculation gives that

E[(w′ij − 2w′

ij)2σ 2l (θ

′Xi)σ 2s (θ′Xj)] = alsh+ O(h2);

E[(w′ij − 2w′

ij)(w′

ji − 2w′

ji)σ2l (θ′Xi)σ 2s (θ

′Xj)] = alsh+ O(h2);

E[(w′ij − 2w′

ij)(wij − 2wij)σ2l (θ′Xi)σ 2l (θ

′Xj)] = alh+ O(h2);

E[(w′ij − 2w′

ij)(wji − 2wji)σ2l (θ′Xi)σ 2l (θ

′Xj)] = alh+ O(h2),

where all, al and als are defined in (2.8). Hence, noticing E(c2ij ) = E(cijcji)+ O(h2), we have,

Var(T1) = 2k∑l=1

∑i,j∈Hl,i6=j

E[c2ijσ

2l (θ′Xi)σ 2l (θ

′Xj)]+ 2

∑l6=s

∑i∈Hl,j∈Hs,i6=j

E[c2ijσ

2l (θ′Xi)σ 2s (θ

′Xj)]+ O(h2)

= 2hk∑l=1

n2l( allN2+alln2l−2alNnl

)+ 2h

∑l6=s

nlnsalsN2+ O(h2)

= hb20τ2+ O(h2),

where τ 2 is defined in (2.7). The result now follows from Lemma 4.1. �

Acknowledgments

The authors would like to thank the referees for their constructive comments. K.B. Kulasekera was partially supportedby the National Institutes of Health grant number 5R01 CA 9250402.

References

[1] P. Hall, J.D. Hart, bootstrap test for difference between means in nonparametric regression, Journal of the American Statistical Association 85 (1990)1039–1049.

[2] E.C. King, J.D. Hart, T.E.Wehrly, Testing the equality of two regression curves using linear smoothers, Statistics & Probability Letters 12 (1991) 239–247.[3] K.B. Kulasekera, Testing the equality of regression curves using quasi residuals, Journal of the American Statistical Association 90 (1995) 1085–1093.[4] K.B. Kulasekera, J. Wang, Smoothing parameter selection for power optimality in testing of regression curves, Journal of the American StatisticalAssociation 92 (1997) 500–511.

[5] H. Dette, N. Neumeyer, Nonparametric analysis of covariance, The Annals of Statistics 29 (2001) 1361–1400.[6] N. Neumeyer, H. Dette, Nonparametric comparison of regression curves: An empirical process approach, The Annals of Statistics 31 (2003) 880–920.[7] R.L. Eubank, Spline Smoothing and Nonparametric Regression, Marcel Dekker, NY, 1998.[8] J. Fan, I. Gijbels, Local Polynomial Modelling and Its Applications, Chapman and Hall, London, 1996.[9] T. Hastie, R. Tibshirani, Generalized Additive Models, Chapman and Hall, London, 1990.[10] J. Horowitz, E. Mammen, Nonparametric estimation of an additive model with a link function, The Annals of Statistics 32 (2004) 2412–2443.[11] E. Mammen, B.U. Park, Bandwidth selection in smooth backfitting in additive models, The Annals of Statistics 33 (2005) 1260–1294.[12] E. Mammen, B.U. Park, A simple smooth backfitting method for additive models, The Annals of Statistics 34 (2006) 2252–2271.[13] K. Yu, B.U. Park, E. Mammen, Smooth backfitting in generalized additive models, The Annals of Statistics 36 (2008) 228–260.[14] C. Gu, Smoothing Spline ANOVA Models, Springer, New York, 2002.[15] H.H. Zhang, Variable selection for support vector machines via smoothing spline ANOVA, Statistica Sinica 16 (2006) 659–674.[16] Y. Lin, H.H. Zhang, Component selection and smoothing in smoothing spline analysis of variancemodels, The Annals of Statistics 34 (2006) 2272–2297.[17] Y. Lin, H.H. Zhang, Component selection and smoothing for nonparametric regression in exponential families, Statistica Sinica 16 (2006) 1021–1041.[18] J.D. Hart, Nonparametric Smoothing and Lack-of-Fit Tests, Springer, New York, 1997.[19] T.M. Stoker, Consistent estimation of scaled coefficients, Econometrica 54 (1986) 1461–1481.[20] H. Ichimura, Semiparametric Least Squares (SLS) and weighted SLS estimation of single-index models, Journal of Econometrics 58 (1993) 71–120.[21] M. Hristache, A. Juditsky, V. Spokoiny, Direct estimation of the index coefficient in a single-index model, The Annals of Statistics 29 (2001) 595–623.[22] X. Yin, R.D. Cook, Direction estimation in single-index regression, Biometrika 92 (2005) 371–384.[23] J.L. Powell, J.H. Stock, T.M. Stoker, Semiparametric estimation of index coefficients, Econometrica 57 (1989) 1403–1430.[24] W. Härdle, P. Hall, H. Ichimura, Optimal smoothing in single-index models, The Annals of Statistics 21 (1993) 157–178.[25] Y. Yu, D. Ruppert, Penalized spline estimation for partially linear single-index models, Journal of the American Statistical Association 97 (2002)

1042–1054.[26] Y.C. Xia, W. Härdle, Semi-parametric estimation of partially linear single-index models, Journal of Multivariate Analysis 97 (2006) 1162–1184.[27] Y. Xia, W.K. Li, H. Tong, D. Zhang, A goodness-of-fit test for single-index models (with discussion), Statistica Sinica 14 (2004) 1–39.[28] W. Stute, L.X. Zhu, Nonparametric checks for single-index models, The Annals of Statistics 33 (2005) 1048–1083.[29] W. Lin, K.B. Kulasekera, Identifiability of single-index models and additive-index models, Biometrika 94 (2007) 496–501.[30] A. Yatchew, An elementary nonparametric differencing test of equality of regression functions, Economics Letters 62 (1999) 271–278.[31] N. Neumeyer, S. Sperlich, Comparison of separable components in different samples, Scandinavian Journal of Statistics 33 (2006) 477–501.[32] G. Aneiros-Perez, Semi-parametric analysis of covariance under dependence conditions within each group, Australian & New Zealand Journal of

Statistics 50 (2008) 97–123.[33] S.G. Young, A.W. Bowman, Nonparametric analysis of covariance, Biometrics 51 (1995) 920–931.[34] J.C. Pardo-Fernandez, I. Van Keilegom, W. Gonzalez-Manteiga, Testing for the equality of k regression curves, Statistica Sinica 17 (2007) 1115–1137.[35] A. Bowman, S. Young, Graphical comparison of nonparametric curves, Applied Statistics 45 (1996) 83–98.[36] P. Cubas, Testing for the comparison of non-parametric regression curves, Preprint 99-29, IRMAR, Univ. Rennes, France, 2000.[37] W. Lin, K.B. Kulasekera, On variance estimation for the single-index models, Technical Report, Department of Mathematical Sciences, Clemson

University. http://www.math.clemson.edu/reports/TR2006_06_LK.pdf, Australian and New Zealand Journal of Statistics (2008) (in press).[38] H.G. Müller, Nonparametric Regression Analysis of Longitudinal Data, Springer, NY, 1984.[39] W. Härdle, E. Mammen, Comparing nonparametric versus parametric regression fits, The Annals of Statistics 21 (1993) 1926–1947.[40] Y.C. Xia, H. Tong, W.K. Li, On extended partially linear single-index models, Biometrika 86 (1999) 831–842.[41] J. Fan, Q. Yao, Efficient esitmation of conditional variance functions in stochastic regression, Biometrika 85 (1998) 645–660.[42] W. Lin, On Completely data-driven bandwidth selection for single-index models (2009) (manuscript submitted for publication).[43] D. Thibodeaux, H. Senter, J. Knowlton, D. McAlister, X. Cui, Measuring the short fiber content of cotton, in: Cotton: Nature’s High-tech Fiber,

Proc. World Cotton Res. Conf. -4, Lubbock, TX, 7–11 Sept. 2007, 2007.[44] C.C. Heyde, B.M. Brown, On the departure fromnormality of a certain class ofmartingales, The Annals ofMathematical Statistics 41 (1970) 2161–2165.