page proof - download.szjspx.com.cn

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

Random Matrices: Theory and Applications1

2250011 (29 pages)2

c⃝ World Scientific Publishing Company3

DOI: 10.1142/S20103263225001134

5

Ridgelized Hotelling’s T2 test on mean vectors6

of large dimension7

Gao-Fan Ha8

KLASMOE & School of Mathematics and Statistics9

Northeast Normal University10

Changchun, Jilin, P. R. China11

[email protected]

Qiuyan Zhang∗13

School of Statistics14

Capital University of Economics and Business15

Beijing, P. R. China16

[email protected]

Zhidong Bai18

KLASMOE & School of Mathematics and Statistics19

Northeast Normal University20

Changchun, Jilin, P. R. China21

[email protected]

You-Gan Wang23

School of Mathematical and Sciences24

Queensland University of Technology25

Brisbane, Queensland, Australia26

[email protected]

Received 4 June 202028

Revised 16 March 202129

Accepted 25 March 202130

Published31

In this paper, a ridgelized Hotelling’s T 2 test is developed for a hypothesis on a large-32

dimensional mean vector under certain moment conditions. It generalizes the main result33

of Chen et al. [A regularized Hotelling’s t2 test for pathway analysis in proteomic studies,34

J. Am. Stat. Assoc. 106(496) (2011) 1345–1360.] by relaxing their Gaussian assumption.35

This is achieved by establishing an exact four-moment theorem that is a simplified version36

of Tao and Vu’s [Random matrices: universality of local statistics of eigenvalues, Ann.37

Probab. 40(3) (2012) 1285–1315] work. Simulation results demonstrate the superiority38

∗Corresponding author.

2250011-1

https://dx.doi.org/10.1142/S2010326322500113

imac21

AQ: Please check throughout text for spelling errors, figures and tables. Please check title, author names, affiliations and emails. Please check: Keywords, Codes, Corresponding author.

iPad

Answer： The modified contents are in Page 2250011-23 line 2, Page 2250011-23 line 4，Page 2250011-24 line 5，Page 2250011-24 line 7. Remove one term in Page 2250011-23 line 2 to J_{k4x}, consequently, add it to the next three places Page 2250011-23 line 4，Page 2250011-24 line 5，Page 2250011-24 line 7. We revised the acknowledgements in Page 2250011-28 line 9 and Page 2250011-28 line 12.

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

G. Ha et al.

of the proposed test over the traditional Hotelling’s T 2 test and its several extensions in1

high-dimensional situations.2

Keywords: Random matrices; Hotelling’s T 2 test; four moment theorem; central limit3

theorem.4

Mathematics Subject Classification 2020: 15B52, 60B205

1. Introduction6

Hypothesis testing concerning mean vectors is a fundamental problem in multivari-7

ate statistical analysis, with a wide range of applications in fields such as biology,8

criminology, and marketing. Let x be a random vector in Rp or Cp, with mean9

vector µ and covariance matrix Σp. The hypothesis on the mean vector is10

H0 : µ = µ0 vs. H1 : µ = µ0 , (1.1)

where µ0 is a given location vector. In general, one may assume µ0 = 0, otherwise11

it can be directly subtracted from the population. Hence, the hypothesis reduces to12

H0 : µ = 0 vs. H1 : µ = 0. (1.2)

Let x1, . . . ,xn be a sequence of independent and identically distributed (i.i.d.)13

observations from the population x. For the testing problem (1.2), the well-known14

Hotelling’s T 2 [11] (HT) statistic is defined as15

HT = nX∗S− 1n X, (1.3)

where16

X =1n

n∑

j=1

xj , Sn =1

n − 1

⎛

⎝n∑

j=1

xjx∗j − nXX∗

⎞

⎠, (1.4)

and “∗” denotes the conjugate transpose of a vector or matrix. The HT test is a17

powerful tool for testing the mean vector, and it has many superiorities over others.18

For example, it is invariant with respect to the group of affine transformations (see19

[1, p. 174]); its exact distribution has been derived under Gaussian distributions,20

and is known as Hotelling’s T 2 distribution; and it is a powerful test when the21

sample size n is sufficiently large compared with the population dimension p.22

However, the HT test becomes invalid in high-dimensional situations where the23

dimension p is comparable to the sample size n. In particular, the test statistic is24

undefined when p > n−1 due to the noninvertibility of the sample covariance matrix25

Sn. Even when p ≤ n−1, it will lose its power if p is close to n, as shown by Bai and26

Saranadasa [2], who modified the HT test to handle such an effect of dimensional-27

ity by removing the inverse matrix S− 1n from the statistic and establishing a new28

central limit theorem (CLT) under both the null and alternative hypotheses. This29

reveals that this new test gains a certain attractive power property compared with30

the original in high-dimensional frameworks. Chen and Qin [10] extended Bai and31

Sarandasa’s [2] test to accommodate ultra-high-dimensional data. Srivastava [14]32

2250011-2

imac21

G.-F. Ha et al.

iPad

Change to “G.-F. Ha et al.”

iPad

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

Ridgelized Hotelling’s T 2 test on mean vectors of large dimension

suggested using the Moore–Penrose inverse of Sn in the HT statistic when p > n.1

Srivastava and Du [15] proposed another variant by removing all off-diagonal entries2

of Sn and only retaining its diagonal elements in the HT statistic.3

The strategy of ridge regression provides a new way to relieve the effect of high4

dimension, especially when p > n, in which case Sn is not invertible. In ridge5

regression, the coefficients of a linear model will be estimated through a shrinkage-6

type of least-squares estimation. It is a more practical and reliable method, and is7

superior to ordinary least-squares (OLS) when fitting pathological data. The main8

idea is to deliberately introduce a small perturbation in OLS estimation to improve9

its overall performance, at the sacrifice of unbiasedness.10

Following the idea of ridge regression, this paper considers a ridgelized11

Hotelling’s T 2 (RIHT) test statistic,12

RIHT = nX∗(Sn + aI)− 1X, (1.5)

for the hypothesis in (1.2), where I denotes the p × p identity matrix and a > 0 is13

a scalar tuning parameter. Here, the product aI is the perturbation that we add to14

the HT statistic such that the matrix Sn + aI is invertible. Note that our proposed15

statistic RIHT is exactly the same as the regularized Hotelling’s T 2 statistic from16

Chen et al. [9], whose asymptotic null distribution has been derived under real17

Gaussian distributions. However, such a distributional requirement is too restrictive18

for practical applications, which motivates the topic of this paper.19

The main contribution of this paper is a universality property of the RIHT20

test. It states that the CLT for the statistic in [9] is irrelevant to the details of21

the population distribution but is only determined by its first four moments. Such22

universality is obtained by establishing an exact four-moment theorem (EFMT),23

which is a simplified version of the classical four-moment theorem proposed by Tao24

and Vu [16]. Our approach can also be applied to some other problems of high-25

dimensional statistical inference.26

The rest of this paper is organized as follows. Section 2 details our model assump-27

tions and presents the main results. Section 3 reports on simulations. All technical28

proofs are given in Sec. 4.29

2. Main Results30

2.1. Model assumptions31

Let M be a p × p hermitian matrix whose empirical spectral distribution (ESD)32

function is defined as33

HMp (x) =

1p

p∑

i=1

I(λMi ≤ x),

where {λMi } are the p eigenvalues of M , and I(·) denotes the indicator function.34

If the ESD sequence {HMp } has a weak limit when p tends to infinity, the limit is35

called a limiting spectral distribution (LSD). For a sequence of sample covariance36

2250011-3

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

G. Ha et al.

matrices, the asymptotic properties of their ESDs can be found in Bai and Silver-1

stein [6], Yin and Krishnaiah [18], Yin [17], Silverstein [13], and Bai and Silverstein2

[3, 4].3

Let Xp× n = (x1, . . . ,xn) be the matrix of observations admitting the indepen-4

dent components model5

xj = Tpxj + µ, (2.1)

where µ denotes the location vector, Tp is a p × p transformation matrix with6

rank(Tp) = p, and Xp× n = (x1, . . . , xn) consists of p× n i.i.d. real or complex stan-7

dardized random variables. For simplicity of notation, we suppress the subscripts8

of the matrices Xp× n and Xp× n in the remainder of the paper.9

Our main assumptions are listed below.10

C1: p, n → ∞ such that cn ! p/n → c ∈ (0,∞);11

C2: Σp ! TpT∗p is a p × p positive definite matrix;12

C3: The ESD HΣpp of Σp converges to a proper probability measure H as13

p → ∞;14

C4: lim supp→∞ ∥Σp∥ < ∞ and lim supp→∞ ∥Σ− 1p ∥ < ∞, where || · || is the spectral15

norm;16

C5: Ex11 = 0, E|x11|2 = 1, and E|x11|4 < ∞. In addition, Ex211 = 0 when x11 is17

complex-valued.18

Note that condition C4 guarantees that Σp is invertible for all p.19

To obtain the universality of the CLT for the RIHT statistic, we adopt the idea20

from Tao and Vu [16] that was used to prove the local semicircular law for Wigner21

matrices under the four-moment matching condition. Another crucial assumption22

in their paper is the so-called C0 condition:23

C0: A random Hermitian matrix An = (ζij)1≤i,j≤n satisfies the following:24

1. The variables {ζij , 1 ≤ i ≤ j ≤ n} are independent (but not necessarily25

identically distributed) and have zero mean and unit variance;26

2. (Uniform exponential decay) there exist two constants C, C′ > 0 such that27

P (|ζij | ≥tC) ≤ exp(−t) for all t ≥C′ and 1 ≤ i, j ≤ n.28

This rigorous condition was relaxed by Jiang and Bai [12] to prove the univer-29

sality of the asymptotic law for the local spectral statistics under a spiked Fisher30

matrix model. Inspired by this, we modify Jiang and Bai’s [12] general four-moment31

theorem to deal with our testing problem. To this end, we define32

Y = (y1, . . . , yn) = (yij)1≤i≤p,1≤j≤n and Y = (y1, . . . ,yn) = (yij)1≤i≤p,1≤j≤n,

which satisfy the same structure as in (2.1), i.e.,33

yj = Tpyj + µ,

2250011-4

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011


where {yij} are i.i.d. standardized random variables, independent of X . The con-1

nection between the matrices X and Y is the following assumption:2

C6: The moments of yij match those of xij up to the fourth order, i.e.3

Eℜ(yij)αℑ(yij)β = Eℜ(xij)αℑ(xij)β ,

for α, β ≥0, such that α + β ≤ 4.4

2.2. Main results5

Denote the sample covariance matrices of {xi} and {yi} as Sxn and Sy

n, respectively.6

Our EFMT for the RIHT statistic is presented in the following theorem.7

Theorem 2.1 (EFMT for the RIHT statistic). Suppose that conditions8

C 1–C 6 hold. For any a > 0,9

g(X) =√

n

(X∗(Sx

n + aI)− 1X− 1n

tr(Sxn + aI)− 1Σp

)(2.2)

and10

g(Y) =√

n

(Y∗(Sy

n + aI)− 1Y − 1n

tr(Syn + aI)− 1Σp

)(2.3)

have the same limiting distribution if one of them does.11

According to Theorem 2.1, when the first four moments of xij match those of12

the standard Gaussian distribution, the limiting distribution of g(X) will be the13

same as if X comes from a Gaussian distribution. By this and the main conclusions14

in [9], the following CLT for the RIHT statistic is then obtained directly.15

Theorem 2.2 (CLT of the RIHT statistic). Suppose that {xij} and {xij} are16

real random variables satisfying conditions C1–C5, with Ex311 = 0 and Ex4

11 = 3.17

Under H0 in (1.2), we have18

g(X)√2cn

ptr((Sx

n + aI)− 1Σp(Sxn + aI)− 1Σp)

⇒ N(0, 1), (2.4)

where “⇒” denotes the convergence in distribution.19

If {xij} and {xij} are complex random variables satisfying conditions C1–C5,20

and the third and fourth moments match with those of standard complex normal.21

Under H0 in (1.2), the conclusion (2.4) becomes22

g(X)√cn

ptr((Sx

n + aI)− 1Σp(Sxn + aI)− 1Σp)

⇒ N(0, 1).

When applying Theorem 2.2 to the location test, we need to first estimate the23

centering and scaling terms, which both involve the unknown covariance matrix Σp.24

Here, we simply adopt the estimators from Chen et al. [9].25

2250011-5

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

G. Ha et al.

Theorem 2.3. Under the conditions of Theorem 2.2, we have, for any a > 0, in1

probability,2

√p

∣∣∣∣1ptr(Sn + aI)− 1Σp − Θ(1)

n (a, cn)∣∣∣∣→ 0 (2.5)

and3

1ptr((Sn + aI)− 1Σp(Sn + aI)− 1Σp) − Θ(2)

n (a, cn) → 0, (2.6)

where4

Θ(1)n (a, cn) =

1 − amFn,p(−a)1 − cn(1 − amFn,p(−a))

,

Θ(2)n (a, cn) =

1 − amFn,p(−a)(1 − cn + cnamFn,p(−a))3

− amFn,p(−a) − am′

Fn,p(−a)

(1 − cn + cnamFn,p(−a))4,

mFn,p(z) =1ptr(Sn − zI)− 1 and m′

Fn,p(z) =

1ptr(Sn − zI)− 2.

Corollary 1. Under the conditions of Theorem 2.2, we have5

T !

√p

(1pRIHT − Θ(1)

n (a, cn))

√κΘ(2)

n (a, cn)⇒ N(0, 1),

where Θ(1)n (a, cn) and Θ(2)

n (a, cn) are given in Theorem 2.3, κ = 2 for real case and 16

for complex.7

3. Simulations8

Simulation experiments were carried out to evaluate the performance of the pro-9

posed RIHT test. For comparison, we conducted the traditional HT test and tests10

proposed by Bai and Sarandasa [2], Srivastava [14], Srivastava and Du [15], and11

Chen and Qin [10]. We briefly describe these tests.12

(a) HT test (traditional HT test proposed by Hotelling [11]), defined in (1.3);13

(b) TBS test (Bai and Sarandasa [2]):14

TBS =nX∗X − trSn√

2n

n − 1pa

,

where15

pa =(n − 1)2

(n + 1)(n − 2)

[tr(S2

n) − 1n − 1

(trSn)2];

(c) TS test (Srivastava [14]), when p > n:16

TS = cp,n

(n − 1

2

)1/2(bp − n + 2(n − 1)2

nX′S+n X − 1

),

2250011-6

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011


where S+n is the Moore–Penrose inverse of Sn,1

b =(n + 1)(n − 2)

(n − 1)2(trSn/p)2

p− 1

[trS2

n − 1n − 1

(trSn)2], and cp,n =

(p − n + 2

p + 1

)1/2

;

(d) TSD test (Srivastava and Du [15]):2

TSD =nX∗D− 1

SnX − (n − 1)p

n − 3√

2(

trR2 − p2

n − 1

)cR

,

where DSn = diag(s11, . . . , spp) is a diagonal matrix, with (sii) the diagonal3

entries of Sn, R = D− 1/2Sn

SnD− 1/2Sn

, and4

cR = 1 +trR2

p3 /2;

(e) TCQ test (Chen and Qin [10]),5

TCQ =

n∑

j1=j2

x′j1xj2

√√√√√2tr

⎛

⎝n∑

j1=j2

(xj2− X(j1,j2))x′j2

(xj1− X(j1,j2))x′j1

⎞

⎠

,

where xj1 and xj2 are the j1th and j2th columns, respectively, of X, and X(j1,j2)6

is the sample mean excluding xj1 and xj2.7

3.1. Fluctuations of the statistic T8

We examine the fluctuation of the statistic T under finite sample situations. Two9

models for the underlying random matrix X = (xij) are considered.10

Model I. Standard Gaussian. The matrix X = (xij) has i.i.d. standard Gaussian11

entries, xij ∼N(0, 1).12

Model II. Non-Gaussian. The matrix X = (xij) consists of i.i.d. random variables13

with14

xij = mxij + mxij , (3.1)

where m = 0.7827, m = 0.6224, xij is a uniformly distributed random15

variable on the interval (−√

3,√

3), and xij , independent of xij , follows16

a distribution with density function17

f(x) =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

√2

2e−

√2x if x > 0,

√2

2e√

2x if x < 0.

(3.2)

2250011-7

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

G. Ha et al.

Fig. 1. Histograms of statistic T from 1000 independent replications fitted by standard Gaussiandensity curve (red) under Models I (left) and II (right) with dimensional settings (p, n) = (400, 200)and (400, 800).

It can be verified that the first four moments of xij in (3.1) match those1

of a standard Gaussian variable.2

We also consider an independent null setting for the observation matrix X, i.e.3

Independent null: µ = µ0 = 0, Σp = I, X = X. (3.3)

The tuning parameter of the statistic T is chosen as a = 1. The dimensional settings4

are (p, n) = (400, 200) and (400, 800). Figure 1 shows the histograms of the statistic5

under the two models, which demonstrate that the empirical distribution of the6

statistic can be well fitted by its limiting distribution.7

3.2. Empirical size and power8

We evaluate the performance of the proposed test in non-Gaussian situations. The9

matrix X is modeled as in (3.1), and the tuning parameter is still a = 1. For10

the observation matrix X, under the null hypothesis H0 , we consider two scenar-11

ios: the independent case as defined in (3.3), and a dependent case,12

Dependent null: µ = µ0 = 0, Σp = (σij)p× p with σii = 1 and σij =1p.

2250011-8

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011


Under the alternative hypothesis H1, the mean vector µ and covariance matrix Σp1

are set to be2

Dependent Alternative: µ = µ1 = (µ11, . . . , µ1p)′, where µ1i = 0 if i ≤ [p/3];3

µ1i = κ1/n if [p/3] < i ≤ [2p/3]; otherwise, µ1i = −κ1/n; Σp = (σij)p× p, where4

σii = 1 and σij = κ2/p.5

The dimensional settings are (p, n) = (400, 200), (400, 400), and (400, 600) under6

H0 , and (p, n) = (400, 400) under H1. The nominal significance level is fixed at α =7

0.05. All statistics in this section are averaged from 1000 independent replications.8

Empirical sizes of the six tests T , HT, TBS , TS , TSD, and TCQ are collected9

in Table 1. The HT test is only available for cases of p < n, and the TS test10

is only considered when p ≥n. The results show that TS suffers from serious size11

distortion, while the others maintain reasonable empirical sizes, as they are all close12

to the nominal level, α = 0.05.13

The empirical powers of T , TBS , TSD, and TCQ are reported in Tables 2–5,14

respectively. It is clearly demonstrated that the proposed test is comparable to the15

three competitors when the parameter κ2 is small, and becomes dominant as κ216

increases. In addition, TBS and TCQ have almost the same power for the studied17

alternative model.18

3.3. Tuning parameter19

We discuss the choice of the tuning parameter a of the RIHT test in simulations.20

The underlying matrix X is generated from model (3.1). The observation matrix21

Table 1. Empirical sizes of tests T , HT, TBS , TS , TSD, and TCQ at significance level α = 0.05.

Settings n T HT TBS TS TSD TCQ

Independent null 200 0.059 * 0.054 0.082 0.041 0.053400 0.049 * 0.050 0.271 0.045 0.050600 0.059 0.043 0.058 * 0.050 0.058

Dependent null 200 0.049 * 0.050 0.052 0.037 0.050400 0.046 * 0.060 0.276 0.048 0.060600 0.046 0.052 0.047 * 0.040 0.047

Table 2. Empirical power of proposed test T at significance level α = 0.05 with dimensions(p, n) = (400, 400).

❍❍❍❍κ1

κ2 3 6 9 12 15 18 21 24 27 30

3 0.053 0.046 0.067 0.054 0.074 0.053 0.050 0.065 0.059 0.0706 0.158 0.154 0.139 0.149 0.165 0.147 0.157 0.158 0.167 0.1609 0.426 0.435 0.463 0.473 0.439 0.443 0.453 0.467 0.492 0.49112 0.860 0.839 0.859 0.866 0.870 0.853 0.866 0.876 0.855 0.88615 0.991 0.990 0.992 0.992 0.992 0.993 0.999 0.996 0.997 0.99618 1 1 1 1 1 1 1 1 1 121 1 1 1 1 1 1 1 1 1 1

2250011-9

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

G. Ha et al.

Table 3. Empirical power of TBS at significance level α = 0.05 with dimensions (p, n) =(400, 400).

❍❍❍❍κ1

κ2 3 6 9 12 15 18 21 24 27 30

3 0.052 0.038 0.059 0.057 0.063 0.054 0.048 0.052 0.053 0.0506 0.150 0.142 0.117 0.118 0.133 0.079 0.086 0.086 0.088 0.0709 0.451 0.451 0.408 0.376 0.283 0.243 0.220 0.208 0.176 0.13412 0.888 0.854 0.854 0.805 0.737 0.684 0.602 0.562 0.433 0.37615 0.997 0.993 0.992 0.984 0.977 0.975 0.955 0.924 0.902 0.84718 1 1 1 1 1 1 0.999 1 0.999 0.99621 1 1 1 1 1 1 1 1 1 1

Table 4. Empirical power of TSD at significance level α = 0.05 with dimensions (p, n) =(400, 400).

❍❍❍❍κ1

κ2 3 6 9 12 15 18 21 24 27 30

3 0.043 0.033 0.049 0.046 0.053 0.052 0.040 0.040 0.046 0.0396 0.126 0.125 0.098 0.099 0.121 0.074 0.070 0.076 0.077 0.0599 0.424 0.417 0.384 0.345 0.254 0.207 0.182 0.179 0.152 0.11212 0.874 0.838 0.826 0.779 0.704 0.639 0.557 0.499 0.375 0.30515 0.997 0.990 0.990 0.981 0.973 0.963 0.938 0.890 0.865 0.78618 1 1 1 1 1 1 0.999 0.999 0.997 0.99021 1 1 1 1 1 1 1 1 1 1

Table 5. Empirical power of TCQ at significance level α = 0.05 with dimensions (p, n) =(400, 400).

❍❍❍❍κ1

κ2 3 6 9 12 15 18 21 24 27 30

3 0.052 0.038 0.057 0.057 0.063 0.054 0.048 0.052 0.053 0.0506 0.150 0.142 0.117 0.118 0.133 0.079 0.086 0.086 0.088 0.0709 0.451 0.451 0.408 0.376 0.282 0.243 0.220 0.208 0.176 0.13412 0.888 0.854 0.854 0.805 0.737 0.684 0.602 0.562 0.434 0.37615 0.997 0.993 0.992 0.984 0.977 0.975 0.955 0.924 0.902 0.84718 1 1 1 1 1 1 0.999 1 0.999 0.99621 1 1 1 1 1 1 1 1 1 1

X under H0 follows independent model (3.3). Under the alternative hypothesis, a1

degenerate model for X is designed as follows:2

Degenerate Alternative: µ = µ1 = (µ11, . . . , µ1p)′, where µ1i = 0 if i ≤ [p/3],3

µ1i = 0.1 if [p/3] < i ≤ [2p/3], and otherwise µ1i = −0.1; Σp = (σij)p× p, where4

σii = 1 and σij = 2/p.5

The dimensional settings are p = 90, 135, 180; n = p/cn, with cn = 0.5, 1, 1.5.6

Empirical size and power curves of our test T = T (a) are plotted in Figs. 2–4,7

where the tuning parameter a ranges from 0.05 to 2.5, with grid spacing 0.05. We8

observe that when p < n (Fig. 2, cn = 0.5), RIHT has stable size and power with9

2250011-10

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

Ridgelized Hotelling’s T2 test on mean vectors of large dimension

Fig. 2. Empirical size (left) and power (right) of test T = T (a) at significance level α = 0.05 withtuning parameter a ranging from 0.05 to 2.5. Dimension-to-sample-size ratio is cn = 0.5.

Fig. 3. Empirical size (left) and power (right) of test T = T (a) at significance level α = 0.05, withtuning parameter a ranging from 0.05 to 2.5. Dimension-to-sample-size ratio is cn = 1.

Fig. 4. Empirical size (left) and power (right) of test T = T (a) at significance level α = 0.05, withtuning parameter a ranging from 0.05 to 2.5. Dimension-to-sample-size ratio is cn = 1.5.

respect to the tuning parameter a. However, when p ≥n (Fig. 3, cn = 1 and Fig. 4,1

cn = 1.5), as the parameter a approaches 0, the empirical size of the test becomes2

biased upward while its power is reduced significantly. Such a deficiency disappears3

when a is large, say a ≥ 1. This is why we chose a = 1 in our simulations. The4

parameter tuning problem will be theoretically analyzed in our future work.5

2250011-11

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

G. Ha et al.

4. Proofs of Main Results1

4.1. Preliminaries2

Lemma 4.1. Let A be a p × p nonrandom hermitian matrix with bounded spec-3

tral norm, and z = (z1, . . . , zp)′ a p-dimensional random vector with independent4

coordinates satisfying5

Ezi = 0, E|zi|2 = 1, E|zi|4 < ∞, and |zi| ≤ ηn√

n, (4.1)

where {ηn} is a deterministic sequence, and ηn ↓ 0 as n → ∞. Then6

E(n− 1z∗Az − n− 1trA)2 = O(n− 1) and E(n− 1z∗Az − n− 1trA)4 = o(n− 1).

This lemma can be derived from Lemma 9.1 in Bai and Silverstein [6].7

Lemma 4.2. Let A be a p × p nonrandom hermitian matrix with bounded spectral8

norm, and Z = (zij) a p × n random matrix whose entries are i.i.d., satisfying the9

conditions in (4.1). Then10

E|Z∗k0 AZk0 |v ≤ Kv, v = 1, 2, . . . ,

where Zk0 = 1n

∑nj =k zj , where zj is the jth column of Z, k ∈ {1, 2, . . . , n}, and Kv11

is a constant depending on v.12

Proof. Since the spectral norm ∥A∥ is bounded, say ∥A∥ ≤ K0 , we have13

E|Z∗k0 AZk0 |v ≤ Kv

0 E|Z∗k0 Zk0 |v. (4.2)

Let ri denote the ith component of Zk0 , i.e.14

ri =1n

n∑

j =k

zij . (4.3)

Then15

Kv0 E|(Z∗

k0 Zk0 )|v = Kv0 E

(p∑

i=1

|ri|2)v

. (4.4)

Applying the multinomial formula (see [8, Chap. 1, Sec. 9]), we have16

(4.4) = Kv0 E

∑

i1+···+ip=v

v!i1!...ip!

|r1|2i1...|rp|2ip

≤ Kv0

v∑

l=1

∑

i1+···+il=v,1≤i1,...,il≤v

v!i1!...il!

∑

1≤i1<···<il≤p

E|ri1|2i1...E|ril|2il

≤ Kv0

v∑

l=1

∑

i1+···+il=v,1≤i1,...,il≤v

v!i1!...il!

p∑

i=1

E|ri|2i1...p∑

i=1

E|ri|2il. (4.5)

Next, we show that E|ri|2i ≤ O(n− 1), i.e. E| 1n∑n

j =k zij |2i ≤ O(n− 1) for i ≥1.17

2250011-12

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011


The general term of the expansion of | 1n∑n

j =k zij |2i is1

(1nzij1

)j1(

1nz∗ij1

)j′1

...

(1nzijm

)jm(

1nz∗ijm

)j′m

, (4.6)

where m ∈ {1, . . . , i}, and j and j′are integers taking values in {0, . . . , i}. If j = j

′=2

0, then ( 1nzij)j( 1

nz∗ij)j′= 1. If j+j

′= 1, then one can prove that E( 1

nzij)j( 1nz∗ij)

j′=3

0. For j + j′ ≥2, we have4

E

∣∣∣∣∣∣1n

n∑

j =k

zij

∣∣∣∣∣∣

2i

≤i∑

m=1

∑

j1+···+jm=2i,

1<j1,...,jm≤ 2i

(2i)!j1!...jm!

n∑

j=1,j =k

× E(

1n|zij |

)j1

· · ·n∑

j=1,j =k

E(

1n|zij |

)jm

, (4.7)

where j = j + j′.5

Notice that∑i

m=1

∑j1+···+jm=2i,1<j1,...,jm≤2i

(2i)!

j1!...jm!is bounded. Thus, the6

largest order of (4.7) is achieved when m = 1 and j = 2i. By the conditions in7

(4.1), we have8

E|zij |α =

{O(1) if α ≤ 4,

O((ηn√

n)α− 4 ) if α > 4.(4.8)

Thus,9

n∑

j=1,j =k

E(

1n|zij |

)2i

≤{

n1− 2iO(1) if 2 ≤ 2i ≤ 4,

n1− 2iO((ηn√

n)2i− 4 ) if 2i > 4,(4.9)

and10

E

∣∣∣∣∣∣1n

n∑

j =k

zij

∣∣∣∣∣∣

2i

≤ O(n− 1).

Similarly, the largest order of (4.5) is achieved when i = v. Thus,11

p∑

i=1

Ervi ≤ O(1).

Combining the above results yields12

E|Z∗k0 AZk0 |v ≤ Kv, v = 1, 2, . . . ,

which completes the proof.13

2250011-13

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

G. Ha et al.

Lemma 4.3. Under the assumptions of Lemma 4.1,1

E∣∣∣∣1nz∗Az

∣∣∣∣v

≤ Kv, v = 1, 2, . . . ,

where Kv is a constant depending on v.2

Proof. Because ∥A∥ is bounded by a constant, say K0 , we have3

E∣∣∣∣1nz∗Az

∣∣∣∣v

≤ Kv0 E(

1nz∗z)v

= Kv0 E

∑

i1+···+ip=v

v!i1!...ip!

(1n|z1|2

)i1

...

(1n|zp|2

)ip

≤ Kv0

v∑

l=1

∑

i1+···+il=v,1≤i1,...,il≤v

v!i1!...il!

∑

1≤i1<···<il≤p

× E(

1n|zi1|2

)i1

...E(

1n|zil|2

)il

≤ Kv0

v∑

l=1

∑

i1+···+il=v,1≤i1,...,il≤v

v!i1!...il!

p∑

i=1

E(

1n|zi|2

)i1

...p∑

i=1

E(

1n|zi|2

)il

,

whose largest order occurs when l = 1 and i = v. According to (4.8), we get4

p∑

i=1

E(

1n|zi|2

)v

≤{

O(n− v+1) if v ≤ 2,

O(η2v− 4n n− 1) if v > 2.

Hence, E| 1nz∗Az|v is bounded by a constant Kv. The proof is then complete.5

Lemma 4.4. Under the assumptions of Lemma 4.2,6

E|Z∗k0 Azk|v ≤ Kv, v = 1, 2, . . . ,

where Kv is a constant depending on v.7

Proof. Let z∗ = (z∗1 , . . . , z∗p) = Z∗k0 A. By the Cauchy–Schwartz inequality, we have8

E|Z∗k0 Azk|v ≤

√E|z∗zk|2v.

The general term of the expansion of |z∗zk|2v is9

(z∗i1zi1k)i1(zi1z∗i1k)i

′1...(z∗il

zilk)il(zilz∗ilk)i

′l, (4.10)

2250011-14

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011


where l ∈ {1, . . . , v}. According to the analysis in the proof of Lemma 4.2, i and i′

1

take values in {1, . . . , v}. Hence,2

E|z∗zk|2v ≤v∑

l=1

∑

i1+···+il=2v,1<i1,...,il≤2v

(2v)!i1!...il!

×(

p∑

i=1

E|z∗i zik|i1...p∑

i=1

E|z∗i zik|il

), it ≥2, (4.11)

where it + i′t = it, with t ∈ {1, 2, . . . , l}.3

For 2 ≤ it ≤ 2v, applying Holder’s inequality, we get4

p∑

i=1

E|z∗i zik|it ≤(

p∑

i=1

E|z∗i zik|2)2v− it

2v− 2(

p∑

i=1

E|z∗i zik|2v

) it− 22v− 2

.

When (∑p

i=1 E|z∗i zik|2)v ≤∑p

i=1 E|z∗i zik|2v, we have5

p∑

i=1


i=1

E|z∗i zik|il ≤p∑

i=1

E|zi|2vE|zik|2v.

Otherwise,6

p∑

i=1


i=1

E|z∗i zik|il ≤ (E(Z∗k0 A)(Z∗

k0 A)∗)v,

where E|zik|2 = 1. Then we obtain7

p∑

i=1


i=1

E|z∗i zik|il ≤ (E(Z∗k0 A)(Z∗

k0 A)∗)v +p∑

i=1

E|zi|2vE|zik|2v.

Since8

v∑

l=1

∑

i1+···+il=2v,1<i1,...,il≤2v

(2v)!i1!...il!

(4.12)

is bounded by a constant Kv, we obtain9

E|z∗zk|2v ≤ Kv

[(E(Z∗

k0 A)(Z∗k0 A)∗)v +

p∑

i=1

E|zi|2vE|zik|2v

]. (4.13)

For the expectation E|zi|2v, we have10

E|zi|2v = E|Z∗k0 A·i|2v.

Applying Holder’s inequality, we have11

E|zi|2v ≤ Kv

⎡

⎣

⎛

⎝p∑

j=1

|Aji|2n∑

s=1,s=k

E∣∣∣∣1nz∗js

∣∣∣∣2⎞

⎠v

+p∑

j=1

|Aji|2vn∑

s=1,s=k

E∣∣∣∣1nz∗js

∣∣∣∣2v⎤

⎦.

2250011-15

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

G. Ha et al.

Since supp ||A|| < ∞, we have1

supp

p∑

j=1

|Aji|2 = supp

(A∗A)ii < ∞ and

supp

p∑

j=1

|Aji|2v ≤ supp

⎛

⎝p∑

j=1

|Aji|2⎞

⎠v

< ∞.

In addition,∑n

s=1,s=k E| 1nz∗js|2 = O(n− 1), and2

n∑

s=1,s=k

E∣∣∣∣1nzjs

∣∣∣∣2v

≤{

O(n− 2v+1) if 2 ≤ 2v ≤ 4,

o(n− v− 1) if 2v > 4.

Thus we conclude that3

E|zi|2v ≤{

O(n− 1) if 2 ≤ 2v ≤ 4,

O(n− v) if 2v > 4.

Also,4

E|zik|2v =

{O(1) if 2 ≤ 2v ≤ 4,

O((ηn√

n)2v− 4 ) if 2v > 4

and (E(Z∗k0 A)(Z∗

k0 A)∗)v = O(1). From the above results and Lemma 4.2, we obtain5

that E|z∗zk|2v ≤ Kv. The proof is complete.6

4.2. Proof of Theorem 2.17

Following steps of truncation, centralization, and rescaling similar to Bai and Sil-8

verstein [5], we may assume that the random variables {xij} satisfy9

|xij | ≤ ηn√

n, Exij = 0, E|xij |2 = 1, and E|xij |4 = O(1),

where {ηn} is a deterministic sequence with ηn ↓ 0 whose convergence rate can be10

made arbitrarily slow. Under these assumptions, we have, for any α > 4,11

E|xij |α = O((ηn√

n)α− 4 ).

If xij is complex-valued, then12

Ex2ij = O(n− 1).

For simplicity, we suppress the subscripts of matrices Σp and Tp below.13

Let14

Sxn =

1n

n∑

i=1

xix∗i − XX∗ (4.14)

and15

g(X) =√

n

(X∗(Sx

n + aI)− 1X − 1n

tr(Sxn + aI)− 1Σ

). (4.15)

2250011-16

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011


We first show that the difference between g(X) and g(X) is negligible asymptoti-1

cally. Indeed, we have2

g(X) − g(X) = − 1n

g(X) +n − 1

n

[√nX∗

((Sx

n +(n − 1)a

nI)− 1 − (Sx

n + aI)− 1

)X

− 1√n

(tr(Sx

n +(n − 1)a

nI)− 1

Σ − tr(Sxn + aI)− 1Σ

)]

= Oa.s.(n− 1/2),

where “a.s.” means “almost surely.” Hence, we can prove Theorem 2.1 by using3

Sxn ! 1

n

⎛

⎝n∑

j=1

xjx∗j − XX∗

⎞

⎠ and Syn ! 1

n

⎛

⎝n∑

j=1

yjy∗j − YY∗

⎞

⎠ (4.16)

instead of their original definitions.4

We now prove EFMT by showing that Eeitg(X ) − Eeitg(Y ) → 0. Write Xk =5

(x1, . . . ,xk,yk+1, . . . ,yn) and Xk0 = (x1, . . . ,xk− 1,yk+1, . . . ,yn), with the con-6

ventions that Xn = X and X0 = Y. Let7

g(Xk) =√

n

(X∗

k(Snk + aI)− 1Xk − 1n

tr(Snk + aI)− 1Σ)

and8

g(Xk0 ) =√

n

(X∗

k0 (Snk0 + aI)− 1Xk0 −1n

tr(Snk0 + aI)− 1Σ)

,

where Xk = 1nXk1, Xk0 = 1

nXk0 1, Snk = 1nXkX∗

k − XkX∗k, Snk0 = 1

nXk0 X∗k0 −9

Xk0 X∗k0 , and 1 denotes a p-dimensional vector consisting of 1’s.10

Since11

Eeitg(X ) − Eeitg(Y ) =n∑

k=1

E(eitg(X k) − eitg(X k− 1))

=n∑

k=1

Eeitg(X k0)(eit(g(X k)− g(X k0))

− eit(g(X k− 1)− g(X k0)), (4.17)

we next calculate the order of g(Xk) − g(Xk0 ).12

Since13

Xk = Xk0 +1nxk, (4.18)

2250011-17

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

G. Ha et al.

we have1

Snk = Snk0 + anxkx∗k − n− 1xkX∗

k0 − n− 1Xk0 x∗k

= Snk1 − n− 1(xkX∗k0 + Xk0 x∗

k), (4.19)

where an = n− 1n2 . By the inverse matrix formula, we have2

B− 1nk = B− 1

nk1 + B− 1nk1(n

− 1xk, Xk0 )Λ− 1

(X∗

k0

n− 1x∗k

)B− 1

nk1 (4.20)

and3

B− 1nk1 = B− 1

nk0 −anB− 1

nk0 xkx∗kB

− 1nk0

1 + anx∗kB

− 1nk0 xk

, (4.21)

where B⋆ = S⋆ + aI for ⋆ = nk, nk0 or nk1, and4

Λ = I2 −(

n− 1X∗k0 B

− 1nk1xk X∗

k0 B− 1nk1Xk0

n− 2x∗kB

− 1nk1xk n− 1x∗

kB− 1nk1Xk0

). (4.22)

This implies5

B− 1nk1(n

− 1xk, Xk0 )Λ− 1

(X∗

k0

n− 1x∗k

)B− 1

nk1

=Υ

|1 − n− 1x∗kB

− 1nk1Xk0 |2 − n− 2x∗

kB− 1nk1xkX∗

k0 B− 1nk1Xk0

, (4.23)

where6

Υ = n− 1B− 1nk1xk(1 − n− 1x∗

kB− 1nk1Xk0 )X∗

k0 B− 1nk1

+ n− 2B− 1nk1Xk0 x∗

kB− 1nk1xkX∗

k0 B− 1nk1

+ n− 2B− 1nk1xkX∗

k0 B− 1nk1Xk0 x∗

kB− 1nk1

+ n− 1B− 1nk1Xk0 (1 − n− 1X∗

k0 B− 1nk1xk)x∗

kB− 1nk1. (4.24)

Applying the identity B− 1nk1 = B− 1

nk0 − anβkB− 1nk0 xkx∗

kB− 1nk0 , where βk = 1/(1 +7

anx∗kB

− 1nk0 xk), we obtain8

X∗kΥXk := I1 + I2 + I3 + I4 , (4.25)

where9

I1 = n− 1βkX∗kB

− 1nk0 xk(1 − n− 1βkx∗

kB− 1nk0 Xk0 )

× (X∗k0 B

− 1nk0 Xk − anβkX∗

k0 B− 1nk0 xkx∗

kB− 1nk0 Xk), (4.26)

I2 = n− 2βk(X∗kB

− 1nk0 Xk0 − anβkX∗

kB− 1nk0 xkx∗

kB− 1nk0 Xk0 )

× x∗kB

− 1nk0 xk(X∗

k0 B− 1nk0 Xk − anβkX∗

k0 B− 1nk0 xkx∗

kB− 1nk0 Xk), (4.27)

I3 = β2kn− 2X∗

kB− 1nk0 xk(X∗

k0 B− 1nk0 Xk0

− anβkX∗k0 B

− 1nk0 xkx∗

kB− 1nk0 Xk0 )x∗

kB− 1nk0 Xk (4.28)

2250011-18

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011


and1

I4 = (X∗kB

− 1nk0 Xk0 − anβkX∗

kB− 1nk0 xkx∗

kB− 1nk0 Xk0 )

× (1 − n− 1βkX∗k0 B

− 1nk0 xk)n− 1βkx∗

kB− 1nk0 Xk. (4.29)

We now control the first term, I1, which can be represented as2

I1 = n− 1βkX∗kB

− 1nk0 xkX∗

k0 B− 1nk0 Xk0 + n− 2βkX∗

kB− 1nk0 xkX∗

k0 B− 1nk0 xk

−n− 2β2kX

∗kB

− 1nk0 xkx∗

kB− 1nk0 Xk0 X∗

k0 B− 1nk0 Xk

−n− 1anβ2kX

∗kB

− 1nk0 xkX∗

k0 B− 1nk0 xkx∗

kB− 1nk0 Xk

+ n− 2anβ3kX

∗kB

− 1nk0 xkx∗

kB− 1nk0 Xk0 X∗

k0 B− 1nk0 xkx∗

kB− 1nk0 Xk. (4.30)

Because βk and ∥B⋆∥ for ⋆ = nk, nk0, and nk1 are all bounded by some constant,3

and by applying Lemmas 4.2–4.4, we get4

E|X∗kB

− 1nk0 xkX∗

k0 B− 1nk0 xk|2

≤ E|X∗k0 B

− 1nk0 xkX∗

k0 B− 1nk0 xk|2 + E

∣∣∣∣1nx∗

kB− 1nk0 xkX∗

k0 B− 1nk0 xk

∣∣∣∣2

≤√

E|X∗k0 B

− 1nk0 xk|4 E|X∗

k0 B− 1nk0 xk|4

+

√

E∣∣∣∣1nx∗

kB− 1nk0 xk

∣∣∣∣4

E|X∗k0 B

− 1nk0 xk|4

= O(1),

E|X∗kB

− 1nk0 xkx∗

kB− 1nk0 Xk0 X∗

k0 B− 1nk0 Xk|2

≤ K

[E|X∗

k0 B− 1nk0 xkx∗

kB− 1nk0 Xk0 X∗

k0 B− 1nk0 Xk0 |2

+ E∣∣∣∣1nX∗

k0 B− 1nk0 xkx∗

kB− 1nk0 Xk0 X∗

k0 B− 1nk0 xk

∣∣∣∣2

+ E∣∣∣∣1nx∗

kB− 1nk0 xkx∗

kB− 1nk0 Xk0 X∗

k0 B− 1nk0 Xk0

∣∣∣∣2

+ E∣∣∣∣

1n2

x∗kB

− 1nk0 xkx∗

kB− 1nk0 Xk0 X∗

k0 B− 1nk0 xk

∣∣∣∣2]

= O(1),

E|X∗kB

− 1nk0 xkX∗

k0 B− 1nk0 xkx∗

kB− 1nk0 Xk|2

≤ K

[E|X∗

k0 B− 1nk0 xkX∗

k0 B− 1nk0 xkx∗

kB− 1nk0 Xk0 |2

2250011-19

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

G. Ha et al.

+ E∣∣∣∣1nX∗

k0 B− 1nk0 xkX∗

k0 B− 1nk0 xkx∗

kB− 1nk0 xk

∣∣∣∣2

+ E∣∣∣∣1nx∗

kB− 1nk0 xkX∗

k0 B− 1nk0 xkx∗

kB− 1nk0 Xk0

∣∣∣∣2

+ E∣∣∣∣

1n2

x∗kB

− 1nk0 xkX∗

k0 B− 1nk0 xkx∗

kB− 1nk0 xk

∣∣∣∣2]

= O(1),

and1

E|X∗kB

− 1nk0 xkx∗

kB− 1nk0 Xk0 X∗

k0 B− 1nk0 xkx∗

kB− 1nk0 Xk|2

≤ K

[E|X∗

k0 B− 1nk0 xkx∗

kB− 1nk0 Xk0 X∗

k0 B− 1nk0 xkx∗

kB− 1nk0 Xk0 |2

+ E∣∣∣∣1nX∗

k0 B− 1nk0 xkx∗

kB− 1nk0 Xk0 X∗

k0 B− 1nk0 xkx∗

kB− 1nk0 xk

∣∣∣∣2

+ E∣∣∣∣1nx∗

kB− 1nk0 xkx∗

kB− 1nk0 Xk0 X∗

k0 B− 1nk0 xkx∗

kB− 1nk0 Xk0

∣∣∣∣2

+ E∣∣∣∣

1n2

x∗kB

− 1nk0 xkx∗

kB− 1nk0 Xk0 X∗

k0 B− 1nk0 xkx∗

kB− 1nk0 xk

∣∣∣∣2]

= O(1).

Thus we obtain2

I1 = n− 1βkX∗k0 B

− 1nk0 xkX∗

k0 B− 1nk0 Xk0 + n− 2βkx∗

kB− 1nk0 xkX∗

k0 B− 1nk0 Xk0 + ζn,

(4.31)

where ζn = OL2(n− 2), i.e.√

E|n2ζn|2 is bounded in n. The other terms, I2, I3 , and3

I4 , can be controlled similarly, from which one can verify that4

(4.25) := I1 + I2 + I3 + I4 ,

where5

I2 = n− 2βk(X∗k0 B

− 1nk0 Xk0 )2x∗

kB− 1nk0 xk

+ OL2(n− 2),

I3 = OL2(n− 2),

I4 = n− 1βkX∗k0 B

− 1nk0 Xk0 x∗

kB− 1nk0 Xk0

+ n− 2βkX∗k0 B

− 1nk0 Xk0 x∗

kB− 1nk0 xk + OL2(n

− 2).

(4.32)

2250011-20

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011


Furthermore,1

the denominator of (4.23)

= 1 − n− 1x∗kB

− 1nk1Xk0 − n− 1X∗

k0 B− 1nk1xk

+ n− 2(|x∗kB

− 1nk1Xk0 |2 − x∗

kB− 1nk1xkX∗

k0 B− 1nk1Xk0 )

= 1 − n− 1βkx∗kB

− 1nk0 Xk0 − n− 1βkX∗

k0 B− 1nk0 xk

+ n− 2((1 + anx∗kB

− 1nk0 xk)|βkx∗

kB− 1nk0 Xk0 |2

− βkx∗kB

− 1nk0 xkX∗

k0 B− 1nk0 Xk0 )

= 1 − n− 1βkx∗kB

− 1nk0 Xk0 − n− 1βkX∗

k0 B− 1nk0 xk

−n− 2βkx∗kB

− 1nk0 xkX∗

k0 B− 1nk0 Xk0 + OL2(n

− 2)

= 1 + OL2(n− 1). (4.33)

Collecting the results in (4.23)–(4.33), we obtain2

X∗k(4.23)Xk

= n− 2βk(X∗k0 B

− 1nk0 Xk0 )2x∗

kB− 1nk0 xk

+ n− 1βkX∗k0 B

− 1nk0 Xk0 X∗

k0 B− 1nk0 xk

+ 2n− 2βkX∗k0 B

− 1nk0 Xk0 x∗

kB− 1nk0 xk

+ n− 1βkx∗kB

− 1nk0 Xk0 X∗

k0 B− 1nk0 Xk0 + OL2(n

− 2). (4.34)

Also, we have3

X∗kB

− 1nk1Xk − X∗

k0 B− 1nk0 Xk0

= n− 1βk(X∗k0 B

− 1nk0 xk + x∗

kB− 1nk0 Xk0 + n− 1x∗

kB− 1nk0 xk)

− anβk|x∗kB

− 1nk0 Xk0 |2 (4.35)

and4

− 1n

tr(B− 1nk − B− 1

nk0 )Σ = n− 1anβkx∗kB

− 1nk0 ΣB− 1

nk0 xk + OL2(n− 2). (4.36)

It thus follows that5

g(Xk) − g(Xk0 ) = n− 1/2βk(X∗k0 B

− 1nk0 xk + x∗

kB− 1nk0 Xk0 )(1 + X∗

k0 B− 1nk0 Xk0 )

−n1/2anβk|x∗kB

− 1nk0 Xk0 |

2 (4.37)

+ n− 3 /2βkx∗kB

− 1nk0 xk(1 + X∗

k0 B− 1nk0 Xk0 )2 (4.38)

+ n− 1/2anβkx∗kB

− 1nk0 ΣB− 1

nk0 xk (4.39)

+ OL2(n− 3 /2). (4.40)

2250011-21

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

G. Ha et al.

It is easy to check that the order of the difference between 1/(1 + 1nx∗

kB− 1nk0 xk)1

and 1/(1 + anx∗kB

− 1nk0 xk) is OL2(n− 1). So, we simplify the calculation by using2

1/(1 + 1nx∗

kB− 1nk0 xk) instead of 1/(1 + anx∗

kB− 1nk0 xk). Similarly, we use 1/n instead3

of an.4

Let βk0 = 1/(1 + 1n trΣB− 1

nk0 ), γk = 1n (x∗

kB− 1nk0 xk − trΣB− 1

nk0 ), εk = 1n (x∗

kB− 1nk0 Σ5

B− 1nk0 xk − tr(ΣB− 1

nk0 )2). One can verify that6

βk =1

1 + n− 1x∗kB

− 1nk0 xk

=1

1 + n− 1trΣB− 1nk0

−n− 1(x∗

kB− 1nk0 xk − trΣB− 1

nk0 )(1 + n− 1trΣB− 1

nk0 )(1 + n− 1x∗kB

− 1nk0 xk)

= βk0 − βkβk0 γk

and7

1 − βk =n− 1x∗

kB− 1nk0 xk

1 + n− 1x∗kB

− 1nk0 xk

=1n

βkx∗kB

− 1nk0 xk.

By Lemmas 4.1–4.4, and because βk0 is bounded, B− 1nk0 ΣB− 1

nk0 is bounded in a8

spectral norm, and 1n tr(ΣB− 1

nk0 )2 = OL2(1), we get9

(4.37)

= n− 1/2βk[(X∗k0 B

− 1nk0 xk + x∗

kB− 1nk0 Xk0 )(1 + X∗

k0 B− 1nk0 Xk0 ) − |x∗

kB− 1nk0 Xk0 |2]

= −n− 1/2βk0 X∗k0 B

− 1nk0 ΣB− 1

nk0 Xk0

+ n− 1/2βk0 (X∗k0 B

− 1nk0 xk + x∗

kB− 1nk0 Xk0 )(1 + X∗

k0 B− 1nk0 Xk0 )

−n− 1/2βk0 X∗k0 B

− 1nk0 (xkx∗

k − Σ)B− 1nk0 Xk0 − n− 1/2β2

k0 γk

× [(X∗k0 B

− 1nk0 xk + x∗

kB− 1nk0 Xk0 )(1 + X∗

k0 B− 1nk0 Xk0 ) − |x∗

kB− 1nk0 Xk0 |2]

+ n− 1/2βkβ2k0 γ

2k[(X∗

k0 B− 1nk0 xk + x∗

kB− 1nk0 Xk0 )(1 + X∗

k0 B− 1nk0 Xk0 )

− |x∗kB

− 1nk0 Xk0 |2], (4.41)

(4.38)

= n− 3 /2βkx∗kB

− 1nk0 xk(1 + X∗

k0 B− 1nk0 Xk0 )2

= n− 1/2(1 − βk)(1 + X∗k0 B

− 1nk0 Xk0 )2

= n− 1/2(1 − βk0 + β2k0 γk − βkβ2

k0 γ2k)(1 + X∗

k0 B− 1nk0 Xk0 )2

= n− 1/2(1 − βk0 )(1 + X∗k0 B

− 1nk0 Xk0 )2+n− 1/2β2

k0 γk(1 + X∗k0 B

− 1nk0 Xk0 )2

−n− 1/2βkβ2k0 γ

2k(1 + X∗

k0 B− 1nk0 Xk0 )2, (4.42)

2250011-22

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011


and1

(4.39) = n− 1/2βkεk + n− 1/2βk

(1n

tr(ΣB− 1nk0 )

2

)

= n− 1/2(βk0 − β2k0 γk + βkβ2

k0 γ2k)εk

+ n− 1/2(βk0 − β2k0 γk + βkβ2

k0 γ2k)(

1n

tr(ΣB− 1nk0 )

2

)

= n− 1/2βk0

(1n

tr(ΣB− 1nk0 )

2

)+ n− 1/2βk0 εk

−n− 1/2β2k0 γk

(1n

tr(ΣB− 1nk0 )

2

)− n− 1/2(β2

k0 γk − βkβ2k0 γ

2k)εk

+ n− 1/2βkβ2k0 γ

2k

(1n

tr(ΣB− 1nk0 )

2

), (4.43)

where2

n− 1/2(β2k0 γk − βkβ2

k0 γ2k)εk − n− 1/2βkβ2

k0 γ2k

(1n

tr(ΣB− 1nk0 )

2

)= OL2(n

− 3 /2).

From (4.37)–(4.43), we get3

g(Xk) − g(Xk0 ) := Jk1x + Jk2x + Jk3 x + Jk4 x + Jk5 x, (4.44)

where4

Jk1x = −n− 1/2βk0 X∗k0 B

− 1nk0 ΣB− 1

nk0 Xk0

+ n− 1/2(1 − βk0 )(1 + X∗k0 B

− 1nk0 Xk0 )2

+ n− 1/2βk0

(1n

tr(ΣB− 1nk0 )

2

),

Jk2x = n− 1/2βk0 (X∗k0 B

− 1nk0 Xk0 + 1)(X∗

k0 B− 1nk0 xk + x∗

kB− 1nk0 Xk0 )

−n− 1/2βk0 X∗k0 B

− 1nk0 (xkx∗

k − Σ)B− 1nk0 Xk0

+ n− 1/2β2k0 γk(1 + X∗

k0 B− 1nk0 Xk0 )2

+ n− 1/2βk0 εk − n− 1/2β2k0 γk

(1n

tr(ΣB− 1nk0 )

2

),

Jk3 x = −n− 1/2β2k0 γk[(X∗

k0 B− 1nk0 xk + x∗

kB− 1nk0 Xk0 )

× (1 + X∗k0 B

− 1nk0 Xk0 ) − |x∗

kB− 1nk0 Xk0 |2],

Jk4 x = n− 1/2βkβ2k0 γ

2k[(X∗

k0 B− 1nk0 xk + x∗

kB− 1nk0 Xk0 )

× (1 + X∗k0 B

− 1nk0 Xk0 ) − |x∗

kB− 1nk0 Xk0 |2]

−n− 1/2βkβ2k0 γ

2k(1 + X∗

k0 B− 1nk0 Xk0 )2,

2250011-23

iPad

Delete this term

iPad

iPad

Add this term here

iPad

iPad

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

G. Ha et al.

and1

Jk5 x = OL2(n− 3 /2).

By similar arguments, one can prove that2

g(Xk− 1) − g(Xk0 ) = Jk1 + Jk2 + Jk3 + Jk4 + Jk5 , (4.45)

where {Jki, i = 2, . . . , 5} are defined similarly to Jkix, with xk replaced by yk and3

Jk1x = Jk1. From (4.17) and (4.45), we have4

Eeitg(X ) − Eeitg(Y )

=n∑

k=1

Eeit(g(X k0)+Jk1)(eit(Jk2x+Jk3x+Jk4x+Jk5x)

− eit(Jk2+Jk3+Jk4+Jk5)). (4.46)

For Jk4 x, we have5

E|Jk4 x| ≤ E|n− 1/2βkβ2k0 γ

2kX

∗k0 B

− 1nk0 Xk0 X∗

k0 B− 1nk0 xk|

+ E|n− 1/2βkβ2k0 γ

2kX

∗k0 B

− 1nk0 xk|

+ E|n− 1/2βkβ2k0 γ

2kX

∗k0 B

− 1nk0 Xk0 x∗

kB− 1nk0 Xk0 |

+ E|n− 1/2βkβ2k0 γ

2kx

∗kB

− 1nk0 Xk0 |

+ E|n− 1/2βkβ2k0 γ

2k(X∗

k0 B− 1nk0 xk)2|

+ E|n− 1/2βkβ2k0 γ

2k(X∗

k0 B− 1nk0 Xk0 )2|

+ E|2n− 1/2βkβ2k0 γ

2kX

∗k0 B

− 1nk0 Xk0 | + E|n− 1/2βkβ2

k0 γ2k|.

Let K be an upper bound of βk and βk0 . According to Lemmas 4.1–4.4, we get6

E|n− 1/2βkβ2k0 γ

2kX

∗k0 B

− 1nk0 Xk0 X∗

k0 B− 1nk0 xk|

≤ Kn− 1/2√

E|γk|4 E|X∗k0 B

− 1nk0 Xk0 X∗

k0 B− 1nk0 xk|2

≤ Kn− 1/2

√E|γk|4

√E|X∗

k0 B− 1nk0 Xk0 |4 E|X∗

k0 B− 1nk0 xk|4

= o(n− 1).

Similarly, the orders of7


2kX

∗k0 B

− 1nk0 xk|,


2kX

∗k0 B

− 1nk0 Xk0 x∗

kB− 1nk0 Xk0 |,


2kx

∗kB

− 1nk0 Xk0 |,

2250011-24

iPad

iPad

iPad

Add this term here

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011



2k(X∗

k0 B− 1nk0 xk)2|,


2k(X∗

k0 B− 1nk0 Xk0 )2|,

E|2n− 1/2βkβ2k0 γ

2kX

∗k0 B

− 1nk0 Xk0 |,

and E|n− 1/2βkβ2k0 γ

2k| are all o(n− 1). In addition, Jk4 has the same order as Jk4 x.1

By the inequality |eia − 1| ≤ |a| for any a ∈ R, E|Jk4 x| = o(n− 1), and E|Jk4 | =2

o(n− 1), we have3

|Eeit(g(X k0)+Jk1)eit(Jk2x+Jk3x)(eit(Jk4x+Jk5x) − 1)|

≤ E|eit(Jk4x+Jk5x) − 1|

≤ E|t(Jk4 x + Jk5 x)| = o(n− 1), (4.47)

and similarly,4

|Eeit(g(X k0)+Jk1)eit(Jk2+Jk3)(eit(Jk4+Jk5) − 1)| = o(n− 1). (4.48)

Next, we show that5

n∑

k=1

Eeit(g(X k0)+Jk1)(eit(Jk2x+Jk3x) − eit(Jk2+Jk3)) → 0. (4.49)

Applying the Taylor expansion, we have6

n∑

k=1

Eeit(g(X k0)+Jk1)(eit(Jk2x+Jk3x) − eit(Jk2+Jk3))

≤n∑

k=1

E∣∣∣∣1 + it(Jk2x + Jk3 x) − 1

2t2(Jk2x + Jk3 x)2

− 1 − it(Jk2 + Jk3 ) +12t2(Jk2 + Jk3 )2 + oL2(n

− 1)∣∣∣∣, (4.50)

where ζn = oL2(an) means√

E|a− 1n ζn|2 → 0. Notice that7

EkJk2x = EkJk2 = 0, EkJ2k2x = EkJ2

k2, EkJk3 x = EkJk3 , (4.51)

where Ek denotes the conditional expectation with respect to the σ-field generated8

by the random variables {x1, . . . ,xk− 1,yk+1, . . . ,yn}. Then9

E(

it|Ek(Jk2x − Jk2)| + it|Ek(Jk3 x − Jk3 )| +t2

2|Ek(J2

k2x − J2k2)|)

= 0.

In addition, we have10

n∑

k=1

E∣∣∣∣1 + it(Jk2x + Jk3 x) − 1

2t2(Jk2x + Jk3 x)2 − 1 − it(Jk2 + Jk3 )

+12t2(Jk2 + Jk3 )2 + oL2(n

− 1)∣∣∣∣

2250011-25

iPad

Put this term at the next line

iPad

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

G. Ha et al.

≤n∑

k=1

E(

it|Ek(Jk2x − Jk2)| + it|Ek(Jk3 x − Jk3 )| +t2

2|Ek(J2

k2x − J2k2)|)

+n∑

k=1

E(

t2

2|J2

k3 x − J2k3 | + t2|Jk2xJk3 x − Jk2Jk3 | + oL2(n

− 1))

.

Therefore, we obtain1

(4.50) ≤n∑

k=1

E(

t2

2|J2

k3 x − J2k3 | + t2|Jk2xJk3 x − Jk2Jk3 | + oL2(n

− 1))

.

(4.52)

Obviously, for J2k2x, we have2

E|Jk2x|2

≤ E|n− 1/2βk0 X∗k0 B

− 1nk0 Xk0 X∗

k0 B− 1nk0 xk|2 + E|n− 1/2βk0 X∗

k0 B− 1nk0 xk|2

+ E|n− 1/2βk0 X∗k0 B

− 1nk0 Xk0 x∗

kB− 1nk0 Xk0 |2 + E|n− 1/2βk0 x∗

kB− 1nk0 Xk0 |2

+ E|n− 1/2βk0 (X∗k0 B

− 1nk0 xk)2|2 + E|n− 1/2βk0 X∗

k0 B− 1nk0 ΣB− 1

nk0 Xk0 |2

+ E|n− 1/2β2k0 γk(X∗

k0 B− 1nk0 Xk0 )2|2 + E|2n− 1/2β2

k0 γkX∗k0 B

− 1nk0 Xk0 |2

+ E|n− 1/2β2k0 γk|2 + E|n− 1/2βk0 εk|2 + E

∣∣∣∣n− 1/2β2

k0 γk

(1n

tr(ΣB− 1nk0 )

2

)∣∣∣∣2

.

According to Lemmas 4.1–4.4, we get3

E|n− 1/2βk0 X∗k0 B

− 1nk0 Xk0 X∗

k0 B− 1nk0 xk|2

≤ Kn− 1E|X∗k0 B

− 1nk0 Xk0 |2|X∗

k0 B− 1nk0 xk|2

≤ Kn− 1√

E|X∗k0 B

− 1nk0 Xk0 |4 E|X∗

k0 B− 1nk0 xk|4

= O(n− 1).

Similarly,4

E|n− 1/2βk0 X∗k0 B

− 1nk0 xk|2,

E|n− 1/2βk0 X∗k0 B

− 1nk0 Xk0 x∗

kB− 1nk0 Xk0 |2,

E|n− 1/2βk0 x∗kB

− 1nk0 Xk0 |2,

E|n− 1/2βk0 (X∗k0 B

− 1nk0 xk)2|2

and E|n− 1/2βk0 X∗k0 B

− 1nk0 ΣB− 1

nk0 Xk0 |2 are all O(n− 1). Also, we have5

E|n− 1/2β2k0 γk(X∗

k0 B− 1nk0 Xk0 )2|2 ≤ Kn− 1E|γk|2|X∗

k0 B− 1nk0 Xk0 |4

≤ Kn− 1√

E|γk|4 E|X∗k0 B

− 1nk0 Xk0 |8

= o(n− 3 /2)

2250011-26

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

Ridgelized Hotelling’s T2 test on mean vectors of large dimension

and1

E|2n− 1/2β2k0 γkX∗

k0 B− 1nk0 Xk0 |2 = o(n− 3 /2).

According to Lemma 4.1,2

E|n− 1/2β2k0 γk|2, E|n− 1/2βk0 εk|2, and E

∣∣∣∣n− 1/2β2

k0 γk

(1n

tr(ΣB− 1nk0 )

2

)∣∣∣∣2

are all O(n− 2). Thus, E|Jk2x|2 = O(n− 1). By the same argument, E|Jk2|2 = O(n− 1).3

For J2k3 x, we have4

E|Jk3 x|2 ≤ E|n− 1/2β2k0 γkX∗

k0 B− 1nk0 Xk0 X∗

k0 B− 1nk0 xk|2

+ E|n− 1/2β2k0 γkX∗

k0 B− 1nk0 xk|2

+ E|n− 1/2β2k0 γkX∗

k0 B− 1nk0 Xk0 x∗

kB− 1nk0 Xk0 |2

+ E|n− 1/2β2k0 γkx∗

kB− 1nk0 Xk0 |2

+ E|n− 1/2β2k0 γk(X∗

k0 B− 1nk0 xk)2|2.

Similarly,5

E|n− 1/2β2k0 γkX∗

k0 B− 1nk0 Xk0 X∗

k0 B− 1nk0 xk|2

≤ Kn− 1√

E|γk|4 E|X∗k0 B

− 1nk0 Xk0 |4 |X∗

k0 B− 1nk0 xk|4

≤ Kn− 1

√E|γk|4

√E|X∗

k0 B− 1nk0 Xk0 |8 E|X∗

k0 B− 1nk0 xk|8

= o(n− 3 /2).

The orders of6


k0 B− 1nk0 xk|2,


k0 B− 1nk0 Xk0 x∗

kB− 1nk0 Xk0 |2,

E|n− 1/2β2k0 γkx∗

kB− 1nk0 Xk0 |2, and

E|n− 1/2β2k0 γk(X∗

k0 B− 1nk0 xk)2|2

are o(n− 3 /2). Hence, the order of E|Jk3 |2 is o(n− 3 /2).7

Finally, we get8

E|J2k3 x − J2

k3 | ≤ E|Jk3 x|2 + E|Jk3 |2 = o(n− 3 /2)

and9

E|Jk2xJk3 x − Jk2Jk3 | ≤ E|Jk2xJk3 x| + E|Jk2Jk3 |

≤√

E|Jk2x|2E|Jk3 x|2 +√

E|Jk2|2E|Jk3 |2

= o(n− 5 /4 ),

which gives the result in (4.49), and the proof is complete.10

2250011-27

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011

G. Ha et al.

4.3. Proof of Theorem 2.31

Proof. The conclusions of Theorem 2.3 can be obtained from similar arguments to2

Sec. 3 of Bai and Zhou [7] by a simple improvement of the order of o(1) to o(1/√

n).3

For example, the first conclusion (2.5) can be derived from the result above Eq.4

(3.14) in Bai and Zhou [7], substituting Emn(z) for mn(z). The second conclusion5

(2.6) can be proved by the identity above Eq. (3.3) in Bai and Zhou [7]. Thus, we6

omit the details.7

Acknowledgments8

G.F. Ha and Z.D. Bai were partially supported by NSFC China, Grant 11571067.9

Q.Y. Zhang was partially supported by Capital University of Economics and10

Business: The Fundamental Research Funds for Beijing Universities, Grant11

XRZ2021044, and NSFC China, Grants 11971097, 11771073, and 11571067. Y.G.12

Wang was supported by the ARC Center of Excellence for Mathematical and13

Statistical Frontiers and the Australian Research Council Discovery Project14

DP160104292.15

References16

[1] T. Anderson, An Introduction to Multivariate Statistical Analysis, 3rd edn. (Wiley,17

New York, 2003).18

[2] Z. D. Bai and H. Saranadasa, Effect of high dimension: By an example of a two19

sample problem, Stat. Sinica 6(2) (1996) 311–329.20

[3] Z. D. Bai and J. W. Silverstein, No eigenvalues outside the support of the limiting21

spectral distribution of large-dimensional sample covariance matrices, Ann. Probab.22

26(1) (1998) 316–345.23

[4] Z. D. Bai and J. W. Silverstein, Exact separation of eigenvalues of large dimensional24

sample covariance matrices, Ann. Probab. 27(3) (1999) 1536–1555.25

[5] Z. D. Bai and J. W. Silverstein, Clt for linear spectral statistics of large-dimensional26

sample covariance matrices, Ann. Probab. 32(1A) (2004) 553–605.27

[6] Z. D. Bai, and J. W. Silverstein, Spectral Analysis of Large Dimensional Random28

Matrices, 2nd edn (Springer, New York, 2010).29

[7] Z. D. Bai and W. Zhou, Large sample covariance matrices without independence30

structures in columns, Stat. Sinica 18(2) (2008), 425-442.31

[8] C. Berge, Principles of Combinatorics (Academic Press, New York, 1971).32

[9] L. S. Chen, D. Paul, R. L. Prentice and P. Wang, A regularized Hotelling’s t2 test33

for pathway analysis in proteomic studies, J. Am. Stat. Assoc. 106(496) (2011)34

1345–1360.35

[10] S. X. Chen and Y. L. Qin, A two-sample test for high-dimensional data with appli-36

cations to gene-set testing, Ann. Stat. 38(2) (2010) 808–835.37

[11] H. Hotelling, The generalization of student’s ratio, Ann. Math Stat. 2 (1931) 360–378.38

[12] D. D. Jiang and Z. D. Bai, Generalized four moment theorem and an application39

to CLT for spiked eigenvalues of large-dimensional covariance matrices, Bernoulli40

27(1) (2021) 274–294.41

[13] J. W. Silverstein, Strong convergence of the empirical distribution of eigenvalues of42

large dimensional random matrices, J. Multivariate Anal. 55(2) (1995) 331–339.43

2250011-28

iPad

iPad

iPad

iPad

iPad

iPad

Change to “11771073”

iPad

Put “and” here

iPad

iPad

iPad

Change to “G.-F. Ha”

iPad

iPad

iPad

iPad

iPad

Chang to “Y.-G. Wang”

iPad

iPad

iPad

Page Proof

May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011


[14] M. S. Srivastava, Multivariate theory for analyzing high dimensional data, J. Japan1

Stat. Soc. 37 (2007) 53–86.2

[15] M. S. Srivastava and M. Du, A test for the mean vector with fewer observations than3

the dimension, J. Multivariate Anal. 99(3) (2008) 386–402.4

[16] T. Tao and V. Vu, Random matrices: Universality of local statistics of eigenvalues,5

Ann. Probab. 40(3) (2012) 1285–1315.6

[17] Y. Q. Yin, Limiting spectral distribution for a class of random matrices, J. Multi-7

variate Anal. 20(1) (1986) 50–68.8

[18] Y. Q. Yin and P. R. Krishnaiah, A limit-theorem for the eigenvalues of product of 29

random matrices, J. Multivariate Anal. 13(4) (1983) 489–507.10

2250011-29

page proof - download.szjspx.com.cn

Documents