page proof - download.szjspx.com.cn
TRANSCRIPT
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
Random Matrices: Theory and Applications1
2250011 (29 pages)2
c⃝ World Scientific Publishing Company3
DOI: 10.1142/S20103263225001134
5
Ridgelized Hotelling’s T2 test on mean vectors6
of large dimension7
Gao-Fan Ha8
KLASMOE & School of Mathematics and Statistics9
Northeast Normal University10
Changchun, Jilin, P. R. China11
Qiuyan Zhang∗13
School of Statistics14
Capital University of Economics and Business15
Beijing, P. R. China16
Zhidong Bai18
KLASMOE & School of Mathematics and Statistics19
Northeast Normal University20
Changchun, Jilin, P. R. China21
You-Gan Wang23
School of Mathematical and Sciences24
Queensland University of Technology25
Brisbane, Queensland, Australia26
Received 4 June 202028
Revised 16 March 202129
Accepted 25 March 202130
Published31
In this paper, a ridgelized Hotelling’s T 2 test is developed for a hypothesis on a large-32
dimensional mean vector under certain moment conditions. It generalizes the main result33
of Chen et al. [A regularized Hotelling’s t2 test for pathway analysis in proteomic studies,34
J. Am. Stat. Assoc. 106(496) (2011) 1345–1360.] by relaxing their Gaussian assumption.35
This is achieved by establishing an exact four-moment theorem that is a simplified version36
of Tao and Vu’s [Random matrices: universality of local statistics of eigenvalues, Ann.37
Probab. 40(3) (2012) 1285–1315] work. Simulation results demonstrate the superiority38
∗Corresponding author.
2250011-1
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
G. Ha et al.
of the proposed test over the traditional Hotelling’s T 2 test and its several extensions in1
high-dimensional situations.2
Keywords: Random matrices; Hotelling’s T 2 test; four moment theorem; central limit3
theorem.4
Mathematics Subject Classification 2020: 15B52, 60B205
1. Introduction6
Hypothesis testing concerning mean vectors is a fundamental problem in multivari-7
ate statistical analysis, with a wide range of applications in fields such as biology,8
criminology, and marketing. Let x be a random vector in Rp or Cp, with mean9
vector µ and covariance matrix Σp. The hypothesis on the mean vector is10
H0 : µ = µ0 vs. H1 : µ = µ0 , (1.1)
where µ0 is a given location vector. In general, one may assume µ0 = 0, otherwise11
it can be directly subtracted from the population. Hence, the hypothesis reduces to12
H0 : µ = 0 vs. H1 : µ = 0. (1.2)
Let x1, . . . ,xn be a sequence of independent and identically distributed (i.i.d.)13
observations from the population x. For the testing problem (1.2), the well-known14
Hotelling’s T 2 [11] (HT) statistic is defined as15
HT = nX∗S− 1n X, (1.3)
where16
X =1n
n∑
j=1
xj , Sn =1
n − 1
⎛
⎝n∑
j=1
xjx∗j − nXX∗
⎞
⎠, (1.4)
and “∗” denotes the conjugate transpose of a vector or matrix. The HT test is a17
powerful tool for testing the mean vector, and it has many superiorities over others.18
For example, it is invariant with respect to the group of affine transformations (see19
[1, p. 174]); its exact distribution has been derived under Gaussian distributions,20
and is known as Hotelling’s T 2 distribution; and it is a powerful test when the21
sample size n is sufficiently large compared with the population dimension p.22
However, the HT test becomes invalid in high-dimensional situations where the23
dimension p is comparable to the sample size n. In particular, the test statistic is24
undefined when p > n−1 due to the noninvertibility of the sample covariance matrix25
Sn. Even when p ≤ n−1, it will lose its power if p is close to n, as shown by Bai and26
Saranadasa [2], who modified the HT test to handle such an effect of dimensional-27
ity by removing the inverse matrix S− 1n from the statistic and establishing a new28
central limit theorem (CLT) under both the null and alternative hypotheses. This29
reveals that this new test gains a certain attractive power property compared with30
the original in high-dimensional frameworks. Chen and Qin [10] extended Bai and31
Sarandasa’s [2] test to accommodate ultra-high-dimensional data. Srivastava [14]32
2250011-2
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
Ridgelized Hotelling’s T 2 test on mean vectors of large dimension
suggested using the Moore–Penrose inverse of Sn in the HT statistic when p > n.1
Srivastava and Du [15] proposed another variant by removing all off-diagonal entries2
of Sn and only retaining its diagonal elements in the HT statistic.3
The strategy of ridge regression provides a new way to relieve the effect of high4
dimension, especially when p > n, in which case Sn is not invertible. In ridge5
regression, the coefficients of a linear model will be estimated through a shrinkage-6
type of least-squares estimation. It is a more practical and reliable method, and is7
superior to ordinary least-squares (OLS) when fitting pathological data. The main8
idea is to deliberately introduce a small perturbation in OLS estimation to improve9
its overall performance, at the sacrifice of unbiasedness.10
Following the idea of ridge regression, this paper considers a ridgelized11
Hotelling’s T 2 (RIHT) test statistic,12
RIHT = nX∗(Sn + aI)− 1X, (1.5)
for the hypothesis in (1.2), where I denotes the p × p identity matrix and a > 0 is13
a scalar tuning parameter. Here, the product aI is the perturbation that we add to14
the HT statistic such that the matrix Sn + aI is invertible. Note that our proposed15
statistic RIHT is exactly the same as the regularized Hotelling’s T 2 statistic from16
Chen et al. [9], whose asymptotic null distribution has been derived under real17
Gaussian distributions. However, such a distributional requirement is too restrictive18
for practical applications, which motivates the topic of this paper.19
The main contribution of this paper is a universality property of the RIHT20
test. It states that the CLT for the statistic in [9] is irrelevant to the details of21
the population distribution but is only determined by its first four moments. Such22
universality is obtained by establishing an exact four-moment theorem (EFMT),23
which is a simplified version of the classical four-moment theorem proposed by Tao24
and Vu [16]. Our approach can also be applied to some other problems of high-25
dimensional statistical inference.26
The rest of this paper is organized as follows. Section 2 details our model assump-27
tions and presents the main results. Section 3 reports on simulations. All technical28
proofs are given in Sec. 4.29
2. Main Results30
2.1. Model assumptions31
Let M be a p × p hermitian matrix whose empirical spectral distribution (ESD)32
function is defined as33
HMp (x) =
1p
p∑
i=1
I(λMi ≤ x),
where {λMi } are the p eigenvalues of M , and I(·) denotes the indicator function.34
If the ESD sequence {HMp } has a weak limit when p tends to infinity, the limit is35
called a limiting spectral distribution (LSD). For a sequence of sample covariance36
2250011-3
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
G. Ha et al.
matrices, the asymptotic properties of their ESDs can be found in Bai and Silver-1
stein [6], Yin and Krishnaiah [18], Yin [17], Silverstein [13], and Bai and Silverstein2
[3, 4].3
Let Xp× n = (x1, . . . ,xn) be the matrix of observations admitting the indepen-4
dent components model5
xj = Tpxj + µ, (2.1)
where µ denotes the location vector, Tp is a p × p transformation matrix with6
rank(Tp) = p, and Xp× n = (x1, . . . , xn) consists of p× n i.i.d. real or complex stan-7
dardized random variables. For simplicity of notation, we suppress the subscripts8
of the matrices Xp× n and Xp× n in the remainder of the paper.9
Our main assumptions are listed below.10
C1: p, n → ∞ such that cn ! p/n → c ∈ (0,∞);11
C2: Σp ! TpT∗p is a p × p positive definite matrix;12
C3: The ESD HΣpp of Σp converges to a proper probability measure H as13
p → ∞;14
C4: lim supp→∞ ∥Σp∥ < ∞ and lim supp→∞ ∥Σ− 1p ∥ < ∞, where || · || is the spectral15
norm;16
C5: Ex11 = 0, E|x11|2 = 1, and E|x11|4 < ∞. In addition, Ex211 = 0 when x11 is17
complex-valued.18
Note that condition C4 guarantees that Σp is invertible for all p.19
To obtain the universality of the CLT for the RIHT statistic, we adopt the idea20
from Tao and Vu [16] that was used to prove the local semicircular law for Wigner21
matrices under the four-moment matching condition. Another crucial assumption22
in their paper is the so-called C0 condition:23
C0: A random Hermitian matrix An = (ζij)1≤i,j≤n satisfies the following:24
1. The variables {ζij , 1 ≤ i ≤ j ≤ n} are independent (but not necessarily25
identically distributed) and have zero mean and unit variance;26
2. (Uniform exponential decay) there exist two constants C, C′ > 0 such that27
P (|ζij | ≥tC) ≤ exp(−t) for all t ≥C′ and 1 ≤ i, j ≤ n.28
This rigorous condition was relaxed by Jiang and Bai [12] to prove the univer-29
sality of the asymptotic law for the local spectral statistics under a spiked Fisher30
matrix model. Inspired by this, we modify Jiang and Bai’s [12] general four-moment31
theorem to deal with our testing problem. To this end, we define32
Y = (y1, . . . , yn) = (yij)1≤i≤p,1≤j≤n and Y = (y1, . . . ,yn) = (yij)1≤i≤p,1≤j≤n,
which satisfy the same structure as in (2.1), i.e.,33
yj = Tpyj + µ,
2250011-4
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
Ridgelized Hotelling’s T 2 test on mean vectors of large dimension
where {yij} are i.i.d. standardized random variables, independent of X . The con-1
nection between the matrices X and Y is the following assumption:2
C6: The moments of yij match those of xij up to the fourth order, i.e.3
Eℜ(yij)αℑ(yij)β = Eℜ(xij)αℑ(xij)β ,
for α, β ≥0, such that α + β ≤ 4.4
2.2. Main results5
Denote the sample covariance matrices of {xi} and {yi} as Sxn and Sy
n, respectively.6
Our EFMT for the RIHT statistic is presented in the following theorem.7
Theorem 2.1 (EFMT for the RIHT statistic). Suppose that conditions8
C 1–C 6 hold. For any a > 0,9
g(X) =√
n
(X∗(Sx
n + aI)− 1X− 1n
tr(Sxn + aI)− 1Σp
)(2.2)
and10
g(Y) =√
n
(Y∗(Sy
n + aI)− 1Y − 1n
tr(Syn + aI)− 1Σp
)(2.3)
have the same limiting distribution if one of them does.11
According to Theorem 2.1, when the first four moments of xij match those of12
the standard Gaussian distribution, the limiting distribution of g(X) will be the13
same as if X comes from a Gaussian distribution. By this and the main conclusions14
in [9], the following CLT for the RIHT statistic is then obtained directly.15
Theorem 2.2 (CLT of the RIHT statistic). Suppose that {xij} and {xij} are16
real random variables satisfying conditions C1–C5, with Ex311 = 0 and Ex4
11 = 3.17
Under H0 in (1.2), we have18
g(X)√2cn
ptr((Sx
n + aI)− 1Σp(Sxn + aI)− 1Σp)
⇒ N(0, 1), (2.4)
where “⇒” denotes the convergence in distribution.19
If {xij} and {xij} are complex random variables satisfying conditions C1–C5,20
and the third and fourth moments match with those of standard complex normal.21
Under H0 in (1.2), the conclusion (2.4) becomes22
g(X)√cn
ptr((Sx
n + aI)− 1Σp(Sxn + aI)− 1Σp)
⇒ N(0, 1).
When applying Theorem 2.2 to the location test, we need to first estimate the23
centering and scaling terms, which both involve the unknown covariance matrix Σp.24
Here, we simply adopt the estimators from Chen et al. [9].25
2250011-5
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
G. Ha et al.
Theorem 2.3. Under the conditions of Theorem 2.2, we have, for any a > 0, in1
probability,2
√p
∣∣∣∣1ptr(Sn + aI)− 1Σp − Θ(1)
n (a, cn)∣∣∣∣→ 0 (2.5)
and3
1ptr((Sn + aI)− 1Σp(Sn + aI)− 1Σp) − Θ(2)
n (a, cn) → 0, (2.6)
where4
Θ(1)n (a, cn) =
1 − amFn,p(−a)1 − cn(1 − amFn,p(−a))
,
Θ(2)n (a, cn) =
1 − amFn,p(−a)(1 − cn + cnamFn,p(−a))3
− amFn,p(−a) − am′
Fn,p(−a)
(1 − cn + cnamFn,p(−a))4,
mFn,p(z) =1ptr(Sn − zI)− 1 and m′
Fn,p(z) =
1ptr(Sn − zI)− 2.
Corollary 1. Under the conditions of Theorem 2.2, we have5
T !
√p
(1pRIHT − Θ(1)
n (a, cn))
√κΘ(2)
n (a, cn)⇒ N(0, 1),
where Θ(1)n (a, cn) and Θ(2)
n (a, cn) are given in Theorem 2.3, κ = 2 for real case and 16
for complex.7
3. Simulations8
Simulation experiments were carried out to evaluate the performance of the pro-9
posed RIHT test. For comparison, we conducted the traditional HT test and tests10
proposed by Bai and Sarandasa [2], Srivastava [14], Srivastava and Du [15], and11
Chen and Qin [10]. We briefly describe these tests.12
(a) HT test (traditional HT test proposed by Hotelling [11]), defined in (1.3);13
(b) TBS test (Bai and Sarandasa [2]):14
TBS =nX∗X − trSn√
2n
n − 1pa
,
where15
pa =(n − 1)2
(n + 1)(n − 2)
[tr(S2
n) − 1n − 1
(trSn)2];
(c) TS test (Srivastava [14]), when p > n:16
TS = cp,n
(n − 1
2
)1/2(bp − n + 2(n − 1)2
nX′S+n X − 1
),
2250011-6
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
Ridgelized Hotelling’s T 2 test on mean vectors of large dimension
where S+n is the Moore–Penrose inverse of Sn,1
b =(n + 1)(n − 2)
(n − 1)2(trSn/p)2
p− 1
[trS2
n − 1n − 1
(trSn)2], and cp,n =
(p − n + 2
p + 1
)1/2
;
(d) TSD test (Srivastava and Du [15]):2
TSD =nX∗D− 1
SnX − (n − 1)p
n − 3√
2(
trR2 − p2
n − 1
)cR
,
where DSn = diag(s11, . . . , spp) is a diagonal matrix, with (sii) the diagonal3
entries of Sn, R = D− 1/2Sn
SnD− 1/2Sn
, and4
cR = 1 +trR2
p3 /2;
(e) TCQ test (Chen and Qin [10]),5
TCQ =
n∑
j1=j2
x′j1xj2
√√√√√2tr
⎛
⎝n∑
j1=j2
(xj2− X(j1,j2))x′j2
(xj1− X(j1,j2))x′j1
⎞
⎠
,
where xj1 and xj2 are the j1th and j2th columns, respectively, of X, and X(j1,j2)6
is the sample mean excluding xj1 and xj2.7
3.1. Fluctuations of the statistic T8
We examine the fluctuation of the statistic T under finite sample situations. Two9
models for the underlying random matrix X = (xij) are considered.10
Model I. Standard Gaussian. The matrix X = (xij) has i.i.d. standard Gaussian11
entries, xij ∼N(0, 1).12
Model II. Non-Gaussian. The matrix X = (xij) consists of i.i.d. random variables13
with14
xij = mxij + mxij , (3.1)
where m = 0.7827, m = 0.6224, xij is a uniformly distributed random15
variable on the interval (−√
3,√
3), and xij , independent of xij , follows16
a distribution with density function17
f(x) =
⎧⎪⎪⎪⎨
⎪⎪⎪⎩
√2
2e−
√2x if x > 0,
√2
2e√
2x if x < 0.
(3.2)
2250011-7
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
G. Ha et al.
Fig. 1. Histograms of statistic T from 1000 independent replications fitted by standard Gaussiandensity curve (red) under Models I (left) and II (right) with dimensional settings (p, n) = (400, 200)and (400, 800).
It can be verified that the first four moments of xij in (3.1) match those1
of a standard Gaussian variable.2
We also consider an independent null setting for the observation matrix X, i.e.3
Independent null: µ = µ0 = 0, Σp = I, X = X. (3.3)
The tuning parameter of the statistic T is chosen as a = 1. The dimensional settings4
are (p, n) = (400, 200) and (400, 800). Figure 1 shows the histograms of the statistic5
under the two models, which demonstrate that the empirical distribution of the6
statistic can be well fitted by its limiting distribution.7
3.2. Empirical size and power8
We evaluate the performance of the proposed test in non-Gaussian situations. The9
matrix X is modeled as in (3.1), and the tuning parameter is still a = 1. For10
the observation matrix X, under the null hypothesis H0 , we consider two scenar-11
ios: the independent case as defined in (3.3), and a dependent case,12
Dependent null: µ = µ0 = 0, Σp = (σij)p× p with σii = 1 and σij =1p.
2250011-8
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
Ridgelized Hotelling’s T 2 test on mean vectors of large dimension
Under the alternative hypothesis H1, the mean vector µ and covariance matrix Σp1
are set to be2
Dependent Alternative: µ = µ1 = (µ11, . . . , µ1p)′, where µ1i = 0 if i ≤ [p/3];3
µ1i = κ1/n if [p/3] < i ≤ [2p/3]; otherwise, µ1i = −κ1/n; Σp = (σij)p× p, where4
σii = 1 and σij = κ2/p.5
The dimensional settings are (p, n) = (400, 200), (400, 400), and (400, 600) under6
H0 , and (p, n) = (400, 400) under H1. The nominal significance level is fixed at α =7
0.05. All statistics in this section are averaged from 1000 independent replications.8
Empirical sizes of the six tests T , HT, TBS , TS , TSD, and TCQ are collected9
in Table 1. The HT test is only available for cases of p < n, and the TS test10
is only considered when p ≥n. The results show that TS suffers from serious size11
distortion, while the others maintain reasonable empirical sizes, as they are all close12
to the nominal level, α = 0.05.13
The empirical powers of T , TBS , TSD, and TCQ are reported in Tables 2–5,14
respectively. It is clearly demonstrated that the proposed test is comparable to the15
three competitors when the parameter κ2 is small, and becomes dominant as κ216
increases. In addition, TBS and TCQ have almost the same power for the studied17
alternative model.18
3.3. Tuning parameter19
We discuss the choice of the tuning parameter a of the RIHT test in simulations.20
The underlying matrix X is generated from model (3.1). The observation matrix21
Table 1. Empirical sizes of tests T , HT, TBS , TS , TSD, and TCQ at significance level α = 0.05.
Settings n T HT TBS TS TSD TCQ
Independent null 200 0.059 * 0.054 0.082 0.041 0.053400 0.049 * 0.050 0.271 0.045 0.050600 0.059 0.043 0.058 * 0.050 0.058
Dependent null 200 0.049 * 0.050 0.052 0.037 0.050400 0.046 * 0.060 0.276 0.048 0.060600 0.046 0.052 0.047 * 0.040 0.047
Table 2. Empirical power of proposed test T at significance level α = 0.05 with dimensions(p, n) = (400, 400).
❍❍❍❍κ1
κ2 3 6 9 12 15 18 21 24 27 30
3 0.053 0.046 0.067 0.054 0.074 0.053 0.050 0.065 0.059 0.0706 0.158 0.154 0.139 0.149 0.165 0.147 0.157 0.158 0.167 0.1609 0.426 0.435 0.463 0.473 0.439 0.443 0.453 0.467 0.492 0.49112 0.860 0.839 0.859 0.866 0.870 0.853 0.866 0.876 0.855 0.88615 0.991 0.990 0.992 0.992 0.992 0.993 0.999 0.996 0.997 0.99618 1 1 1 1 1 1 1 1 1 121 1 1 1 1 1 1 1 1 1 1
2250011-9
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
G. Ha et al.
Table 3. Empirical power of TBS at significance level α = 0.05 with dimensions (p, n) =(400, 400).
❍❍❍❍κ1
κ2 3 6 9 12 15 18 21 24 27 30
3 0.052 0.038 0.059 0.057 0.063 0.054 0.048 0.052 0.053 0.0506 0.150 0.142 0.117 0.118 0.133 0.079 0.086 0.086 0.088 0.0709 0.451 0.451 0.408 0.376 0.283 0.243 0.220 0.208 0.176 0.13412 0.888 0.854 0.854 0.805 0.737 0.684 0.602 0.562 0.433 0.37615 0.997 0.993 0.992 0.984 0.977 0.975 0.955 0.924 0.902 0.84718 1 1 1 1 1 1 0.999 1 0.999 0.99621 1 1 1 1 1 1 1 1 1 1
Table 4. Empirical power of TSD at significance level α = 0.05 with dimensions (p, n) =(400, 400).
❍❍❍❍κ1
κ2 3 6 9 12 15 18 21 24 27 30
3 0.043 0.033 0.049 0.046 0.053 0.052 0.040 0.040 0.046 0.0396 0.126 0.125 0.098 0.099 0.121 0.074 0.070 0.076 0.077 0.0599 0.424 0.417 0.384 0.345 0.254 0.207 0.182 0.179 0.152 0.11212 0.874 0.838 0.826 0.779 0.704 0.639 0.557 0.499 0.375 0.30515 0.997 0.990 0.990 0.981 0.973 0.963 0.938 0.890 0.865 0.78618 1 1 1 1 1 1 0.999 0.999 0.997 0.99021 1 1 1 1 1 1 1 1 1 1
Table 5. Empirical power of TCQ at significance level α = 0.05 with dimensions (p, n) =(400, 400).
❍❍❍❍κ1
κ2 3 6 9 12 15 18 21 24 27 30
3 0.052 0.038 0.057 0.057 0.063 0.054 0.048 0.052 0.053 0.0506 0.150 0.142 0.117 0.118 0.133 0.079 0.086 0.086 0.088 0.0709 0.451 0.451 0.408 0.376 0.282 0.243 0.220 0.208 0.176 0.13412 0.888 0.854 0.854 0.805 0.737 0.684 0.602 0.562 0.434 0.37615 0.997 0.993 0.992 0.984 0.977 0.975 0.955 0.924 0.902 0.84718 1 1 1 1 1 1 0.999 1 0.999 0.99621 1 1 1 1 1 1 1 1 1 1
X under H0 follows independent model (3.3). Under the alternative hypothesis, a1
degenerate model for X is designed as follows:2
Degenerate Alternative: µ = µ1 = (µ11, . . . , µ1p)′, where µ1i = 0 if i ≤ [p/3],3
µ1i = 0.1 if [p/3] < i ≤ [2p/3], and otherwise µ1i = −0.1; Σp = (σij)p× p, where4
σii = 1 and σij = 2/p.5
The dimensional settings are p = 90, 135, 180; n = p/cn, with cn = 0.5, 1, 1.5.6
Empirical size and power curves of our test T = T (a) are plotted in Figs. 2–4,7
where the tuning parameter a ranges from 0.05 to 2.5, with grid spacing 0.05. We8
observe that when p < n (Fig. 2, cn = 0.5), RIHT has stable size and power with9
2250011-10
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
Ridgelized Hotelling’s T2 test on mean vectors of large dimension
Fig. 2. Empirical size (left) and power (right) of test T = T (a) at significance level α = 0.05 withtuning parameter a ranging from 0.05 to 2.5. Dimension-to-sample-size ratio is cn = 0.5.
Fig. 3. Empirical size (left) and power (right) of test T = T (a) at significance level α = 0.05, withtuning parameter a ranging from 0.05 to 2.5. Dimension-to-sample-size ratio is cn = 1.
Fig. 4. Empirical size (left) and power (right) of test T = T (a) at significance level α = 0.05, withtuning parameter a ranging from 0.05 to 2.5. Dimension-to-sample-size ratio is cn = 1.5.
respect to the tuning parameter a. However, when p ≥n (Fig. 3, cn = 1 and Fig. 4,1
cn = 1.5), as the parameter a approaches 0, the empirical size of the test becomes2
biased upward while its power is reduced significantly. Such a deficiency disappears3
when a is large, say a ≥ 1. This is why we chose a = 1 in our simulations. The4
parameter tuning problem will be theoretically analyzed in our future work.5
2250011-11
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
G. Ha et al.
4. Proofs of Main Results1
4.1. Preliminaries2
Lemma 4.1. Let A be a p × p nonrandom hermitian matrix with bounded spec-3
tral norm, and z = (z1, . . . , zp)′ a p-dimensional random vector with independent4
coordinates satisfying5
Ezi = 0, E|zi|2 = 1, E|zi|4 < ∞, and |zi| ≤ ηn√
n, (4.1)
where {ηn} is a deterministic sequence, and ηn ↓ 0 as n → ∞. Then6
E(n− 1z∗Az − n− 1trA)2 = O(n− 1) and E(n− 1z∗Az − n− 1trA)4 = o(n− 1).
This lemma can be derived from Lemma 9.1 in Bai and Silverstein [6].7
Lemma 4.2. Let A be a p × p nonrandom hermitian matrix with bounded spectral8
norm, and Z = (zij) a p × n random matrix whose entries are i.i.d., satisfying the9
conditions in (4.1). Then10
E|Z∗k0 AZk0 |v ≤ Kv, v = 1, 2, . . . ,
where Zk0 = 1n
∑nj =k zj , where zj is the jth column of Z, k ∈ {1, 2, . . . , n}, and Kv11
is a constant depending on v.12
Proof. Since the spectral norm ∥A∥ is bounded, say ∥A∥ ≤ K0 , we have13
E|Z∗k0 AZk0 |v ≤ Kv
0 E|Z∗k0 Zk0 |v. (4.2)
Let ri denote the ith component of Zk0 , i.e.14
ri =1n
n∑
j =k
zij . (4.3)
Then15
Kv0 E|(Z∗
k0 Zk0 )|v = Kv0 E
(p∑
i=1
|ri|2)v
. (4.4)
Applying the multinomial formula (see [8, Chap. 1, Sec. 9]), we have16
(4.4) = Kv0 E
∑
i1+···+ip=v
v!i1!...ip!
|r1|2i1...|rp|2ip
≤ Kv0
v∑
l=1
∑
i1+···+il=v,1≤i1,...,il≤v
v!i1!...il!
∑
1≤i1<···<il≤p
E|ri1|2i1...E|ril|2il
≤ Kv0
v∑
l=1
∑
i1+···+il=v,1≤i1,...,il≤v
v!i1!...il!
p∑
i=1
E|ri|2i1...p∑
i=1
E|ri|2il. (4.5)
Next, we show that E|ri|2i ≤ O(n− 1), i.e. E| 1n∑n
j =k zij |2i ≤ O(n− 1) for i ≥1.17
2250011-12
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
Ridgelized Hotelling’s T 2 test on mean vectors of large dimension
The general term of the expansion of | 1n∑n
j =k zij |2i is1
(1nzij1
)j1(
1nz∗ij1
)j′1
...
(1nzijm
)jm(
1nz∗ijm
)j′m
, (4.6)
where m ∈ {1, . . . , i}, and j and j′are integers taking values in {0, . . . , i}. If j = j
′=2
0, then ( 1nzij)j( 1
nz∗ij)j′= 1. If j+j
′= 1, then one can prove that E( 1
nzij)j( 1nz∗ij)
j′=3
0. For j + j′ ≥2, we have4
E
∣∣∣∣∣∣1n
n∑
j =k
zij
∣∣∣∣∣∣
2i
≤i∑
m=1
∑
j1+···+jm=2i,
1<j1,...,jm≤ 2i
(2i)!j1!...jm!
n∑
j=1,j =k
× E(
1n|zij |
)j1
· · ·n∑
j=1,j =k
E(
1n|zij |
)jm
, (4.7)
where j = j + j′.5
Notice that∑i
m=1
∑j1+···+jm=2i,1<j1,...,jm≤2i
(2i)!
j1!...jm!is bounded. Thus, the6
largest order of (4.7) is achieved when m = 1 and j = 2i. By the conditions in7
(4.1), we have8
E|zij |α =
{O(1) if α ≤ 4,
O((ηn√
n)α− 4 ) if α > 4.(4.8)
Thus,9
n∑
j=1,j =k
E(
1n|zij |
)2i
≤{
n1− 2iO(1) if 2 ≤ 2i ≤ 4,
n1− 2iO((ηn√
n)2i− 4 ) if 2i > 4,(4.9)
and10
E
∣∣∣∣∣∣1n
n∑
j =k
zij
∣∣∣∣∣∣
2i
≤ O(n− 1).
Similarly, the largest order of (4.5) is achieved when i = v. Thus,11
p∑
i=1
Ervi ≤ O(1).
Combining the above results yields12
E|Z∗k0 AZk0 |v ≤ Kv, v = 1, 2, . . . ,
which completes the proof.13
2250011-13
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
G. Ha et al.
Lemma 4.3. Under the assumptions of Lemma 4.1,1
E∣∣∣∣1nz∗Az
∣∣∣∣v
≤ Kv, v = 1, 2, . . . ,
where Kv is a constant depending on v.2
Proof. Because ∥A∥ is bounded by a constant, say K0 , we have3
E∣∣∣∣1nz∗Az
∣∣∣∣v
≤ Kv0 E(
1nz∗z)v
= Kv0 E
∑
i1+···+ip=v
v!i1!...ip!
(1n|z1|2
)i1
...
(1n|zp|2
)ip
≤ Kv0
v∑
l=1
∑
i1+···+il=v,1≤i1,...,il≤v
v!i1!...il!
∑
1≤i1<···<il≤p
× E(
1n|zi1|2
)i1
...E(
1n|zil|2
)il
≤ Kv0
v∑
l=1
∑
i1+···+il=v,1≤i1,...,il≤v
v!i1!...il!
p∑
i=1
E(
1n|zi|2
)i1
...p∑
i=1
E(
1n|zi|2
)il
,
whose largest order occurs when l = 1 and i = v. According to (4.8), we get4
p∑
i=1
E(
1n|zi|2
)v
≤{
O(n− v+1) if v ≤ 2,
O(η2v− 4n n− 1) if v > 2.
Hence, E| 1nz∗Az|v is bounded by a constant Kv. The proof is then complete.5
Lemma 4.4. Under the assumptions of Lemma 4.2,6
E|Z∗k0 Azk|v ≤ Kv, v = 1, 2, . . . ,
where Kv is a constant depending on v.7
Proof. Let z∗ = (z∗1 , . . . , z∗p) = Z∗k0 A. By the Cauchy–Schwartz inequality, we have8
E|Z∗k0 Azk|v ≤
√E|z∗zk|2v.
The general term of the expansion of |z∗zk|2v is9
(z∗i1zi1k)i1(zi1z∗i1k)i
′1...(z∗il
zilk)il(zilz∗ilk)i
′l, (4.10)
2250011-14
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
Ridgelized Hotelling’s T 2 test on mean vectors of large dimension
where l ∈ {1, . . . , v}. According to the analysis in the proof of Lemma 4.2, i and i′
1
take values in {1, . . . , v}. Hence,2
E|z∗zk|2v ≤v∑
l=1
∑
i1+···+il=2v,1<i1,...,il≤2v
(2v)!i1!...il!
×(
p∑
i=1
E|z∗i zik|i1...p∑
i=1
E|z∗i zik|il
), it ≥2, (4.11)
where it + i′t = it, with t ∈ {1, 2, . . . , l}.3
For 2 ≤ it ≤ 2v, applying Holder’s inequality, we get4
p∑
i=1
E|z∗i zik|it ≤(
p∑
i=1
E|z∗i zik|2)2v− it
2v− 2(
p∑
i=1
E|z∗i zik|2v
) it− 22v− 2
.
When (∑p
i=1 E|z∗i zik|2)v ≤∑p
i=1 E|z∗i zik|2v, we have5
p∑
i=1
E|z∗i zik|i1...p∑
i=1
E|z∗i zik|il ≤p∑
i=1
E|zi|2vE|zik|2v.
Otherwise,6
p∑
i=1
E|z∗i zik|i1...p∑
i=1
E|z∗i zik|il ≤ (E(Z∗k0 A)(Z∗
k0 A)∗)v,
where E|zik|2 = 1. Then we obtain7
p∑
i=1
E|z∗i zik|i1...p∑
i=1
E|z∗i zik|il ≤ (E(Z∗k0 A)(Z∗
k0 A)∗)v +p∑
i=1
E|zi|2vE|zik|2v.
Since8
v∑
l=1
∑
i1+···+il=2v,1<i1,...,il≤2v
(2v)!i1!...il!
(4.12)
is bounded by a constant Kv, we obtain9
E|z∗zk|2v ≤ Kv
[(E(Z∗
k0 A)(Z∗k0 A)∗)v +
p∑
i=1
E|zi|2vE|zik|2v
]. (4.13)
For the expectation E|zi|2v, we have10
E|zi|2v = E|Z∗k0 A·i|2v.
Applying Holder’s inequality, we have11
E|zi|2v ≤ Kv
⎡
⎣
⎛
⎝p∑
j=1
|Aji|2n∑
s=1,s=k
E∣∣∣∣1nz∗js
∣∣∣∣2⎞
⎠v
+p∑
j=1
|Aji|2vn∑
s=1,s=k
E∣∣∣∣1nz∗js
∣∣∣∣2v⎤
⎦.
2250011-15
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
G. Ha et al.
Since supp ||A|| < ∞, we have1
supp
p∑
j=1
|Aji|2 = supp
(A∗A)ii < ∞ and
supp
p∑
j=1
|Aji|2v ≤ supp
⎛
⎝p∑
j=1
|Aji|2⎞
⎠v
< ∞.
In addition,∑n
s=1,s=k E| 1nz∗js|2 = O(n− 1), and2
n∑
s=1,s=k
E∣∣∣∣1nzjs
∣∣∣∣2v
≤{
O(n− 2v+1) if 2 ≤ 2v ≤ 4,
o(n− v− 1) if 2v > 4.
Thus we conclude that3
E|zi|2v ≤{
O(n− 1) if 2 ≤ 2v ≤ 4,
O(n− v) if 2v > 4.
Also,4
E|zik|2v =
{O(1) if 2 ≤ 2v ≤ 4,
O((ηn√
n)2v− 4 ) if 2v > 4
and (E(Z∗k0 A)(Z∗
k0 A)∗)v = O(1). From the above results and Lemma 4.2, we obtain5
that E|z∗zk|2v ≤ Kv. The proof is complete.6
4.2. Proof of Theorem 2.17
Following steps of truncation, centralization, and rescaling similar to Bai and Sil-8
verstein [5], we may assume that the random variables {xij} satisfy9
|xij | ≤ ηn√
n, Exij = 0, E|xij |2 = 1, and E|xij |4 = O(1),
where {ηn} is a deterministic sequence with ηn ↓ 0 whose convergence rate can be10
made arbitrarily slow. Under these assumptions, we have, for any α > 4,11
E|xij |α = O((ηn√
n)α− 4 ).
If xij is complex-valued, then12
Ex2ij = O(n− 1).
For simplicity, we suppress the subscripts of matrices Σp and Tp below.13
Let14
Sxn =
1n
n∑
i=1
xix∗i − XX∗ (4.14)
and15
g(X) =√
n
(X∗(Sx
n + aI)− 1X − 1n
tr(Sxn + aI)− 1Σ
). (4.15)
2250011-16
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
Ridgelized Hotelling’s T 2 test on mean vectors of large dimension
We first show that the difference between g(X) and g(X) is negligible asymptoti-1
cally. Indeed, we have2
g(X) − g(X) = − 1n
g(X) +n − 1
n
[√nX∗
((Sx
n +(n − 1)a
nI)− 1 − (Sx
n + aI)− 1
)X
− 1√n
(tr(Sx
n +(n − 1)a
nI)− 1
Σ − tr(Sxn + aI)− 1Σ
)]
= Oa.s.(n− 1/2),
where “a.s.” means “almost surely.” Hence, we can prove Theorem 2.1 by using3
Sxn ! 1
n
⎛
⎝n∑
j=1
xjx∗j − XX∗
⎞
⎠ and Syn ! 1
n
⎛
⎝n∑
j=1
yjy∗j − YY∗
⎞
⎠ (4.16)
instead of their original definitions.4
We now prove EFMT by showing that Eeitg(X ) − Eeitg(Y ) → 0. Write Xk =5
(x1, . . . ,xk,yk+1, . . . ,yn) and Xk0 = (x1, . . . ,xk− 1,yk+1, . . . ,yn), with the con-6
ventions that Xn = X and X0 = Y. Let7
g(Xk) =√
n
(X∗
k(Snk + aI)− 1Xk − 1n
tr(Snk + aI)− 1Σ)
and8
g(Xk0 ) =√
n
(X∗
k0 (Snk0 + aI)− 1Xk0 −1n
tr(Snk0 + aI)− 1Σ)
,
where Xk = 1nXk1, Xk0 = 1
nXk0 1, Snk = 1nXkX∗
k − XkX∗k, Snk0 = 1
nXk0 X∗k0 −9
Xk0 X∗k0 , and 1 denotes a p-dimensional vector consisting of 1’s.10
Since11
Eeitg(X ) − Eeitg(Y ) =n∑
k=1
E(eitg(X k) − eitg(X k− 1))
=n∑
k=1
Eeitg(X k0)(eit(g(X k)− g(X k0))
− eit(g(X k− 1)− g(X k0)), (4.17)
we next calculate the order of g(Xk) − g(Xk0 ).12
Since13
Xk = Xk0 +1nxk, (4.18)
2250011-17
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
G. Ha et al.
we have1
Snk = Snk0 + anxkx∗k − n− 1xkX∗
k0 − n− 1Xk0 x∗k
= Snk1 − n− 1(xkX∗k0 + Xk0 x∗
k), (4.19)
where an = n− 1n2 . By the inverse matrix formula, we have2
B− 1nk = B− 1
nk1 + B− 1nk1(n
− 1xk, Xk0 )Λ− 1
(X∗
k0
n− 1x∗k
)B− 1
nk1 (4.20)
and3
B− 1nk1 = B− 1
nk0 −anB− 1
nk0 xkx∗kB
− 1nk0
1 + anx∗kB
− 1nk0 xk
, (4.21)
where B⋆ = S⋆ + aI for ⋆ = nk, nk0 or nk1, and4
Λ = I2 −(
n− 1X∗k0 B
− 1nk1xk X∗
k0 B− 1nk1Xk0
n− 2x∗kB
− 1nk1xk n− 1x∗
kB− 1nk1Xk0
). (4.22)
This implies5
B− 1nk1(n
− 1xk, Xk0 )Λ− 1
(X∗
k0
n− 1x∗k
)B− 1
nk1
=Υ
|1 − n− 1x∗kB
− 1nk1Xk0 |2 − n− 2x∗
kB− 1nk1xkX∗
k0 B− 1nk1Xk0
, (4.23)
where6
Υ = n− 1B− 1nk1xk(1 − n− 1x∗
kB− 1nk1Xk0 )X∗
k0 B− 1nk1
+ n− 2B− 1nk1Xk0 x∗
kB− 1nk1xkX∗
k0 B− 1nk1
+ n− 2B− 1nk1xkX∗
k0 B− 1nk1Xk0 x∗
kB− 1nk1
+ n− 1B− 1nk1Xk0 (1 − n− 1X∗
k0 B− 1nk1xk)x∗
kB− 1nk1. (4.24)
Applying the identity B− 1nk1 = B− 1
nk0 − anβkB− 1nk0 xkx∗
kB− 1nk0 , where βk = 1/(1 +7
anx∗kB
− 1nk0 xk), we obtain8
X∗kΥXk := I1 + I2 + I3 + I4 , (4.25)
where9
I1 = n− 1βkX∗kB
− 1nk0 xk(1 − n− 1βkx∗
kB− 1nk0 Xk0 )
× (X∗k0 B
− 1nk0 Xk − anβkX∗
k0 B− 1nk0 xkx∗
kB− 1nk0 Xk), (4.26)
I2 = n− 2βk(X∗kB
− 1nk0 Xk0 − anβkX∗
kB− 1nk0 xkx∗
kB− 1nk0 Xk0 )
× x∗kB
− 1nk0 xk(X∗
k0 B− 1nk0 Xk − anβkX∗
k0 B− 1nk0 xkx∗
kB− 1nk0 Xk), (4.27)
I3 = β2kn− 2X∗
kB− 1nk0 xk(X∗
k0 B− 1nk0 Xk0
− anβkX∗k0 B
− 1nk0 xkx∗
kB− 1nk0 Xk0 )x∗
kB− 1nk0 Xk (4.28)
2250011-18
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
Ridgelized Hotelling’s T 2 test on mean vectors of large dimension
and1
I4 = (X∗kB
− 1nk0 Xk0 − anβkX∗
kB− 1nk0 xkx∗
kB− 1nk0 Xk0 )
× (1 − n− 1βkX∗k0 B
− 1nk0 xk)n− 1βkx∗
kB− 1nk0 Xk. (4.29)
We now control the first term, I1, which can be represented as2
I1 = n− 1βkX∗kB
− 1nk0 xkX∗
k0 B− 1nk0 Xk0 + n− 2βkX∗
kB− 1nk0 xkX∗
k0 B− 1nk0 xk
−n− 2β2kX
∗kB
− 1nk0 xkx∗
kB− 1nk0 Xk0 X∗
k0 B− 1nk0 Xk
−n− 1anβ2kX
∗kB
− 1nk0 xkX∗
k0 B− 1nk0 xkx∗
kB− 1nk0 Xk
+ n− 2anβ3kX
∗kB
− 1nk0 xkx∗
kB− 1nk0 Xk0 X∗
k0 B− 1nk0 xkx∗
kB− 1nk0 Xk. (4.30)
Because βk and ∥B⋆∥ for ⋆ = nk, nk0, and nk1 are all bounded by some constant,3
and by applying Lemmas 4.2–4.4, we get4
E|X∗kB
− 1nk0 xkX∗
k0 B− 1nk0 xk|2
≤ E|X∗k0 B
− 1nk0 xkX∗
k0 B− 1nk0 xk|2 + E
∣∣∣∣1nx∗
kB− 1nk0 xkX∗
k0 B− 1nk0 xk
∣∣∣∣2
≤√
E|X∗k0 B
− 1nk0 xk|4 E|X∗
k0 B− 1nk0 xk|4
+
√
E∣∣∣∣1nx∗
kB− 1nk0 xk
∣∣∣∣4
E|X∗k0 B
− 1nk0 xk|4
= O(1),
E|X∗kB
− 1nk0 xkx∗
kB− 1nk0 Xk0 X∗
k0 B− 1nk0 Xk|2
≤ K
[E|X∗
k0 B− 1nk0 xkx∗
kB− 1nk0 Xk0 X∗
k0 B− 1nk0 Xk0 |2
+ E∣∣∣∣1nX∗
k0 B− 1nk0 xkx∗
kB− 1nk0 Xk0 X∗
k0 B− 1nk0 xk
∣∣∣∣2
+ E∣∣∣∣1nx∗
kB− 1nk0 xkx∗
kB− 1nk0 Xk0 X∗
k0 B− 1nk0 Xk0
∣∣∣∣2
+ E∣∣∣∣
1n2
x∗kB
− 1nk0 xkx∗
kB− 1nk0 Xk0 X∗
k0 B− 1nk0 xk
∣∣∣∣2]
= O(1),
E|X∗kB
− 1nk0 xkX∗
k0 B− 1nk0 xkx∗
kB− 1nk0 Xk|2
≤ K
[E|X∗
k0 B− 1nk0 xkX∗
k0 B− 1nk0 xkx∗
kB− 1nk0 Xk0 |2
2250011-19
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
G. Ha et al.
+ E∣∣∣∣1nX∗
k0 B− 1nk0 xkX∗
k0 B− 1nk0 xkx∗
kB− 1nk0 xk
∣∣∣∣2
+ E∣∣∣∣1nx∗
kB− 1nk0 xkX∗
k0 B− 1nk0 xkx∗
kB− 1nk0 Xk0
∣∣∣∣2
+ E∣∣∣∣
1n2
x∗kB
− 1nk0 xkX∗
k0 B− 1nk0 xkx∗
kB− 1nk0 xk
∣∣∣∣2]
= O(1),
and1
E|X∗kB
− 1nk0 xkx∗
kB− 1nk0 Xk0 X∗
k0 B− 1nk0 xkx∗
kB− 1nk0 Xk|2
≤ K
[E|X∗
k0 B− 1nk0 xkx∗
kB− 1nk0 Xk0 X∗
k0 B− 1nk0 xkx∗
kB− 1nk0 Xk0 |2
+ E∣∣∣∣1nX∗
k0 B− 1nk0 xkx∗
kB− 1nk0 Xk0 X∗
k0 B− 1nk0 xkx∗
kB− 1nk0 xk
∣∣∣∣2
+ E∣∣∣∣1nx∗
kB− 1nk0 xkx∗
kB− 1nk0 Xk0 X∗
k0 B− 1nk0 xkx∗
kB− 1nk0 Xk0
∣∣∣∣2
+ E∣∣∣∣
1n2
x∗kB
− 1nk0 xkx∗
kB− 1nk0 Xk0 X∗
k0 B− 1nk0 xkx∗
kB− 1nk0 xk
∣∣∣∣2]
= O(1).
Thus we obtain2
I1 = n− 1βkX∗k0 B
− 1nk0 xkX∗
k0 B− 1nk0 Xk0 + n− 2βkx∗
kB− 1nk0 xkX∗
k0 B− 1nk0 Xk0 + ζn,
(4.31)
where ζn = OL2(n− 2), i.e.√
E|n2ζn|2 is bounded in n. The other terms, I2, I3 , and3
I4 , can be controlled similarly, from which one can verify that4
(4.25) := I1 + I2 + I3 + I4 ,
where5
I2 = n− 2βk(X∗k0 B
− 1nk0 Xk0 )2x∗
kB− 1nk0 xk
+ OL2(n− 2),
I3 = OL2(n− 2),
I4 = n− 1βkX∗k0 B
− 1nk0 Xk0 x∗
kB− 1nk0 Xk0
+ n− 2βkX∗k0 B
− 1nk0 Xk0 x∗
kB− 1nk0 xk + OL2(n
− 2).
(4.32)
2250011-20
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
Ridgelized Hotelling’s T 2 test on mean vectors of large dimension
Furthermore,1
the denominator of (4.23)
= 1 − n− 1x∗kB
− 1nk1Xk0 − n− 1X∗
k0 B− 1nk1xk
+ n− 2(|x∗kB
− 1nk1Xk0 |2 − x∗
kB− 1nk1xkX∗
k0 B− 1nk1Xk0 )
= 1 − n− 1βkx∗kB
− 1nk0 Xk0 − n− 1βkX∗
k0 B− 1nk0 xk
+ n− 2((1 + anx∗kB
− 1nk0 xk)|βkx∗
kB− 1nk0 Xk0 |2
− βkx∗kB
− 1nk0 xkX∗
k0 B− 1nk0 Xk0 )
= 1 − n− 1βkx∗kB
− 1nk0 Xk0 − n− 1βkX∗
k0 B− 1nk0 xk
−n− 2βkx∗kB
− 1nk0 xkX∗
k0 B− 1nk0 Xk0 + OL2(n
− 2)
= 1 + OL2(n− 1). (4.33)
Collecting the results in (4.23)–(4.33), we obtain2
X∗k(4.23)Xk
= n− 2βk(X∗k0 B
− 1nk0 Xk0 )2x∗
kB− 1nk0 xk
+ n− 1βkX∗k0 B
− 1nk0 Xk0 X∗
k0 B− 1nk0 xk
+ 2n− 2βkX∗k0 B
− 1nk0 Xk0 x∗
kB− 1nk0 xk
+ n− 1βkx∗kB
− 1nk0 Xk0 X∗
k0 B− 1nk0 Xk0 + OL2(n
− 2). (4.34)
Also, we have3
X∗kB
− 1nk1Xk − X∗
k0 B− 1nk0 Xk0
= n− 1βk(X∗k0 B
− 1nk0 xk + x∗
kB− 1nk0 Xk0 + n− 1x∗
kB− 1nk0 xk)
− anβk|x∗kB
− 1nk0 Xk0 |2 (4.35)
and4
− 1n
tr(B− 1nk − B− 1
nk0 )Σ = n− 1anβkx∗kB
− 1nk0 ΣB− 1
nk0 xk + OL2(n− 2). (4.36)
It thus follows that5
g(Xk) − g(Xk0 ) = n− 1/2βk(X∗k0 B
− 1nk0 xk + x∗
kB− 1nk0 Xk0 )(1 + X∗
k0 B− 1nk0 Xk0 )
−n1/2anβk|x∗kB
− 1nk0 Xk0 |
2 (4.37)
+ n− 3 /2βkx∗kB
− 1nk0 xk(1 + X∗
k0 B− 1nk0 Xk0 )2 (4.38)
+ n− 1/2anβkx∗kB
− 1nk0 ΣB− 1
nk0 xk (4.39)
+ OL2(n− 3 /2). (4.40)
2250011-21
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
G. Ha et al.
It is easy to check that the order of the difference between 1/(1 + 1nx∗
kB− 1nk0 xk)1
and 1/(1 + anx∗kB
− 1nk0 xk) is OL2(n− 1). So, we simplify the calculation by using2
1/(1 + 1nx∗
kB− 1nk0 xk) instead of 1/(1 + anx∗
kB− 1nk0 xk). Similarly, we use 1/n instead3
of an.4
Let βk0 = 1/(1 + 1n trΣB− 1
nk0 ), γk = 1n (x∗
kB− 1nk0 xk − trΣB− 1
nk0 ), εk = 1n (x∗
kB− 1nk0 Σ5
B− 1nk0 xk − tr(ΣB− 1
nk0 )2). One can verify that6
βk =1
1 + n− 1x∗kB
− 1nk0 xk
=1
1 + n− 1trΣB− 1nk0
−n− 1(x∗
kB− 1nk0 xk − trΣB− 1
nk0 )(1 + n− 1trΣB− 1
nk0 )(1 + n− 1x∗kB
− 1nk0 xk)
= βk0 − βkβk0 γk
and7
1 − βk =n− 1x∗
kB− 1nk0 xk
1 + n− 1x∗kB
− 1nk0 xk
=1n
βkx∗kB
− 1nk0 xk.
By Lemmas 4.1–4.4, and because βk0 is bounded, B− 1nk0 ΣB− 1
nk0 is bounded in a8
spectral norm, and 1n tr(ΣB− 1
nk0 )2 = OL2(1), we get9
(4.37)
= n− 1/2βk[(X∗k0 B
− 1nk0 xk + x∗
kB− 1nk0 Xk0 )(1 + X∗
k0 B− 1nk0 Xk0 ) − |x∗
kB− 1nk0 Xk0 |2]
= −n− 1/2βk0 X∗k0 B
− 1nk0 ΣB− 1
nk0 Xk0
+ n− 1/2βk0 (X∗k0 B
− 1nk0 xk + x∗
kB− 1nk0 Xk0 )(1 + X∗
k0 B− 1nk0 Xk0 )
−n− 1/2βk0 X∗k0 B
− 1nk0 (xkx∗
k − Σ)B− 1nk0 Xk0 − n− 1/2β2
k0 γk
× [(X∗k0 B
− 1nk0 xk + x∗
kB− 1nk0 Xk0 )(1 + X∗
k0 B− 1nk0 Xk0 ) − |x∗
kB− 1nk0 Xk0 |2]
+ n− 1/2βkβ2k0 γ
2k[(X∗
k0 B− 1nk0 xk + x∗
kB− 1nk0 Xk0 )(1 + X∗
k0 B− 1nk0 Xk0 )
− |x∗kB
− 1nk0 Xk0 |2], (4.41)
(4.38)
= n− 3 /2βkx∗kB
− 1nk0 xk(1 + X∗
k0 B− 1nk0 Xk0 )2
= n− 1/2(1 − βk)(1 + X∗k0 B
− 1nk0 Xk0 )2
= n− 1/2(1 − βk0 + β2k0 γk − βkβ2
k0 γ2k)(1 + X∗
k0 B− 1nk0 Xk0 )2
= n− 1/2(1 − βk0 )(1 + X∗k0 B
− 1nk0 Xk0 )2+n− 1/2β2
k0 γk(1 + X∗k0 B
− 1nk0 Xk0 )2
−n− 1/2βkβ2k0 γ
2k(1 + X∗
k0 B− 1nk0 Xk0 )2, (4.42)
2250011-22
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
Ridgelized Hotelling’s T 2 test on mean vectors of large dimension
and1
(4.39) = n− 1/2βkεk + n− 1/2βk
(1n
tr(ΣB− 1nk0 )
2
)
= n− 1/2(βk0 − β2k0 γk + βkβ2
k0 γ2k)εk
+ n− 1/2(βk0 − β2k0 γk + βkβ2
k0 γ2k)(
1n
tr(ΣB− 1nk0 )
2
)
= n− 1/2βk0
(1n
tr(ΣB− 1nk0 )
2
)+ n− 1/2βk0 εk
−n− 1/2β2k0 γk
(1n
tr(ΣB− 1nk0 )
2
)− n− 1/2(β2
k0 γk − βkβ2k0 γ
2k)εk
+ n− 1/2βkβ2k0 γ
2k
(1n
tr(ΣB− 1nk0 )
2
), (4.43)
where2
n− 1/2(β2k0 γk − βkβ2
k0 γ2k)εk − n− 1/2βkβ2
k0 γ2k
(1n
tr(ΣB− 1nk0 )
2
)= OL2(n
− 3 /2).
From (4.37)–(4.43), we get3
g(Xk) − g(Xk0 ) := Jk1x + Jk2x + Jk3 x + Jk4 x + Jk5 x, (4.44)
where4
Jk1x = −n− 1/2βk0 X∗k0 B
− 1nk0 ΣB− 1
nk0 Xk0
+ n− 1/2(1 − βk0 )(1 + X∗k0 B
− 1nk0 Xk0 )2
+ n− 1/2βk0
(1n
tr(ΣB− 1nk0 )
2
),
Jk2x = n− 1/2βk0 (X∗k0 B
− 1nk0 Xk0 + 1)(X∗
k0 B− 1nk0 xk + x∗
kB− 1nk0 Xk0 )
−n− 1/2βk0 X∗k0 B
− 1nk0 (xkx∗
k − Σ)B− 1nk0 Xk0
+ n− 1/2β2k0 γk(1 + X∗
k0 B− 1nk0 Xk0 )2
+ n− 1/2βk0 εk − n− 1/2β2k0 γk
(1n
tr(ΣB− 1nk0 )
2
),
Jk3 x = −n− 1/2β2k0 γk[(X∗
k0 B− 1nk0 xk + x∗
kB− 1nk0 Xk0 )
× (1 + X∗k0 B
− 1nk0 Xk0 ) − |x∗
kB− 1nk0 Xk0 |2],
Jk4 x = n− 1/2βkβ2k0 γ
2k[(X∗
k0 B− 1nk0 xk + x∗
kB− 1nk0 Xk0 )
× (1 + X∗k0 B
− 1nk0 Xk0 ) − |x∗
kB− 1nk0 Xk0 |2]
−n− 1/2βkβ2k0 γ
2k(1 + X∗
k0 B− 1nk0 Xk0 )2,
2250011-23
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
G. Ha et al.
and1
Jk5 x = OL2(n− 3 /2).
By similar arguments, one can prove that2
g(Xk− 1) − g(Xk0 ) = Jk1 + Jk2 + Jk3 + Jk4 + Jk5 , (4.45)
where {Jki, i = 2, . . . , 5} are defined similarly to Jkix, with xk replaced by yk and3
Jk1x = Jk1. From (4.17) and (4.45), we have4
Eeitg(X ) − Eeitg(Y )
=n∑
k=1
Eeit(g(X k0)+Jk1)(eit(Jk2x+Jk3x+Jk4x+Jk5x)
− eit(Jk2+Jk3+Jk4+Jk5)). (4.46)
For Jk4 x, we have5
E|Jk4 x| ≤ E|n− 1/2βkβ2k0 γ
2kX
∗k0 B
− 1nk0 Xk0 X∗
k0 B− 1nk0 xk|
+ E|n− 1/2βkβ2k0 γ
2kX
∗k0 B
− 1nk0 xk|
+ E|n− 1/2βkβ2k0 γ
2kX
∗k0 B
− 1nk0 Xk0 x∗
kB− 1nk0 Xk0 |
+ E|n− 1/2βkβ2k0 γ
2kx
∗kB
− 1nk0 Xk0 |
+ E|n− 1/2βkβ2k0 γ
2k(X∗
k0 B− 1nk0 xk)2|
+ E|n− 1/2βkβ2k0 γ
2k(X∗
k0 B− 1nk0 Xk0 )2|
+ E|2n− 1/2βkβ2k0 γ
2kX
∗k0 B
− 1nk0 Xk0 | + E|n− 1/2βkβ2
k0 γ2k|.
Let K be an upper bound of βk and βk0 . According to Lemmas 4.1–4.4, we get6
E|n− 1/2βkβ2k0 γ
2kX
∗k0 B
− 1nk0 Xk0 X∗
k0 B− 1nk0 xk|
≤ Kn− 1/2√
E|γk|4 E|X∗k0 B
− 1nk0 Xk0 X∗
k0 B− 1nk0 xk|2
≤ Kn− 1/2
√E|γk|4
√E|X∗
k0 B− 1nk0 Xk0 |4 E|X∗
k0 B− 1nk0 xk|4
= o(n− 1).
Similarly, the orders of7
E|n− 1/2βkβ2k0 γ
2kX
∗k0 B
− 1nk0 xk|,
E|n− 1/2βkβ2k0 γ
2kX
∗k0 B
− 1nk0 Xk0 x∗
kB− 1nk0 Xk0 |,
E|n− 1/2βkβ2k0 γ
2kx
∗kB
− 1nk0 Xk0 |,
2250011-24
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
Ridgelized Hotelling’s T 2 test on mean vectors of large dimension
E|n− 1/2βkβ2k0 γ
2k(X∗
k0 B− 1nk0 xk)2|,
E|n− 1/2βkβ2k0 γ
2k(X∗
k0 B− 1nk0 Xk0 )2|,
E|2n− 1/2βkβ2k0 γ
2kX
∗k0 B
− 1nk0 Xk0 |,
and E|n− 1/2βkβ2k0 γ
2k| are all o(n− 1). In addition, Jk4 has the same order as Jk4 x.1
By the inequality |eia − 1| ≤ |a| for any a ∈ R, E|Jk4 x| = o(n− 1), and E|Jk4 | =2
o(n− 1), we have3
|Eeit(g(X k0)+Jk1)eit(Jk2x+Jk3x)(eit(Jk4x+Jk5x) − 1)|
≤ E|eit(Jk4x+Jk5x) − 1|
≤ E|t(Jk4 x + Jk5 x)| = o(n− 1), (4.47)
and similarly,4
|Eeit(g(X k0)+Jk1)eit(Jk2+Jk3)(eit(Jk4+Jk5) − 1)| = o(n− 1). (4.48)
Next, we show that5
n∑
k=1
Eeit(g(X k0)+Jk1)(eit(Jk2x+Jk3x) − eit(Jk2+Jk3)) → 0. (4.49)
Applying the Taylor expansion, we have6
n∑
k=1
Eeit(g(X k0)+Jk1)(eit(Jk2x+Jk3x) − eit(Jk2+Jk3))
≤n∑
k=1
E∣∣∣∣1 + it(Jk2x + Jk3 x) − 1
2t2(Jk2x + Jk3 x)2
− 1 − it(Jk2 + Jk3 ) +12t2(Jk2 + Jk3 )2 + oL2(n
− 1)∣∣∣∣, (4.50)
where ζn = oL2(an) means√
E|a− 1n ζn|2 → 0. Notice that7
EkJk2x = EkJk2 = 0, EkJ2k2x = EkJ2
k2, EkJk3 x = EkJk3 , (4.51)
where Ek denotes the conditional expectation with respect to the σ-field generated8
by the random variables {x1, . . . ,xk− 1,yk+1, . . . ,yn}. Then9
E(
it|Ek(Jk2x − Jk2)| + it|Ek(Jk3 x − Jk3 )| +t2
2|Ek(J2
k2x − J2k2)|)
= 0.
In addition, we have10
n∑
k=1
E∣∣∣∣1 + it(Jk2x + Jk3 x) − 1
2t2(Jk2x + Jk3 x)2 − 1 − it(Jk2 + Jk3 )
+12t2(Jk2 + Jk3 )2 + oL2(n
− 1)∣∣∣∣
2250011-25
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
G. Ha et al.
≤n∑
k=1
E(
it|Ek(Jk2x − Jk2)| + it|Ek(Jk3 x − Jk3 )| +t2
2|Ek(J2
k2x − J2k2)|)
+n∑
k=1
E(
t2
2|J2
k3 x − J2k3 | + t2|Jk2xJk3 x − Jk2Jk3 | + oL2(n
− 1))
.
Therefore, we obtain1
(4.50) ≤n∑
k=1
E(
t2
2|J2
k3 x − J2k3 | + t2|Jk2xJk3 x − Jk2Jk3 | + oL2(n
− 1))
.
(4.52)
Obviously, for J2k2x, we have2
E|Jk2x|2
≤ E|n− 1/2βk0 X∗k0 B
− 1nk0 Xk0 X∗
k0 B− 1nk0 xk|2 + E|n− 1/2βk0 X∗
k0 B− 1nk0 xk|2
+ E|n− 1/2βk0 X∗k0 B
− 1nk0 Xk0 x∗
kB− 1nk0 Xk0 |2 + E|n− 1/2βk0 x∗
kB− 1nk0 Xk0 |2
+ E|n− 1/2βk0 (X∗k0 B
− 1nk0 xk)2|2 + E|n− 1/2βk0 X∗
k0 B− 1nk0 ΣB− 1
nk0 Xk0 |2
+ E|n− 1/2β2k0 γk(X∗
k0 B− 1nk0 Xk0 )2|2 + E|2n− 1/2β2
k0 γkX∗k0 B
− 1nk0 Xk0 |2
+ E|n− 1/2β2k0 γk|2 + E|n− 1/2βk0 εk|2 + E
∣∣∣∣n− 1/2β2
k0 γk
(1n
tr(ΣB− 1nk0 )
2
)∣∣∣∣2
.
According to Lemmas 4.1–4.4, we get3
E|n− 1/2βk0 X∗k0 B
− 1nk0 Xk0 X∗
k0 B− 1nk0 xk|2
≤ Kn− 1E|X∗k0 B
− 1nk0 Xk0 |2|X∗
k0 B− 1nk0 xk|2
≤ Kn− 1√
E|X∗k0 B
− 1nk0 Xk0 |4 E|X∗
k0 B− 1nk0 xk|4
= O(n− 1).
Similarly,4
E|n− 1/2βk0 X∗k0 B
− 1nk0 xk|2,
E|n− 1/2βk0 X∗k0 B
− 1nk0 Xk0 x∗
kB− 1nk0 Xk0 |2,
E|n− 1/2βk0 x∗kB
− 1nk0 Xk0 |2,
E|n− 1/2βk0 (X∗k0 B
− 1nk0 xk)2|2
and E|n− 1/2βk0 X∗k0 B
− 1nk0 ΣB− 1
nk0 Xk0 |2 are all O(n− 1). Also, we have5
E|n− 1/2β2k0 γk(X∗
k0 B− 1nk0 Xk0 )2|2 ≤ Kn− 1E|γk|2|X∗
k0 B− 1nk0 Xk0 |4
≤ Kn− 1√
E|γk|4 E|X∗k0 B
− 1nk0 Xk0 |8
= o(n− 3 /2)
2250011-26
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
Ridgelized Hotelling’s T2 test on mean vectors of large dimension
and1
E|2n− 1/2β2k0 γkX∗
k0 B− 1nk0 Xk0 |2 = o(n− 3 /2).
According to Lemma 4.1,2
E|n− 1/2β2k0 γk|2, E|n− 1/2βk0 εk|2, and E
∣∣∣∣n− 1/2β2
k0 γk
(1n
tr(ΣB− 1nk0 )
2
)∣∣∣∣2
are all O(n− 2). Thus, E|Jk2x|2 = O(n− 1). By the same argument, E|Jk2|2 = O(n− 1).3
For J2k3 x, we have4
E|Jk3 x|2 ≤ E|n− 1/2β2k0 γkX∗
k0 B− 1nk0 Xk0 X∗
k0 B− 1nk0 xk|2
+ E|n− 1/2β2k0 γkX∗
k0 B− 1nk0 xk|2
+ E|n− 1/2β2k0 γkX∗
k0 B− 1nk0 Xk0 x∗
kB− 1nk0 Xk0 |2
+ E|n− 1/2β2k0 γkx∗
kB− 1nk0 Xk0 |2
+ E|n− 1/2β2k0 γk(X∗
k0 B− 1nk0 xk)2|2.
Similarly,5
E|n− 1/2β2k0 γkX∗
k0 B− 1nk0 Xk0 X∗
k0 B− 1nk0 xk|2
≤ Kn− 1√
E|γk|4 E|X∗k0 B
− 1nk0 Xk0 |4 |X∗
k0 B− 1nk0 xk|4
≤ Kn− 1
√E|γk|4
√E|X∗
k0 B− 1nk0 Xk0 |8 E|X∗
k0 B− 1nk0 xk|8
= o(n− 3 /2).
The orders of6
E|n− 1/2β2k0 γkX∗
k0 B− 1nk0 xk|2,
E|n− 1/2β2k0 γkX∗
k0 B− 1nk0 Xk0 x∗
kB− 1nk0 Xk0 |2,
E|n− 1/2β2k0 γkx∗
kB− 1nk0 Xk0 |2, and
E|n− 1/2β2k0 γk(X∗
k0 B− 1nk0 xk)2|2
are o(n− 3 /2). Hence, the order of E|Jk3 |2 is o(n− 3 /2).7
Finally, we get8
E|J2k3 x − J2
k3 | ≤ E|Jk3 x|2 + E|Jk3 |2 = o(n− 3 /2)
and9
E|Jk2xJk3 x − Jk2Jk3 | ≤ E|Jk2xJk3 x| + E|Jk2Jk3 |
≤√
E|Jk2x|2E|Jk3 x|2 +√
E|Jk2|2E|Jk3 |2
= o(n− 5 /4 ),
which gives the result in (4.49), and the proof is complete.10
2250011-27
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
G. Ha et al.
4.3. Proof of Theorem 2.31
Proof. The conclusions of Theorem 2.3 can be obtained from similar arguments to2
Sec. 3 of Bai and Zhou [7] by a simple improvement of the order of o(1) to o(1/√
n).3
For example, the first conclusion (2.5) can be derived from the result above Eq.4
(3.14) in Bai and Zhou [7], substituting Emn(z) for mn(z). The second conclusion5
(2.6) can be proved by the identity above Eq. (3.3) in Bai and Zhou [7]. Thus, we6
omit the details.7
Acknowledgments8
G.F. Ha and Z.D. Bai were partially supported by NSFC China, Grant 11571067.9
Q.Y. Zhang was partially supported by Capital University of Economics and10
Business: The Fundamental Research Funds for Beijing Universities, Grant11
XRZ2021044, and NSFC China, Grants 11971097, 11771073, and 11571067. Y.G.12
Wang was supported by the ARC Center of Excellence for Mathematical and13
Statistical Frontiers and the Australian Research Council Discovery Project14
DP160104292.15
References16
[1] T. Anderson, An Introduction to Multivariate Statistical Analysis, 3rd edn. (Wiley,17
New York, 2003).18
[2] Z. D. Bai and H. Saranadasa, Effect of high dimension: By an example of a two19
sample problem, Stat. Sinica 6(2) (1996) 311–329.20
[3] Z. D. Bai and J. W. Silverstein, No eigenvalues outside the support of the limiting21
spectral distribution of large-dimensional sample covariance matrices, Ann. Probab.22
26(1) (1998) 316–345.23
[4] Z. D. Bai and J. W. Silverstein, Exact separation of eigenvalues of large dimensional24
sample covariance matrices, Ann. Probab. 27(3) (1999) 1536–1555.25
[5] Z. D. Bai and J. W. Silverstein, Clt for linear spectral statistics of large-dimensional26
sample covariance matrices, Ann. Probab. 32(1A) (2004) 553–605.27
[6] Z. D. Bai, and J. W. Silverstein, Spectral Analysis of Large Dimensional Random28
Matrices, 2nd edn (Springer, New York, 2010).29
[7] Z. D. Bai and W. Zhou, Large sample covariance matrices without independence30
structures in columns, Stat. Sinica 18(2) (2008), 425-442.31
[8] C. Berge, Principles of Combinatorics (Academic Press, New York, 1971).32
[9] L. S. Chen, D. Paul, R. L. Prentice and P. Wang, A regularized Hotelling’s t2 test33
for pathway analysis in proteomic studies, J. Am. Stat. Assoc. 106(496) (2011)34
1345–1360.35
[10] S. X. Chen and Y. L. Qin, A two-sample test for high-dimensional data with appli-36
cations to gene-set testing, Ann. Stat. 38(2) (2010) 808–835.37
[11] H. Hotelling, The generalization of student’s ratio, Ann. Math Stat. 2 (1931) 360–378.38
[12] D. D. Jiang and Z. D. Bai, Generalized four moment theorem and an application39
to CLT for spiked eigenvalues of large-dimensional covariance matrices, Bernoulli40
27(1) (2021) 274–294.41
[13] J. W. Silverstein, Strong convergence of the empirical distribution of eigenvalues of42
large dimensional random matrices, J. Multivariate Anal. 55(2) (1995) 331–339.43
2250011-28
Page Proof
May 19, 2021 16:16 WSPC/S2010-3263 RMTA 2250011
Ridgelized Hotelling’s T 2 test on mean vectors of large dimension
[14] M. S. Srivastava, Multivariate theory for analyzing high dimensional data, J. Japan1
Stat. Soc. 37 (2007) 53–86.2
[15] M. S. Srivastava and M. Du, A test for the mean vector with fewer observations than3
the dimension, J. Multivariate Anal. 99(3) (2008) 386–402.4
[16] T. Tao and V. Vu, Random matrices: Universality of local statistics of eigenvalues,5
Ann. Probab. 40(3) (2012) 1285–1315.6
[17] Y. Q. Yin, Limiting spectral distribution for a class of random matrices, J. Multi-7
variate Anal. 20(1) (1986) 50–68.8
[18] Y. Q. Yin and P. R. Krishnaiah, A limit-theorem for the eigenvalues of product of 29
random matrices, J. Multivariate Anal. 13(4) (1983) 489–507.10
2250011-29