high-dimensional profile analysisliu.diva-portal.org/smash/get/diva2:1461548/fulltext01.pdf · link...

Department of Mathematics

HIGH-DIMENSIONAL PROFILE ANALYSIS

Cigdem Cengiz and Dietrich von Rosen

LiTH-MAT-R--2020/07--SE

Linköping University Department of Mathematics SE-581 83 Linköping

HIGH-DIMENSIONAL PROFILE ANALYSIS

Cigdem Cengiz1 and Dietrich von Rosen1,2

1Department of Energy and Technology,Swedish University of Agricultural Sciences,

SE-750 07 Uppsala, Sweden.

2Department of Mathematics,Linkoping University,

SE-581 83 Linkoping, Sweden.

Abstract

The three tests of profile analysis: test of parallelism, test of level and test of flatnesshave been studied. Likelihood ratio tests have been derived. Firstly, a traditionalsetting, where the sample size is greater than the dimension of the parameter space,is considered. Then, all tests have been derived in a high-dimensional setting. Inhigh-dimensional data analysis, it is required to use some techniques to tackle theproblems which arise with the dimensionality. We propose a dimension reductionmethod using scores which was first proposed by Lauter et al. (1996).

Keywords: High-dimensional data; hypothesis testing; linear scores; multivariateanalysis; profile analysis; spherical distributions.

1

Notation

Abbreviations

p.d. : positive definitep.s.d. : positive semi-definitei.i.d. : independently and identically distributedi.e. : that ise.g. : for exampleMANOVA : multivariate analysis of varianceGMANOVA : generalized multivariate analysis of varianceBRM : bilinear regression modelPLS : partial least squaresPCA : principal component analysisPCR : principal component regressionCLT : central limit theoremLLN : law of large numbers

Symbols

x : column vectorX : matrixA′ : transpose of AA−1 : inverse of AA+ : Moore-Penrose inverse of AA− : generalized inverse of A|A| : determinant of AC(A) : column space of Ar(A) : rank of AA⊥ : orthocomplement of subspace AA◦ : C(A◦) = C(A)⊥

⊗ : Kronecker product� : orthogonal sum of linear spacesvec: vec-operatorIn : n× n identity matrix1n : n× 1 vector of onesE[x] : expectation of xD[x] : dispersion matrix of xNp(µ,Σ) : multivariate normal distributionNp,n(µ,Σ,Ψ) : matrix normal distributionWp(Σ, n,∆) : non-central Wishart distribution with n degrees of freedomWp(Σ, n) : central Wishart distribution with n degrees of freedomd= : equal in distribution(A)( )′ : (A)(A)′

2

1 Introduction

In this report, we are going to construct test statistics for each of the three hypothesisin profile analysis, first in a classical setting where the number of parameters is less thanthe number of subjects and then in a high-dimensional setting where the opposite holds,i.e., the number of parameters exceeds the number of individuals.

In profile analysis, we have multiple variables for each individual who form different groups(at least two) and the groups are compared based on the mean vectors of these variables.The idea is to see if there is an interaction between groups and responses. Assume wehave p variables and q independent groups (treatments), the p-dimensional vectors aredenoted by x1,x2, ...,xq with mean vectors µ1,µ2, ...,µq. The mean profile for the i-thgroup is obtained by connecting the lines between the points (1, µi1), (2, µi2), ..., (p, µip).We can then consider the profile analysis as the comparison of these q lines of meanvectors. See Figure 1 for an illustration.

Figure 1: Profiles of q groups.

There are two possible scenarios that can be considered for the responses:

I. The same mean-variable can be compared between q groups over several time-points(repeated measurements).

II. One can measure different variables for each subject and compare their mean levelsbetween q groups.

In the literature, the topic has been investigated by many researchers. One of the firstand leading papers on this topic was published by Greenhouse and Geisser (1959) andthe topic has been revisited by Geisser (2003). Srivastava (1987) derived the likelihoodratio tests together with their distributions for the three hypothesis. A chapter on profileanalysis can be found in the books by Srivastava (2002) and Srivastava and Carter (1983).Potthoff and Roy (1964) presented the growth curve model for the first time and other

3

extensions within the framework of the growth curve model can be found in Fujikoshi(2009), where Fujikoshi extended profile analysis, especially statistical inference on theparallelism hypothesis. Ohlson and Srivastava (2010) considered profile analysis of severalgroups, where the groups have partly equal means. Seo, Sakurai and Fujikoshi (2011)derived the likelihood ratio tests for the two hypotheses, level and flatness, in profileanalysis of growth curve data. Another focus was on the profile analysis with randomeffects covariance structure. Srivastava and Singull (2012) constructed tests based onthe likelihood ratio, without any restrictions on the parameter space, for testing thecovariance matrix for random-effects structure or sphericity. Yokoyama (1995) derivedthe likelihood ratio criterion with random-effects covariance structure under the parallelprofile model. Yokoyama and Fujikoshi (1993) conducted analysis of parallel growthcurves of groups where they assumed a random-effects covariance structure. They alsogave the asymptotic null distributions of the tests.

1.1 Profile analysis of several groups

As mentioned before, there are three types of tests which are commonly considered inprofile analysis: test of parallelism, test of levels and test of flatness (Srivastava andCarter, 1983; Srivastava, 1987, 2002). Assume that the ni p-dimensional random vectorsxij are independently normally distributed as Np(µi,Σ), j = 1, ..., ni, i = 1, ..., q, whereµi = (µ1,i, ..., µp,i)

′.

(1) Parallelism hypothesis

H1 : µi − µq = γi1p, i = 1, ..., q − 1 and A1 6= H1,

where A1 stands for alternative hypothesis and 1p is a p-dimensional vector of ones.

(2) Level hypothesis

H2|H1 : γi = 0, i = 1, ..., q − 1 and A2 6= H2|H1,

where H2|H1 means H2 under the assumption that H1 is true.

(3) Flatness hypothesis

H3|H1 : µ• = γq1p and A3 6= H3|H1,

where H3|H1 means H3 under the assumption that H1 is true.

The parameters γi’s represent unknown scalars and µ• = 1N

∑qi=1 niµi where N is the

total sample size, that is, N = n1 + · · ·+ nq.

If the profiles are parallel, we can say that there is no interaction between the responsesand the treatments (groups). Given that the parallelism hypothesis holds, one may wantto proceed with testing the second hypothesis, H2, which indicates that there is no columnor treatment (group) effect. Alternatively, if the first hypothesis holds, one may wantto proceed with testing the third hypothesis, H3, which indicates that there is no roweffect. Briefly speaking, the level hypothesis under the parallelism hypothesis (H2|H1)indicates that the q profiles are coincident with each other. The flatness hypothesis underthe parallelism hypothesis (H3|H1) indicates that the q profiles are constant. It is usefulto note that failing to reject H1 does not mean that it is true, that is the profiles areparallel. It means that we do not have enough evidence against H1. The second and thethird hypotheses are constructed given that the first hypothesis holds.

4

1.2 Test statistics for the two-sample case

In this section, only the special case of two groups is considered. Let the p-dimensionalrandom vectors x

(i)1 , ...,x

(i)ni , i = 1, 2, be independently normally distributed with mean

vector µi and covariance matrix Σ. The sample mean vectors, the sample covariancematrices and the pooled sample covariance matrix are given by

x(i) =1

ni

ni∑j=1

x(i)j ,

S(i) =1

ni − 1

ni∑j=1

(x(i)j − x(i))(x

(i)j − x(i))′,

Sp =1

n1 + n2 − 2[(n1 − 1)S(1) + (n2 − 1)S(2)].

Define a (p− 1)× p matrix C which satisfies C1p = 0 and is of rank r(C) = p− 1. Let

b =n1n2

n1 + n2

, f = n1 + n2 − 2, u = x(1) − x(2).

Then the three hypotheses and related test statistics can be written as below (Srivastavaand Carter, 1983; Srivastava, 1987, 2002):

(1) Parallelism hypothesis: H1 : Cµ1 = Cµ2.

The null hypothesis is rejected if

f − (p− 1) + 1

f(p− 1)bu′C ′(CSpC

′)−1Cu ≥ Fp−1,f−p+2,α ,

where Fp−1,f−p+2,α denotes the α-percentile of the F -distribution with p − 1 andf − p+ 2 degrees of freedom.

(2) Level hypothesis: H2|H1 : 1′pµ1 = 1′pµ2.

The null hypothesis is rejected if(f − p+ 1

f

)b(1′S−1

p u)2(1′S−1p 1)−1(1 + f−1T 2

p−1)−1 ≥ t2f−p+1,α/2 = F1,f−p+1,α ,

where T 2p−1 = bu′C ′(CSpC

′)−1Cu and t2f−p+1,α/2 is the α/2-percentile of the t-distribution with f − p+ 1 degrees of freedom.

(3) Flatness hypothesis: H3|H1 : C(µ1 + µ2) = 0.

The null hypothesis is rejected if

n(f − p+ 3)

p− 1x′C ′(CV C ′ + bCuu′C ′)−1Cx ≥ Fp−1,n−p+1,α ,

where x = (n1x(1) + n2x

(2))/(n1 + n2) and V = fSp.

As it is mentioned before, the second hypothesis is tested given that H1 is true. If onefails to reject the first hypothesis, it cannot be concluded that the profiles are parallel.

5

2 Useful definitions and theorems

Definition 2.1. The vector space generated by the columns of an arbitrary matrix A :p× q is denoted C(A):

C(A) = {a : a = Ax, x ∈ Rq}.

Definition 2.2. A matrix, whose columns generate the orthogonal complement to C(A)is denoted A◦, i.e., C(A◦) = C(A)⊥. Similar to the generalized inverse, A◦ is not unique.One can choose A◦ = I − (A′)−A′ or A◦ = I −A(A′A)−A′ in addition to some otherchoices.

Definition 2.3. The space CV (A) denotes a column vector space with an inner productdefined through the positive definite matrix V ; i.e., for any pair of vectors x and y, theoperation x′V −1y holds. If V = I, instead of CI(A) one writes C(A).

Definition 2.4. The orthogonal complement to CV (A) is denoted by CV (A)⊥ and isgenerated by all the vectors orthogonal to all the vectors in CV (A); i.e., for an arbitrarya ∈ CV (A), all the y satisfying y′V −1a = 0 generate the linear space (column vectorspace) CV (A)⊥.

Definition 2.5. Let V1 and V2 be disjoint subspaces and y = x1 + x2, where x1 ∈ V1

and x2 ∈ V2. The mapping Py = x1 is called a projection of y on V1 along V2 and Pis a projector. If V1 and V2 are orthogonal, we say that we have an orthogonal projector.

(i) PP = P , which means P is an idempotent matrix.

(ii) If P is a projector, then I − P is also a projector.

(iii) P is unique.

(iv) PA = A(A′A)−A′ is a projector on C(A) for which the standard inner product isassumed to hold.

(v) PA,V = A(A′V −1A)−A′V −1 is a projector on CV (A) for which an inner productdefined by (x,y) = x′V −1y is assumed to hold and V is p.d.

Definition 2.6. The matrix W : p × p is said to be Wishart distributed if and only ifW = XX ′ for some matrix X, where X ∼ Np,n(µ,Σ, I), Σ ≥ 0. If µ = 0, we havea central Wishart distribution which is denoted by W ∼ Wp(Σ, n) and if µ 6= 0, wehave a non-central Wishart distribution which is denoted by W ∼ Wp(Σ, n,∆) where∆ = Σ−1µµ′.

Definition 2.7. Let X and Y be two arbitrary matrices. The covariance of these twomatrices is defined by

Cov[X,Y ] = E[(vecX − E[vecX])(vecY − E[vecY ])′]

if the expectations exist. From this, we have the following:

D[X] = E[vec(X − E[X])vec′(X − E[X])].

6

Definition 2.8. The general multivariate linear model equals

X = MB +E,

where X : p × n is a random matrix which corresponds to the observations, M : p × qis an unknown parameter matrix and B : q × n is a known design matrix. Moreover,E ∼ Np,n(0,Σ, I), where Σ is an unknown p.d. matrix. This is also called the MANOVAmodel.

Definition 2.9. A bilinear model can be defined as

X = AMB +E,

where X : p×n, the unknown mean parameter matrix M : q×k, the two design matricesA : p × q and B : k × n and the error matrix E. This is also called the GMANOVAmodel or the growth curve model.

Theorem 2.1. The general solution of the consistent equation in X:

AXB = C

can be given by any of these three formulas:

(i) X = X0 + (A′)◦Z1B′ +A′Z2B

◦′ + (A′)◦Z3B◦′,

(ii) X = X0 + (A′)◦Z1 +A′Z2B◦′,

(iii) X = X0 +Z1B◦′ + (A′)◦Z2B

′,

where X0 represents a particular solution and Zi, i = 1, 2, 3, represent arbitrary matricesof proper sizes.

Theorem 2.2. The equation AXB = C is consistent if and only if C(C) ⊆ C(A) andC(C ′) ⊆ C(B′). A particular solution of the equation is given by

X0 = A−CB−,

where ”−” denotes an arbitrary g-inverse.

Theorem 2.3. If S is positive definite and C(B) ⊆ C(A),

PA,S = PB,S + SP ′A,SB◦(B◦

′SP ′A,SB

◦)−B◦′PA,S.

A special case is

S−1 −B◦(B◦′SB◦)−B◦′ = S−1B(B′S−1B)−B′S−1.

Theorem 2.4. For A : m× n and B : n×m

|Im +AB| = |In +BA|.

Theorem 2.5. Let X ∼ Np,n(µ,Σ,Ψ). For any A : q × p and B : m× n

AXB′ ∼ Nq,m(AµB′,AΣA′,BΨB).

7

Theorem 2.6. Let W1 ∼ Wp(Σ, n,∆1) be independent of W2 ∼ Wp(Σ,m,∆2). Then

W1 +W2 ∼ Wp(Σ, n+m,∆1 + ∆2).

Theorem 2.7. Let X ∼ Np,n(0,Σ, I) and Q be any idempotent matrix of a proper size.Then

XQX ′ ∼ Wp(Σ, r(Q)).

Theorem 2.8. Let A ∈ Rq×p and W ∼ Wp(Σ, n). Then

AWA′ ∼ Wq(AΣA′, n).

Theorem 2.9. Let A ∈ Rp×q and W ∼ Wp(Σ, n). Then

A(A′W−1A)−A′ ∼ Wp(A(A′Σ−1A)−A′, n− p+ r(A)).

Theorem 2.10. Let a partitioned non-singular matrix A be given by A =

(A11 A12

A21 A22

).

If A22 is non-singular, then

|A| = |A22||A11 −A12A−122A21|;

if A11 is non-singular, then

|A| = |A11||A22 −A21A−111A12|.

Theorem 2.11. Let S be positive definite and suppose that V , W and H are of propersizes, assuming H−1 exists. Then

(S + V HW ′)−1 = S−1 − S−1V (W ′S−1V +H−1)−1W ′S−1.

Theorem 2.12. Let E[X] = µ and D[X] = Ψ⊗Σ. Then

(i) E[AXB] = AµB,

(ii) D[AXB] = B′ΨB ⊗AΣA′.

Theorem 2.13. Let A : n×m and B : m× n. Then

r(A−ABA) = r(A) + r(Im −BA)−m = r(A) + r(In −AB)− n.

Theorem 2.14.

(i) For A, B, C and D of proper sizes

tr(ABCD) = tr(DABC) = tr(CDAB) = tr(BCDA).

(ii) If A is idempotent, thentr(A) = r(A).

(iii) For any Atr(A) = tr(A′).

8

Theorem 2.15. Let A, B and C be matrices of proper sizes. Then

vec(ABC) = (C ′ ⊗A)vecB.

Theorem 2.16.

(i) (A⊗B)′ = A′ ⊗B′

(ii) Let A,B,C and D be matrices of proper sizes. Then

(A⊗B)(C ⊗D) = AC ⊗BD.

(iii) A⊗B = 0 if and only if A = 0 or B = 0.

Theorem 2.17. Let A, B and C be matrices of proper sizes. If C(A) ⊆ C(B), then

C(B) = C(A) � C(B) ∩ C(A)⊥.

3 Likelihood ratio tests for the three hypotheses

In Section 1.1, we have given the three hypotheses of profile analysis of q groups and thetest statistics of the likelihood ratio tests for two groups have been presented in Section 1.2(Srivastava and Carter, 1983; Srivastava, 1987, 2002). Srivastava (1987) derived the teststatistics for q groups, but in this section we will reformulate the problems as we indeedare in a mutivariate analysis of variance (MANOVA) or a generalized mutivariate analysisof variance (GMANOVA) testing situation. This will require matrix reformulation of thehypotheses and derivation of the likelihood ratio tests based on this matrix notation.

3.1 The model

The model for one group, say the k-th group, can be written as

Xk = MkDk +Ek

where Xk represents the matrix of observations, Mk the p-vector of mean parameters,Dk a vector of nk ones, i.e., Dk = 1′nk

and Ek error matrix. The columns of Xk

are independently distributed, which means that the columns of Ek are independentlydistributed. The assumption for the distribution of Ek is that the column vectors of Ek

follow a multivariate normal distribution; ejk ∼ Np(0,Σ).

When we have q groups, we have q models:

(X1 : X2 : ... : Xq) = (M1D1 : M2D2 : ... : MqDq) + (E1 : E2 : ... : Eq)

= (M1 : M2 : ... : Mq)D + (E1 : E2 : ... : Eq), (1)

where D is a q ×N matrix, N =∑q

k=1 nk, which equals

D =

1 · · · 1 0 · · · 0 · · · 00 · · · 0 1 · · · 1 · · · 0...

. . ....

.... . .

.... . .

...0 · · · 0 0 · · · 0 · · · 1

,

9

and where Ek ∼ Np,nk(0,Σ, Ink

). The relation in (1) can be written

X(p×N)

= M(p×q)

D(q×N)

+ E(p×N)

, X ∼ Np,N(MD,Σ, IN).

where X = (X1 : X2 : · · · : Xq), M = (M1 : M2 : · · · : Mq) and E = (E1 : E2 : · · · :Eq).

Moreover let F be a q × (q − 1) and C be a (p − 1) × p matrix which satisfy 1′F = 0and C1 = 0, respectively, e.g.,

F =

1 0 · · · 0−1 1 · · · 00 −1 · · · 0...

.... . .

...0 0 · · · 10 0 · · · −1

, C =

1 −1 0 0 · · · 0 00 1 −1 0 · · · 0 00 0 1 −1 · · · 0 0...

......

.... . .

......

0 0 0 0 · · · 1 −1

·

These two matrices, F and C, will be used in the next chapter during the derivations ofthe tests. Since the common F and C are used in each hypothesis, they are introducedhere.

3.2 Derivation of the tests

The model for q groups has been given above by

X = MD +E (2)

where X = (X1 : X2 : · · · : Xq). This model is often called the MANOVA model. If wewant to deduct any inference from the model, the unknown parameters M and Σ needto be estimated.

For the model in (2), the likelihood function equals

L(M ,Σ) = (2π)−12pN |Σ|−N/2exp

{− 1

2trΣ−1(X −MD)(X −MD)′

}≤ (2π)−

12pN

∣∣∣∣ 1

N(X −MD)(X −MD)′

∣∣∣∣− 12N

e−12Np,

where N =∑q

k=1 nk and equality holds if and only if NΣ = (X −MD)(X −MD)′

(see Srivastava and Khatri, 1979, Theorem 1.10.4). Now, we need to find a lower boundfor |(X −MD)(X −MD)′|.Use that I = I − PD′ + PD′ and then

|(X −MD)(X −MD)′| = |(X −MD)PD′(X −MD)′

+ (X −MD)(I − PD′)(X −MD)′|≥ |(X −MD)(I − PD′)(X −MD)′| = |X(I − PD′)X ′|

and equality holds if and only if (X −MD)PD′ = 0. Thus XPD′ = MD and NΣ =RR′, where R = X(I − PD′) with PD′ = D′(DD′)−D. This can be considered as adecomposition of Rn relative to the model (see Figure 2). The whole space is divided into

10

two spaces which are orthogonal to each other; C(D′) and C(D′)⊥ which correspond tothe mean space and residual space, respectively.

MD = XPD′ R = X(I − PD′)

C(D′) C(D′)⊥

Figure 2: The decomposition of the space with no restriction on the mean parameterspace.

Note that sinceD is a full rank matrix, one can write PD′ = D′(DD′)−D = D′(DD′)−1D.Now we will move on to the derivation of the test statistics.

3.2.1 Parallelism Hypothesis

The null hypothesis and the alternative hypothesis for parallelism can be written

H1 : E(X) = MD, CMF = 0,

A1 : E(X) = MD, CMF 6= 0, (3)

where C and F are defined in Section 3.1.

Theorem 3.1. The likelihood ratio statistic for the parallelism hypothesis presented in(3) can be given as

λ2/N =|CSC ′|

|CSC ′ +CXPD′(DD′)−1KX ′C ′|, (4)

where S = X(I − PD′)X ′ and K is any matrix satisfying C(K) = C(D) ∩ C(F ),

CSC ′ ∼ Wp−1(CΣC ′, N − r(D)),

CXPD′(DD′)−1KX′C ′ ∼ Wp−1(CΣC ′, r(K)).

Then

λ2/N =|CSC ′|

|CSC ′ +CXPD′(DD′)−1KX ′C ′|∼ Λ(p− 1, N − r(D), r(K)),

where Λ(·, ·, ·) denotes the Wilks’ lambda distribution.

Proof. We have restrictions on the mean parameter space:

CMF = 0 ⇔ (F ′ ⊗C)vecM = 0

which means that vecM belongs to C(F ⊗C ′)⊥. By Theorem 2.1, the general solutionof the equation CMF = 0 equals

M = (C ′)◦θ1 +C ′θ2F◦′ ,

11

where θ1 and θ2 are new parameters. Inserting this solution into (2) yields

X = (C ′)◦θ1D +C ′θ2F◦′D +E.

This is the reparameterization of the first model given by (2) after applying the restrictionsCMF = 0. Here we notice that we are outside the MANOVA and GMANOVA model.Recall the inequality

|Σ|−N/2e−12

tr{Σ−1(X−E(X))(X−E(X))′} ≤∣∣∣∣ 1

N(X − E(X))(X − E(X))′

∣∣∣∣−N/2e−Np/2

with equality if and only if NΣ = (X − E(X))(X − E(X))′.

Now, we start performing some calculations under the null hypothesis:

|(X − (C ′)◦θ1D −C ′θ2F◦′D)( )′|↑

I = PD′+(I − PD′)= |(XPD′ − (C ′)◦θ1D −C ′θ2F

◦′D)( )′ +X(I − PD′)X ′︸︷︷︸S

|

= |S||S−1(XPD′ − (C ′)◦θ1D −C ′θ2F◦′D)( )′ + I|

= |S||(XPD′ − (C ′)◦θ1D −C ′θ2F◦′D)′S−1(XPD′ − (C ′)◦θ1D −C ′θ2F

◦′D) + I|.(5)

Recall from Theorem 2.3,

S−1 = C ′(CSC ′)−C + S−1(C ′)◦[(C ′)◦′S−1(C ′)◦]−(C ′)◦

′S−1.

If we replace S−1 in (5) with this expression:

=|S||(XPD′ − (C ′)◦θ1D −C ′θ2F◦′D)′S−1(C ′)◦[(C ′)◦

′S−1(C ′)◦]−(C ′)◦

′S−1

× (XPD′ − (C ′)◦θ1D −C ′θ2F◦′D) + (XPD′ − (C ′)◦θ1D −C ′θ2F

◦′D)′

×C ′(CSC ′)−C(XPD′ − (C ′)◦θ1D −C ′θ2F◦′D) + I|

≥|S||(XPD′ − (C ′)◦θ1D −C ′θ2F◦′D)′C ′(CSC ′)−C

× (XPD′ − (C ′)◦θ1D −C ′θ2F◦′D) + I| (6)

with equality if and only if

(XPD′ − (C ′)◦θ1D −C ′θ2F◦′D)′S−1(C ′)◦[(C ′)◦

′S−1(C ′)◦]−(C ′)◦

′S−1

×(XPD′ − (C ′)◦θ1D −C ′θ2F◦′D) = 0.

Notice thatD′θ′1 (C ′)◦

′C ′

orthogonal

(CSC ′)−C = 0.

Hence, (6) is equal to

|S||(XPD′ −C ′θ2F◦′D)′C ′(CSC ′)−C︸︷︷︸=C′(CSC′)−CSC′(CSC′)−C

(XPD′ −C ′θ2F◦′D) + I|

=|S||(XPD′ −C ′θ2F◦′D)′C ′(CSC ′)−CSC ′(CSC ′)−C(XPD′ −C ′θ2F

◦′D) + I|.

12

By Theorem 2.4,

=|S||C ′(CSC ′)−C(XPD′ −C ′θ2F◦′D)(XPD′ −C ′θ2F

◦′D)′C ′(CSC ′)−CS + I|=|SC ′(CSC ′)−C︸︷︷︸

P ′C′,S−1

(XPD′ −C ′θ2F◦′D)(XPD′ −C ′θ2F

◦′D)′C ′(CSC ′)−CS︸︷︷︸PC′,S−1

+S|

=|(P ′C′,S−1XPD′ − P ′C′,S−1C ′θ2F◦′D)( )′ + S|

↑(I − PD′F ◦) + PD′F ◦

=|P ′C′,S−1XPD′(I − PD′F ◦)(P ′C′,S−1XPD′)′

+ (P ′C′,S−1XPD′PD′F ◦ − P ′C′,S−1C ′θ2F◦′D)( )′ + S|

≥|P ′C′,S−1XPD′(I − PD′F ◦)(P ′C′,S−1XPD′)′ + S|. (7)

Since PA◦ = I − PA, one can write I − PD′F ◦ = P(D′F ◦)◦ . From the definition of thecolumn spaces given in the Notation part C[(D′F ◦)◦] = C(D′F ◦)⊥. Using Theorem 2.17,C(D′F ◦)⊥ can be decomposed into two orthogonal subspaces:

C(D′F ◦)⊥ = C(D′)⊥ � C(D′) ∩ C(D′F ◦)⊥︸︷︷︸C(D′(DD′)−1K)

, (8)

where C(K) = C(D)∩C(F ). The space C(D′)⊥ will correspond to I−PD′ and the spaceC(D′(DD′)−1K) will correspond to PD′(DD′)−1K . Then

(7) =|P ′C′,S−1XPD′ [(I − PD′) + PD′(DD′)−1K ](P ′C′,S−1XPD′)′ + S|

(PD′(I − PD′) = 0, PD′PD′(DD′)−1K = PD′(DD′)−1K)

=|P ′C′,S−1XPD′(DD′)−1K(P ′C′,S−1X)′ + S|.

For the alternative hypothesis, we do not have any restrictions on the mean parameterspace. Thus, we will use the results from the introduction of Section 3.2, where NΣ =RR′ = S was found.

NΣA1 = S,

NΣH1 = S + P ′C′,S−1XPD′(DD′)−1KX′PC′,S−1 . (9)

Thus,

λ2/N =|NΣA1||NΣH1|

=|S|

|S + P ′C′,S−1XPD′(DD′)−1KX ′PC′,S−1|.

The numerator and the denominator are not independently distributed. To be able toachieve this independency, we introduce the full rank matrix

H = (C ′,S−1(C ′)◦).

Multiplying the ratio (both the numerator and the denominator) by H yields

λ2/N =|H ′ΣA1H||H ′ΣH1H|

=|CSC ′|

|CSC ′ +CXPD′(DD′)−1KX ′C ′|·

13

By Theorem 2.7 and Theorem 2.8, the following two relations hold:

S = X(I − PD′)X ′ ∼ Wp(Σ, N − r(D)) ⇒ CSC ′ ∼ Wp−1(CΣC ′, N − r(D)),

XPD′(DD′)−1KX′ ∼ Wp(Σ, r(K)) ⇒ CXPD′(DD′)−1KX

′C ′ ∼ Wp−1(CΣC ′, r(K)).

The ratio given by λ2/N does not depend Σ, consequently we can replace CΣC ′ withIp−1. For a detailed explanation, see Appendix A, Result A1. Therefore,

λ2/N =|CSC ′|

|CSC ′ +CXPD′(DD′)−1KX ′C ′|∼ Λ(p− 1, N − r(D), r(K)),

which completes the proof.

Note that the distribution of λ2/N , that is Wilks’ lambda distribution, can be approxi-mated very accurately (Lauter, 2016, Mardia, Kent and Bibby, 1979).

3.2.2 Level Hypothesis

Assuming that the profiles are parallel, we will construct a test to check if they coincide.The null hypothesis and the alternative hypothesis for the level test can be written

H2|H1 : E(X) = MD, MF = 0,

A2|H1 : E(X) = MD, CMF = 0, (10)


Theorem 3.2. The likelihood ratio statistic for the level hypothesis can be expressed as

λ2/N =|(C ′)◦′S−1(C ′)◦|−1∣∣∣∣((C ′)◦′S−1(C ′)◦)−1 + ((C ′)◦′S−1(C ′)◦)−1(C ′)◦

′S−1XD′(DD′)−1KQ−1K ′

× (DD′)−1DX ′S−1(C ′)◦((C ′)◦′S−1(C ′)◦)−1

∣∣∣∣(11)

whereQ = K ′(DD′)−1K+K ′(DD′)−1DX ′C ′(CSC ′)−1CXD′(DD′)−1K and C(K) =C(D) ∩ C(F ),

((C ′)◦′S−1(C ′)◦)−1 ∼ W1(((C ′)◦

′Σ−1(C ′)◦)−1, N − r(D)− p+ 1),

((C ′)◦′S−1(C ′)◦)−1(C ′)◦

′S−1XD′(DD′)−1KQ−1K ′(DD′)−1DX ′S−1(C ′)◦

× ((C ′)◦′S−1(C ′)◦)−1 ∼ W1(((C ′)◦

′Σ−1(C ′)◦)−1, r(K)).

Then

λ2/N ∼ Λ(1, N − r(D)− p+ 1, r(K)).

Proof. Equivalent expressions for the restrictions in both hypotheses can be written as

H2 : MF = 0 ⇔ M = θF ◦′,

A2 : CMF = 0 ⇔ M = (C ′)◦θ1 +C ′θ2F◦′ .

14

Plugging these solutions into the models gives

H2 : X = θF ◦′D +E,

A2 : X = (C ′)◦θ1D +C ′θ2F◦′D +E.

First the null hypothesis will be studied. Under H2:

|(X − θF ◦′D)( )′| = |(XPD′F ◦ − θF ◦′D)( )′ +X(I − PD′F ◦)X ′|

↑PD′F ◦ + (I − PD′F ◦)

≥ |X(I − PD′F ◦)X ′|

with equality if and only if XPD′F ◦ = θF ◦′D. As with the parallelism hypothesis, we

will partition the space C(D′F ◦)⊥, which corresponds to I −PD′F ◦ , into two orthogonalparts, that is C(D′F ◦)⊥ = C(D′)⊥ � C(D′) ∩ C(D′F ◦)⊥, where C(D′) ∩ C(D′F ◦)⊥ =C(D′(DD′)−1K) with C(K) = C(D) ∩ C(F ).

We have already derived the maximum of the likelihood under the restriction CMF =0 while considering the parallelism hypothesis. This restriction appeared in the nullhypothesis for testing parallelism. For the second hypothesis, we assume that the profilesare parallel (or we do not reject H1) and the test is conducted to see if they have equallevels. The restrictions for this test can be summarised with MF = 0. The alternativehypothesis will not simply be MF 6= 0 because we already assume that the profiles areparallel. Consequently the level hypothesis is tested against the parallelism hypothesisdue this prior knowledge. This is the reason CMF = 0 why appears here in thealternative hypothesis.

Thus

|NΣH2| = |X(I − PD′F ◦)X ′| = |X(I − PD′)X ′ +XPD′(DD′)−1KX′|,

|NΣA2| = |X(I − PD′)X ′ + P ′C′,S−1XPD′(DD′)−1KX′PC′,S−1|,

where X(I − PD′)X ′ = S. We know that S and XPD′(DD′)−1KX′ are Wishart dis-

tributed, but P ′C′,S−1XPD′(DD′)−1KX′PC′,S−1 is not. Therefore the likelihood function

will be manipulated similarly to the treatment of the parallelism hypothesis. Put H =(C ′,S−1(C ′)◦) which is a full rank matrix and multiply both the numerator and thedenominator in the likelihood ratio with H ′ and H from left and right respectively:


·

We start with calculating H ′ΣA2H :

|H ′ΣA2H| =∣∣∣∣ ( C

(C ′)◦′S−1

)(S + P ′C′,S−1XPD′(DD′)−1KX

′PC′,S−1)(C ′,S−1(C ′)◦)

∣∣∣∣=

∣∣∣∣V11 V12

V21 V22

∣∣∣∣ ,

15

where

V11 =C(S + P ′C′,S−1XPD′(DD′)−1KX′PC′,S−1)C ′,

V12 =C(S + P ′C′,S−1XPD′(DD′)−1KX′PC′,S−1)S−1(C ′)◦,

V21 =(C ′)◦′S−1(S + P ′C′,S−1XPD′(DD′)−1KX

′PC′,S−1)C ′,

V22 =(C ′)◦′S−1(S + P ′C′,S−1XPD′(DD′)−1KX

′PC′,S−1)S−1(C ′)◦.

It follows that

V21 = (C ′)◦′C ′︸︷︷︸

=0

+(C ′)◦′S−1P ′C′,S−1XPD′(DD′)−1KX

′PC′,S−1C ′

= (C ′)◦′S−1[C ′(CSC ′)−CS]′XPD′(DD′)−1KX

′C ′(CSC ′)−CSC ′

= (C ′)◦′S−1S︸︷︷︸

=I

C ′︸︷︷︸=0

(CSC ′)−CXPD′(DD′)−1KX′C ′(CSC ′)−CSC ′

= 0.

Notice that V12 = V ′21. Therefore V12 = 0.

V11 = CSC ′ +CSC ′(CSC ′)−C︸︷︷︸=C

XPD′(DD′)−1KX′C ′(CSC ′)−CSC ′︸︷︷︸

=C′

and

V22 =(C ′)◦′S−1SS−1(C ′)◦

+ (C ′)◦′S−1S︸︷︷︸

=I

C ′︸︷︷︸=0

(CSC ′)−CXPD′(DD′)−1KX′C ′(CSC ′)−CSS−1(C ′)◦

=(C ′)◦′S−1(C ′)◦.

Then the determinant can be written

|H ′ΣA2H| =∣∣∣∣CSC ′ +CXPD′(DD′)−1KX

′C ′ 00 (C ′)◦

′S−1(C ′)◦

∣∣∣∣=

∣∣∣∣C(X(I − PD′F ◦)X ′)C ′ 00 (C ′)◦

′S−1(C ′)◦

∣∣∣∣ .Let’s move on to the null hypothesis:

|H ′ΣH2H| =∣∣∣∣ ( C

(C ′)◦′S−1

)(S +XPD′(DD′)−1KX

′)(C ′,S−1(C ′)◦)

∣∣∣∣=

∣∣∣∣ C(S +XPD′(DD′)−1KX′)C ′ C(S +XPD′(DD′)−1KX

′)S−1(C ′)◦

(C ′)◦′S−1(S +XPD′(DD′)−1KX

′)C ′ (C ′)◦′S−1(S +XPD′(DD′)−1KX

′)S−1(C ′)◦

∣∣∣∣=

∣∣∣∣ CX(I − PD′F◦)X′C ′ CX(I − PD′F◦)X

′S−1(C ′)◦

(C ′)◦′S−1X(I − PD′F◦)X

′C ′ (C ′)◦′S−1X(I − PD′F◦)X

′S−1(C ′)◦

∣∣∣∣.16

Note that we used the relation X(I −PD′F ◦)X ′ = X(I −PD′)X ′+XPD′(DD′)−1KX′ =

S+XPD′(DD′)−1KX′. The determinant for the alternative hypothesis is straightforward.

We use the Theorem 2.10, which is for the partitionened matrices, for the determinantfor the null hypothesis. Hence,

|H ′ΣA2H| = |CX(I − PD′F ◦)X ′C ′||(C ′)◦′S−1(C ′)◦|,

|H ′ΣH2H| = |CX(I − PD′F ◦)X ′C ′||(C ′)◦′S−1X(I − PD′F ◦)X ′S−1(C ′)◦

− (C ′)◦′S−1X(I − PD′F ◦)X ′C ′[CX(I − PD′F ◦)X ′C ′]−1

×CX(I − PD′F ◦)X ′S−1(C ′)◦|.

When we take the ratio of these two quantities, i.e., the ratio of |H ′ΣA2H| and |H ′ΣH2H|,the first term, which is |CX(I−PD′F ◦)X ′C ′|, will cancel out. Say S1 = X(I−PD′F ◦)X ′.Then the ratio will become

|H ′ΣA2H||H ′ΣH2H|

=|(C ′)◦′S−1(C ′)◦|

|(C ′)◦′S−1S1S−1(C ′)◦ − (C ′)◦′S−1S1C ′[CS1C ′]−1 ×CS1S−1(C ′)◦|

=|(C ′)◦′S−1(C ′)◦|

|(C ′)◦′S−1[S1 − S1C ′(CS1C ′)−1CS1]S−1(C ′)◦|·

By the special case of Theorem 2.3,

S1C′(CS1C

′)−1CS1 = S1 − (C ′)◦[(C ′)◦′S−1

1 (C ′)◦]−1(C ′)◦′.

Thus,


=|(C ′)◦′S−1(C ′)◦|

|(C ′)◦′S−1[S1 − S1 + (C ′)◦((C ′)◦′S−11 (C ′)◦)−1(C ′)◦′ ]S−1(C ′)◦|

=|(C ′)◦′S−1(C ′)◦|

|(C ′)◦′S−1(C ′)◦||((C ′)◦′S−11 (C ′)◦)−1(C ′)◦′S−1(C ′)◦|

=|(C ′)◦′S−1(C ′)◦|−1

|(C ′)◦′S−11 (C ′)◦)|−1

·

Notice that

S−11 = [X(I − PD′F ◦)X ′]−1 = (S +XPD′(DD′)−1KX)−1

= [S + (XP1)(XP1)′]−1(put P1=PD′(DD′)−1K)

= S−1 − S−1(XP1)[(XP1)′S−1(XP1) + I]−1(XP1)′S−1(since P1 is idem. and sym.)

= S−1 − S−1XP1(P1X′S−1XP1 + I)−1P1X

′S−1. (by Theorem 2.11) (12)

If we replace S−11 in |(C ′)◦′S−1

1 (C ′)◦)|−1 with (12):

|(C ′)◦′S−11 (C ′)◦)|−1

= |(C ′)◦′S−1(C ′)◦ − (C ′)◦′S−1XP1(P1X

′S−1XP1 + I)−1P1X′S−1(C ′)◦|

= |(C ′)◦′S−1(C ′)◦|−1|I − P1X′S−1(C ′)◦((C ′)◦

′S−1(C ′)◦)−1

× (C ′)◦′S−1XP1(P1X

′S−1XP1 + I)−1|−1

= |(C ′)◦′S−1(C ′)◦|−1|P1X′S−1XP1 + I||I + P1X

′S−1XP1

− P1X′S−1(C ′)◦((C ′)◦

′S−1(C ′)◦)−1(C ′)◦

′S−1XP1|−1. (13)

17

For the last determinant in (13),

I + P1X′S−1XP1 − P1X

′S−1(C ′)◦((C ′)◦′S−1(C ′)◦)−1(C ′)◦

′S−1XP1

= I + P1X′[S−1 − S−1(C ′)◦((C ′)◦

′S−1(C ′)◦)−1(C ′)◦

′S−1]XP1. (14)

Using the relation given in Theorem 2.3,

(14) = I + P1X′[C ′(CSC ′)−1C]XP1 .

Thus,

|(C ′)◦′S−11 (C ′)◦)|−1 = |(C ′)◦′S−1(C ′)◦|−1|P1X

′S−1XP1 + I|× |I + P1X

′C ′(CSC ′)−1CXP1|−1.

If we put this result back into the ratio:


=|(C ′)◦′S−1(C ′)◦|−1

|(C ′)◦′S−11 (C ′)◦)|−1

=|I + P1X

′C ′(CSC ′)−1CXP1||I + P1X ′S−1XP1|

=|I +X ′C ′(CSC ′)−1CXP1|

|I +X ′S−1XP1|· (15)

Note that

P1 = PD′(DD′)−1K = D′(DD′)−1K(K ′(DD′)−1DD′(DD′)−1K)−K ′(DD′)−1D.

Plug this into the ratio above and take K ′(DD′)−1D to the left:

(15) =

∣∣∣∣I +K ′(DD′)−1DX ′C ′(CSC ′)−1CXD′(DD′)−1K× (K ′(DD′)−1DD′(DD′)−1K)−

∣∣∣∣∣∣∣∣I +K ′(DD′)−1DX ′S−1XD′(DD′)−1K× (K ′(DD′)−1DD′(DD′)−1K)−

∣∣∣∣ ·

Now we take (K ′(DD′)−1DD′(DD′)−1K)− out for both the numerator and the denom-inator. Then

(15) =|K ′(DD′)−1DD′(DD′)−1K +K ′(DD′)−1DX ′C ′(CSC ′)−1CXD′(DD′)−1K|

|K ′(DD′)−1DD′(DD′)−1K +K ′(DD′)−1DX ′S−1XD′(DD′)−1K|·

DD′(DD′)−1 = I. Moreover, we know that S−1 = S−1(C ′)◦((C ′)◦′S−1(C ′)◦)−1(C ′)◦

′

× S−1 +C ′(CSC ′)−1C, which implies

(15) =|K ′(DD′)−1K +K ′(DD′)−1DX ′C ′(CSC ′)−1CXD′(DD′)−1K|∣∣∣∣K ′(DD′)−1K +K ′(DD′)−1DX ′S−1(C ′)◦((C ′)◦

′S−1(C ′)◦)−1(C ′)◦

′

× S−1XD′(DD′)−1K +K ′(DD′)−1DX ′C ′(CSC ′)−1CD′(DD′)−1K

∣∣∣∣= |I + [K ′(DD′)−1K +K ′(DD′)−1DX ′C ′(CSC ′)−1CXD′(DD′)−1K]−1

×K ′(DD′)−1DX ′S−1(C ′)◦((C ′)◦′S−1(C ′)◦)−1(C ′)◦

′S−1XD′(DD′)−1K|−1

18

Put Q = K ′(DD′)−1K +K ′(DD′)−1DX ′C ′(CSC ′)−1CXD′(DD′)−1K and use therotation in Theorem 2.4:

(15) = |I + (C ′)◦′S−1XD′(DD′)−1KQ−1K ′(DD′)−1DX ′S−1(C ′)◦((C ′)◦

′S−1(C ′)◦)−1|−1

= |(C ′)◦′S−1(C ′)◦|−1|((C ′)◦′S−1(C ′)◦)−1 + ((C ′)◦′S−1(C ′)◦)−1(C ′)◦

′S−1

×XD′(DD′)−1KQ−1K ′(DD′)−1DX ′S−1(C ′)◦((C ′)◦′S−1(C ′)◦)−1|−1

=|(C ′)◦′S−1(C ′)◦|−1∣∣∣∣((C ′)◦′S−1(C ′)◦)−1 + ((C ′)◦′S−1(C ′)◦)−1(C ′)◦

′S−1XD′(DD′)−1KQ−1K ′

× (DD′)−1DX ′S−1(C ′)◦((C ′)◦′S−1(C ′)◦)−1

∣∣∣∣ ·Now we will find the distributions of the expressions in this ratio. Let’s start with((C ′)◦

′S−1(C ′)◦)−1. We multiply this expression with the following identity matrices

from left and right:

︸︷︷︸I

((C ′)◦′(C ′)◦)−1(C ′)◦

′(C ′)◦((C ′)◦

′S−1(C ′)◦)−1︸︷︷︸

I

(C ′)◦′

︸︷︷︸=S−SC′(CSC′)−1CS

(C ′)◦((C ′)◦′(C ′)◦)−1

= ((C ′)◦′(C ′)◦)−1(C ′)◦

′[S − SC ′(CSC ′)−1CS](C ′)◦((C ′)◦

′(C ′)◦)−1. (16)

Recall that S = X(I − PD′)X ′. Then (16) becomes

((C ′)◦′(C ′)◦)−1(C ′)◦

′X(I − PD′)(I −X ′C ′[CX(I − PD′)X ′C ′]−1

×CX(I − PD′))X ′(C ′)◦((C ′)◦′(C ′)◦)−1. (17)

From Theorem 2.3, two relations

I = (C ′)◦((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1 + ΣC ′(CΣC ′)−1C,

I = Σ−1(C ′)◦((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′+C ′(CΣC ′)−1CΣ

are obtained which will be then used in (17):

(17) = ((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1X(I − PD′)(I −X ′C ′[CX(I − PD′)

×X ′C ′]−1CX(I − PD′))X ′Σ−1(C ′)◦((C ′)◦′Σ−1(C ′)◦)−1.

We have achieved a structure where we have · · ·X ( )︸︷︷︸idempotent

X ′ · · · . Now the rank of this

idempotent matrix is checked (see Appendix A, Result A2) :

r(I − PD′)(I −X ′C ′[CX(I − PD′)X ′C ′]−1CX(I − PD′)) = N − r(D)− p+ 1. (18)

We also need to show that (C ′)◦′Σ−1X and X ′C ′ are independent which is verified in

Appendix A, Result A3.

Thus, we can conclude that the conditional distribution of ((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1X

× (I −PD′)(I −X ′C ′[CX(I −PD′)X ′C ′]−1CX(I −PD′)) conditioned on CX followsa normal distribution with mean 0 and dispersion matrix ((C ′)◦

′Σ−1(C ′)◦)−1. Then, by

Theorem 2.7,

((C ′)◦′S−1(C ′)◦)−1 ∼ W1(((C ′)◦

′Σ−1(C ′)◦)−1, N − r(D)− p+ 1),

19

which is independent of CX. Now we will move on to the second expression in the ratio

given by (15), which equals

((C ′)◦′S−1(C ′)◦)−1(C ′)◦

′S−1XD′(DD′)−1KQ−1K ′

× (DD′)−1DX ′S−1(C ′)◦((C ′)◦′S−1(C ′)◦)−1. (19)

First focus on ((C ′)◦′S−1(C ′)◦)−1(C ′)◦

′S−1X. We use the identity matrix ((C ′)◦

′(C ′)◦)−1

×(C ′)◦′(C ′)◦ and the relations (C ′)◦((C ′)◦

′S−1(C ′)◦)−1(C ′)◦

′S−1 = I−SC ′(CSC ′)−1C

and S = X(I − PD′)X ′. Then

((C ′)◦′(C ′)◦)−1(C ′)◦

′(C ′)◦((C ′)◦

′S−1(C ′)◦)−1(C ′)◦

′S−1X

= ((C ′)◦′(C ′)◦)−1(C ′)◦

′X[I − (I − PD′)X ′C ′(CX(I − PD′)X ′C ′)−1CX]. (20)

Similar to the calcutions done for (17), the identity matrix

I = (C ′)◦((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1 + ΣC ′(CΣC ′)−1C

will be used in (20):

((C ′)◦′(C ′)◦)−1(C ′)◦

′(C ′)◦((C ′)◦

′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1X

× [I − (I − PD′)X ′C ′(CX(I − PD′)X ′C ′)−1CX]

=((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1X[I − (I − PD′)X ′C ′(CX(I − PD′)X ′C ′)CX]. (21)

Moreover, (C ′)◦′Σ−1X and X ′C ′ are independently distributed. Thus, (21) is normally

distributed given CX and so is

((C ′)◦′S−1(C ′)◦)−1(C ′)◦

′S−1XD′(DD′)−1KQ−1/2. (22)

From the definition of the Wishart distribution given by Definition 2.6, if X is normallydistributed with mean 0 and dispersion I ⊗ Σ, then XX ′ ∼ W (Σ, n). So we need tocheck the mean and dispersion for (22). The mean is zero and for the dispersion recallthat

D[AXB] = (B′ ⊗A)D(X)︸︷︷︸I⊗Σ

(B ⊗A′) = (B′B)⊗ (AΣA′).

Using this formula, the dispersion of (22) for given CX equals

Q−1/2K ′(DD′)−1D[I −X ′C ′(CX(I − PD′)X ′C ′)−1CX(I − PD′)][I − (I − PD′)×X ′C ′(CX(I − PD′)X ′C ′)−1CX]D′(DD′)−1KQ−1/2 ⊗ ((C ′)◦

′Σ−1(C ′)◦)−1(C ′)◦

′

×Σ−1ΣΣ−1(C ′)◦((C ′)◦′Σ−1(C ′)◦)−1.

(23)

The details for the calculation of (23) is given in Appendix A, Result A4. Based on thisresult

(23) = I ⊗ ((C ′)◦′Σ−1(C ′)◦)−1.

20

Hence, we can conclude that (22) is normally distributed with mean 0 and dispersionI ⊗ ((C ′)◦

′Σ−1(C ′)◦)−1, conditional on CX. Therefore, the square of the matrix in (22)

is Wishart distributed. Notice that

[I − (I − PD′)X ′C ′(CX(I − PD′)X ′C ′)−1CX]D′(DD′)−1KQ−1/2

×Q−1/2K ′(DD′)−1D[I −X ′C ′(CX(I − PD′)X ′C ′)−1CX(I − PD′)] (24)

is idempotent. To show this, putG = Q−1/2K ′(DD′)−1D[I−X ′C ′(CX(I−PD′)X ′C ′)−1

×CX(I − PD′)] andG′GG′︸︷︷︸

=I

G = G′G.

To see the details for GG′ = I, see Appendix A, Result A4. Thus, (24) is idempotent.We need to check the rank of this idempotent matrix to determine the degrees of freedomin the Wishart distribution (See Theorem 2.7):

r((24)) = r(G′G)Prop. 2.14 ii)

= tr(G′G)Prop. 2.14 i)

= tr(GG′) = tr(I).

Furthermore, tr(I) will be equal to the size ofGG′. To be able to find the size ofGG′, oneneeds to check Q = K ′(DD′)−1K +K ′(DD′)−1DX ′C ′(CSC ′)−1CXD′(DD′)−1K.Say K is a q× s matrix and r(K) = s. Then Q is s× s, so the size of GG′ is s = r(K).

As a conclusion,

((C ′)◦′S−1(C ′)◦)−1(C ′)◦

′S−1XD′(DD′)−1KQ−1K ′(DD′)−1DX ′S−1(C ′)◦

× ((C ′)◦′S−1(C ′)◦)−1 ∼ W1(((C ′)◦

′Σ−1(C ′)◦)−1, r(K)).

Thus, the distribution for (15) is given by|U |

|U + V |, where

U ∼ W1(((C ′)◦′Σ−1(C ′)◦)−1, N − r(D)− p+ 1),

V ∼ W1(((C ′)◦′Σ−1(C ′)◦)−1, r(K)).

If we pre- and post-multiply U and V with ((C ′)◦′Σ−1(C ′)◦)1/2, and denote the new

expressions with U and V respectively, then the ratio becomes|U |

|U + V |, where

U ∼ W1(I, N − r(D)− p+ 1) and V ∼ W1(I, r(K)).

Then

λ2/N =|U |

|U + V |∼ Λ(1, N − r(D)− p+ 1, r(K)).

21

3.2.3 Flatness Hypothesis

Assuming that the profiles are parallel, we will test if they are flat or not.

H3|H1 : E(X) = MD, CM = 0,

A3|H1 : E(X) = MD, CMF = 0, (25)


Theorem 3.3. The likelihood ratio statistic for the flatness hypothesis is given by

λ2/N =|CSC ′ +CXPD′(DD′)−1KX

′C ′||CSC ′ +CXPD′(DD′)−1KX ′C ′ +CXPD′F ◦X ′C ′|

, (26)

where

CXPD′F ◦X′C ′ ∼ Wp−1(CΣC ′, r(D′F ◦)),

CSC ′ +CXPD′(DD′)−1KX′C ′ ∼ Wp−1(CΣC ′, N − r(D) + r(K)).

Thenλ2/N ∼ Λ(p− 1, N − r(D) + r(K), r(D′F ◦)).

Proof. Equivalent expressions for the restrictions in both hypotheses can be written

H3 : CM = 0 ⇔ M = (C ′)◦θ,

A3 : CMF = 0 ⇔ M = (C ′)◦θ1 +C ′θ2F◦′ .

If we plug in these solutions into the model given in (2), then

H3 : X = (C ′)◦θD +E,

A3 : X = (C ′)◦θ1D +C ′θ2F◦′D +E.

We will first look at the null hypothesis. Under H3:

|(X − (C ′)◦θD)(X − (C ′)◦θD)′| = |(XPD′ − (C ′)◦θD)( )′ +X(I − PD′)X ′|. (27)

↑PD′ + (I − PD′)

We cannot simply say that (27) ≥ |X(I − PD′)X ′| with equality if and only if XPD′ =(C ′)◦θD because XPD′ = (C ′)◦θD is not necessarily a consistent equation. Recall thetwo conditions for consistency from Theorem 2.2:

(i) C(PD′X ′) ⊆ C(D′) is satisfied.

(ii) C(XPD′) ⊆ C((C ′)◦), which is not necessarily true.

Thus, we need some further steps.

(27) = |(XPD′ − (C ′)◦θD)( )′ + S| = |S||S−1(XPD′ − (C ′)◦θD)( )′ + I|= |S||I + (XPD′ − (C ′)◦θD)′S−1(XPD′ − (C ′)◦θD)|.

22

Let G = XPD′ − (C ′)◦θD and using Theorem 2.3,

(27) = |S||I +G′S−1(C ′)◦((C ′)◦′S−1(C ′)◦)−(C ′)◦

′S−1G+G′C ′(CSC ′)−CG|

≥ |S||I +G′C ′(CSC ′)−CG| (28)

which is independent of θ since

CG = C(XPD′ − (C ′)◦θD) = CXPD′ −C(C ′)◦︸︷︷︸=0

θD = CXPD′ .

Equality holds if and only if

G′S−1(C ′)◦((C ′)◦′S−1(C ′)◦)−(C ′)◦

′S−1G = 0.

This is equivalent toG′S−1(C ′)◦((C ′)◦

′S−1(C ′)◦)− = 0.

Thus, the lower bound which we were seeking for in (28) and which equals |S||I +G′C ′(CSC ′)−CG| = |S||I + PD′X

′C ′(CSC ′)−CXPD′| has been obtained. The situ-ation for the third hyothesis is similar to the level hypothesis where we have CMF = 0as the alternative hypothesis. We assume that the profiles are parallel and the test is con-ducted to see if they are flat or not. The restrictrictions for this test can be summarisedwith CM = 0. Due to the assumption that parallelism holds, the alternative hypothesisbecomes CMF = 0. Then (9) from Section 3.2.1, where the likelihood for CMF = 0

has been derived, will be used for |NΣA3 |. As a result,

|NΣH3| = |S||I + PD′X′C ′(CSC ′)−CXPD′ |,

|NΣA3| = |S + P ′C′,S−1XPD′(DD′)−KX′PC′,S−1|.

In order to get a familiar structure for the ratio of these two quantities, we need todo some changes on |NΣH3 |. Use the rotation given by Theorem 2.4 and since PD′ isidempotent, PD′PD′ = PD′ holds. And use also I = SS−1:

|NΣH3 | = |S||I + PD′X′C ′(CSC ′)−CSS−1X| (rotate)

= |S||I +XPD′X′C ′(CSC ′)−CS︸︷︷︸

=PC′,S−1

S−1| (PC′,S−1 = P 2C′,S−1 and then rotate)

= |S||I + PC′,S−1S−1XPD′X′PC′,S−1|

= |S||I +C ′(CSC ′)−C SS−1︸︷︷︸=I

XPD′X′PC′,S−1|

= |S + SC ′(CSC ′)−CXPD′X′PC′,S−1|

= |S + P ′C′,S−1XPD′X′PC′,S−1 |.

Hence,

|NΣH3 | = |S + P ′C′,S−1XPD′X′PC′,S−1|;

|NΣA3 | = |S + P ′C′,S−1XPD′(DD′)−1KX′PC′,S−1|.

23

We already know that XPD′X′ is Wishart distributed (see Theorem 2.7) but P ′C′,S−1X

× PD′X ′PC′,S−1 is not. Similarly XPD′(DD′)−1KX′ is Wishart distributed but P ′C′,S−1X

× PD′(DD′)−1KX′PC′,S−1 is not. Let H = (C ′,S−1C ′)◦ be of full rank:


.

We start with calculating |H ′ΣH3H|:

|H ′ΣH3H| =∣∣∣∣ ( C

(C ′)◦′S−1

)(S + P ′C′,S−1XPD′X

′PC′,S−1)(C ′,S−1(C ′)◦)

∣∣∣∣=

∣∣∣∣V11 V12

V21 V22

∣∣∣∣ ,where

V11 =C(S + P ′C′,S−1XPD′X′PC′,S−1)C ′,

V12 =C(S + P ′C′,S−1XPD′X′PC′,S−1)S−1(C ′)◦,

V21 =(C ′)◦′S−1(S + P ′C′,S−1XPD′X

′PC′,S−1)C ′,

V22 =(C ′)◦′S−1(S + P ′C′,S−1XPD′X

′PC′,S−1)S−1(C ′)◦.

Let’s check V12:

V12 = CSS−1(C ′)◦ +CP ′C′,S−1XPD′X′PC′,S−1S−1(C ′)◦

= C(C ′)◦︸︷︷︸=0

+CSC ′(CSC ′)−CXPD′X′C ′(CSC ′)−CSS−1(C ′)◦︸︷︷︸

=0

= 0.

Notice that V21 = V ′12 = 0. Let’s calculate the other elements:

V11 = CSC ′ +CSC ′(CSC ′)−CXPD′X′C ′(CSC ′)−CSC ′

= CSC ′ +CXPD′X′C ′,

V22 = (C ′)◦′S−1SS−1(C ′)◦ + (C ′)◦

′S−1SC ′(CSC ′)−CXPD′X

′C ′(CSC ′)−

×CSS−1(C ′)◦

= (C ′)◦′S−1(C ′)◦.

If the established relations are put together:

|H ′ΣH3H| =∣∣∣∣CSC ′ +CXPD′X ′C ′ 0

0 (C ′)◦′S−1(C ′)◦

∣∣∣∣ .The alternative hypothesis for the flatness test is the same as for the level test. Conse-quently, the corresponding likelihood will have the same form, so H ′ΣA2H = H ′ΣA3H .Notice that the matrix H used during the level test is the same as the matrix H intro-duced here. Then

|H ′ΣA3H| = |CSC ′ +CXPD′(DD′)−1KX′C ′||(C ′)◦′S−1(C ′)◦|.

24

Thus, the ratio becomes:


=|CSC ′ +CXPD′(DD′)−1KX

′C ′||(C ′)◦′S−1(C ′)◦||CSC ′ +CXPD′X ′C ′||(C ′)◦′S−1(C ′)◦|


′C ′||CSC ′ +CXPD′X ′C ′|

,

where CSC ′, CXPD′(DD′)−1KX′C ′ and CXPD′X

′C ′ are all Wishart distributed. How-

ever, we are trying to find a structure with |U ||U+V | , U and V are independently Wishart

distributed. Recall the space decomposition given in Equation (8) for the parallelismhypothesis. It implies

I − PD′F ◦ = I − PD′ + PD′(DD′)−1K ,

PD′ = PD′(DD′)−1K + PD′F ◦ .

Then we can write the ratio as



′C ′||CSC ′ +CX(PD′(DD′)−1K + PD′F ◦)X ′C ′|



·

Since X is normally distributed and PD′F ◦ is idempotent, by Theorem 2.7,

XPD′F ◦X′ ∼ Wp(Σ, r(D

′F ◦))Theorem 2.8⇒ CXPD′F ◦X

′C ′ ∼ Wp−1(CΣC ′, r(D′F ◦)).

We already know that CSC ′ ∼ Wp−1(CΣC ′, N − r(D)) and by Theorem 2.6, the sumof two independently distributed Wishart matrices with the same scale matrix is againWishart. Then

CSC ′ +CXPD′(DD′)−1KX′C ′ ∼ Wp−1(CΣC ′, N − r(D) + r(K)).

Using the result given in Appendix A, Result A1,

λ2/N ∼ Λ(p− 1, N − r(D) + r(K), r(D′F ◦)).

4 High-dimensional setting

4.1 Background

The classical setting for data analysis usually consists of large number of experimentalunits and small number of variables. For estimability reasons, the number of data points,n, needs to be larger than the number of parameters, p. Asymptotic properties have beenderived in the classical setting. Theorems such as The Law of Large Numbers and TheCentral Limit Theorem focus on the case when p is fixed and n→∞.

25

In recent years due to the development of information technology and data storage, thedirection of the relationship between p and n has started to change. We face more researchquestions where we have more parameters than data points:

p > n or p� n. (29)

Here we can mention different types of asymptotics as n→∞:

(i) pn→ c, where c ∈ (a, b),

(ii) pn→∞.

Classical multivariate approaches fail when the dimension of repeated measurements, pstarts to exceed the number of observations, n. The sample covarince matrix becomessingular, consequently the likelihood ratio tests are not well-defined. As a result of this,classical tests are not feasible in the high-dimensional setting. We need to investigateand extend the current approaches within the high-dimensional framework. Our focusin this report will not be on any specific type of asymptotics that have been mentionedabove. The p and n will be fixed and they will satisfy the condition given by (29).

Ledoit and Wolf (2002) derived hypothesis tests for the covariance matrix in a high-dimensional setting. Srivastava (2005) also developed tests for certain hypotheses onthe covariance matrix in high-dimensions. Srivastava and Fujikoshi (2006), Srivastava(2007), Srivastava and Du (2008) are other examples in the multivariate area. Kollo, vonRosen and von Rosen (2011) focused on estimating the parameters describing the meanstructure in the Growth Curve model. Testing for the mean matrix in a Growth Curvemodel for high-dimensions was studied by Srivastava and Singull (2017) as well. Fujikoshi,Ulyanov and Shimizu (2010) focus on high-dimensional and large-sample approximationsfor multivariate statistics.

The focus in this report is on high-dimensional profile analysis. Onozawa, Nishiyama andSeo (2016) derived test statistics for profile analysis with unequal covariance matrices inhigh-dimensions. Similarly, Harrar and Kong (2016) worked on this topic. Shutoh andTakahashi (2016) proposed new test statistics in profile analysis with high-dimensionaldata by using the Cauchy-Schwarz inequality. All these references study the asymptoticdistributions of the test statistics. They introduce different high-dimensional asymptoticframeworks and derive the test statistics in profile analysis under these frameworks. Ourapproach will be different than the approaches mentioned above. As noted before, we willnot focus on the asymptotic distributions of the test statistics. In this report, fixed p andn are of interest and the method that is going to be used is introduced in the followingchapter.

4.2 Dimension reduction using scores and spherical distribu-tions

Lauter (1996, 2016) and Lauter, Glimm and Kropf (1996, 1998) proposed a new methodfor dealing with the problem that arises in high-dimensional settings. The tests theyproposed are based on linear scores which have been obtained by using score coefficientsthat are determined from data via sums of products matrices. These scores are basicallylinear combinations of the repeated measures and how the combinations are being chosen

26

are called score coefficients or weights. With this approach high-dimensional observationsare compressed into low-dimensional scores. Then these are used for the analysis insteadof the original data. This approach can be useful in many situations because we often donot have the knowledge on the effect of each single variable or one may want to investigatethe joint effect of several variables.

Let’s give the mathematical representation of the theory. Suppose

x = (xi) ∼ Np(µ,Σ)

and n individual p-dimensional vectors form the p× n matrix X which satisfies

X = (xij) ∼ Np×n(µ1′n,Σ, In).

Consider a single score

z′ = (z1, z2, · · · , zn) = (d1, d2, · · · , dp)X = d′X,

where d is the vector of weights and zj’s, j = 1, ..., n, are the individual scores. Therule for choosing the vector d of the coefficients is that it has to be a unique functionof XX ′ which is the p× p matrix of the sums of the products. Moreover, the conditiond′X 6= 0 with probability 1 needs to be satisfied. The total sums of product matrixXX ′ corresponds to the hypothesis µ = 0. Consequently, based on the hypothesis thestructure of the function can change. We will try to illustrate the idea with two primarytheorems presented in Lauter, Glimm and Kropf (1996).

Theorem 4.1. (Lauter et al., 1996) Assume that X is a p × n matrix consistingof n p-dimensional observations (p ≥ 1, n ≥ 2) that follows the normal distributionX ∼ Np×n(0,Σ, In). Define a p-dimensional vector of weights d which is a function ofXX ′ and assume d′X 6= 0 with probability 1. Then

t =

√nz

sz(30)

has the exact t distribution with n− 1 degrees of freedom, where

z′ = (zj)′ = d′X, z =

1

nz′1n, s2

z =1

n− 1(z′z − nz2).

Theorem 4.2. (Lauter et al., 1996) Assume that H ∼ Wp(Σ,m) and G ∼ Wp(Σ, f)and they are independently distributed. Define a p-dimensional vector of weights d whichis a function of H +G and assume d′(H +G)d 6= 0 with probability 1. Then

F =f

m

d′Hd

d′Gd

follows an F -distibution with m and f degrees of freedom.

The idea behind these theorems are based on the theory of spherical distributions whichhas been treated extensively in the book by Fang and Zhang (1990). Elliptically contoureddistributions can be considered as the generalization of the class of Gaussian distributionwhich has been the centre of multivariate theory. Normality is assumed for many testingproblems, but practically this is often not true. Thus, there has been an effort to extend

27

the class of normal distibution to a wider class which still keeps the basic properties ofnormal distribution.

We know that if z ∼ Nn(0, In), the statistic t given in (30) has a t-distribution with n−1degrees of freedom , so we need to show a connection between z and the standard normaldistribution. Since the normal distribution is in the class of spherical distributions, ifone can show that z is spherically distributed, then this connection will be provided.We also need to show that the test statistics’ distributions remain the same when weuse spherically distributed random vectors. These ideas are given by a corollary and atheorem, among others, by Fang and Zhang (1990).

Corollary 4.1. (Fang and Zhang, 1990) An n × 1 random vector x is spherically

distributed if and only if, for every n× n orthogonal matrix Γ, xd= Γx.

Theorem 4.3. (Fang and Zhang, 1990) A statistic t(x)’s distribution remains the

same whenever x ∼ S+n (φ) if t(αx)

d= t(x) for each α > 0 and each x ∼ S+

n (.), whereSn(φ) denotes the spherical distribution with parameter φ and φ(.) is a function of ascalar variable and it is called the characteristic generator of the spherical distribution.If x ∼ Sn(φ) and P (x = 0) = 0, this is denoted by x ∼ S+

n (φ).

4.3 The derivation of the tests in the high dimensional setting

We have derived the three hypothesis for the classical setting where N > p. Now we willfocus on the high dimensional setting where we have p > N or p >> N . First we willconstruct the scores and then derive the likelihood ratio tests based on these scores. Oneshould notice that we use capital N for the total sample size when there are groups, whosesizes may differ from each other. We refer to the parts related to profile analysis. In ourcase, there are q groups with group size nk, where k = 1, ..., q and therefore N =

∑qk=1 nk.

In Chapter 4.1 and Chapter 4.2, while the general introduction is being given, the totalsample size is denoted by n when there exists one group in the analysis or theory.

4.3.1 Parallelism Hypothesis

Recall (3) and (4) from Section 3.2.1,

H1 : E(X) = MD, CMF = 0,

A1 : E(X) = MD, CMF 6= 0

and

LR =|ΣA1 ||ΣH1|

=|CSC ′|

|CSC ′ +CXPD′(DD′)−KX ′C ′|,

where

CSC ′ ∼ Wp−1(CΣC ′, N − r(D)),

CXPD′(DD′)−1KX′C ′ ∼ Wp−1(CΣC ′, r(K)). (31)

28

In the beginning it was assumed that X ∼ Np,N(MD,Σ, IN). If we multiply X withC, then CX ∼ N(p−1),N(CMD,CΣC ′, IN). As we can see through the derivation ofthe statistics in (31), X appears with C. Let Y = CX. Then

CSC ′ = CX(I − PD′)X ′C ′ = Y (I − PD′)Y ′ ∼ Wp−1(CΣC ′, N − r(D)),

CXPD′(DD′)−1KX′C ′ = Y PD′(DD′)−1KY

′ ∼ Wp−1(CΣC ′, r(K)).

Instead of applying the vector d to X we will apply it to Y . Notice that d is a (p−1)×1vector. When we multiply Y with d′ from left (and Y ′ with d from right), Y will bereduced to a vector and we call this new vector the score vector and denote it by z, thatis z′ = d′Y :

λN/2 =d′Y (I − PD′)Y ′d

d′Y (I − PD′)Y ′d+ d′Y PD′(DD′)−KY ′d

=z′(I − PD′)z

z′(I − PD′)z + z′PD′(DD′)−1Kz. (32)

Now we are going to find the distribution of this ratio.

Theorem 4.4. The ratio given in (32) follows Wilks’ lambda distribution with parameters1, r(K) and N − r(D) that is denoted by Λ(1, r(K), N − r(D)) which is equivalent to

B(N−r(D)

2, r(K)

2

), where B(·, ·) denotes the Beta-distribution.

Proof. First, we should note that since d is a function of Y Y ′,

d′Y (I − PD′)Y ′d � W1(d′CΣC ′d, N − r(D)),

d′Y PD′(DD′)−1KY′d � W1(d′CΣC ′d, r(K)),

which means that we cannot find the distribution of the ratio directly. This is where weneed the theory of spherical distributions.We begin by showing that the scores are spher-ically distributed. To show this, first we need to show that Y is spherically distributed:

YΓ = Y Γ

D(YΓ) = (Γ′Γ)⊗Σ = I ⊗Σ but E(YΓ) 6= E(Y ).

Thus, Y is not spherically distributed. Therefore, we need to adapt the test statisticwithout changing the overall value in order to achieve sphericity.

Recall that the model under the null hypothesis is

X = (C ′)◦θ1D +C ′θ2F◦′D +E,

and accordingly the mean under the null hypothesis equals

(C ′)◦θ1 +C ′θ2F◦′ .

We subtract the mean under the null hypothesis from X. Then the expressions in the

29

ratio given by (32) will become as it follows:

(i) d′C[X − E(X)](I − PD′)[X − E(X)]′C ′d

=d′C[X − (C ′)◦θ1D −C ′θ2F◦′D](I − PD′)[X − (C ′)◦θ1D −C ′θ2F

◦′D]′C ′d

=d′CX(I − PD′)X ′C ′d=d′Y (I − PD′)Y ′d;

(ii) d′C[X − E(X)]PD′(DD′)−1K [X − E(X)]′C ′d

=d′C[X − (C ′)◦θ1D −C ′θ2F◦′D]PD′(DD′)−1K [X − (C ′)◦θ1D −C ′θ2F

◦′D]′C ′d

=d′[CX −C(C ′)◦︸︷︷︸=0

θ1D −CC ′θ2F◦′D]PD′(DD′)−1K [CX −C(C ′)◦︸︷︷︸

=0

θ1D

−CC ′θ2F◦′D]′d.

Let’s look at the middle part:

CC ′θ2F◦′DPD′(DD′)−1K

= CC ′θ2F◦′DD′(DD′)−1︸︷︷︸

=I

K[(D′(DD′)−1K)′(D′(DD′)−1K)]−(D′(DD′)−1K)′.

(33)

Recall C(K) = C(D) ∩ C(F ) and K can be written as K = D(D′F ◦)◦. Then

(33) = CC ′θ2 F◦′D(D′F ◦)◦︸︷︷︸

=0

[(D′(DD′)−1K)′(D′(DD′)−1K)]−(D′(DD′)−1K)′.

Then

(ii) = d′CXPD′(DD′)−1KX′C ′d = d′Y PD′(DD′)−1KY

′d

This means that we can subtract the mean from X and the expression of the likelihoodratio remains the same which means that the distribution will remain the same:

λN/2 =d′Y (I − PD′)Y ′d

d′Y (I − PD′)Y ′d+ d′Y PD′(DD′)−1KY ′d

=d′[Y −CE(X)](I − PD′)[Y −CE(X)]′d

d′[Y −CE(X)](I − PD′)[Y −CE(X)]′d+ d′[Y −CE(X)]× PD′(DD′)−1K [Y −CE(X)]′d

·

If we denote this new variable, Y −CE(X), by Y , then

λN/2 =d′Y (I − PD′)Y ′d

d′Y (I − PD′)Y ′d+ d′Y PD′(DD′)−1KY ′d.

The reason why we needed the adaptation of the Y was that it was not sphericallydistributed. Now without changing the statistics overall, we achieved a new variable Ywhich is spherical distributed. To show this:

E(Y ) = E(Y )−CE(X) = 0,

D(Y ) = D(Y ).

30

We showed that Y is distributed with mean 0 and with covariance the same as Y ’s.Define YΓ = Y Γ, where Γ is an N × N orthogonal matrix. The subscript Γ meansmultiplication with Γ from the right-hand side.

E(YΓ) = E(Y Γ) = 0,

D(YΓ) = (Γ′Γ)⊗Σ = I ⊗Σ.

This proves that YΓ and Y follow the same (p − 1) × N normal distribution. Now we

can show that the scores, z′ = d′Y , are spherically distributed. First, define z′Γ = d′ΓYΓ.Notice that dΓ = d since they are derived from the same matrix:

YΓY′

Γ = Y Γ(Y Γ)′ = Y ΓΓ′Y ′ = Y Y ′.

In Chapter 4.2, it was stated that d needs to be a unique function of the total sums ofproduct matrix. The derivation above is based on this information. Then

z′Γ = d′Y Γ = z′Γ. (34)

From this result, we can say that the vectors z′ and z′Γ follows the same distributionwhich is spherical. It is well-known that every spherically distributed random vector x

has a stochastic representation xd= Ru(n), where u(n) is a uniformly distributed random

vector and R ≥ 0 is independent of u(n). Furthermore, if x ∼ Sn(φ), then xd= Rw,

where w ∼ Nn(0, In) is independent of R ≥ 0. Lastly, recall the Theorem 2.5.8. fromFang and Zhang (1990) which is also given in Chapter 4.2 as Theorem 4.3. Now we canwork on the ratio Λ2/N and implement these results:

f(z) = λ2/N =z′(I − PD′)z

z′(I − PD′)z + z′PD′(DD′)−1Kz

d=

(Rw)′(I − PD′)(Rw)

(Rw)′(I − PD′)(Rw) + (Rw)′PD′(DD′)−1K(Rw)

=��R2w′(I − PD′)w

��R2(w′(I − PD′)w +w′PD′(DD′)−1Kw

)which does not depend on R. The theorem from Fang and Zhang is satisfied. Thus, wecan conclude that

λ2/N =z′(I − PD′)z

z′(I − PD′)z + z′PD′(DD′)−1Kz∼Λ(1, N − r(D), r(K))

≡B(N − r(D)

2,r(K)

2

). (35)

Wilks’ lambda distribution can be written as the product of independently distributedBeta variables (Mardia and Kent, 1979, Lauter, 2016), that is Λ(p,m, n) ∼

∏pi=1Bi, where

B1, B2, ..., Bn are p independent random variables which follows the Beta distribution;ui ∼ B(m−i+1

2, n

2), i = 1, ..., p. Notice that we have one dimensional Wishart distributed

expressions in the ratio given by (32) which means p = 1. Then the overall ratio willreduce to one Beta distribution given at the end of the proof by (35).

31

4.3.2 Level hypothesis

First let’s recall how the null and the alternative hypotheses looked like in the normalsetting:

H2|H1 : E(X) = MD, MF = 0,

A2|H1 : E(X) = MD, CMF = 0

and the likelihood ratio was

LR =|(C ′)◦′S−1(C ′)◦|−1∣∣∣∣((C ′)◦′S−1(C ′)◦)−1 + ((C ′)◦

′S−1(C ′)◦)−1(C ′)◦

′S−1XD′(DD′)−1KQ−1K ′(DD′)−1

×DX ′S−1(C ′)◦((C ′)◦′S−1(C ′)◦)−1

∣∣∣∣ ·(36)

For this hypothesis, it is clear that the expressions in the likelihood ratio are already onedimensional. The contrast matrix was given explicitly in Chapter 3.1. It is a (p− 1)× pmatrix and

C : (p− 1)× p ⇒ C ′ : p× (p− 1) ⇒ (C ′)◦ : p× 1 ⇒ (C ′)◦′: 1× p.

However, dimension reduction is needed due to the degrees of freedom in the Wishartdistribution. Recall

((C ′)◦′S−1(C ′)◦)−1 ∼ W1(((C ′)◦

′Σ−1(C ′)◦)−1, N − r(D)− p+ 1).

When p > N , the degrees of freedom will become negative which cannot take place.Moreover, S−1 does not exist in this case. Thus, we need to take care of these issues.One can notice that we have a more complicated situation here because the expressionsin the ratio are more complex than for the parallelism and the level hypotheses. To solvethe problem, we expand ((C ′)◦

′S−1(C ′)◦)−1 and ((C ′)◦

′S−1(C ′)◦)−1(C ′)◦

′S−1X. The

expansions will be investigated in two parts; (i) and (ii), respectively:

(i) We start with ((C ′)◦′S−1(C ′)◦)−1:

((C ′)◦′S−1(C ′)◦)−1 = ((C ′)◦

′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1(C ′)◦((C ′)◦

′S−1(C ′)◦)−1

× (C ′)◦′Σ−1(C ′)◦((C ′)◦

′Σ−1(C ′)◦)−1

=((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1[S − SC ′(CSC ′)−1CS]Σ−1(C ′)◦((C ′)◦

′Σ−1(C ′)◦)−1.

(37)

Recall that S = X(I − PD′)X ′. Then (37) becomes

((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1X[(I − PD′)(I −X ′C ′(CX(I − PD′)X ′C ′)−1

×CX(I − PD′))]X ′Σ−1(C ′)◦((C ′)◦′Σ−1(C ′)◦)−1.

Apply the d vector to CX, i.e., d′CX:

((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1X[(I − PD′)(I −X ′C ′d(d′CX(I − PD′)

×X ′C ′d)−1d′CX(I − PD′))]X ′Σ−1(C ′)◦((C ′)◦′Σ−1(C ′)◦)−1.

(38)

32

When we check (38), we can notice the structure, say LXPX ′L′, where P is an idempo-tent matrix. This was aimed from the beginning and this will allow us to give the exactdistribution of (38). Notice that (C ′)◦

′Σ−1X and CX are independent which is proved

in Appendix A, Result A3.

It is assumed that d is a function of CX. Therefore,

((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1X[(I − PD′)(I −X ′C ′d(d′CX(I − PD′)X ′C ′d)−1

×d′CX(I − PD′))] (39)

is normally distributed given CX and it’s square is Wishart distributed. Let’s check therank of the idempotent matrix given below to determine the degrees of freedom of theWishart distribution. For the detailed calculations, see Appendix B, Result B1:

r[(IN −PD′)(IN −X ′C ′d(d′CX(I −PD′)X ′C ′d)−1d′CX(I −PD′))] = N − r(D)− 1.

The degrees of freedom of the Wishart distribution for (38) is now found. Then, thecovariance matrix needs to be obtained. From Theorem 2.12,

D[vec(((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1X)]

=I ⊗ ((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1ΣΣ−1(C ′)◦((C ′)◦

′Σ−1(C ′)◦)−1

=I ⊗ ((C ′)◦′Σ−1(C ′)◦)−1. (40)

We have examined the first expression in the ratio given by (36). Dimension reductionhas been applied with the help of the vector d after using necessary expansions. Thenthe distribution of this expression, that is (38), has been found.

We continue with the second expression which appears in the denominator of (36).

(ii) Let’s start with the first part which is ((C ′)◦′S−1(C ′)◦)−1(C ′)◦

′S−1X. Recall also

the relations (C ′)◦((C ′)◦′S−1(C ′)◦)−1(C ′)◦

′S−1 = I −SC ′(CSC ′)−1C and S = X(I −

PD′)X′:

((C ′)◦′S−1(C ′)◦)−1(C ′)◦

′S−1X

=((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1(C ′)◦((C ′)◦

′S−1(C ′)◦)−1(C ′)◦

′S−1X

=((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1X[I − (I − PD′)X ′C ′(CX(I − PD′)X ′C ′)−1CX].

Apply the d vector to CX, i.e., d′CX and X ′C ′d:

((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1X[I − (I − PD′)X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX].

(41)

Notice that (C ′)◦′Σ−1X and CX are independent which have been proven before and so

are (C ′)◦′Σ−1X and d′CX. The first part of the second expression in the denominator

of the likelihood ratio given by (36) has now been investigated. The second part isD′(DD′)−1KQ−1K ′(DD′)−1D. It is apparent thatQ in this expression, which was firstintroduced in Theorem 3.2, also contains CX for which we need to apply the dimensionreduction. Recall the matrix Q:

Q = K ′(DD′)−1K +K ′(DD′)−1DX ′C ′(CX(I − PD′)X ′C ′)−CXD′(DD′)−1K.

33

Apply d′ to CX which yields

Q = K ′(DD′)−1K +K ′(DD′)−1DX ′C ′d(d′CX(I − PD′)X ′C ′d)−d′CXD′(DD′)−1K.

Then, after the dimension reduction, the second expression in the denominator of thelikelihood ratio given by (36) becomes

((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1X[I − (I − PD′)X ′C ′d(d′CX(I − PD′)X ′C ′d)−1

× d′CX]D′(DD′)−1KQ−1K ′(DD′)−1D[I −X ′C ′d(d′CX(I − PD′)X ′C ′d)−1

× d′CX(I − PD′)]X ′Σ−1(C ′)◦((C ′)◦′Σ−1(C ′)◦)−1. (42)

In (42)

[I − (I − PD′)X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX]D′(DD′)−1KQ−1

×K ′(DD′)−1D[I −X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX(I − PD′)], (43)

which corresponds to the part that lies between X and X ′, is an idempotent matrix.For the details, see Appendix B, Result B2. This means that for (42) the structure,LXPX ′L′, where P is an idempotent matrix, has been achieved. We just need toshow that (C ′)◦

′Σ−1X and CX are independent which has already been shown in this

chapter for (i). Since d is a function of CX and Q also depends on d′CX, one canconclude that (C ′)◦

′Σ−1X is independent of what lies in the idempotent matrix given

by (43). Thus, (42) is Wishart distributed. For the degrees of freedom of the Wishartdistribution, one needs to check the rank of (43) which is equal to r(K), where K is anymatrix satisfying C(K) = C(D) ∩ C(F ), which was first defined in Theorem 3.1. For theproof, see Appendix B, Result B3.

For the scale matrix of the Wishart distribution for (42), see the following which has alsobeen calculated in (40):

D[vec(((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1X)] = I ⊗ ((C ′)◦

′Σ−1(C ′)◦)−1.

Our aim was to find the distribution of the ratio which was given by (36) after thedimension reduction. The expressions from this ratio have been investigated separatelyin (i) and (ii) above. If the results from (i) and (ii) are put together, we can conclude thefollowing:

((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1X[(I − PD′)(I −X ′C ′d(d′CX(I − PD′)

×X ′C ′d)−1d′CX(I − PD′))]X ′Σ−1(C ′)◦((C ′)◦′Σ−1(C ′)◦)−1

∼ W1(((C ′)◦′Σ−1(C ′)◦)−1, N − r(D)− 1); (44)

((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1X[I − (I − PD′)X ′C ′d(d′CX(I − PD′)

×X ′C ′d)−1d′CX]D′(DD′)−1KQ−1K ′(DD′)−1D[I −X ′C ′d(d′CX

× (I − PD′)X ′C ′d)−1d′CX(I − PD′)]X ′Σ−1(C ′)◦((C ′)◦′Σ−1(C ′)◦)−1

∼ W1(((C ′)◦′Σ−1(C ′)◦)−1, r(K)). (45)

34

These expressions can be simplified in the following way:

(44) = ((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1[X(I − PD′)X ′ −X(I − PD′)X ′C ′d(d′CX

× (I − PD′)X ′C ′d)−1d′CX(I − PD′)X ′]Σ−1(C ′)◦((C ′)◦′Σ−1(C ′)◦)−1

= ((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1[S − SC ′d(d′CSC ′d)−1d′CS]Σ−1(C ′)◦

× ((C ′)◦′Σ−1(C ′)◦)−1

= ((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1[S(I − PC′d,S−1)]Σ−1(C ′)◦((C ′)◦

′Σ−1(C ′)◦)−1

Then

((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1[S(I − PC′d,S−1)]Σ−1(C ′)◦((C ′)◦

′Σ−1(C ′)◦)−1

∼ W1(((C ′)◦′Σ−1(C ′)◦)−1, N − r(D)− 1).

Moreover

(45) = ((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1[X − SC ′d(d′CSC ′d)−1d′CX]

×D′(DD′)−1KQ−1K ′(DD′)−1D[X ′ −X ′C ′d× (d′CSC ′d)−1d′CS]Σ−1(C ′)◦((C ′)◦

′Σ−1(C ′)◦)−1

= ((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1(I − P ′C′d,S−1)XD′(DD′)−1KQ−1

×K ′(DD′)−1DX ′(I − PC′d,S−1)

∼ W1(((C ′)◦′Σ−1(C ′)◦)−1, r(K)).

Denote (44) by UL and (45) by VL. Then the following theorem can be presented:

Theorem 4.5.

λ2/N =UL

UL + VL∼Λ(1, N − r(D)− 1, r(K))

≡B(N − r(D)− 1

2,r(K)

2

).

4.3.3 Flatness Hypothesis

We will follow a very similar approach to the parallelism hypothesis due to the similaritiesbetween the test statistics in the likelihood ratio. Recall the null and the alternativehypotheses:

H3|H1 : E(X) = MD, CM = 0,

A3|H1 : E(X) = MD, CMF = 0

and

LR =|ΣA3||ΣH3|



,

35

where

CSC ′ +CXPD′(DD′)−1KX′C ′ ∼ Wp−1(CΣC ′, N − r(D) + r(K)), (46)

CXPD′F ◦X′C ′ ∼ Wp−1(CΣC ′, r(D′F ◦)). (47)

Recall thatCSC ′ = CX(I−PD′)X ′C ′. Put Y = CX and sinceX ∼ Np,N(MD,Σ, IN),Y ∼ N(p−1),N(CMD,CΣC ′, IN). To reduce the dimension, we will apply the weightvector d to Y instead of the data matrix X like we did for the parallelism hypothesisand denote the new vector by z, where z′ = d′Y . Then

λ2/N =d′Y (I − PD′)Y ′d+ d′Y PD′(DD′)−1KY

′d

d′Y (I − PD′)Y ′d+ d′Y PD′(DD′)−1KY ′d+ d′Y PD′F ◦Y ′d

=z′(I − PD′)z + z′PD′(DD′)−1Kz

z′(I − PD′)z + z′PD′(DD′)−1Kz + z′PD′F ◦z. (48)

The distribution of this ratio is presented in the next theorem.

Theorem 4.6. The ratio given in (48) follows Wilks’ lambda distribution with parameters1, N − r(D) + r(K) and r(D′F ◦) that is denoted by Λ(1, N − r(D) + r(K), r(D′F ◦))

which equals B(N−r(D)+r(K)

2, r(D

′F ◦)2

).

Proof. The situation with d being a function of Y Y ′ is the same as with the parallelismhypothesis. As a result of this, d′Y is not normally distrubuted. Thus,

d′Y (I − PD′)Y ′d+ d′Y PD′(DD′)−1KY′d � W1(d′CΣC ′d, N − r(D) + r(K),

d′Y PD′F ◦Y′d � W1(d′CΣC ′d, r(D′F ◦)).

Then we will use the related theorems of spherical distributions presented in Section 4.2.One can see that Y is not spherically distributed because the orthogonally transformedmatrix YΓ does not have the same distribution as Y since E(YΓ) 6= E(Y ). Here again,as with the first hypothesis (parallelism hypothesis), we will form a new variable bysubtracting the mean under the null hypothesis from X. The statistics in the likelihoodratio will not change and we will assure sphericity for the scores by assuring sphericityfor the new variable.

Recall that the model under the null hypothesis equals the classical growth curve model(GMANOVA)

X = (C ′)◦θD +E

and the mean under the null hypothesis is given by (C ′)◦θD.

After subtracting the mean from X, the ratio can be written

λN/2 =

d′[Y −CE(X)](I − PD′)[Y −CE(X)]′d+ d′[Y −CE(X)]× PD′(DD′)−1K [Y −CE(X)]′d

d′[Y −CE(X)](I − PD′)[Y −CE(X)]′d+ d′[Y −CE(X)]× PD′(DD′)−1K [Y −CE(X)]′d+ d′[Y −CE(X)]PD′F ◦ [Y −CE(X)]′d

,

since

d′C[X − E(X)](I − PD′)[X − E(X)]′C ′d

=d′C[X − (C ′)◦θD](I − PD′)[X − (C ′)◦θD]′C ′d

=d′CX(I − PD′)X ′C ′d=d′Y (I − PD′)Y ′d ; (49)

36

d′C[X − E(X)]PD′(DD′)−1K [X − E(X)]′C ′d

=d′C[X − (C ′)◦θD]PD′(DD′)−1K [X − (C ′)◦θD]′C ′d

=d′CXPD′(DD′)−1KX′C ′d

=d′Y PD′(DD′)−1KY′d ; (50)

d′C[X − E(X)]PD′F ◦ [X − E(X)]′C ′d

=d′C[X − (C ′)◦θD]PD′F ◦ [X − (C ′)◦θD]′C ′d

=d′CXPD′F ◦X′C ′d

=d′Y PD′F ◦Y′d . (51)

From (49), (50) and (51), we can conclude that the distribution of the ratio will remain the

same after we subtract the mean under the null hypothesis fromX. Let Y = Y −CE(X).Then

λN/2 =d′Y (I − PD′)Y ′d+ d′Y PD′(DD′)−1KY

′d

d′Y (I − PD′)Y ′d+ d′Y × PD′(DD′)−1KY ′d+ d′Y PD′F ◦Y ′d·

The next step is to show that Y is spherically distributed. This will be exactly the sameas for the parallelism hypothesis, consequently we omit the details. The scores, z′ = d′Y ,will also be spherically distributed according to the result in (34).

Finally we can establish the theorem, knowing that z is spherically distributed and usingTheorem 4.3 by Fang and Zhang,

λN/2 = f(z) =z′(I − PD′)z + z′PD′(DD′)−1Kz

z′(I − PD′)z + z′PD′(DD′)−1Kz + z′PD′F◦z

d=

(Rw)′(I − PD′)(Rw) + (Rw)′PD′(DD′)−1K(Rw)

(Rw)′(I − PD′)(Rw) + (Rw)′PD′(DD′)−1K(Rw) + (Rw)′PD′F◦(Rw)

=��R2{w′(I − PD′)w +w′PD′(DD′)−1Kw

}��R2{w′(I − PD′)w +w′PD′(DD′)−1Kw +w′PD′F◦w

} .One can see that f(z) does not depend on R which means that the ratio will keep itsdistribution. As a result,

λN/2 =z′(I − PD′)z + z′PD′(DD′)−1Kz

z′(I − PD′)z + z′PD′(DD′)−1Kz + z′PD′F ◦z∼ Λ(1, N − r(D) + r(K), r(D′F ◦))

≡ B(N − r(D) + r(K)

2,r(D′F ◦)

2

),

and the theorem is proven.

5 Conclusion

In this report, profile analysis of several groups is of interest. Different from the currentmethods which have been proposed by Srivastava (1987, 2002), we treat the problemsas problems in MANOVA and GMANOVA. Reformulation of the problems in this wayis useful in the later stages when the test statics have been formulated under high-dimensional setting. For all three hypotheses, we derived the test statistic |U |

|U+V | where

37

U and V are independently Wishart distributed. The distribution of |U ||U+V | is well-

known, that is Wilks’ lambda distribution which can be written as the product of Betadistributed variables.

When p > n, one is exposed to issues such as the singularity of S. Different approacheshave been proposed so far, but the one we focused on was Lauter’s (1996, 2016) andLauter, Glimm and Kropf’s (1996, 1998) method. The original idea is taking a linearcombinations of p-variables per individual, which means multiplying the data matrix Xwith a vector d′, where d is a function of XX ′. We needed to adapt this idea to makeit work in profile analysis. Thus, instead of implementing the reduction just for the datamatrix, we implemented it to a linear function of X, which appears in the likelihoodratio statistic. There is a special case with the level hypothesis. Due to the restrictionsapplied to the mean parameter space in this hypothesis, the likelihood ratio statisticsbecomes already one dimensional, but in high-dimensions, the degrees of freedom of theWishart distribution for U becomes negative. By dimension reduction, we again attainWilks’ lambda distributions for the ratios in the end, which do not depend on p this time.Spherical distribution theory is used to show the matrices follow a Wishart distribution.Notice that we did not need to use spherical distributions for the level hypothesis.

The important question which still needs to be answered is how we should choose d. It isnoted that d needs to be a function of CX, but how to determine this function remainsas an open question.

Appendices

Appendix A will include the technical results in the usual setting (Chapter 3, whereN > p), whereas Appendix B will include the technical results in a high dimensionalsetting (Chapter 4, where p > N).

Appendix A

Result A1. Let W1 ∼ Wp(Σ, f1) and W2 ∼ Wp(Σ, f2) with f1 ≥ p. Then the distribu-

tion of Λ = |W1||W1+W2| , that is Wilks’ lambda distribution which is denoted by Λ(p, f1, f2),

does not depend on Σ.

To show this, choose a non-singular matrix A which satisfies AΣA′ = Ip. Then

W1 = AW1A′ ∼ Wp(Ip, f1) and W2 = AW2A

′ ∼ Wp(Ip, f2).

and

Λ =|W1|

|W1 + W2|=

|A||W1||A′||A||W1 +W2||A′|

= Λ.

Λ’s distribution does not depend on Σ and since Λ = Λ, Λ’s distribution is also indepen-dent of Σ.

Based on this result, one can conclude that the distribution of

λ2/N =|CSC ′|

|CSC ′ +CXPD′(DD′)−1KX ′C ′|,

38

where

CSC ′ ∼Wp−1(CΣC ′, N − r(D)),

CXPD′(DD′)−1KX′C ′ ∼ Wp−1(CΣC ′, r(K)),

does not depend on Σ and one can replace CΣC ′ by Ip−1. Thus,

λ2/N ∼ Λ(p− 1, N − r(D), r(K)).

Result A2.

(18) =(I − PD′)(I −X ′C ′[CX(I − PD′)X ′C ′]−1CX(I − PD′))=(I − PD′)− (I − PD′)X ′C ′[CX(I − PD′)X ′C ′]−1CX(I − PD′).

We will find the rank of this expression. If we use Proposition 2.13,

r((I − PD′)− (I − PD′)X ′C ′[CX(I − PD′)X ′C ′]−1CX(I − PD′))= r(I − PD′) + r(IN −X ′C ′[CX(I − PD′)X ′C ′]−1CX(I − PD′))−N= N − r(D) +N − r(X ′C ′[CX(I − PD′)X ′C ′]−1CX(I − PD′))−N= N − r(D)− tr(X ′C ′[CX(I − PD′)X ′C ′]−1CX(I − PD′)) (by Prop. 2.14 ii))

= N − r(D)− tr(CX(I − PD′)X ′C ′[CX(I − PD′)X ′C ′]−1) (by Prop. 2.14 i))

= N − r(D)− tr(Ip−1)

= N − r(D)− p+ 1.

Result A3. (C ′)◦′Σ−1X and X ′C ′ are independent:

By Definition 2.12,

Cov[(C ′)◦′Σ−1X,CX] = E[vec((C ′)◦

′Σ−1X)vec′(CX)]

= E[(I ⊗ (C ′)◦′Σ−1)vecXvec′X(I ⊗C ′)]

= (I ⊗ (C ′)◦′Σ−1)E[vecXvec′X](I ⊗C ′)

= (I ⊗ (C ′)◦′Σ−1)(I ⊗Σ)(I ⊗C ′) (by Prop. 2.16 ii))

= I ⊗ (C ′)◦′Σ−1ΣC ′ = 0. (by Prop. 2.16 iii))

As a conclusion, (C ′)◦′Σ−1X and X ′C ′ are independent.

Result A4.

(22) = ((C ′)◦′S−1(C ′)◦)−1(C ′)◦

′S−1XD′(DD′)−1KQ−1/2.

The dispersion of this expression is then written as

(23) =Q−1/2K ′(DD′)−1D[I −X ′C ′(CX(I − PD′)X ′C ′)−1CX(I − PD′)]× [I − (I − PD′)X ′C ′(CX(I − PD′)X ′C ′)−1CX]D′(DD′)−1KQ−1/2

⊗ ((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1ΣΣ−1(C ′)◦((C ′)◦

′Σ−1(C ′)◦)−1.

39

The first part of the Kronecker product given by (23) can be calculated as

1) Q−1/2K ′(DD′)−1D

I −X ′C ′(CX(I − PD′)X ′C ′)−1CX(I − PD′)− (I − PD′)X ′C ′(CX(I − PD′)X ′C ′)−1CX

+X ′C ′(CX(I − PD′)X ′C ′)−1CX(I − PD′)×X ′C ′(CX(I − PD′)X ′C ′)−1CX

×D′(DD′)−1KQ−1/2

=Q−1/2K ′(DD′)−1DD′(DD′)−1KQ−1/2 +Q−1/2K ′(DD′)−1DX ′C ′(CX

× (I − PD′)X ′C ′)−1CXD′(DD′)−1KQ−1/2

=Q−1/2K ′(DD′)−1KQ−1/2 +Q−1/2K ′(DD′)−1DX ′C ′(CX(I − PD′)X ′C ′)−1

×CXD′(DD′)−1KQ−1/2

=Q−1/2[K ′(DD′)−1K +K ′(DD′)−1DX ′C ′(CX(I − PD′)X ′C ′)−1CX

×D′(DD′)−1K]Q−1/2

=Q−1/2QQ−1/2K ′(DD′)−1D

=I.

The second part of the Kronecker product given by (23) can be calculated as

2) ((C ′)◦′Σ−1(C ′)◦)−1(C ′)◦

′Σ−1ΣΣ−1(C ′)◦((C ′)◦

′Σ−1(C ′)◦)−1

=((C ′)◦′Σ−1(C ′)◦)−1((C ′)◦

′Σ−1(C ′)◦)((C ′)◦

′Σ−1(C ′)◦)−1

=((C ′)◦′Σ−1(C ′)◦)−1.

After combining 1) and 2), one can conclude that

(23) = I ⊗ ((C ′)◦′Σ−1(C ′)◦)−1.

Appendix B

Result B1. The rank of the idempotent matrix that appears in the middle part of (39)is calculated as

r[(IN − PD′)(IN −X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX(I − PD′))] (Prop. 2.13)

=r(IN − PD′)− r[IN −X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX(I − PD′)]−N=N − r(D) +N − r(X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX(I − PD′))−N (Prop. 2.14, ii))

=N − r(D)− tr(X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX(I − PD′)) (Prop. 2.14, i))

=N − r(D)− tr(d′CX(I − PD′)X ′C ′d(d′CX(I − PD′)X ′C ′d)−1)

=N − r(D)− tr(1)

=N − r(D)− 1.

Result B2. The matrix

[I − (I − PD′)X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX]D′(DD′)−1KQ−1

×K ′(DD′)−1D[I −X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX(I − PD′)],

which is given by (43) in Chapter 4.3.2 is an idempotent matrix:

40

Put

B = [I − (I − PD′)X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX]D′(DD′)−KQ−1/2.

Calculate

B′B =Q−1/2K ′(DD′)−1D

I −X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX(I − PD′)− (I − PD′)X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX

+X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX(I − PD′)×X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX

×D′(DD′)−1KQ−1/2

=Q−1/2K ′(DD′)−1DD′(DD′)−1KQ−1/2 + Q−1/2K ′(DD′)−1DX ′C ′d(d′CX

× (I − PD′)X ′C ′d)−1d′CXD′(DD′)−1KQ−1/2

=Q−1/2K ′(DD′)−1KQ−1/2 + Q−1/2K ′(DD′)−1DX ′C ′d(d′CX(I − PD′)×X ′C ′d)−1d′CXD′(DD′)−1KQ−1/2

=Q−1/2[K ′(DD′)−1K +K ′(DD′)−1DX ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX

×D′(DD′)−1K]Q−1/2

=Q−1/2QQ−1/2 = I.

Observe that (43) = BB′. To prove that (43) is idempotent, one needs to show thatBB′BB′ = BB′:

BB′B︸︷︷︸=I

B′ = BB′.

Result B3.

(43) = [I − (I − PD′)X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX]D′(DD′)−1KQ−1

×K ′(DD′)−1D[I −X ′C ′d(d′CX(I − PD′)X ′C ′d)−1d′CX(I − PD′)]= BB′.

r(43) = r(BB′) = tr(BB′) = tr(B′B) = tr(I) = s = r(K)

The size of Q is the same with the size of Q and Q’s size information together with therelation to the matrix K is given in the proof of Theorem 3.2 in Chapter 3.2.2.

Acknowledgements

The authors would like to thank Professor Julia Volaufova and Dr. Martin Singull for theirinvaluable comments and suggestions which helped to improve the report significantly.

References

[1] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis (3rded). Wiley, New York.

[2] Anderson, T. W. and Fang, K. T. (1982). On the theory of multivariate ellipticallycontoured distributions and their applications. Technical Report No. 54, StanfordUniversity, California.

41

[3] Anderson, T. W. and Fang, K. T. (1990). Theory and applications of ellipticallycontoured and related distributions. Technical Report No. 24, Stanford University,California.

[4] Bilodeau, M. and Brenner, D. (1999). Theory of Multivariate Statistics. Springer-Verlag, New York.

[5] Cambanis, S., Huang, S., and Simons, G. (1981). On the theory of elliptically con-toured distribution. Journal of Multivariate Analysis, 11, 368-385.

[6] Dawid, A. P. (1977). Spherical matrix distributions and a multivariate model. Journalof the Royal Statistical Society, Series B, 39, 254-261.

[7] Dawid, A. P. (1978). Extendibility of spherical matrix distributions. Journal of Mul-tivariate Analysis, 8, 559-566.

[8] Fang, K. T., Kotz, S. and Ng, K. W. (1990). Symmetric Multivariate and RelatedDistributions. Springer-Science+Business Media, B.V.

[9] Fang, K. T. and Zhang, Y. T. (1990). Generalized Multivariate Analysis. Springer-Verlag, Berlin.

[10] Fujikoshi, Y. (2009). Statistical inference for parallelism hypothesis in growth curvemodel. SUT Journal of Mathematics, 45, 137-148.

[11] Fujikoshi, Y., Ulyanov, V. V., and Shimizu, R. (2010). Multivariate Statistics: High-Dimensional and Large-Sample Approximations. Wiley, Hoboken, New Jersey.

[12] Geisser, S. (2003). The analysis of profile data–revisited. Statistics in Medicine, 22,3337-3346.

[13] Greenhouse, S. W., Geisser, S. (1959). On the methods in the analysis of profiledata. Psychometrika 24, 95-112.

[14] Harrar, S. W. and Kong, X. (2016). High-dimensional multivariate repeated measuresanalysis with unequal covariance matrices. Journal of Multivariate Analysis 145, 1-21.

[15] Kariya, T. and Sinha, B. K. (1989). Robustness of Statistical Tests. Academic Press,Boston.

[16] Kelker, D. (1970). Distribution theory of spherical distributions and a location-scaleparameter generalization. Sankhya Ser. A 32, 419-430.

[17] Kollo, T. and von Rosen, D. (2005). Advanced Multivariate Statistics with Matrices.Springer, Dordrecht.

[18] Kollo, T., von Rosen, T. and von Rosen, D. (2011). Estimation in high-himensionalanalysis and multivariate linear models. Communications in Statistics - Theory andMethods, 40, 1241-1253.

[19] Kropf, S. (2000). Hochdimensionale Multivariate Verfahren In Der MedizinischenStatistik. Shaker Verlag, Aachen.

42

[20] Lauter, J. (1996). Exact t and F tests for analyzing studies with multiple endpoints.Biometrics, 52, 964-970.

[21] Lauter, J. (2016). Multivariate Statistik - drei Manuskripte. Shaker Verlag, Aachen.

[22] Lauter, J., Glimm, E. and Kropf, S. (1996). New multivariate tests for data with aninherent structure. Biometrical Journal, 38, 5-23.

[23] Lauter, J., Glimm, E., and Kropf, S. (1998). Multivariate tests based on left-spherically distributed linear scores. The Annals of Statistics 26, 1972-1988.

[24] Ledoit, O. and Wolf, M. (2002). Some hypothesis tests for the covariance matrixwhen the dimension is large compared to the sample size. The Annals of Statistics30, 1081-1102.

[25] Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. AcademicPress, New York.

[26] Morrison, D. F. (2004). Multivariate Statistical Methods, 4th edition. Duxbury Press,CA.

[27] Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York.

[28] O’Brien, P. C. (1984). Procedures for comparing samples with multiple endpoints.Biometrics, 40, 1079-1087.

[29] Ohlson, M. and Srivastava, M. S. (2010). Profile analysis for a growth curve model.Journal of the Japan Statistical Society, 40, 1-21.

[30] Onozawa, M., Nishiyama, T. and Seo, T. (2016). On test statistics in profile analysiswith high-dimensional data. Communications in Statistics - Simulation and Compu-tation, 45, 3716-3743.

[31] Potthoff, R. F. and Roy, S. N. (1964). A generalized multivariate analysis of variancemodel useful especially for growth curve problems. Biometrika 51, 313-326.

[32] Rao, C. R. (1973). Linear Statistical Inference and Its Applications, 2nd edition.Wiley, New York.

[33] Rencher, A. C. (2002). Methods of Multivariate Analysis, 2nd edition. Wiley, NewYork.

[34] Seo, T., Sakurai, T. and Fujikoshi, Y. (2011). LR tests for two hypotheses in profileanalysis of growth curve data. SUT Journal of Mathematics, 47, 105-118.

[35] Shutoh, N. and Takahashi, S. (2016). Tests for parallelism and flatness hypothesesof two mean vectors in high-dimensional settings. Journal of Statistical Computationand Simulation, 86, 1150-1165.

[36] Srivastava, M. S. (1987). Profile analysis of several groups. Communications inStatistics - Theory and Methods, 16, 909-926.

[37] Srivastava, M. S. (2002). Methods of Multivariate Statistics. Wiley, New York.

43

[38] Srivastava, M. S. (2005). Some tests concerning the covariance matrix in high di-mensional data. Journal of the Japan Statistical Society, 35, 251–272.

[39] Srivastava, M. S. (2007). Multivariate theory for analyzing high dimensional data.Journal of the Japan Statistical Society, 37, 53-86.

[40] Srivastava, M. S. and Carter, E. M.(1983). An Introduction to Applied MultivariateStatistics. North Holland, New York.

[41] Srivastava, M. S. and Du, M. (2008). A test for the mean vector with fewer obser-vations than the dimension. Journal of Multivariate Analysis, 99, 386–402.

[42] Srivastava, M. S. and Fujikoshi, Y. (2006). Multivariate analysis of variancewith fewer observations than the dimension. Journal of Multivariate Analysis, 97,1927–1940.

[43] Srivastava, M. S. and Khatri, C. G. (1979). An Introduction to Multivariate Statistics.North Holland, New York.

[44] Srivastava, M. S. and Singull, M. (2012). Profile analysis with random-effects covari-ance structure. Journal of the Japan Statistical Society, 42, 145-164.

[45] Srivastava, M. S. and Singull, M. (2017). Test for the mean matrix in a growth curvemodel for high dimensions. Communications in Statistics - Theory and Methods, 46,6668-6683.

[46] von Rosen, D. (2018). Bilinear Regression Analysis: An Introduction. Springer In-ternational Publishing, New York.

[47] Yokoyama, T. (1995). LR test for random-effects covariance structure in a parallelprofile model. Annals of the Institute of Statistical Mathematics, 47, 309-320.

[48] Yokoyama, T. and Fujikoshi, Y. (1993). A parallel profile model with random-effectscovariance structure. Journal of the Japan Statistical Society, 23, 83-89.

44

high-dimensional profile analysisliu.diva-portal.org/smash/get/diva2:1461548/fulltext01.pdf · link...

Documents