jacobians of matrix transformations

CHAPTER 7

JACOBIANS OF MATRIX TRANSFORMATIONS

[This Chapter is based on the lectures of Professor A.M. Mathai of McGill University,Canada (Director of the SERC Schools).]

7.0. Introduction

Real scalar functions of matrix argument, when the matrices are real, will bedealt with. It is difficult to develop a theory of functions of matrix argument forgeneral matrices. Let X = (xi j), i = 1 · · · ,m and j = 1, · · · , n be an m × n matrixwhere the xi j’s are real elements. It is assumed that the readers have the basicknowledge of matrices and determinants. The following standard notations will beused here. A prime denotes the transpose, X′ = transpose of X, |(.)| denotes thedeterminant of the square matrix, m × m matrix (·). The same notation will be usedfor the absolute value also. tr(X) denotes the trace of a square matrix X, tr(X) =sum of the eigenvalues of X = sum of the leading diagonal elements in X. A realsymmetric positive definite X (definiteness is defined only for symmetric matriceswhen real) will be denoted by X = X′ > 0. Then 0 < X = X′ < I ⇒ X = X′ > 0 andI − X > 0. Further, dX will denote the wedge product or skew symmetric productof the differentials dxi j’s.

That is, when X = (xi j), an m × n matrix

dX = dx11 ∧ · · · ∧ dx1n ∧ dx21 ∧ · · · dx2n ∧ · · · ∧ dxmn. (7.0.1)

If X = X′, that is symmetric, and p × p, then

dX = dx11 ∧ dx21 ∧ dx22 ∧ dx31 ∧ · · · ∧ dxpp (7.0.2)

a wedge product of 1 + 2 + · · · + p = p(p + 1)/2 differentials.A wedge product or skew symmetric product is defined in Chapter 1.

225

226 7. JACOBIANS OF MATRIX TRANSFORMATIONS

7.1. Jacobians of Linear Matrix Transformations

Some standard Jacobians, that we will need later, will be illustrated here. Formore on Jacobians see Mathai (1997). First we consider a very basic linear trans-formation involving a vector of real variables going to a vector of real variables.

Theorem 7.1.1. Let X and Y be p × 1 vectors of real scalar variables,functionally independent (no element in X is a function of the other elements inX and similarly no element in Y is a function of the other elements in Y), and letY = AX, |A| , 0, where A = (ai j) is a nonsingular p × p matrix of constants (Ais free of the elements in X and Y; each element in Y is a linear function of theelements in X, and vice versa). Then

Y = AX, |A| , 0⇒ dY = |A|dX. (7.1.1)

Proof 7.1.1.

Y =

y1...

yp

= AX =

a11 a12 · · · a1p

a21 a22 · · · a2p...

... · · · ...

ap1 ap2 · · · app

x1

x2...

xp

⇒

yi = ai1x1 + ... + aipxp, i = 1, ..., p.

∂yi

∂x j= ai j ⇒

(

∂yi

∂x j

)

= (ai j) = A⇒ J = |A|.

Hence,dY = |A|dX.

That is, Y and X, p × 1, A is p × p, |A| , 0, A is a constant matrix, then

Y = AX, |A| , 0⇒ dY = |A|dX.

Example 7.1.1. Consider the transformation Y = AX where Y ′ = (y1, y2, y3) andX′ = (x1, x2, x3) and let the transformation be

y1 = x1 + x2 + x3

y2 = 3x2 + x3

y3 = 5x3.

7.1. JACOBIANS OF LINEAR MATRIX TRANSFORMATIONS 227

Then write dY in terms of dX.

Solution 7.1.1. From the above equations, by taking the differentials we have

dy1 = dx1 + dx2 + dx3

dy2 = 3dx2 + dx3

dy3 = 5dx3.

Then taking the product of the differentials we have

dy1 ∧ dy2 ∧ dy3 = [dx1 + dx2 + dx3]

× ∧[3dx2 + dx3] ∧ [5dx3].

Taking the product directly and then using the fact that dx2∧dx2 = 0 and dx3 ∧ dx3 = 0we have

dY = dy1 ∧ dy2 ∧ dy3

= 15dx1 ∧ dx2 ∧ dx3 = 15dX

= |A|dX

where

A =

1 1 10 3 10 0 5

.

This verifies Theorem 7.1.1 also. This theorem is the standard result that is seen inelementary textbooks. Now we will investigate more elaborate linear transforma-tions.

Theorem 7.1.2. Let X and Y be m × n matrices of functionally independentreal variables and let A,m × m be a nonsingular constant matrix. Then

Y = AX ⇒ dY = |A|ndX. (7.1.2)

Proof 7.1.2. Let Y = AX = (AX(1), AX(2), ..., AX(n)) where X(1), ..., X(n) are thecolumns of X. Then the Jacobian matrix for X going to Y is of the form


A O ... OO A ... O...... ...

...

O O ... A

⇒

∣

∣

∣

∣

∣

∣

∣

∣

∣

A O ... O...... ...

...

O O ... A

∣

∣

∣

∣

∣

∣

∣

∣

∣

= |A|n = J (7.1.3)

where O denotes a null matrix and J is the Jacobian for the transformation of Xgoing to Y or dY = |A|ndX.

In the above linear transformation the matrix X was pre-multiplied by a nonsin-gular constant matrix A. Now let us consider the transformation of the form Y = XBwhere X is post-multiplied by a nonsingular constant matrix B.

Theorem 7.1.3. Let X be a m × n matrix of functionally independent realvariables and let B be an n × n nonsingular matrix of constants. Then

Y = XB, |B| , 0⇒ dY = |B|mdX, (7.1.4)

Proof 7.1.3.

Y = XB =

X(1)B...

X(m)B

where X(1), ..., X(m) are the rows of X. The Jacobian matrix is of the form,

B 0 · · · 0...... · · · ...

0 0 · · · B

⇒∣

∣

∣

∣

∣

∣

B 0 · · · 0...... · · · B

∣

∣

∣

∣

∣

∣

= |B|m ⇒ dY = |B|mdX.

Then combining the above two theorems we have the Jacobian for the most generallinear transformation.

Theorem 7.1.4. Let X and Y be m × n matrices of functionally independentreal variables. Let A be m × m and B be n × n nonsingular matrices of constants.Then

Y = AXB, |A| , 0, |B| , 0 , Y,m × n, X,m × n,⇒ dY = |A|n|B|mdX. (7.1.5)


Proof 7.1.4. For proving this result first consider the transformation Z = AX andthen the transformation Y = ZB, and make use of Theorems 7.1.2 and 7.1.3.

In Theorems 7.1.2 to 7.1.4 the matrix X was rectangular. Now we will examinea situation where the matrix X is square and symmetric. If X is p× p and symmetricthen there are only 1 + 2 + · · · + p = p(p + 1)/2 functionally independent elementsin X because, here xi j = x ji for all i and j. Let Y = Y ′ = AXA′, X = X′, |A| , 0.Then we can obtain the following result:

Theorem 7.1.5. Let X = X′ be a p × p real symmetric matrix of p(p + 1)/2functionally independent real elements and let A be a p × p nonsingular constantmatrix. Then

Y = AXA′, X = X′, |A| , 0,⇒ dY = |A|p+1dX. (7.1.6)

Proof 7.1.5. This result can be proved by using the fact that a nonsingular matrixsuch as A can be written as a product of elementary matrices in the form

A = E1E2 · · ·Ek

where E1, · · · , Ek are elementary matrices. Then

Y = AXA′ ⇒ E1E2 · · ·EkXE′k · · ·E′1.

where E′j is the transpose of E j. Let Yk = EkXE′k, Yk−1 = Ek−1YkE′k−1, and so on, andfinally Y = Y1 = E1Y2E′1. Evaluate the Jacobians in these transformations to obtainthe result, observing the following facts. If, for example, the elementary matrix Ek

is formed by multiplying the i-th row of an identity matrix by the nonzero scalarc then taking the wedgeproduct of differentials we have dYk = cp+1dX. Similarly,for example, if the elementary matrix Ek−1 is formed by adding the i-th row of anidentity matrix to its j-th row then the determinant remains the same as 1 and hencedYk−1 = dYk. Since these are the only two types of basic elementary matrices,systematic evaluation of successive Jacobians gives the final result as |A|p+1.

Note 7.1.1. From the above theorems the following properties are evident: If X isa p× q matrix of functionally independent real variables and if c is a scalar quantityand B is a p × q constant matrix then


Y = c X ⇒ dY = cpqdX (7.1.7)

Y = c X + B⇒ dY = cpqdX. (7.1.8)

Note 7.1.2. If X is a p × p symmetric matrix of functionally independent realvariables, a is a scalar quantity and B is a p × p symmetric constant matrix then

Y = a X + B, X = X′, B = B′ ⇒ dY = ap(p+1)/2dX. (7.1.9)

Note 7.1.3. For any p × p lower triangular (or upper triangular) matrix of p(p +1)/2 functionally independent real variables, Y = X + X′ is a symmetric matrix,where X′ denoting the transpose of X = (xi j), then observing that the diagonalelements in Y = (yi j) are multiplied by 2, that is, yii = 2xii, i = 1, ..., p, we have

Y = X + X′ ⇒ dY = 2pdX. (7.1.10)

Example 7.1.2. Let X be a p × q matrix of pq functionally independent randomvariables having a matrix-variate Gaussian distribution with the density given by

f (X) = c exp {−tr[A(Z − M)B(X − M)′]}where, A is a p × p positive definite constant matrix, B is a q × q positive definite constantmatrix, M is p × q constant matrix, tr(·) denotes the trace of the matrix (·) and c is thenormalizing constant, then evaluate c.

Solution 7.1.2. Since A and B are positive definite matrices we can write A =A1A′1 and B = B1B′1 where A1 and B1 are nonsingular matrices, that is, |A1| ,0, |B1| , 0. Also we know that for any two matrices P and Q, tr(PQ) = tr(QP) aslong as PQ and QP are defined, PQ need not be equal to QP. Then

tr[A(X − M)B(X − M)′] = tr[A1A′1(X − M)B1B′1(X − M)′]

= tr[A′1(X − M)B1B′1(X − m)′A1]

= tr(YY ′)

where

Y = A′1(X − M)B1,

But from Theorem 7.1.4


dY = |A′1|q|B1|pd(X − M) = |A1|q|B1|pd(X − M)

since |A1|′ = |A1|

= |A|q/2|B|p/2dX

since |A| = |A1|2, |B| = |B1|2, d(X − M) = d(X), M being a constant matrix. If f (X)is a density then the total integral is unity, that is,

1 =∫

Xf (X)dX

= c∫

Xexp {−tr[A(X − M)B(X − M)′]}dX

= c∫

Yexp {−tr[YY ′]}dY

where, for example,∫

Xdenotes the integral over all elements in X. Note that for any

real matrix P, trace of PP′ is the sum of squares of all the elements in P. Hence

∫

Yexp {−tr[YY ′]}dY =

∫

Yexp {−

∑

i, j

y2i j}dY

=

∏

i, j

∫ ∞

−∞e−y

2i jdyi j.

But∫ ∞

−∞e−u2

du =√π

and therefore

1 = c |A|q/2 |B|p/2√π

pq ⇒c = (|A|q/2 |B|p/2πpq/2)−1.

Note 7.1.4. What happens in the transformation Y = X + X′ where both X andY are p × p matrices of functionally independent real elements. When X = X ′,then Y = 2X and this case is already covered before. If X , X′ then Y has become


symmetric with p(p + 1)/2 variables whereas in X there are p2 variables and hencethis is not a one-to-one transformation.

Example 7.1.3. Consider the transformaton

y11 = x11 + x21, y12 = x11 + x21 + 2x12 + 2x22,

y13 = x11 + x21 + 2x13 + x23, y21 = x11 + 3x21,

y22 = x11 + 3x21 + 2x12 + 6x22, y23 = x11 + 3x21 + 2x13 + 6x23.

Write this transformatin in the form Y = AXB and then evaluate the Jacobian in this trans-formation.

Solution 7.1.3. Writing the transformation in the form Y = AXB we have

Y =

[

y11 y12 y13

y21 y22 y23

]

, X =

[

x11 x12 x13

x21 x22 x23

]

,

A =

[

1 11 3

]

, B =

1 1 10 2 00 0 2

.

Hence the Jacobian is

J = |A|3|B|2 = (23)(42) = 128.

This can also be verified by taking the differentials in the starting explicit forms andthen taking the wedge products. This verification is left to the reader.

Exercises 7.1.7.1.1. If X and A are p × p lower triangular matrices where A = (ai j) is a con-stant matrix with a j j > 0, j = 1, ..., p, X = (xi j) and xi j’s, i ≥ j are functionallyindependent real variables then show that

Y = XA ⇒ dY =

p∏

j=1

ap− j+1j j

dX,

Y = AX ⇒ dY =

p∏

j=1

a jj j

dX,

and

Y = aX ⇒ dY = ap(p+1)/2dX (7.1.11)


where a is a scalar quantity.

7.1.2. Let X and B be uppper triangular p×p matrices where B = (bi j) is a constantmatrix with b j j > 0, j = 1, ..., p, X = (xi j) where the xi j’s, i ≤ j be functionallyindependent real variables and b be a scalar quantity, then show that

Y = XB⇒ dY =

p∏

j=1

b jj j

dX,

Y = BX ⇒ dY =

p∏

j=1

bp+1− jj j

dX,

and

Y = bX ⇒ dY = bp(p+1)/2dX. (7.1.12)

7.1.3. Let X, A, B be p × p lower triangular matrices where A = (ai j) and B = (bi j)be constant matrices with a j j > 0, b j j > 0, j = 1, ..., p and X = (xi j) with xi j’s, i ≥ jbe functionally independent real variables. Then show that

Y = AXB⇒ dY =

p∏

j=1

a jj jb

p+1− jj j

dX,

and

Z = A′X′B′ ⇒ dZ =

p∏

j=1

b jj ja

p+1− jj j

dX. (7.1.13)

7.1.4. Let X = −X′ be a p× p skew symmetric matrix of functionally independentp(p − 1)/2 real variables and let A, |A| , 0, be a p × p constant matrix. Then provethat

Y = AXA′, X′ = −X, |A| , 0⇒ dY = |A|p−1dX. (7.1.14)

7.1.5. Let X be a lower triangular p × p matrix of functionally independent realvariables and A = (ai j) be a lower trinagular matrix of constants with a j j > 0, j =


1, ..., p. Then show that

Y = XA + A′X′ ⇒ dY = 2p

p∏

j=1

ap+1− jj j

dX, (7.1.15)

and

Y = AX + X′A′ ⇒ dY = 2p

p∏

j=1

a jj j

dX. (7.1.16)

7.1.6. Let X and A be as defined in Exercise 7.1.5. Then show that

Y = A′X + X′A⇒ dY = 2p

p∏

j=1

a jj j

dX, (7.1.17)

and

Y = AX′ + XA′ ⇒ dY = 2p

p∏

j=1

ap+1− jj j

dX. (7.1.18)

7.1.7. Consider the transformation Y = AX where

Y =

[

y11 y12

0 y22

]

, X =

[

x11 x12

0 x22

]

, A =

[

2 10 3

]

.

Writing AX explicitly and then taking the differentials and wedge products showthat dY = 12dX and verifty that J =

∏pj=1 ap+1− j

j j = (22)(31) = 12.

7.1.8. Let Y and X be as in Exercise 7.1.7 and consider the transformation Y = XBwhere, B = A in Exercise 7.1.7. Then writing XB explicitly, taking differentialsand then the wedge products show that dY = 18dX and verify the result that J =∏p

j=1 b jj j = (21)(32) = 18.

7.1.9. Let Y, X, A be as in Exercise 7.1.7. Consider the transformation Y = AX +X′A′. Evaluate the Jacobian from first principles of taking differentials and wedgeproducts and then verify the result that the Jacobian is 2p

= 22= 4 times the

Jacobian in Exercise 7.1.7.

7.2. JACOBIANS IN SOME NONLINEAR TRANSFORMATIONS 235

7.1.10. Let Y, X, B be as in Exercise 7.1.8. Consider the transformation Y = XB +B′X′. Evaluate the Jacobian from first principles and then verify that the Jacobianis 2p

= 22= 4 times the Jacobian in Exercise 7.1.8.

7.2. Jacobians in Some Nonlinear Transformations

Some basic nonlinear transformations will be considered in this section andsome more results will be given in the exercises at the end of this section. Themost popular nonlinear transformation is when a positive definite (naturally realsymmetric also) matrix is decomposed into a triangular matrix and its transpose.This will be discussed first.

Example 7.2.1. Let X be p × p, symmetric positive definite and let T = (ti j) be alower triangular matrix. Consider the transformation Z = TT ′. Obtain the conditions forthis transformation to be one-to-one and then evaluate the Jacobian.

Solution 7.2.1.

X = (xi j) =

x11 x12 ... x1p...

... ......

xp1 xp2 ... xpp

with xi j = x ji for all i and j, X = X′ > 0. When X is positive definite, that is, X > 0then x j j > 0, j = 1, ..., p also.

TT ′ =

t11 0 ... 0t21 t22 ... 0...... ...

...

tp1 tp2 ... tpp

t11 t21 ... tp1

0 t22 ... tp2...... ...

...

0 0 · · · tpp

= X ⇒

x11 = t211 ⇒ t11 = ±

√x11. This can be made unique if we impose the condition

t11 > 0. Note that x12 = t11t21 and this means that t21 is unique if t11 > 0. Continuinglike this, we see that for the transformation to be unique it is sufficient that t j j >

0, j = 1, ..., p. Now, observe that,

x11 = t211, x22 = t2

21 + t222, ..., xpp = t2

p1 + ... + t2pp

and x12 = t11t21, ..., x1p = t11tp1, and so on.


∂x11

∂t11= 2t11,

∂x11

∂t21= 0, ...,

∂x11

∂tp1= 0,

∂x12

∂t21= t11, ...,

∂x1p

∂tp1= t11,

∂x22

∂t22= 2t22,

∂x22

∂t31= 0, · · · , ∂x22

∂tp1= 0,

and so on. Taking the xi j’s in the order x11, x12, · · · , x1p, x22, · · · , x2p, · · · , xpp andthe ti j’s in the order t11, t21, t22, · · · , tpp we have the Jacobian matrix a triangularmatrix with the diagonal elements as follows: t11 is repeated p times, t22 is repeatedp − 1 times and so on, and finally tpp appearing once. The number 2 is appearinga total of p times. Hence the determinant is the product of the diagonal elements,giving,

2ptp11tp−1

22 · · · tpp.

Therefore, for X = X′ > 0, T = (ti j), ti j = 0, i < j, t j j > 0, j = 1, · · · p we have

Theorem 7.2.1. Let X = X′ > 0 be a p × p real symmetric positive definitematrix and let X = TT ′ where T is lower triangular with positive diagonal elements,t j j > 0, j = 1, ..., p. Then

X = TT ′ ⇒ dX = 2p

p∏

j=1

tp+1− jj j

dT. (7.2.1)

Example 7.2.2. If X is p × p real symmetric positive definite then evaluate the fol-lowing integral, we will call it matrix-variate real gamma , denoted by Γ p(α):

Γp(α) =∫

X|X|α−

p+12 e−tr(X)dX (7.2.2)

and show that

Γp(α) = πp(p−1)

4 Γ(α)Γ

(

α − 12

)

· · ·Γ(

α − p − 12

)

(7.2.3)


for<(α) > p−12 .

Solution 7.2.2. Make the transformation X = TT ′ where T is lower triangularwith positive diagonal elements. Then

|TT ′| =p

∏

j=1

t2j j, dX = 2p

p∏

j=1

tp+1− jj j

dT

and

tr(X) = t211 + (t2

21 + t222) + ... + (t2

p1 + ... + t2pp).

Then substituting these, the integral over X reduces to the following:∫

X|X|α−

p+12 e−tr(X)dX =

∫

T

p∏

j=1

∫ ∞

02tα− j

2j j e−t2j j dt j j

∏

i> j

∫ ∞

−∞e−t2i j dti j.

Observe that

2∫ ∞

0tα− j

2j j e−t2j jdt j j = Γ

(

α − j − 12

)

, <(α) >j − 1

2,

∫ ∞

−∞e−t2i jdti j =

√π

and there are p(p − 1)/2 factors in∏

i> j and hence

Γp(α) = πp(p−1)

2 Γ(α)Γ

(

α − 12

)

· · ·Γ(

α − p − 12

)

and the condition <(α − j−12 ), j = 1, · · · , p ⇒ <(α) > p−1

2 . This establishes theresult.

Notation 7.2.1.Γp(α) : Real matrix-variate gamma

Definition 7.2.1. Real matrix-variate gamma Γp(α): It is defined by equa-tions (7.2.2) and (7.2.3) where (7.2.2) gives the integral representation and (7.2.3)gives the explicit form.


Remark 7.2.1. If we try to evaluate the integral∫

X|X|α− p+1

2 e−tr(X)dX from firstprinciples, as a multiple integral, notice that even for p = 3 the integral is practicallyimpossible to evaluate. For p = 2 one can evaluate after going through severalstages.

The transformation in (7.2.1) is a nonlinear transformation whereas in (7.1.4) itis a general linear transformation involving mn functionally independent real xi j’s.When X is a square and nonsingular matrix its regular inverse X−1 exists and thetransformation Y = X−1 is a one-to-one nonlinear transformation. What will be theJacobian in this case?

Theorem 7.2.2. For Y = X−1 where X is a p × p nonsingular matrix we have

Y = X−1, |X| , 0⇒ dY = |X|−2p for a general X

= |X|−(p+1) for X = X′. (7.2.4)

Proof 7.2.1. This can be proved by observing the following: When X is nonsin-gular, XX−1

= I where I denotes the identity matrix. Taking differentials on bothsides we have

(dX)X−1+ X(dX−1) = O⇒

(dX−1) = −X−1(dX)X−1 (7.2.5)

where (dX) means the matrix of differentials. Now we can apply Theorem 7.1.4treating X−1 as a constant matrix because it is free of the differentials since we aretaking only the wedge product of differentials on the left side.

Note 7.2.1. If the square matrix X is nonsingular and skew symmetric thenproceeding as above it follows that

Y = X−1, |X| , 0, X′ = −X ⇒ dY = |X|−(p−1)dX. (7.2.6)

Note 7.2.2. If X is nonsingular and lower or upper triangular then, proceeding asbefore we have

Y = X−1 ⇒ dY = |X|−(p+1) (7.2.7)

where |X| , 0, X is lower or upper triangular.

Theorem 7.2.3. Let X = (xi j) be p × p symmetric positive definite matrix offunctionally independent real variables with x j j = 1, j = 1, ..., p. Let T = (ti j) be a


lower triangular matrix of functionally independent real variables with t j j > 0, j =1, ..., p. Then

X = TT ′, withi

∑

j=1

t2i j = 1, i = 1, ..., p⇒

dX =

p∏

j=2

tp− jj j

dT, (7.2.8)

and

X = T ′T, withp

∑

i= j

t2i j = 1, j = 1, ..., p⇒

dX =

p−1∏

j=1

t j−1j j

dT. (7.2.9)

Proof 7.2.2. Since X is symmetric with x j j = 1, j = 1, ..., p there are onlyp(p − 1)/2 variables in X. When X = TT ′ take the xi j’s in the order x21, ..., xp1, x32, ...,

xp2, ..., xpp−1 and the ti j’s also in the same order and form the matrix of partial deriva-tives. We obtain a triangular format and the product of the diagonal elements givesthe required Jacobian.

Example 7.2.3. Let R = (ri j) be a p × p real symmetric positive definite matrix suchthat r j j = 1, j = 1, ..., p, −1 < ri j = r ji < 1, i , j. (This is known as the correlation matrixin statistical theory). Then show that

f (R) =[Γ(α)]p

Γp(α)|R|α−

p+12

is a density function for<(α) > p−12 .

Solution 7.2.3. Since R is positive definite f (R) ≥ 0 for all R. Let us check thetotal integral. Let T be a lower triangular matrix as defined in the above theoremand let R = TT ′. Then

∫

R|R|α−

p+12 dR =

∫

T

p∏

j=2

(t2j j)α− j+1

2

dT.


Observe that

t2j j = 1 − t2

j1 − ... − t2j j−1

where −1 < ti j < 1, i > j. Then let

B =∫

R|R|α− P+1

2 dR =∏

i> j

∆ j

where

∆ j =

∫

w j

(1 − t2j1 − ... − t2

j j−1)α−j+12 dt j1 · · · dt j j−1

where w j = (t jk), −1 < t jk < 1, k = 1, ..., j − 1,∑ j−1

k=1 t2jk < 1.

Evaluating the integral with the help of Dirichlet integral of Chapter 1 and thentaking the product we have the final result showing that f (R) is a density.

Exercises 7.2.

7.2.1 Let X = X′ > 0 be p × p. Let T = (ti j) be an upper triangular matrix withpositive diagonal elements. Then show that

X = TT ′ ⇒ dX = 2p

p∏

j=1

t jj j

dT. (7.2.10)

7.2.2. Let x1, · · · , xp be real scalar variables. Let y1 = x1 + · · · + xp, y2 = x1x2 +

x1x3 + · · · xp−1xp (sum of products taken two at a time), · · · , yk = x1 · · · xk. Then forx j > 0, j = 1, · · · , k show that

dy1 ∧ · · · ∧ dyk =

p−1∏

i=1

p∏

j=i+1

|xi − x j|

dx1 ∧ · · · ∧ dxp. (7.2.11)


7.2.3. Let x1, · · · , xp be real scalar variables. Let

x1 = r sin θ1x j = r cos θ1 cos θ2 · · · cos θ j−1 sin θ j, j = 2, 3, · · · , p − 1

xp = r cos θ1 cos θ2 · · · cos θp−1

for r > 0,−π2 < θ j ≤ π2 , j = 1, · · · , p − 2,−π < θp−1 ≤ π. Then show that

dx1 ∧ · · · ∧ dxp = rp−1

p−1∏

j=1

| cos θ j|p− j−1

dr ∧ dθ1 ∧ · · · ∧ dθp−1. (7.2.12)

7.2.4. Let X = T|T | where X and T are p × p lower triangular or upper triangular

matrices of functionally independent real variables with positive diagonal elements.Then show that

dX = (p − 1)|T |−p(p+1)/2dT. (7.2.13)

7.2.5. For real symmetric positive definite matrices X and Y show that

limt→∞

∣

∣

∣

∣

∣

I +XYt

∣

∣

∣

∣

∣

−t

= e−tr(XY)= lim

t→∞

∣

∣

∣

∣

∣

I − XYt

∣

∣

∣

∣

∣

t

. (7.2.14)

7.2.6. Let X = (xi j),W = (wi j) be lower triangular p × p matrices of distinct realvariables with x j j > 0, w j j > 0, j = 1, · · · , p,

∑ jk=1 w

2jk = 1, j = 1, · · · , p. Let

D = diag(λ1, · · · , λp), λ j > 0, j = 1, · · · , p, real and distinct where diag(λ1, · · · , λp)denotes a diagonal matrix with diagonal elements λ1, · · · , λp. Show that

X = DW ⇒ dX =

p∏

j=1

λj−1j w

−1j j

dD ∧ dW. (7.2.15)

7.2.7. Let X, A, B be p× p nonsingular matrices where A and B are constant matri-ces and X is a matrix of functionally independent real variables. Then, ignoring thesign, show that

Y = AX−1B⇒ dY = |AB|p|X|−2pdX for a general X, (7.2.16)

= |AX−1 |p+1dX for X = X′, B = A′, (7.2.17)

= |AX−1 |p−1 for X′ = −X, B = A′. (7.2.18)


7.2.8. Let X and A be p×p matrices where A is a nonsingular constant matrix and Xis a matrix of functionally independent real variables such that A+X is nonsingular.Then, ignoring the sign, show that

Y = (A + X)−1(A − X) or (A − X)(A + X)−1 ⇒dY = 2p2 |A|p|A + X|−2pdX for a general X, (7.2.19)

= 2p(p+1)

2 |I + X|−(p+1)dX for A = I, X = X′. (7.2.20)

7.2.9. Let X and A be real p × p lower triangular matrices where A is a constantmatrix and X is a matrix of functionally independent real variables such that A andA + X are nonsingular. Then, ignoring the sign, show that

Y = (A + X)−1(A − X)⇒

dY = 2p(p+1)

2 |A + X|−(p+1)+

p∏

j=1

|a j j|p+1− j

dX, (7.2.21)

and

Y = (A − X)(A + X)−1 ⇒

dY = 2p(p+1)

2 |A + X|−(p+1)+

p∏

j=1

|a j j| j

dX (7.2.22)

where | · |+ denotes that the absolute value is taken.

7.2.10. State and prove the corresponding results in Exercise 7.2.9 for upper tri-nagular matrices.

7.3. Transformations Involving Orthonormal Matrices

Here we will consider a few matrix transformations involvig orthonormal andsemiorthonormal matrices. Some basic results that we need later on are discussedhere. For more on these types and various other types of transformations see Mathai(1997). Since the proofs in many of the results are too involved and beyond thescope of this School we will not go into the details of the proofs.

Theorem 7.3.1. Let X be a p × n, n ≥ p, matrix of rank p and of functionallyindependent real variables, and let X = TU ′1 where T is p× p lower triangular with

7.3. TRANSFORMATIONS INVOLVING ORTHONORMAL MATRICES 243

distinct nonzero diagonal elements and U ′1 a unique n× p semiorthonormal matrix,U′1U1 = Ip, all are of functionally independent real variables. Then

X = TU′1 ⇒ dX =

p∏

j=1

|t j j|n− j

dT ∧ U′1(dU1) (7.3.1)

where

∫

∧U′1(dU1) =2pπ

pn2

Γp

(

n2

) . (7.3.2)

(see equation (7.2.3) for Γp(·) ).

Proof 7.3.1. For proving the main part of the theorem take the differentialson both sides of X = TU ′1 and then take the wedge product of the differentialssystematically. Since it involves many steps the proof of the main part is not givenhere. The second part can be proved without much difficulty. Consider the X thep × n, n ≥ p real matrix. Observe that

tr(XX′) =∑

i j

x2i j,

that is, the sum of squares of all elements in X = (xi j) and there are np terms in∑

i j x2i j. Now consider the integral

∫

Xe−tr(X)dX =

∏

i j

∫ ∞

−∞e−x2

i j dxi j

= πnp/2

since each integral over xi j gives√π. Now let us evaluate the same integral by using

Theorem 7.3.1. Consider the same transformation as in Theorem 7.3.1, X = TU ′1.Then

πnp/2=

∫

Xe−tr(X)dX

=

∫

T

p∏

j=1

|t j j|n− j

e−(∑

i≥ j t2i j)dT

×∫

∧U′1(dU1).


But for 0 < t j j < ∞,−∞ < ti j < ∞, i > j and U1 unrestricted semiorthonormal, wehave

∫

T

p∏

j=1

|t j j|n− j

e−(∑

i≥ j t2i j)dT = 2−pΓp

(n2

)

(7.3.3)

observing that for j = 1, ..., p the p integrals

∫ ∞

0|t j j|n− je−t2j j dt j j = 2−1

Γ

(

n2− j − 1

2

)

, n > j − 1, (7.3.4)

and each of the p(p − 1)/2 integrals

∫ ∞

−∞e−t2i j dti j =

√π, i > j. (7.3.5)

Now, substituting these the result in (7.3.2) is established.

Remark 7.3.1. For the transformation X = TU ′1 to be unique, either one can takeT with the diagonal elements t j j > 0, j = 1, ..., p and U1 unrestricted semiorthonor-mal matrix or −∞ < t j j < ∞ and U1 a unique semiorthonormal matrix.

From the outline of the proof of Theorem 7.3.1 we have the follwing result:∫

Vp,n

∧U′1(dU1) =2pπpn/2

Γp( n2 )

(7.3.6)

where Vp,n is the Stiefel manifold, or the set of semiorthonormal matrices of the typeU1, n × p, such that U ′1U1 = Ip where Ip is an identity matrix of order p. For n = pthe Stiefel manifold becomes the full orthogonal group, denoted by Op. Then wehave for, n = p,

∫

Op

∧U′1(dU1) =2pπp2

Γp( n2 ). (7.3.7)

Following through the same steps as in Theorem 7.3.1 we can have the followingtheorem involving an upper triangular matrix T1.

Theorem 7.3.2. Let X1 be an n× p, n ≥ p, matrix of rank p and of functionallyindependent real variables and let X1 = U1T1 where T1 is a real p × p upper


triangular matrix with distinct nonzero diagonal elements and U1 is a unique realn × p semiorthonormal matrix, that is, U ′1U1 = Ip. Then, ignoring the sign,

X1 = U1T1 ⇒ dX1 =

p∏

j=1

|t j j|n− j

dT1 ∧ U′1(dU1) (7.3.8)

As a corollary to Theorem 7.3.1 or independently we can prove the followingresult:

Corollary 7.3.1. Let X1, T,U1 be as defined in Theorem 7.3.1 with the diagonal el-ements of T positive, that is, t j j > 0, j = 1, ..., p and U1 an arbitrary semiorthonor-mal matrix, and let A = XX′, which implies, A = TT ′ also. Then

A = XX′ ⇒ (7.3.9)

dA = 2p

p∏

j=1

tp+1− jj j

dT (7.3.10)

⇒

dT = 2−p

p∏

j=1

t−p−1− jj j

dA. (7.3.11)

In practical applications we would like to have dX in terms of dA or vice versaafter integrating out ∧U ′1(dU1) over the Stiefel manifold Vp,n. Hence we have thefollowing corollary

Corollary 7.3.2. Let X1, T,U1 be as defined in Theorem 7.3.1 with t j j > 0, j =1, ..., p and let S = XX′ = TT ′. Then, after integrating out ∧U ′1(dU1) we have


X = TU′1 and S = XX′ = TT ′ ⇒ (7.3.12)

dX =

p∏

j=1

(t2j j)

n2−

j2

dT ∧ U′1(dU1) (7.3.13)

∫

Vp,n

∧U′1(dU1) =2pπnp/2

Γp( n2 )

(7.3.14)

dS = 2p

p∏

j=1

(t2j j)

p+12 −

j2

dT (7.3.15)

|S | =p

∏

j=1

t2j j (7.3.16)

and, finally,

dX = |S | n2−p+1

2 dS . (7.3.17)

Example 7.3.1. Let X be a p × n, n ≥ p random matrix having an np-variate realGaussian density

f (X) =e−

12 tr(V−1XX′)

(2π)np2 |V | n2

, V = V ′ > 0.

Evaluate the density of S = XX′.

Solution 7.3.1. Consider the transformation as in Theorem 7.3.1, X = TU ′1where T is a p × p lower triangular matrix with positive diagonal elements and U1

is a arbitrary n × p semiorthonormal matrix, U ′1U1 = Ip. Then

dX =

p∏

j=1

tn− jj j

dT ∧ U1(dU1).

Integrating out ∧U1(dU1) we have the marginal density of T , denoted by g(T ). Thatis,

g(T )dT =2pe−

12 tr(V−1TT ′)

2np/2|V | n2

p∏

j=1

tn− jj j

dT.


Now substituting from Corollary 7.3.2, S and dS in terms of T and dT we have thedensity of S , denoted by, h(S ), given by

h(S ) = C1|S |n2−

p+12 e−

12 V−1S , S = S ′ > 0.

Since the total integral,∫

Sh(S )dS = 1, we have

C1 = [2np/2Γp

(n2

)

|V |n/2]−1.

Exercises 7.3.7.3.1. Let X = (xi j), W = (wi j) be p × p lower triangular matrices of distinctreal variables such that x j j > 0, w j j > 0 and

∑ jk=1 w

2jk = 1, j = 1, ..., p. Let

D = diag(λ1, ..., λp), λ j > 0, j = 1, ..., p be real positive and distinct. Let D12 =

diag(λ121 , ..., λ

12p ). Then show that

X = DW ⇒ dX =

p∏

j=1

λj−1j w

−1j j

dD ∧ dW. (7.3.18)

7.3.2. Let X,D,W be as defined in Exercise 7.3.1 then show that

X = D12 W ⇒ dX =

2−pp

∏

j=1

(λ12j ) j−2w−1

j j

dD ∧ dW. (7.3.19)


X = D12 WW ′D

12 ⇒ dX =

p∏

j=1

λp−1

2j w

p− jj j

dD ∧ dW, (7.3.20)

and

X = W ′DW ⇒ dX =

p∏

j=1

(λ jw j j)j−1

dD ∧ dW. (7.3.21)



X = WD ⇒ dX =

p∏

j=1

λp− jj w

−1j j

dD ∧ dW, (7.3.22)

X = WD12 ⇒ dX =

2−pp

∏

j=1

(λ12j )p− j−1w−1

j j

dD ∧ dW, (7.3.23)

X = WDW ′ ⇒ dX =

p∏

j=1

(λ jw j j)p− j

dD ∧ dW, (7.3.24)

and

X = D12 W ′WD

12 ⇒ dX =

p∏

j=1

λp−1

2j w

j−1j j

dD ∧ dW. (7.3.25)

7.3.5. Let X, T,U be p × p matrices of functionally independent real variableswhere all the principal minors of X are nonzero, T is lower triangular and U islower triangular with unit diagonal elements. Then, ignoring the sign, show that

X = TU′ ⇒ dX =

p∏

j=1

|t j j|p− j

dT ∧ dU, (7.3.26)

and

X = T ′U ⇒ dX =

p∏

j=1

|t j j| j−1

dT ∧ dU. (7.3.27)

7.3.6. Let X be a p × p symmetric matrix of functionally independent real vari-ables and with distinct and nonzero eigenvalues λ1 > λ2 > ... > λp and let D =diag(λ1, ..., λp), λ j , 0, j = 1, ..., p. Let U be a unique p × p orthonormal matrixU′U = I = UU′ such that X = UDU′. Then, ignoring the sign, show that

dX =

p−1∏

i=1

p∏

j=i+1

|λi − λ j|

dD ∧ U′(dU). (7.3.28)

7.4. SPECIAL FUNCTIONS OF MATRIX ARGUMENT 249

7.3.7. For a 3 × 3 matrix X such that X = X′ > 0 and I − X > 0 show that∫

XdX =

π2

90.

7.3.8. For a p × p matrix X of p2 functionally independent real variables withpositive eigenvalues, show that

∫

Y|Y ′Y |α−

p+12 e−tr(Y′Y)dY = 2−p

Γp(α − 12 )π

p2

2

Γp( p2 )

.

7.3.9. Let X be a p × p matrix of p2 functionally independent real variables. LetD = diag(µ1, ..., µp), µ1 > µ2 > ... > µp, µ j real for j = 1, ..., p. Let U and V beorthonormal matrices such that

X = UDV ′ ⇒

dX =

∏

i< j

|µ2i − µ2

j |

dD ∧ dG ∧ dH (7.3.29)

where (dG) = U′(dU) and (dH) = (dV ′)V , and the µ j’s are known as the singularvalues of X.

7.3.10. Let λ1 > ... > λp > 0 be real variables and D = diag(λ1, ..., λp). Show that

∫

De−tr(D2)

∏

i< j

|λ2i − λ2

j |

dD =[Γp( p

2 )]2

2pπp2

2

.

7.4. Special Functions of Matrix Argument

Real scalar functions of matrix argument, when the matrices are real, will bedealt with in this chapter. It is difficult to develop a theory of functions of matrixargument for general matrices. The notations that we have used in Section 7.1will be used in the present chapter also. A discussion of scalar functions of matrixargument when the elements of the matrices are in the complex domain may be seenfrom Mathai (1997).


7.4.1. Real matrix-variate scalar functions

When dealing with matrices it is often difficult to define uniquely fractionalpowers such as square roots even when the matrices are real square or even sym-metric. For example

A1 =

[

1 00 1

]

, A2 =

[

−1 00 1

]

, A3 =

[

1 00 −1

]

, A4 =

[

−1 00 −1

]

all give A21 = A2

2 = A23 = A2

4 = I2 where I2 is a 2 × 2 identity matrix. Thus, evenfor I2, which is a nice, square, symmetric, positive definite matrix there are manymatrices which qualify to be square roots of I2. But if we confine to the class ofpositive definite matrices, when real, then for the square root of I2 there is only onecandidate, namely, A1 = I2 itself. Hence the development of the theory of scalarfunctions of matrix argument is confined to positive definite matrices, when real,and hermitian positive definite matrices when in the complex domain.

7.4.2. Real matrix-variate gamma

In the real scalar case the integral representation for a gamma function is thefollowing:

Γ(α) =∫ ∞

0xα−1e−xdx, <(α) > 0. (7.4.1)

Let X be a p × p real symmetric positive definite matrix and consider the integral

Γp(α) =∫

X=X′>0|X|α−

p+12 e−tr(X)dX (7.4.2)

where, when p = 1 the equation (7.4.2) reduces to (7.4.1). We have already evalu-ated this integral in earlier as an exercise to the basic nonlinear matrix transformatinX = TT ′ where T is a lower triangular matrix with positive diagonal elements.Hence the derivation will not be repeated here. We will call Γp(α) the real matrix-variate gamma. Observe that for p = 1, Γp(α) reduces to Γ(α).

7.4.3. Real matrix-variate gamma density

With the help of (7.4.2) we can create the real matrix-variate gamma density asfollows, where X is a p × p real symmetric positive definite matrix:


f (X) =

1×1α−p+1

2

Γp(α) e−tr(X), X = X′> 0, <(α) > p−1

2

0, elsewhere .(7.4.3)

If another parameter matrix is to be introduced then we obtain a gamma densitywith parameters (α, B), B = B

′> 0, as follows:

f1(X) =

|B|αΓp(α) |X|α−

p+12 e−tr(BX), X = X′ > 0, B = B′ > 0,<(α) > p−1

2

0, elsewhere.(7.4.4)

Remark 7.4.1. In f1(X) if B is replaced by 12V−1,V = V ′ > 0 and α is replaced by

n2 , n = p, p+ 1, ... then we have the most important density in multivariate statisticalanalysis known as the nonsingular Wishart density.

As in the scalar case, two matrix random variables X and Y are said to be inde-pendently distributed if the joint density of X and Y is the product of their marginaldensities. We will examine the densities of some functions of independently dis-tributed matrix random variables. To this end we will introduce a few more func-tions.

Definition 7.4.1. A real matrix-variate beta function, denoted by Bp(α, β), isdefined as

Bp(α, β) =Γp(α)Γp(β)

Γp(α + β), <(α) >

p − 12,<(β) >

p − 12. (7.4.5)

The quantity in (7.4.5), analogous to the scalar case (p = 1), is the real matrix-variate beta function. Let us try to obtain an integral representation for the realmatrix-variate beta function of (7.4.5). Consider

Γp(α)Γp(β) =

[∫

X=X′>0|X|α−

p+12 e−tr(X)dX

]

×[∫

Y=Y′>0|Y |β−

p+12 e−tr(Y)dY

]

where both X and Y are p × p matrices.

=

∫ ∫

|X|α−p+1

2 |Y |β−p+1

2 e−tr(X+Y)dX ∧ dY.


Put U = X + Y for a fixed X. Then

Y = U − X ⇒ |Y | = |U − X| = |U ||I − U−12 XU−

12 |

where, for convenience, U12 is the symmetric positive definite square root of U.

Observe that when two matrices A and B are nonsingular where AB and BA aredefined, even if they do not commute,

|I − AB| = |I − BA|and if A = A

′> 0 and B = B

′> 0 then

|I − AB| = |I − A12 BA

12 | = |I − B

12 AB

12 |.

Now,

Γp(α)Γp(β) =∫

U

∫

X|U |β−

p+12 |X|α−

p+12 |I − U−

12 XU−

12 |β−

p+12 e−tr(U)dU ∧ dX.

Let Z = U−12 XU−

12 for fixed U. Then dX = |U | p+1

2 dZ by using Theorem 7.1.5.Now,

Γp(α)Γp(β) =∫

Z|Z|α−

p+12 |I − Z|β−

p+12 dZ

∫

U=U ′>0|U |α+β−

p+12 e−tr(U)dU.

Evaluation of the U-integral by using (7.4.2) yields Γp(α + β). Then we have

Bp(α, β) =Γp(α)Γp(β)

Γp(α + β)=

∫

Z|Z|α−

p+12 |I − Z|β−

p+12 dZ.

Since the integral has to remain non-negative we have Z = Z′> 0, I − Z > 0.

Therefore, one representation of a real matrix-variate beta function is the following,which is also called the type-1 beta integral.

Bp(α, β) =∫

0<Z=Z′<I|Z|α−

p+12 |I − Z|β−

p+12 dZ,<(α) >

p − 12, <(β) >

p − 12. (7.4.6)

By making the transformation V = I − Z note that α and β can be interchanged inthe integral which also shows that Bp(α, β) = Bp(β, α) in the integral representationalso.

Let us make the following transformation in (7.4.6).


W = (I − Z)−12 Z(I − Z)−

12 ⇒ W = (Z−1 − I)

−1 ⇒ W−1= Z−1 − I

⇒ |W |−(p+1)dW = |Z|−(p+1)dZ ⇒ dZ = |I +W |−(p+1)dW.

Under this transformation the integral in (7.4.6) becomes the following: Observethat

|Z| = |W ||I +W |−1, |I − Z| = |I +W |−1.

Bp(α, β) =∫

W=W′>0|W |α−

p+12 |I +W |−(α+β)dW, (7.4.7)

for <(α) > p−12 , <(β) > p−1

2 .

The representation in (7.4.7) is known as the type-2 integral for a real matrix-variatebeta function. With the transformation V = W−1 the parameters α and β in (7.4.7)will be interchanged. With the help of the type-1 and type-2 integral representationsone can define the type-1 and type-2 beta densities in the real matrix-variate case.

Definition 7.4.2. Real matrix-variate type-1 beta density for the p × p realsymmetric positive definite matrix X such that X = X′ > 0, I − X > 0.

f2(X) =

1Bp(α,β) |X|α−

p+12 |I − X|β− p+1

2 = 0 < X = X′< I, <(α) > p−1

2 , <(β) > p−12 ,

0, elsewhere.(7.4.8)

Definition 7.4.3. Real matrix-variate type-2 beta density for the p× p realsymmetric positive definite matrix X.

f3(X) =

Γp(α+β)Γp(α)Γp(β) |X|α−

p+12 |I + X|−(α+β), X = X

′> 0,

<(α) > p−12 , <(β) > p−1

2

0, elsewhere.

(7.4.9)

Example 7.4.1. Let X1, X2 be p× p matrix random variables, independently distributedas (7.4.3) with parameters α1 and α2 respectively. Let U = X1 + X2,V = (X1 + X2)−

12 X1

(X1 + X2)−12 , W = X

− 12

2 X1X− 1

22 . Evaluate the densities of U,V and W.


Solutions 7.4.1. The joint density of X1 and X2, denoted by f (X1, X2), is availableas the product of the marginal densities due to independence. That is,

f (X1, X2) =|X1|α1− p+1

2 |X2|α2− p+12 e−tr(X1+X2)

Γp(α1)Γp(α2), X1 = X′1 > 0, X2 = X′2 > 0,

<(α1) >p − 1

2, <(α2) >

p − 12. (7.4.10)

U = X1 + X2 ⇒ |X2| = |U − X1| = |U ||I − U−12 X1U−

12 |.

Then the joint density of (U,U1) = (X1 + X2, X1), the Jacobian is unity, is availableas

f1(U,U1) =1

Γp(α1)Γp(α2)|U1|α1− p+1

2 |U |α2− p+12 |I − U−

12 U1U−

12 |α2− p+1

2 e−tr(U).

Put U2 = U−12 U1U−

12 ⇒ dU1 = |U |

p+12 dU2 for fixed U. Then the joint density of U

and U2 = V is available as the following:

f2(U,V) =1

Γp(α1)Γp(α2)|U |α1+α2− p+1

2 e−tr(U)|V |α1− p+12 |I − V |α2− p+1

2 .

Since f2(U,V) is a product of two functions of U and V, U = U ′ > 0,V = V ′ > 0,I − V > 0 we see that U and V are independently distributed. The densities of Uand V , denoted by g1(U), g2(V) are the following:

g1(U) = c1|U |α1+α2− p+12 e−tr(U),U = U′ > 0

andg2(V) = c2|V |α1− p+1

2 |I − V |α2− p+12 , V = V ′ > 0, I − V > 0,

where c1 and c2 are the normalizing constants. But from the gamma density andtype-1 beta density note that

c1 =1

Γp(α1 + α2), c2 =

Γp(α1 + α2)

Γp(α1)Γp(α2), <(α1) > 0, <(α2) > 0.

Hence U is gamma distributed with the parameter (α1 + α2) and V is type-1 betadistributed with the parameters α1 and α2 and further that U and V are independently

distributed. For obtaining the density of W = X− 1

22 X1X

− 12

2 start with (7.4.10). Change


(X1, X2) to (X1,W) for fixed X2. Then dX1 = |X2|p+1

2 dW. The joint density of X2 andW, denoted by fw,x2(W, X2), is the following, observing that

tr(X1 + X2) = tr[X122 (I + X

− 12

2 X1X− 1

22 )X

122 ]

= tr[X122 (I +W)X

122 ] = tr[(I +W)X2]

= tr[(I +W)12 X2(I +W)

12 ]

by using the fact that tr(AB) = tr(BA) for any two matrices where AB and BA aredefined.

fw,x2(W, X2) =1

Γp(α1)Γp(α2)|W |α1− p+1

2 |X2|α1+α2− p+12 e−tr[(I+W)

12 X2(I+W)

12 ].

Hence the marginal density of W, denoted by gw(W), is available by integrating outX2 from fw,x2(W, X2). That is,

gw(W) =1

Γp(α1)Γp(α2)|W |α1− p+1

2

∫

X2=X′2>0|X2|α1+α2− p+1

2 e−tr[(I+W)12 X2(I+W)

12 ]dX2.

Put X3 = (I +W)12 X2(I +W)

12 for fixed W, then dX3 = |I +W | p+1

2 dX2.

Then the integral becomes∫

X2=X′2>0|X2|α1+α2− p+1

2 e−tr[(I+W)12 X2(I+W)

12 ]dX2

= Γp(α1 + α2)|I +W |−(α1+α2).

Hence,

gw(W) =

Γp(α1+α2)Γp(α1)Γp(α2) |W |α1− p+1

2 |I +W |−(α1+α2),W = W′> 0,

<(α1) > p−12 , <(α2) > p−1

20, elsewhere,

which is a type-2 beta density with the parameters α1 and α2. Thus, W is realmatrix-variate type-2 beta distributed.


Exercises 7.4.

7.4.1. For a real p × p matrix X such that X = X′ > 0, 0 < X < I show that∫

XdX =

[Γp( p+12 )]2

Γp(p + 1).

7.4.2. Let X be a 2 × 2 real symmetric positive definite matrix with eigenvalues in(0, 1). Then show that

∫

0<X<I|X|αdX =

π

(α + 1)(α + 2)(2α + 3).

7.4.3. For a 2 × 2 real positive definite matrix X show that∫

X|I + X|−3dX =

π

6.

7.4.4. For a 4 × 4 real positive definite matrix X such that 0 < X < I, show that∫

XdX =

2π4

7!5.

7.4.5. If the p × p real positive definite matrix random variable X is distributed asa real matrix-variate type-1 beta (having a type-1 beta density), evaluate the densityof Y = A

12 XA

12 where the constant matrix A = A′ > 0.

7.5. The Laplace Transform in the Matrix Case

If f (x1, · · · , xk) is a scalar function of the real scalar variables x1, · · · , xk thenthe Laplace transform of f , with the parameters t1, · · · , tk, is given by

L f (t1, · · · , tk) =∫ ∞

0· · ·

∫ ∞

0e−t1 x1−···−tk xk f (x1, · · · , xk)dx1 ∧ · · · ∧ dxk. (7.5.1)

If f (X) is a real scalar function of the p× p real symmetric positive definite matrix Xthen the Laplace transform of f (X) should be consistent with (7.5.1). When X = X ′

there are only p(p + 1)/2 distinct elements, either xi j′s, i ≤ j or xi j

′s, i ≥ j. Hencewhat is needed is a linear function of all these variables. That is, in the exponent we

7.5. THE LAPLACE TRANSFORM IN THE MATRIX CASE 257

should have the linear function t11x11 + (t21x21 + t22x22)+ · · ·+ (tp1xp1 + · · ·+ tppxpp).Even if we take a symmetric matrix T = (ti j) = T

′then the trace of T X,

tr(T X) = t11x11 + · · · + tppxpp + 2p

∑

i< j=1

ti jxi j.

Hence if we take a symmetric matrix of parameters ti j’s such that

T ∗ =

t1112 t12 · · · 1

2 t1p12 t21 t22 · · · 1

2 t2p...

12 tp1

12 tp2 · · · tpp

, T ∗ = (t∗i j)⇒ t∗j j = t j j, t∗i j =

12

ti j, i , j

then

tr(T ∗X) = t11x11 + · · · + tppxpp +

p∑

i=1

p∑

j=1,i> j

ti jxi j.

Hence the Laplace transform in the matrix case, for real symmetric positive definitematrix X, is defined with the parameter matrix T ∗.

Definition 7.5.1. Laplace transform in the matrix case.

L f (T∗) =

∫

X=X′>0e−tr(T ∗X) f (X)dX, (7.5.2)

whenever the integral is convergent.

Example 7.5.1. Evaluate the Laplace transform for the two-parameter gamma densityin (7.4.4).

Solution 7.5.1. Here,

f (X) =|B|αΓp(α)

|X|α−p+1

2 e−tr(BX), X = X′ > 0, B = B′ > 0, <(α) >p − 1

2. (7.5.3)

Hence the Laplace transform of f is the following:

L f (T∗) =

|B|αΓp(α)

∫

X=X′>0|X|α−

p+12 e−tr(T ∗X)e−tr(BX)dX.

Note that since T ∗, B and X are p × p,

tr(T ∗X) + tr(BX) = tr[(B + T ∗)X].


Thus for the integral to converge the exponent has to remain positive definite. Thenthe condition B + T ∗ > 0 is sufficient. Let (B + T ∗)

12 be the symmetric positive

definite square root of B + T ∗. Then

tr[(B + T ∗)X] = tr[(B + T ∗)12 X(B + T ∗)

12 ],

(B + T ∗)12 X(B + T ∗)

12 = Y ⇒ dX = |B + T ∗|−

p+12 dY

and

|X|α−p+1

2 dX = |B + T ∗|−α|Y |α−p+1

2 dY.

Hence,

L f (T∗) =

|B|αΓp(α)

∫

Y=Y′>0|B + T ∗|−α|Y |α−

p+12 e−tr(Y)dY

= |B|α|B + T ∗|−α = |I + B−1T ∗|−α. (7.5.4)

Thus for known B and arbitrary T ∗, (7.5.4) will uniquely determine (7.5.3) throughthe uniqueness of the inverse Laplace transform. The conditions for the uniquenesswill not be discussed here. For some results in this direction see Mathai (1993,1997) and the references therein.

7.5.1. A convolution property for Laplace transforms

Let f1(X) and f2(X) be two real scalar functions of the real symmetric positivedefinite matrix X and let g1(T ∗) and g2(T ∗) be their Laplace transforms. Let

f3(X) =∫

0<S=S ′<Xf1(X − S ) f2(S )dS . (7.5.5)

Then g1g2 is the Laplace transform of f3(X).This result can be established from the definition itself.

L f3 (T ∗) =∫

X=X′>0e−tr(T ∗X) f3(X)dX

=

∫

x>0

∫

S<Xe−tr(T ∗X) f1(X − S ) f2(S )dS ∧ dX.


Note that {S < X, X > 0} is also equivalent to {X > S , S > 0}. Hence we mayinterchange the integrals. Then

L f3(T∗) =

∫

S>0f2(S )

[∫

X>Se−tr(T ∗X) f1(X − S )dX

]

∧ dS .

Put X − S = Y ⇒ X = Y + S and then

L f3(T∗) =

∫

S>0e−tr(T ∗S ) f2(S )

[∫

Y>0e−tr(T ∗Y) f1(Y)dY

]

∧ dS

= g2(T ∗)g1(T ∗).

Example 7.5.2. Using the convolution property for the Laplace transform and anintegral representation for the real matrix-variate beta function show that

Bp(α, β) = Γp(α)Γp(β)/Γp(α + β).

Solution 7.5.2. Let us start with the integral representation

Bp(α, β) =∫

0<X<I|X|α−

p+12 |I − X|β−

p+12 dX,

<(α) >p − 1

2,<(β) >

p − 12.

Consider the integral

∫

0<U<X|U |α−

p+12 |X − U |β−

p+12 dU = |X|β−

p+12

∫

0<U<X|U |α−

p+12

× |I − X−12 UX−

12 |β−

p+12 dU

= |X|α+β−p+1

2

∫

0<Y<I|Y |α−

p+12 |I − Y |β−

p+12 dY, Y = X−

12 UX−

12 .

Then

Bp(α, β)|X|α+β−p+1

2 =

∫

0<U<X|U |α−

p+12 |X − U |β−

p+12 dU. (7.5.6)


Take the Laplace transform on both sides to obtain the following:On the left side,

Bp(α, β)∫

X>0|X|α+β−

p+12 e−tr(T ∗X)dX = Bp(α, β)|T ∗|−(α+β)

Γp(α + β).

On the right side we get,∫

X>0e−tr(T ∗X)

[∫

0<U<X|U |α−

p+12 |X − U |β−

p+12 dU

]

dX

= Γp(α)Γp(β)|T ∗|−(α+β) (by the convolution property in (7.5.5).)

Hence

Bp(α, β) = Γp(α)Γp(β)/Γp(α + β).

Example 7.5.3. Let h(T ∗) be the Laplace transform of f (X), that is, h(T ∗) = L f (T ∗).

Then show that the Laplace transform of |X|− p+12 Γp( p+1

2 ) f (X) is equivalent to∫

U>T ∗h(U)dU.

Solution: From (7.5.3) observe that for symmetric positive definite constant matrixB the following is an identity.

|B|−α = 1Γp(α)

∫

X>0|X|α−

p+12 e−tr(BX)dX,<(α) >

p − 12. (7.5.7)

Then we can replace |X|− p+12 Γp( p+1

2 ) by an equivalent integral.

|X|−p+1

2 Γp

(

p + 12

)

≡∫

Y>0|Y |

p+12 −

p+12 e−tr(XY)dY =

∫

Y>0e−tr(XY)dY.

Then the Laplace transform of |X|− p+12 Γp

(

p+12

)

f (X) is given by,∫

X>0e−tr(T ∗X) f (X)[

∫

Y>0e−tr(YX)dY] ∧ dX

=

∫

X>0

∫

Y>0e−tr[(T ∗+Y)X] f (X)dY ∧ dX. (Put T ∗ + Y = U ⇒ U > T ∗)

=

∫

Y>0h(T ∗ + Y)dY =

∫

U>T ∗h(U)dU.

Example 7.5.4. For X > B, B = B′ > 0 and ν > −1 show that the Laplace transform of

|X − B|ν is |T |−(ν+ p+12 )e−tr(T ∗B)

Γp(ν + p+12 ).


Solution 7.5.3. Laplace transform of |X − B|ν with parameter matrix T ∗ is givenby,

∫

X>B|X − B|νe−tr(T ∗X)dX = e−tr(BT ∗)

∫

Y>0|Y |νe−tr(T ∗Y)dY, Y = X − B

= e−tr(BT ∗)Γp

(

ν +p + 1

2

)

|T ∗|−(ν+ p+12 )

(by writing ν = ν + p+12 −

p+12 ) for ν + p+1

2 >p−1

2 ⇒ ν > −1.

Exercises 7.5.

7.5.1. By using the process in Example 7.5.3, or otherwise, show that the Laplacetransform of [Γp( p+1

2 )|X|− p+12 ]n f (X) can be written as

∫

W1>T ∗

∫

W2>W1

...

∫

Wn>Wn−1

h(Wn)dW1 ∧ · · · ∧ dWn

where h(T ∗) is the Laplace transform of f (X).

7.5.2. Show that the Laplace transform of |X|n is |T ∗|−n− p+12 Γp(n + p+1

2 ) for n > −1.

7.5.3. If the p × p real matrix random variable X has a type-1 beta density withparameters (α1, α2) then show that

(i) U = (I − X)−12 X(I − X)−

12 ∼ type-2 beta (α1, α2)

(ii) V = X−1 − I ∼ type-2 beta (α2, α1)

where “ ∼ ” indicates “distributed as”, and the parameters are given in the brackets.

7.5.4. If the p × p real symmetric positive definite matrix random variable X hasa type-2 beta density with parameters α1 and α2 then show that

(i) U = X−1 ∼ type-2 beta (α2, α1)

(ii) V = (I + X)−1 ∼ type-1 beta (α2, α1)

(iii) W = (I + X)−12 X(I + X)−

12 ∼ type-1 beta (α1, α2).

7.5.5. If the Laplace transform of f (X) is g(T ∗) = LT ∗ ( f (X)), where X is realsymmetric positive definite and p × p then show that


∆ng(T ∗) = LT ∗ (|X|n f (X)), ∆ = (−1)p

∣

∣

∣

∣

∣

∂

∂T ∗

∣

∣

∣

∣

∣

where | ∂∂T ∗ | means that first the partial derivatives with respect to ti j’s for all i and j

are taken, then written in the matrix form and then the determinant is taken, whereT ∗ = (t∗i j).

7.6. Hypergeometric Functions with Matrix Argument

There are essentially three approaches available in the literature for defining ahypergeometric function of matrix argument. One approach due to Bochner (1952)and Herz (1955) is through Laplace and inverse Laplace transforms. Under thisapproach, a hypergeometric function is defined as the function satisfying a pairof integral equations, and explicit forms are available for 0F0 and 1F0. Anotherapproach is available from James (1961) and Constantine (1963) through a seriesform involving zonal polynomials. Theoretically, explicit forms are available forgeneral parameters or for a general pFq but due to the difficulty in computing higherorder zonal polynomials, computations are feasible only for small values of p and q.For a detailed discussion of zonal polynomials see Mathai, Provost and Hayakawa(1995). The third approach is due to Mathai (1978, 1993) with the help of a gener-alized matrix transform or M-transform. Through this definition a hypergeometricfunction is defined as a class of functions satisfying a certain integral equation. Thisdefinition is the one most suited for studying various properties of hypergeometricfunctions. The series form is least suited for this purpose. All these definitions areintroduced for symmetric functions in the sense that

f (X) = f (X′) = f (QQ

′X) = f (Q

′XQ) = f (D), D = diag(λ1, ..., λp).

If X is p × p and real symmetric then there exists an orthonormal matrix Q, thatis, QQ

′= I,Q

′Q = I such that Q

′XQ = diag(λ1, ..., λp) where λ1, ..., λp are the

eigenvalues of X. Thus, f (X), a scalar function of the p(p + 1)/2 functionallyindependent elements in X, is essentially a function of the p variables λ1, ..., λp

when the function f (X) is symmetric in the above sense.

7.6. HYPERGEOMETRIC FUNCTIONS WITH MATRIX ARGUMENT 263

7.6.1. Hypergeometric function through Laplace transform

Let rFs(a1, ..., ar; b1, ..., bs; Z) be the hypergeometric function of the matrix ar-gument Z to be defined, Z = Z

′. Consider the following pair of Laplace and inverse

Laplace transforms.

r+1Fs(a1, ..., ar, c; b1, ..., bs;−∧−1)| ∧ |−c

=1Γp(c)

∫

U=U ′>0e−tr(∧U)

rFs(a1, ..., ar; b1, ..., bs;−U)|U |c−p+1

2 dU (7.6.1)

and

rFs+1(a1, ..., ar; b1, ..., br, c;−∧)| ∧ |c−p+1

2

=Γp(c)

(2πi)p(p+1)/2

∫

<(Z)=X>X0

etr(∧Z)rFs(a1, ..., ar; b1, ..., bs;−Z−1)|Z|−cdZ (7.6.2)

where Z = X + iY, i =√−1, X = X

′> 0, and X and Y belong to the class of

symmetric matrices with the non-diagonal elements weighted by 12 . The function

rFs satisfying (7.6.1) and (7.6.2) can be shown to be unique under certain conditionsand that function is defined as the hypergeometric function of matrix argument ∧,according to this definition.

Then by taking 0F0(; ;−∧) = e−tr(∧) and by using the convolution property of theLaplace transform and equations (7.6.1) and (7.6.2) one can systematically buildup. The Bessel function 0F1 for matrix argument is defined by Herz (1955). Thuswe can go from 0F0 to 1F0 to 0F1 to 1F1 to 2F1 and so on to a general pFq.

Example 7.6.1. Obtain an explicit form for 1F0 from the above definition by using0F0

(; ;−U) = e−tr(U).

Solution 7.6.1. From (7.6.1)

1Γp(c)

∫

U=U ′>0|U |c−

p+12 e−tr(∧U)

0F0(; ;−U)dU

=1Γp(c)

∫

U>0|U |c−

p+12 e−tr[(I+∧)U]dU = |I + ∧|−c,

since

0F0(; ;−U) = e−tr(U).


But

|I + ∧|−c= | ∧ |−c|I + ∧−1|−c.

Then from (7.6.1)

1F0(c; ;−∧−1) = |I + ∧−1|−c

which is an explicit representation.

7.6.2. Hypergeometric function through zonal polynomials

Zonal polynomials are certain symmetric functions in the eigenvalues of thep × p matrix Z. They are denoted by CK(Z) where K represents the partition of thepositive integer k, K = (k1, ..., kp) with k1 + · · · + kp = k. When Z is 1 × 1 thenCK(z) = zk. Thus, CK(Z) can be looked upon as a generalization of zk in the scalarcase. For details see Mathai, Provost and Hayakawa (1995). In terms of CK(Z) wehave the representation for a

0F0(; ; Z) = etr(Z)=

∞∑

k=0

(tr(Z))k

k!=

∞∑

k=0

∑

K

CK(Z)k!. (7.6.3)

The binomial expansion will be the following:

1F0(α; ; Z) =∞∑

k=0

∑

K

(α)KCK(Z)k!

= |I − Z|−α, (7.6.4)

for 0 < Z < I, where,

(α)K =

p∏

j=1

(

α − j − 12

)

k j

,K = (k1, ..., kp), k1 + · · · + kp = k. (7.6.5)

In terms of zonal polynomials a hypergeometric series is defined as follows:

pFq(a1, ..., ap; b1, ..., bq; Z) =∞

∑

k=0

∑

K

(a1)K · · · (ap)K

(b1)K · · · (bq)K

CK(Z)k!. (7.6.6)

For (7.6.6) to be defined, none of the denominator factors is equal to zero, q ≥ p,or q = p + 1 and 0 < Z < I. For other details see Constantine (1963). In orderto study properties of a hypergeometric function with the help of (7.6.6) one needsthe Laplace and inverse Laplace transforms of zonal polynomials. These are thefollowing:


∫

X=X′>0|X|α−

p+12 e−tr(XZ)CK(XT )dX = |Z|−αCK(TZ−1)Γp(α,K) (7.6.7)

where

Γp(α,K) = πp(p−1)/4p

∏

j=1

Γ

[

α + k j −j − 1

2

]

= Γp(α)(α)K . (7.6.8)

1(2πi)p(p+1)/2

∫

<(Z)=X>X0

etr(S Z)|Z|−αCK(Z)dZ

=1

Γp(α,K)|S |α−

p+12 CK(S ), i =

√−1 (7.6.9)

for Z = X + iY, X = X′ > 0, X and Y are symmetric and the nondiagonal elementsare weighted by 1

2 . If the non-diagonal elements are not weighted then the left sidein (7.6.9) is to be multiplied by 2p(p−1)/2. Further,

∫

0<X<I|X|α−

p+12 |I − X|β−

p+12 CK(T X)dX =

Γp(α,K)Γp(β)

Γp(α + β,K)CK(T )

<(α) >p − 1

2, <(β) >

p − 12. (7.6.10)

Example 7.6.2. By using zonal polynomials establish the following results:

2F1(a, b; c; X) =Γp(c)

Γp(a)Γp(c − a)

×∫

0<∧<I| ∧ |a−

p+12 |I − ∧|c−a− p+1

2 |I − ∧X|−bd∧ (7.6.11)

for <(a) > p−12 , <(c − a) > p−1

2 .

Solution 7.6.2. Expanding |I − ∧X|−b in terms of zonal polynomials and thenintegrating term by term the right side reduces to the following:

|I − ∧X|−b=

∞∑

k=0

∑

K

(b)KCK(∧X)

k!for 0 < ∧X < I


and∫

0<∧<I| ∧ |a−

p+12 |I − ∧|c−a− p+1

2 CK(∧X)d∧ =Γp(a,K)Γp(c − a)

Γp(c,K)CK(X)

by using (7.6.10). But

Γp(a,K)Γp(c − a)

Γp(c,K)=Γp(a)Γp(c − a)

Γp(c)(a)K

(c)K.

Substituting these back, the right side becomes

∞∑

k=0

∑

K

(a)K(b)K

(c)K

CK(X)k!

= 2F1(a, b; c; X).

This establishes the result.

Example 7.6.3. Establish the result

2F1(a, b; c; I) =Γp(c)Γp(c − a − b)

Γp(c − a)Γp(c − b)(7.6.12)

for<(c − a − b) > p−12 , <(c − a) > p−1

2 , <(c − b) > p−12 .

Solution 7.6.3. In (7.6.11) put X = I, combine the last factor on the right withthe previous factor and integrate out with the help of a matrix-variate type-1 betaintegral.

Uniqueness of the pFq through zonal polynomials, as given in (7.6.6), is estab-lished by appealing to the uniqueness of the function defined through the Laplaceand inverse Laplace transform pair in (7.6.1) and (7.6.2), and by showing that (7.6.6)satisfies (7.6.1) and (7.6.2).

The next definition, introduced by Mathai in a series of papers is through aspecial case of Weyl’s fractional integral.

7.6.3. Hypergeometric functions through M-transforms

Consider the class of p × p real symmetric definite matrices and the null matrixO. Any member of this class will be either positive definite or negative definite ornull. Let α be a complex parameter such that <(α) > p−1

2 . Let f (S ) be a scalar


symmetric function in the sense f (AB) = f (BA) for all A and B when AB and BAare defined. Then the M-transform of f (S ), denoted by Mα( f ), is defined as

Mα( f ) =∫

U=U ′>0|U |α−

p+12 f (U)dU. (7.6.13)

Some examples of symmetric functions are e±tr(S ), |I ± S |β for nonsingular p × pmatrices A and B such that,

e±tr(AB)= e±tr(BA); |I ± AB|β = |I ± BA|β.

Is it possible to recover f (U), a function of p(p + 1)/2 elements in U = (ui j) ora function of p eigenvalues of U, that is a function of p variables, from Mα(t) whichis a function of one parameter α? In a normal course the answer is in the negative.But due to the properties that are seen, it is clear that there exists a set of sufficientconditions by which Mα( f ) will uniquely determine f (U). It is easy to note thatthe class of functions defined through (7.6.13) satisfy the pair of integral equations(7.6.1) and (7.6.2) defining the unique hypergeometric function.

A hypergeometric function through M-transform is defined as a class of func-tions rF∗s satisfying the following equation:

∫

X=X′>0|X|α−

p+12 rFs

∗(a1, ..., ap; b1, ..., bq;−X)dX

={∏s

j=1 Γp(b j)}{∏r

j=1 Γp(a j)}{∏r

j=1 Γp(a j − ρ)}{∏s

j=1 Γp(b j − ρ)}Γp(ρ) (7.6.14)

where ρ is an arbitrary parameter such that the gammas exist.

Example 7.6.4. Re-establish the result

LT (|X − B|ν) = Γp

(

ν +p + 1

2

)

|T |−(ν+ p+12 )e−tr(T B) (7.6.15)

by using M-transforms.

Solution 7.6.4. We will show that the M-transforms on both sides of (7.6.15)are one and the same. Taking the M-transform of the left-side, with respect to theparameter ρ, we have,


∫

T>0|T |ρ−

p+12 {LT (|X − B|ν)}dT =

∫

T>0|T |ρ−

p+12

[∫

X>B|X − B|νe−tr(T X)dX

]

dT

=

∫

T>0|T |ρ−

p+12 e−tr(T B)

[∫

Y>0|Y |νe−tr(TY)dY

]

dT.

Noting that ν = ν + p+12 −

p+12 the Y-integral gives |T |−ν− p+1

2 Γp(ν + p+12 ). Then the

T -integral gives

Mρ(left-side) = Γp

(

ν +p + 1

2

)

Γp

(

ρ − ν − p + 12

)

|B|−ρ+ν+p+1

2 .

M-transform of the right side gives,

Mρ(right-side) =∫

T>0|T |ρ−

p+12

{

Γp

(

ν +p + 1

2

)

|T |−(ν+ p+12 )e−tr(T B)

}

dT

= Γp

(

ν +p + 1

2

)

Γp

(

ρ − ν − p + 12

)

|B|−ρ+ν+p+1

2 .

The two sides have the same M-transform.

Starting with 0F0(; ; X) = etr(X), we can build up a general pFq by using theM-transform and the convolution form for M-transforms, which will be stated next.

7.6.4. A convolution theorem for M-transforms

Let f1(U) and f2(U) be two symmetric scalar functions of the p×p real symmet-ric positive definite matrix U, with M-transforms Mρ( f1) = g1(ρ) and Mρ( f2) = g2(ρ)respectively. Let

f3(S ) =∫

U>0|U |β f1(U

12 S U

12 ) f2(U)dU (7.6.16)

then the M-transform of f3 is given by,

Mρ( f3) = g1(ρ)g2

(

β − ρ + p + 12

)

. (7.6.17)

The result can be easily established from the definition itself by interchangingthe integrals.


Example 7.6.5. Show that

1F1(a; c;−∧) =Γp(c)

Γp(a)Γp(c − a)

∫

0<U<I|U |a−

p+12 |I − U |c−a− p+1

2 e−tr(∧U)dU. (7.6.18)

Solution 7.6.5. We will establish this by showing that both sides have the sameM-transforms. From the definition in (7.6.14) the M-transform of the left side withrespect to the parameter ρ is given by the following:

Mρ(left-side) =∫

∧=∧′>0| ∧ |ρ−

p+12 1F1(a; c;−∧)d∧

=

[

Γp(a − ρ)Γp(c − ρ)Γp(ρ)

]

Γp(c)

Γp(a).

Mρ(right-side) =∫

∧>0| ∧ |ρ−

p+12

{

Γp(c)

Γp(a)Γp(c − a)

×∫

0<U<I|U |a−

p+12 |I − U |c−a− p+1

2 e−tr(∧U)dU}

d ∧ .

Take,

f1(U) = e−tr(U) and f2(U) = |U |a−p+1

2 |I − U |c−a− p+12 .

Then

Mρ( f1) = g1(ρ) =∫

U>0|U |ρ−

p+12 e−tr(U)dU = Γp(ρ),<(ρ) >

p − 12.

Mρ( f2) = g2(ρ) =∫

U>0|U |ρ−

p+12 |U |a−

p+12 |I − U |c−a− p+1

2 dU

=Γp(a + ρ − p+1

2 )Γp(c − a)

Γp(c + ρ − p+12 )

,<(c − a) >p − 1

2,

<(a + ρ) > p,<(c + ρ) > p.

Taking f3 in (7.6.16) as the second integral on the right above we have

Mρ(right-side) =

{

Γp(c)

Γp(a)

}

Γp(ρ)Γp(a − ρ)Γp(c − ρ) = Mρ(left-side).

Hence the result.


Almost all properties, analogous to the ones in the scalar case for hypergeomet-ric functions, can be established by using the M-transform technique very easily.These can then be shown to be unique , if necessary, through the uniqueness ofLaplace and inverse Laplace transform pair. Theories for functions of several ma-trix arguments, Dirichlet integrals, Dirichlet densities, their extensions, Appell’sfunctions, Lauricella functions, and the like, are available. Then all these real casesare also extended to complex cases as well. For details see Mathai (1997). Prob-lems involving scalar functions of matrix argument, real and complex cases, are stillbeing worked out and applied in many areas such as statistical distribution theory,econometrics, quantum mechanics and engineering areas. Since the aim in this briefnote is only to introduce the subject matter, more details will not be given here.

Exercises 7.6.

7.6.1. Show that for ∧ = ∧′ > 0 and p × p,

1F1(a; c;−∧) = e−tr(∧)1F1(c − a; c;∧).

7.6.2. For p × p real symmetric positive definite matrices ∧ and ∨ show that

1F1(a; c;−∧) =Γp(c)

Γp(a)Γp(c − a)| ∧ |−(c− p+1

2 )

∫

0<∨<∧e−tr(∨)

×| ∨ |a−p+1

2 | ∧ − ∨ |c−a− p+12 dV.

7.6.3. Show that for ε a scalar and A a p × p matrix with p finite

limε→0|I + εA|− 1

ε = limε→∞|I + Aε|−ε = e−tr(A).

7.6.4. Show that

lima→∞ 1F1(a; c;−Z

a) = lim

ε→01F1

(

1ε

; c;−εZ)

= 0F1(; c;−Z).

7.6.5. Show that

1F1(a; c;−∧) =Γp(c)

(2πi)p(p+1)

2

∫

<(Z)=X>X0

etr(Z)|Z|−c|I + ∧Z−1|−adZ.


7.6.6. Show that

2F1(a, b; c; X) = |I − X|−β2F1(c − a, b; c;−X(I − X)−1).

7.6.7. For<(s) > p−12 ,<(b − s) > p−1

2 ,<(c − a − s) > p−12 , show that

∫

0<X<I|X|s−

p+12 |I − X|b−s− p+1

2 2F1(a, b; c; X)dX

=Γp(c)Γp(s)Γp(b − s)Γp(c − a − s)

Γp(b)Γp(c − a)Γp(c − s).

7.6.8. Defining the Bessel function Ar(S ) with p × p real symmetric positivedefinite matrix argument S, as

Ar(S ) =1

Γp(r + p+12 )

0F1(; r +p + 1

2;−S ), (7.6.19)

show that∫

S>0|S |δ−

p+12 Ar(S )e−tr(∧S )dS =

Γp(δ)

Γp(r + p+12 )| ∧ |−δ1F1

(

δ; r +p + 1

2;−∧−1

)

.

7.6.9. If

M(α, β; A) =∫

X=X′>0|X|α−

p+12 |I + X|β−

p+12 e−tr(AX)dX,

<(α) >p − 1

2, A = A

′> 0

then show that∫

X>0|X + A|νe−tr(T X)dX = |A|ν+

p+12 M

(

p + 12, ν +

p + 12

; A12 T A

12

)

.

7.6.10. If Whittaker function W is defined as∫

Z>0|Z|µ−

p+12 |I + Z|ν−

p+12 e−tr(AZ)dZ

= |A|−µ+ν

2 Γp(µ)e12 tr(A)W 1

2 (ν−µ), 12 (ν+µ− (p+1)2 )(A)


then show that∫

X>U|X + B|2α−

p+12 |X − U |2q− p+1

2 e−tr(MX)dX

= |U + B|α+q− p+12 |M|−(α+q)e

12 tr[(B−U)M]

Γp(2q)

×W(α−q),(α+q− (p+1)4 )(S ), S = (U + B)

12 M(U + B)

12 .

7.7. A Pathway Model

As an application of a real scalar function of matrix argument, we will intro-duce a general real matrix-variate probability model, which covers almost all realmatrix-variate densities used in multivariate statistical analysis. Through the den-sity introudced here, a pathway is created to go from one functional form to another,to go from matrix-variate type-1 beta to matrix-variate type-2 beta to matrix-variategamma to matrix-variate Gaussian densities.

7.7.1. The pathway density

Let X = (xi j), i = 1, ..., p, j = 1, ..., r, r ≥ p be of rank p and of real scalarvariables xi j’s for all i and j, and having the density f (X), where

f (X) = c|A 12 XBX′A

12 |α|I − a(1 − q)A

12 XBX′A

12 |β

1−q (7.7.1)

for A = A′ > 0 and p × p, B = B′ > 0 and r × r with I − a(1 − q)A12 XBX′A

12 > 0,

A and B are free of the elements in X, a, β, q are scalar constants with a > 0, β > 0,and c is the normalizing constant. A

12 and B

12 denote the real positive definite square

roots of A and B respectively.For evaluating the normalizing constant c one can go through the following

procedure: LetY = A

12 XB

12 ⇒ dY = A

r2 |B|

p2 dX

by using Theorem 7.1.5. Let

U = YY ′ ⇒ dY =π

rp2

Γp

(

r2

) |U | r2−p+1

2 dU

7.7. A PATHWAY MODEL 273

by using equation (7.3.17), where ΓP(·) is the real matrix-variate gamma function.Let, for q < 1,

V = a(1 − q)U ⇒ dV = [a(1 − q)]p(p+1)

2 dU

from the same Theorem 7.1.5. If f (X) is a density then the total integral is 1 andtherefore,

1 =∫

Xf (X)dX =

c

|A| r2 |B| p2

∫

Y|YY ′|α|I − a(1 − q)YY ′|

β

1−q dY (7.7.2)

=π

rp2

Γp

(

r2

)

|A| r2 |B| p2

∫

U|U |α+ r

2−p+1

2 |I − a(1 − q)U |β

1−q dU. (7.7.3)

Note 7.7.1. Note that from (7.7.2) and (7.7.3) we can also infer the densities ofY and U respectively.

At this stage we need to consider three cases.

Case (1): q, < 1. Then a(1 − q) > 0. Make the transformation V = a(1 − q)U, then

c−1=

πrp2

Γp

(

r2

)

|A| r2 |B| p2 [a(1 − q)]p(α+ r2 )

∫

V|V |α+ r

2−p+1

2 |I − V |β

1−q dV. (7.7.4)

The integral in (7.7.4) can be evaluated by using a real matrix-variate type-1 betaintegral. Then we have

c−1=

πrp2

Γp

(

r2

)

|A| r2 |B| p2 [a(1 − q)]p(α+ r2 )

Γp

(

α + r2

)

Γp

(

β

1−q +p+1

2

)

Γp

(

α + r2 +

β

1−q +p+1

2

) (7.7.5)

for α + r2 >

p−12 .

Note 7.7.2. In statistical problems usually the parameters are real and hence wewill assume the parameters to be real here as well as in the discussions to follow. Ifα is in the complex domain then the condition will reduce to<(α) + r

2 >p−1

2 .

Case (ii): q > 1.

In this case 1 − q = −(q − 1) where q − 1 > 0. Then in (7.7.3) one factor in theintegrand becomes

|I − a(1 − q)U |β

1−q = |I + a(q − 1)U |−β

q−1 (7.7.6)


and then making the substitution V = a(q − 1)U and then evaluating the integral byusing a type-2 beta integral we have

c−1=

πrp2

Γp

(

r2

)

|A| r2 |B| p2 [a(q − 1)]p(α+ r2 )

∫

V|V |α+ r

2−p+1

2 |I + V |−β

q−1 dV.

Evalute the integral by using a real matrix-variate type-2 beta integral. We have thefollowing:

c−1=

πrp2

Γp

(

r2

)

|A| r2 |B| p2 [a(q − 1)]p(α+ r2 )

Γp

(

α + r2

)

Γp

(

β

q−1 − α − r2

)

Γp

(

β

q−1

) (7.7.7)

for α + r2 >

p−12 ,

β

q−1 − α −r2 >

p−12 .

Case (iii): q = 1.

When q approaches 1 from the left or from the right it can be shown that thedeterminant containing q in (7.7.3) and (7.7.6) approaches an exponential form,which will be stated as a lemma:

Lemma 7.7.1.

limq→1|I − a(1 − q)U |

β

1−q = e−aβ tr(U). (7.7.8)

This lemma can be proved easily by observing that for any real symmetric ma-trix U there exists an orthonormal matrix Q such that QQ′ = I = Q′Q, Q′UQ =diag(λ1, ..., λp) where λ j’s are the eigenvalues of U. Then

|I − a(1 − q)U | = |I − a(1 − q)QQ′UQQ′ |= |I − a(1 − q)Q′UQ|= |I − a(1 − q)diag(λ1, ..., λp)|

=

p∏

j=1

(1 − a(1 − q)λ j).

Butlimq→1

(1 − a(1 − q)λ j)β

1−q = e−aβλ j .

Thenlimq→1|I − a(1 − q)U |

β

1−q = e−aβtr(U).


Hence in case (iii), for q→ 1, we have

c−1=

πrp2

Γp

(

r2

)

|A| r2 |B| p2

∫

U|U |α+ r

2−p+1

2 e−aβtr(U)dU

=π

rp2

Γp

(

r2

)

|A| r2 |B| p2Γp

(

α + r2

)

(aβ)p(α+ r2 ), α +

r2>

p − 12

(7.7.9)

by evaluating the integral with the help of a real matrix-variate gamma integral.

7.7.2. A general density

For X, A, B, a, β, q as defined in (7.7.1) the density f (X) there has three differentforms for three different situations of q. That is,

f (X) = c1|A12 XBX′A

12 |α|I − a(1 − q)A

12 XBX′A

12 |β

1−q , for q < 1 (7.7.10)

= c2|A12 XBX′A

12 |α|I + a(q − 1)A

12 XBX′A

12 |−

βq−1 , for q > 1 (7.7.11)

= c3|A12 XBX′A

12 |α exp

{

−aβtr[A12 XBX′A

12 ]}

, for q = 1 (7.7.12)

where c1 = c for q < 1, c2 = c for q > 1 and c3 = c for q = 1, given in (7.7.5),(7.7.6) and (7.7.9) respectively.

Note 7.7.3. Observe that f (X) maintains a generalized real matrix-variate type-1beta form for −∞ < q < 1, f (X) maintains a generalized real matrix-variate type-2beta form for 1 < q < ∞ and f (X) keeps a generalized real matrix-variate gammaform when q→ 1.

Note 7.7.4. If a location parameter matrix is to be introduced then in f (X), replaceX by X−M where M is a p×r constant matrix. All properties and derivations remainthe same except that now X is located at M instead of at the origin O.

Remark 7.7.1. The parameter q in the density f (X) can be taken as a pathwayparameter. It defines a pathway from a generalized type-1 beta form to a type-2beta form to a gamma form. Thus a wide variety of probability models are availablefrom f (X). If the experimenter needs a model with a thicker tail or thinner tail orthe right and left tails cut off, all such models are available from f (X) for variousvalues of q. For α = 0 one has the matrix-variate Gaussian form coming from f (X).


7.7.3. Arbitray moments

Arbitrary moments of the determinant |A 12 XBX′A

12 | is available from the normal-

izing constant itself for various values of q. That is, denoting the expected valuesby E,

E|A 12 XBX′A

12 |h = 1

[a(1 − q)]ph

Γp

(

α + h + r2

)

Γp

(

α + r2

)

Γp

(

α + r2 +

β

1−q +p+1

2

)

Γp

(

α + h + r2 +

β

1−q +p+1

2

) (7.7.13)

for q < 1, α + h +r2>

p − 12

=1

[a(q − 1)]ph

Γp

(

α + h + r2

)

Γp

(

α + r2

)

Γp

(

β

q−1 − α − h − r2

)

Γp

(

β

q−1 − α − r2

) (7.7.14)

for q > 1,β

q − 1− α − h − r

2>

p − 12, α + h +

r2>

p − 12

=1

(aβ)ph

Γp

(

α + h + r2

)

Γp

(

α + r2

) for q = 1, α + h +r2>

p − 12. (7.7.15)

7.7.4. Quadratic forms

The current theory in statistical leterature is based on a Gaussian or normalpopulation and quadratic and bilinear forms in a simple random sample comingfrom such a normal population, or a quadratic form and bilinear forms in normalvariables, independently distributed or jointly normally distributed. But from thestructure of the density f (X) in (7.7.1) it is evident that we can extend the theory toa much wider class. For p = 1, r > p the constant matrix A is a scalar quantity. Forconvenience let us take it as 1. Then we have

A12 XBX′A

12 = (x1, ..., xr)B

x1

x2...

xr

= u( say ). (7.7.16)

Here u is a real positive definite quadratic form in the first row of X, and this rowis denoted by (x1, .., xr). Now observe that the density of u is available as a specialcase in f (X), from (7.7.10) for q < 1, from (7.7.11) for q > 1 and from (7.7.12) forq = 1. [Write down the exact density in the three cases as an exercise].


7.7.5. Generalized quadratic form

For a general p, U = A12 XBX′A

12 is the gneralized quadratic form in X where

X has the density f (X) in (7.7.1). The density of U is available from 7.7.3) in thefollowing form, denoting it by f1(U). Then

f1(U) = c∗|U |α+ r2−

p+12 |I − a(1 − q)U |

β

1−q (7.7.17)

where

c∗ =[a(1 − q)]p(α+ r

2 )Γp

(

α + r2 +

β

1−q +p+1

2

)

Γp

(

α + r2

)

Γp

(

β

1−q +p+1

2

) (7.7.18)

for q < 1, α + r2 >

p−12 ,

=

[a(q − 1)]p(α+ r2 )Γp

(

β

q−1

)

Γp

(

α + r2

)

Γp

(

β

q−1 − α − r2

) , (7.7.19)

for q > 1, α + r2 >

p−12 ,

β

q−1 − α −r2 >

p−12 ,

=(aβ)p(α+ r

2 )

Γp

(

α + r2

) , (7.7.20)

for q = 1, α + r2 >

p−12 .

7.7.6. Applications to random volumes

Another connection to geometrical probability problems is established in Mathai(2005). This is coming from the fact that the columns of the p × r, r ≥ p matrixof rank p can be considered to be p linearly independent points in a r-dimensionalEuclidean space. Then the determinant of XX′ represents the square of the volumecontent of the p-parallaletope generated by the convex hull of the p linearly inde-pendent points represented by X. If the points are random points in some sense, see


for example a discussion of random points and random geometrical configurationsfrom Mathai (1999), then we are dealing with a random volume in |XX ′| 12 . The dis-tribution of this random volume is of interest in geometrical probability problemswhen the points have specified distributions. For problems of this type see Mathai(1999). Then the distributions of such random volumes will be based on the distri-bution of X where X has the very general density given in (7.7.1). Thus the existingtheory in this area is extended to a very general class of basic distributions coveredby the f (X) of (7.7.1).

Exercises 7.7.7.7.1. By using Stirling’s approximation for gamma functions, namely,

Γ(z + a) ≈√

2π zz+a− 12 e−z (7.7.21)

for |z| → ∞ and a a bounded quantity, show that the moment expressions in (7.7.13)and (7.7.14) reduce to the moment expression in (7.7.15).

7.7.2. By opening up Γp(·) in terms of gamma functions and by examining thestructure of the gamma products in (7.7.13) show that for q < 1 we can write

E[|a(1 − q)A12 XBX′A

12 |h] =

p∏

j=1

E(xhj ) (7.7.22)

where x j is a real scalar type-1 beta with the parameters

(

α +r2− j − 1

2,β

1 − q+

p + 12

)

, j = 1, ..., p.

7.7.3. By going through the procedure in Exercise 7.7.2 show that, for q > 1,

E[|a(q − 1)A12 XBX′A

12 |h] =

p∏

j=1

E(yhj) (7.7.23)

where y j is a real scalar type-2 beta random variable.

7.7.4. Let q < 1, a(1− q) = 1, Y = XBX′, Z = A12 XBX′A

12 where X has the density

f (X) of (7.7.1). Then show that Y has the non-standard real matrix-variate type-1beta density and Z has standard type-1 beta density.


7.7.5. Let q < 1, a(1 − q) = 1, α + r2 =

p+12 , β = 0, Z = A

12 XBX′A

12 where X has

the density f (X) of (7.7.1). Then show that Z has a standard uniform density.

7.7.6. Let q < 1, α = 0, a(1 − q) = 1, β1−q =12 (m − p − r − 1). Then show that the

f (X) of (7.7.1) reduces to the inverted T density of Dickey.

7.7.7. Let q > 1, a(q − 1) = 1, Y = XBX′, Z = A12 XBX′A

12 . Then when X has

the density in (7.7.1) show that Y has the non-standard matrix-variate type-2 betadensity and Z has the standard type-2 beta density.

7.7.8. Let q = 1, a = 1, β = 1, α + r2 =

n2 , Y = XBX′, A = 1

2V−1. Then show that Yhas a Wishart density when X has the density in (7.7.1).

7.7.9. Let q = 1, a = 1, β = 1, α = 0 in f (X) of (7.7.1). Then show that f (X)reduces to the real matrix-variate Gaussian density.

7.7.10. Let q > 1, a(q − 1) = 1, α + r2 =

p+12 ,

β

q−1 = 1, Y = A12 XBX′A

12 . Then if

X has the density in (7.7.1) show that Y has a standard real matrix-variate Cauchydensity.

References

Bochner, S. (1952). Bessel functions and modular relations of higher type andhyperbolic differential equations, Com. Sem. Math. de l’Univ. de Lund, Tomesupplementaire dedie a Marcel Riesz, 12-20

Constantine, A.G. (1963). Some noncentral distribution problems in multivariateanalysis, Annals of Mathematical Statistics, 34, 1270-1285.

Herz, C.S. (1955). Bessel functions of matrix argument, Annals of Mathematics,61 (3), 474-523.

James, A.T. (1961). Zonal polynomials of the real positive definite matrices, Annalsof Mathematics, 74 , 456-469.

Mathai, A.M. (1978). Some results on functions of matrix argument, Math. Nachr.84 , 171-177.

Mathai, A.M. (1993). A Handbook of Generalized Special Functions for Statisticaland Physical Sciences, Oxford University Press, Oxford.

Mathai, A.M. (1997). Jacobians of Matrix Transformations and Functions of Ma-trix Argument, World Scientific Publishing, New York.

Mathai, A.M. (1999). An Introduction of Geometrical Probability: DistributionalAspects with Applications, Gordon and Breach, New York.


Mathai, A.M., Provost, S.B. and Hayakawa, T. (1995). Bilinear Forms and ZonalPolynomials, Springer-Verlag Lecture Notes in Statistics, 102, New York.

Mathai, A.M. (2005). A pathway to matrix-variate gamma and normal densities,Linear Algebra and Its Applications, 396, 317-328.

Mathai, A.M. (1978). Some results on functions of matrix argument, Math. Nachr.84 , 171-177.

Mathai, A.M. (2004). Modules 1,2,3, Centre for Mathematical Sciences, India.

jacobians of matrix transformations

Documents