on the classification statistic of l~tald · on the classification statistic of l~tald by ... 12 12...
Post on 30-Apr-2018
215 Views
Preview:
TRANSCRIPT
ON THE CLASSIFICATION STATISTIC OF l~TALD
by
Mohammad IqbalUniversity of North Carolina
This research was supported in part bythe Office of Naval Research under ContractNo. NR-042031 for research in probabilityand statistics at Chapel Hill. Reproductionfor any purpose of the United States Government is permitted.
Institute of StatisticsJvTimeograph Series No. 159November 1956
ERRATA SHEET
6(11)
141(6)
NotationiV(18)4(12)
13(2)
19(9)
21(5)
21( 6)
22(1)
92(3) will mean page 92, line 3 •Replace m = m3 by w = I m3 IReplace Z by z .
RePlace.//Nl+N2 byV NI N2
2 2Replace ~~ - m3 ~ 0 by ~m2 - m3 ~ 0
Re ad N( e ) as N •2 e
Replace (1- ~ by (1 _ ~)2 •n n
Replace 28 by a~
02gRead as , Q'
2 '0 all t'
23(1) From - ~n to the end of line 2, is to be enclosed in squareh brackets.
24(6) Put) after ~.n
27(!~) Put t"./ between II and III .
29(18) Replace f(x) by If(x) I and 0 :: c :: 00 by 0 ::: c < 00
48(7) Insert a multiplier ~ on the right.51(5) Replace 16 by 64 • (and this corresponds to p = 2.)62(9),113(7) Replace nm
3by Inm
3\
62(18), 92(3),115(7) Replace m3
by Im31 ~
69(1,5) Replace 1¢(t)-¢(t)1 by I¢(t) - ¢(t)l.
70(14), 88(16) Replace e- j bye-V•
81(9) Put dv: after Ie (v)m
85(7) Read 'yJ (v) = • 290(5) Replace ~ by ~ •103(5) Read l: as 1: •
T) n
119(15) Replace V > nm3 by IVI > Inm31 •_J.'X-2 -1 A.2
2 2135(11) Replace e by e139(18) Replace suffices by suffixes •140(6),(10) Replace r by y.
a ij ) 2a ijReplace -=r by
aij d atj
Read )(i as Jtl and 145(7) Read a;~ as aijz i •
•\
A C K N a ~ LED GEM E N T S
I wish to put on recqrd my deep sense of gratitude to
Professor Harold Hotelling for his inspiring guidance throughout
the preparation of this work. I feel myself greatly honored on
having had the privilege of working under his direction.
I am also greatly indebted to Professor R. C. nose for the
confidence derived through his encouragement at various stages.
My thanvs are also due to the Fulbright Foundation in Pa1(istan,
the Institute of International Education and the Office of Naval Re-
search for their financial assistance which made this study possible.
The help of Mrs. Kattsoff, Mrs. Spencer and Mrs. Kiley for the
careful typing of the manuscript is gratefully acknowledged.
Mohammad Iqbal
•
I®O scientific investigation can be final;it merely represents the most probableconclusions which can be drawn from thedata at the disposal of the writer. Awider range of facts or more refinedanalysis, experiment and observation willlead to new formulae and new theories.This is the essence of scientific progress.1!
Karl Pearson 1898
•
TABLE OF CONTENTS
CHAPTER
iii
PAGE
. . . . . . . . . . .
I.
ACKNOWLEDGEMENT
INTRODUCTION
A P~OBLEM OF CLASSIFICATION CONSIDERED BY WALD . . . .
ii
vii
1
1. Introduction. • . .. • .
2. Statement of the problem •.
3. An example of its importance
4. The statistic proposed by ir/ald
5. Further work on the problem
· .
. . . 1
3
5
5
9
II. ON AN ASYMPTOTIC EVA,LU~TI01\T OF /1 TRIPLE INTEGRAL.
1. Introduction
2. The integral and its domain
3. Order of the variables ml , m2 and mj
).+ • An important limit . · · ·5. A triple integral . · · .6. The integral over ~, an asymptotic approximation
7. An upper bound to error • · · · · ·8. The integral over D2 • ·9. An upper bound to the value of 12 • ·
10. Comparison of II and 12 . . · · . . .11. The integral over the domain D* • · . . . .12. Summary of Chapter II · · · .
12
12
12
14
18
25
29
34
43
48
51
53
5rr
CHAPTER
III. ON THE ASYMPTOTIC DISTRIBUTION OF T~ALDI S CL .'\SSIFIC:,-
iv
PAGE
TleN ST'TISTIC ••••
1. Introduction.
• • • • 0 oJ Q • 60
60
? lATald I s approximate classification st.9tistic andits moments • • • • • • • • • . 61
3. The asymptotic distribution of v for p = 2m
4. An integral equation due to Wilks
5. A note on Bessel functions
6. Distribution of v for odd values of p
66
75
76
79
7. The use of a differential equation in the evalua-tion of an integral • • •• ••••••.•••• 81
8. The asymptotic distribution of v for even andodd values of p ••••••• • • •• 82
9. Note on the construction of tables •. . ,. .. . . . . 88
10. Summary of Chapter III ••.••••• . . • . • 91
IV. AN~SYl'1PTOTIC SERIES EXP:\NSIOl\T FOR THE DISTRIBUTION OF
. . . .1. Introduction ••
2. An asymptotic series for the distribution
92
92
93
3. The constant of integration for the first approxi-mation • . • . • • • • 108
4. The tail are8S for the first approximation 110
5. Comparison with the results of Chapter III 113
6. Summary of Chapter IV 115
..
V. THE APPLIC'TI0N OF TCHEBYCHEFF-M,iRKOFF INE0UALITIES
v
6. Some results due to Tschebycheff and Narkoff
2. The integral over Dl
3. The integr al over D2 · . . · • ·4. The integral over D · . . . · · · .5. Moments of V • . . . . . . · · ·
TO A SPECT~L CASE • •
1. Introduction
. .
· .
116
116
lIS
117
119
119
120
7. tpplication of Tschebycheff-Markoff theorems tothis case . . . . . . . . . . .. 123
• • • • • • • • • • • • It • • • • •VI. NON-NULL C:\SE
1. Introduction • • • • i • • • • • · .128
128
2. The joint distribution '" • • ;, • • • • • •• 129
3. Note on confluent hyper geometric functions · · ·4. ~n asymptotic form of f(~, m2, m
3) . . .
5. Distribution of U for large n and p ::: 1, anindependent approach · . . · · · · · ·
6. The asymptotic mean and variance of the stC3tisticU . . . . . .. . . . . . · . . · · · · · ·
7. Correction term for the variance of the lineardiscriminant function · · . · · · ·
VII. SOME REL:ITED UNSOLVED PRORLE11S
130
132
133
137
145
148
1. On classification statistics of ~rTald and Anderson 148
2. The quadratic discriminators
Possibility of a differsnt approach · .148
150
CH'PT~
4. Efficiency. . . . • • •
5. The gre ater me"ln vector
BT~LI00R~PFY . . . . . . . . .
vi
P·:GE
150
151
152
..vii
INTRODUCTIONl
In his paper rtOn a statistical problem arising in the classi
fication of an individual into one of two groups!! L-50_72, the late
Professor ~braham Wald Mude an attempt to put the theory of discrimi
nant functicns on rigorous mathematical foundations. He demonstrated
p p ..lJ (- -)U = ~ Z s z. y,-x
jis the usual discriminant function with the
i=l j=l 1 J
by usi~g very ingenious geometrical arguments spread out over several
lemm,9s that a function V = nm3! L-(1-~)(l-m2)-m~ _7 can be taken as
the classification statistic instead of /NIN2 U, where!~~! Nl +N2
population parameters replaced by their sample estim~es, and Nl and N2
are the sizes of the two samples from the two p-variate normal popula-
tions, and n = Nl + N2 - 2. Weld also obtained the joint distribution
of ~,m2 and m3 - f(~, m2, m3).
It would be desirahle to obtain, in a usable form, the distri
bution of V from f(~, m2, m3
). Such a simplification appears
extremely difficult. It is related to the problem of the non-central
lNishart I s distribution for which T. llJ. :,nderson and M. :'. Girshick
were a~le to obtain manageable expressions only for two or less
variables. It seems that this general distribution of the discrimi
nant function, or the classification statistic as Wald calls it,
ISponsored by the Office of Naval Research under the contract forresearch in probability and statistics at" Chapel Hill. Reproduction inwhole or in part is permitted for any purpose of the United States Gov~
ernment.
2The numbers in square brackets refer to the bibliography listedat the end.
·e
viii
would involve the figurative distance, 6, between the centers of the
two populations. One approach to this highly involved distribution
would be to obtain a series of powers of 6 with each coefficient in-
volving nand p. The present work is chiefly concerned with the
examination of the first term of this series with special attention
to its value when n is large.
In the first chcp ter of the present work, a brief historical
introduction to the theory of discriminant functions is f~llowed by
a mathematical formulat ion of the problem following lrJald. The re-
sults obtained.by him and also by some subsequent workers on the orob-
lem are briefly described.
The next two chapters deal with the problem of findiDg the dis-
tribution of V in the null case, by supposing that n ~ Nl +N2 - 2
is a lar~e number. Explicitly this problem can be stated as follows:
p-3 n-p-l
Given f(ml,m2,m3)dmldm2d~= Canst. IlVl j2l1-1'11~ d~dm2d~
to find the distribution of V suitable
It will be noticed that the sample sizes, Nl and N2, do not
separately occur in the joint distribution and the assumption that n
is large, which is obviously milder than the assumption that Nl and
N2 are both large, introduces certain simplifications. One mmplifi-
cation that is obtained is that the statistic itself approximates
...
..
ix
because of the order in probability of the variables entering into the
distribution. The same ass~ption entails simplifications both in the
integrand and in the domain of integration.
In the second chapter, which can be regarded as dealing with
the mathematical aspects of the prcblem, methods have been developed
which will enable us to evalu ate triple integr als giving the moments
An upper bound to the error in using these simpli-
fications is also worked out which enables us to put reliance in the
aporoximations in suttable cases.
The third chapter deals TAith theaaymptotic distribution prob-
lem. 4fter £inding an expression for the kth moment of v, we obtain
the asymptotic distribution of v both for even and for odd values
of p. For even values of p, the uniqueness of the distribution, which
i~obtained by the help of its moment generating function, is also es-
tablished. For odd values of p, use had to be made of an integral
equation due to S. S. Wilks /-55 7; and, because of the fact that we- -are considering only the principal term in the kth moment.f v, the
uniqueness of the result cannot be guaranteed. This section is there-
fore presented on a heuristic basis and has to be left for further
discussion and rigorization.
In Chapter IV we have obtained an asymptotic series for the dis
tribution of w = Im31 ,which is proportional to v. This is done
by observing that fora fixed w the range of integration for ~ and
m2 is a lenticular region enclosed by two hyperbolic arcs in the plane,
x
w c a constant. Integration is carried out over this region by using
suitable transformations, and the first three terms of the asymptotic
series are obtained. For the first approximation, we have also dis-
cussed the method of finding the tail areas.
Chapter V deals with the special case Nl + N2 = 20, P = 3•
. In this case, the first seven moments of V are found, and use is
made of the inequalities due to Tchebycheff and Markoff in setting up
bounds on probabilities of the type p(V ~ ~)~ These limits are rather
crude due to the fact that a small numher of moments is being used.
The example, however, illustrates ene way of proceeding to disc~ver
something about an unknown distribution when its first few moments are
known.
Chapter VI contains a few remarks on the non-~ull caso. It
starts with expressing the joint distribution of ml , m2 and m3
dis
cussed by Sitgreaves /-45' 7 in another form suitable for large n.- -This chapter also contains a brief discussion of the asymptotic distri-
bution of U for p = 1. In the next section we exemplify the differ
ential method by finding the mean and variance of U. The concluding
section of this chapter deals with finding the variance of the linear
discriminant function when the sampling fluctuations of the means are
tck en into account. In the last chapter are listed a few unsolved prob-
lems related to the problem of classification.
CHAPTER I
A PROBLEM OF CLASSIFICATION CONSIDERED BY WALD
1. Introduction.
The problem of classification ie the problem of assigning an
individual (or an element), on which a set of measurements is avail-
able, to one of several groups or populations. The problem admits
a simple solution when the distributions of measurements in the al-
ternative populations are completely kncwn or what is the same thing
as saying that the sizes of the samples available from the various
populations, on the basis of which we have to make a decision, tend
to infinity, so that the sample estimates of the parameters tend
stochastically to their population values. If, however, the samples
are not +arge, the problem becomes rather complicated~
Research in this area of Multivariate Analysis was started wi. th
his introduction of the linear discriminant function by Sir Ronald A.
Fisher L-IO_7 in 1936. The linear discriminant function is
PD = Z f.zi, in which z = (zl ••• zp) is the new observation, and
i=l ~
the coefficients Ii' following Fisher, can be obtained by maximizing
the square of the difference of the expectations of D in the two
populations divided by the standard deviation of D. The linear dis-
criminant function provides the best solution of the problem of classi-
fication provided that,
(1) The number of alternative populations is two,
(2) The form of the distributions in both populations is
2
multivariate normal,
(3) The parameters are all known,
(4) The covariance matrj.ces of the two populations are equal.
It may be remarked that Welch L-53_7 observed that even
without making any assumptions of normality or equality of covar-
iance matrices, the problem of obtaining the best function to dis-
crim:i.nate between t1>10 completely specified populations may be
solved. He demonstrated that the desired function is simply the
ratio of the two probability distributions, and the criterian level
to which this function is referred is deducible either from Bayes l
Theorem with given a priori probabilities or by the use of a lemma
by Neyman and Fe arson L-)2 7 when the errors for the two hypotheses
are minimized in any given ratio. He proved that under the four
assumptions stated nbove the function obtained in this manner is
identical with the linear discriminant function,
Von Mises ~3l_7 considered the problem of classification
when the number of populations is m, and showed how to subdivide the
s3mple space into m parts so as to minimize the maximum error of
misclassification.
Rao /-39 7 gave explicit Bayes solutions with given a priori- -probabilities or ratios of errors for the alternative populations,
and discussed the construction and use of doubtful regions and re-
lated problems.
3
In all these cases it is assumed that the distributions are
completoly specified. If, however, as will frequently be the case,
one cannot justify the supposition that the distributions are com-
pletoly known, and the only information at hand is what is contain-
ed in the samples available from various populations, we run into
rather complicated distribution problems.
W':lld L"1;o _7 in 1944 set out to solVG the problem of classi-
fication for the case of two altern[ffiivG populations. Instead of
using a distri.bution-free approach he si.mplified it further by in-
troducing the following two restrictionsl
(1) The form of the distributions is multivariate normal.
(2) The two populations have the same covariance matrix.
Though it would be desirable to solve the problem without.
making either of these assumptions, still one can argue that in
many practical problems arising in numerous fj.elds of scientific
inquiry it is not unreasonable to make the two assumptions stated
above.
In this chapter we propose to give a mathematical formula-
tion of the problem, and to state the conclusions of Wald, and of
subsequent workers on the problem.
2. Statement of the problem.
/#
xll x12 . · • xlp
(2.1) Let Xx2l x2p
=
. • . . • ·xN 1 · • x
1 NIP/"
4
,./
Yll Y12 · • · YIp
Y2l · • · Y2p(2.2) and y=
· · •
YN 1 · · · YN P2 2·
be two random samples from two variate normal populations )(x
and jf both having the same, though unknown, covariance matrixY
E, and different Unknown mean vectors
Let
and
respectively.
be an observation on a new individual which is known to have come
either from lTx or from Jry' but is distributed independently
of both x = (xl'" xp) and Y = (YI ... yp)' the two sets of
variates corresponding to 1T and 1T respectively.x . Y
On the basis of the information supplied by X, Y and Z the
such that if the probability of one type of misclassification is
held fixed, the chnnce of second typo of misclassification is min-
imum.
3. As an example of the importance of this problem we can con-
sider a candidate applying for admission to an institution with
certain test scores. He may have t~ be accepted or rejected de-
pending on his chances of success or otherwise on the basis of the
scores of candidates admitted in previous years.
4. the statistic proposed by 1liald.
(4.~ Wald considered as classification statistic
p P i'J (- -)U = Z Z s z. y. - x ji=l j=l ~ Jobtained by considering this problem
as one in testing the hypothesis
tive that
H: Z&}Tx "x against the alterna~
and by replacing the population values of the parameters by their
optirnum estimates obtained from the samples in the statistic ob-
tained by using the fUl)damental lemma of Neyman and Pearson. Thus
'where
(4.2)
and
e.
-x. =~
Nl 'I"~Ex N
a=l iat 1
The statistic U can be rewritten as
6
where
(4.5)
and where
and
are distributed independently of each other according to p-variate
normal distributions with
E (z) =
and
{: if z e Jtxif z e rry
,
and with the same covarinnce matrix /-a.. 7.- l.J-
Since the Sij are distributed independently of the set
( ~~
zl •.• zp' zl
if we define
•••~}
z ), the distribution ofp
n 2 /s .. = Z t
i/ n ,
l.J 0;..1 aI
U remains unchanged
where
and writing
bution of U
n => Nl + N2 - 2
i' 1(s J) for (s, ,)- , W31d observed that the distri~J
is the same as that of
7
(4.7)p
V == Ei=l
P 'jZ s~ t t. 2
, 1 i,n+l J ,n+J"'"
whore the probability element of t ia is given by
1/- p n
t:(4.8)(2n)P(2+2)
exp /-~ Z E +
L i=l a=l ~a
7 p n+2p
)2+P
i)2Z (t - Z (t - / IT "IT dtiai=l i,n+1 i i=l i,n+2 _/ i=l a=l
whcO're
(4.9) p= Cf'1 p 2 . . . p )p
~ = (t. t tp)- 1'''' 2 . . .
are certain functions of ~i' vi and Gij , 1, j ~ 1, 2 •.• p.
Here Wald introduced two sets of numbers (ul ••• un+2) and
••• ' v 2)n+ satisfying the relations
(J.J. .10)n+2 2 n+2 2Zu=Zv=l
a=l a, a=l aand
n+2Z u v = 0
a,=1 a a
~nd using a very ingenious geometrica~ argument, concluded that the
distribution of V
(4,11)
where
(4.12)
is the same as that of
eu2m == ~
10:=1 a:
and the joint pl'obabi..1ity distd.bution of ~,m2 and ID3
is given by
,/ r ~ n+?-lJ.r11
, , , \ --.,.-Ip ).E
, . .dm1om?dm
3•
r p1 r. . • . pp
"'~ /
t'lhere r ijp
= Z t. t ja=1 ~a aand g is the constant of integration, in
9
tho domain 0 ~ ~ ~ 1, 0 .: m:, :: 1, - v'~m? ~ m3 :: v'~~ , and
zero othEJrwise;
where
and
F (t) ~.k
1 ,
5. ~rther ·W~rk on the problem.
Anderson [' 1;..7 considered the statistic
(5.1) ij (- -) 1 ij (- - )(- - )W=~ Z s zi Yj-XJ, - ~ Z ~ S Yj+X. Yj-x.
i j ~ i j J- J
which is much like U, since it differs from U by terms independent
of Z =(Zl ••• zp)' the measurements observed on the new in
dividual. He c;valuated the expected value of the matrix of non-
central Wishart variates occuring in the joint distribution of ml
,
m? and m3
in the special case when
(5.2)
Sitgreaves L-45 _7 gave an analytio derivation of the dis~
tribution of W in the case considered by Anderson ond also obtain-
ed exactlythe constant of integration in the joint distribution
10
of ~m2(l13. We shall refer to the following result from her paper
in the next chapter.
where
,
where
> 0- !I-MI> 0
and ~ = 6' Z-15, and where k1k2 are defined in (5.2) •
Earlier Harter L-18_7 in 1951 corisidared the joint distri
bution of ~, m2 and m3
of (4.13) in the degem~rate case
Pi = 0 == ~ i ' i == 1, ,.. p and obt ained the approximate distri-
bution of m3
in the special cases
11
(I-a) n even, p odd
(I-b) n even, p even.
The technique he used in deriving this distribution was
ossontially exp,~ding the two binomials constituting the integrand
in the joint distribution of ~, m2 and m3 of (4.13)in tile degen
erate case, and integrating with respect to ml and then with re
spect to m2 , The number of terms in the distribution of m3
thus
obtained deponds on n, which is not a small number in any practical
situ3tion. Moreover the solution thus obtained is not an asymptotic
series in which the leading terms could be considered as approxi-
mating the true distribution for large n.
The latest paper in historical order of development of the
theory of discrimination is that of Rao /-40 7, in which he devel-- -oped some general methods by using the ic1e.as of sufficient statis
tics and fiducial probability distributions, by using Which, the
discrimination problem can be solved utilizing only the sample in
formation. 'The distribution problems connected with the test
criteria suggested in the p3per have, however, yet to be tackled,
CHAPrER II
ON p~ ASYMPTOTIC EVALUATION OF A TRIPLE INTEGRAL
1. Introduction.
The integral with which we shall be concerned in this chapter
is the one obtained from the joint distribution of ml , m2 and m3
,
given by Wald ~50_7, by putting Pi = 0 = ~i' 1 ~ l,2, ••• ,p.
For the sake of convenience, we shall refer to this case as the Central
Case or the Null Case. In this chapter, we shall find the value of
the integral for large values of n, which is equal to Nl + N2 - 2, by
introducing certain simplifications both in the integ~and and in the
domain of integration. Justifications shall be given for the simpli-
fications introduced, and the final result shown to be a valid
asymptotic apnroximation in the sense of Poincare. Moreover, an upper
bound for the error involved in the asymptotic approximation will be
found.
2 • ~ integral and ~ domain.
The triple integral to which we refer corresponds to
o
over D
elsewhere.
where the domain D is defined by the following inequalities which
insure a real, positive integrand in its interior.
13
(2.2) D:
The ine~ualities in (2.2) show that the domain is bounded
by two right elliptical cones in three-dimensional space having
vertices at (0,0,0) and (1,1,0) respectively and having a common
base in the plane ml + m2 = 1.
We def~ne two other domains Dl and D2 as follows:
(2.3)
O~ml~l
°~ ~ ~21mlm2 - m3 ?:: °m1 + m2 ~ 1
°:s ml ::; 1
0~m2=S1
(1-ml )(1-m2)2
- m > °3-
(2.4) Then it is easy to see that D =Dl + D2, except for the set
of points lying on the plane ml + m2 = 1, which are counted twice.
The truth of this statement can be seen easily by noticing
that the regions defined by the two domains are the interiors of
two cones, one obtainable from the other by a simple transformation
14
and lying on oppos i te 6 ides of the plane ml + m2 = 1 in the space
of three dimensions. Moreover, except for the points lying on the
plane ml + m2 = 1, the two regions are mutually exclusive because
the point set corresponding to Dl
lies on the origin side, whereas
the other corresponding to D2 lies on the non-origin side. The fact
that Dl and D2 between themselves include all the points of D can be
seen by observing that
2m~mr, - m3 2: 0,
.L c: 2(2.5) > (1-ml )(1-m2 ) - m3 2: OJml + m2 ~ 1
(1:'m1)( I-m2) 2and - m3 2: 0,
":> m1m2_ mC: > 0(2.6) ml + m2 2: 1 3- ,
~lhere ====~> is read as "imply".
As a consequence of this result, we can find the value of an
integral over D by adding up its values over the two domains Dl
and
D2 • The fact that points lying on the plane ml + m2 = 1 have been
taken twice would not make any difference because they form a set
of Lebsegue measure zero.
3. Order ~ the variables ml ,m2 and m3
,
To examine the order of the variables mI' m2 and my we have
15
first to define them following the original paper of Wald L-50_7.For the sake of clarity, therefore, we add the following paragraph.
Denote by S the 2n + 1 dimensional surface in the 2n + 4
dimensional space of the variables ul , •••un+2, vl , ••• vn+2 defined
by the following equations:
~2 n+2 22Zu =L v = 1~=l ~ ~=l ~
(3.1) n+2~ u~v~ = 0.
~=l
Let u1 .•.un+2, vl .••vn+2 be random variables whose joint
probability distribution function is defined as follows: the point
density function is defined by
dSps·s
Then for any subset A of S, the probability of A is equal to
the 2n + 1 dimensional value of A divided by JdS. It should be
noted that the probability density function (3.2) is identical with
the probability density function we would obtain if we were to assume
16
that ul , ••• ,un+2, vl , ••• ,vn+2 are independently, normally distributed
with zero means and unit variances and calculate the conditional den-
sity function under the restriction that the point (u1, ••• ,un+2,
v l ' ... ,vn+2 belongs to S.
Variables ml , m2 and m) which are equal respectively to
P 2i: ut3 '
(3=1
r 2L; v(3
13=1
can be redefined by using (2.1) as follows:
P 2 n+22ml = r u / L uf3
13=1 13 f3= 1
P 2n+2 2(3.3) m2 = L vf3 / I:. v
13=1 13=1 13
p / (n+2 ·2 n+22
m3
= r u v / L u13
l. vf3 ,(3=1 13 13/ V 13=1 f3=1
where p is the number of variables and n =
of degrees of freedom.
N + lIt - 2 is the numberr.::
With this explanation about the variables entering into the
discussion we shall prove the following theorem:
Theorem (1). The variables ml , m2 and mj
defined in (3.3) in
terms of u. and v., i=1,2, ••• ,n+2, which are N(O,l), variates; are of1 1
order n-l in the probability sense.
17
Definition. We write XN = 0 L:f(N} 7, and say that ~. is ofP I - IIJ,
probability order 0 L:f(N)_7 if for each € > 0 there exists an
A€ > 0 such that L:P I xN I ~ Ae f(N)_7 ~ l-e for all values of
N > NO(e),
Proof of the theorem:
Since u. and v., i=l, ••• ,n+2, are all independently, normally~ 1
distributed with zero means and unit variance,
(3.4) p-l )n-p+1mi
(I-m. dm., 1=1,21 1
since each of m1
and m2
is of the form
Thus
E(mi ) = -R- andn+2
p(n-p+2 )
(n+2}2(n+3)= o (12)
n
and by Tchebycheff's inequality, namely
(3.6) P
it is immediately seen that for given € there exist k l and k2
such
that
18
Hence ml and m2 are of order ~ in probability.
To see that m3
is also of order ~ in the probability sense,
we note that
n+2 2 n+2 2L:: u' L v·
13=1 (3 13=1 13
But
(3.10) therefore
P 2Y: u
1313=1n+2-
2L u
13=1 (3
P 2L v
,8=1 13n+2 2
E v13=1 13
(3.11)
From this, by noting that ml = 0 (!) and m = 0 (l), we concludep n 2 p n
that1m
3= 0 (-) •p n
4. An important limit.
In this section we shall prove a result which will be he1p-
ful in finding asymptotic values of triple integrals of the type
19
iff m VIm V2m v3 Iml m31 a Il-ml m3 Ib. 1 2, m, m2 m, I-m2
D
where b is a large number. The result can be stated as
Theorem 2. If ml , m2, m, are random variables as defined in
-1" each depending on n, and each being of order n in probability,
then
(4.2)pUm
n -> 00
1
Proof: We shall replace ml , m2 and m, by ~, ~ and *respectively.
The variables 0, ~ and r are therefore of order one in probab11-
ity. This means for a given €; there exist numbers N(e), Ale' A2€
and A3e
, such that
for n > N- e
If in (4.3) each of Ale' A2 € and A3€ is replaced by Ae = max (Ale'
A2e, A,e)' the inequalities will still hold. In terms of these
variables we have to show
plim
(4.4)n --i> 00
2 n_ ~+13 + OB-r 7
n 2-n
e -a-f3 = 1
(4.6)
". ./
To show this we consider the;funccion
g(a,~,y) = log f(a,~,y)
20
and expand it by Taylor's Theorem, with a remainder after two
terms, namely
(4.7) g(o:,t3,y) = g(O,O,O) + (a.~ + f3'£- + y ~ )g(o,O,O)
o 0 ... 2+ (ex cti + f3 ~ + 1;Y) g(G,4>,r),
where
°<4;4> < f3
O<W <y
We have
2g(a,~,'Y) = a + ~ + n log £1 _ a~t3 + a~-~ _7 ,
n
so that
o 1.I3/n~=l- ... . 2
(1 - ~) (1 - ~) - r2n
a:og 1 ndi3 = 1 - ------~2
(1 _ ~)(l _~) 7n n --2
n
,
21
Also
2- 1:.(1 - ~)n n
,
2~2 - l (1 - ~o g n n ,dt)2 = 2 2
Ii(l - £) (1 - f?) - l.r; 7n n <::-n
2g If _ £ _ ~ + 213+ 7 7
2 n k~ n n 2-d g ndr2 =--------2--
2
I( 1 - ~) (1 - ~) - 12_7n
,
22
rl 2 :t.de; - Y /n-'-~ = ------------OCOfJ
(1 _~)( _ 2~)2 n cd g _ ....;... ~n~ _~-
.(1 _ £)( _ _21')
d2 n 2g ....;;;n;.....- "1>
~7 - 2 2ex f3 "I£(1 - -)(1 - ~) - {? 7n n n·-
,
dg dg dgNow g, di ' diB ' or are all zero for (a,f3,I') = (0,0,0).
Thus (4.7 ) gives
(4.8)
where the value of each of the derivatives involved is calculated
(4.9) 1 ci $2- ~{l
41 2R2 = --(l - ....) - -) +
r' 2 . 2n n 2n ng 41 '!rt::-
Ll - - - - + Q<I>- 7n n 2_
n
23
21(1n
_ t )n
(4.10)
Using the inellualities of (4.3) in (4.9) we get
2 22A A
(1 - Ii" - -2 )n
and
in the probability sense; where the two expressions inside the
brackets in (4.10) are calculated from (4.9) by c~nsidering the
fact that R2 may be positive or negative. Since A is finite and
independent of n , therefore both the expressions tend to zero as
n tends to infinity or
PUm R2 = 0
n > 00
24
Hence
2 n R2PUm eo.:+t.3 Tl _ a+f3 + et(3-1 7 = Plim e = 1L n 2-
n
That is, _ Ci+f3 + Cf,f3-y21 nn 2
nis asymptotically equivalent to
e ~-f3 ~n the b bOlot~ pro a 1 ~ Y sense. Note: An alternative proof of the
statement (4.2) shall be provided if we are able to prove that
(4.12) plim fn log (1
n:->co
c h- - + - +n 2
n
2where c stands for Cf, + (3 J and h = Cf,f3 - y and Ct,(3 and yare res-
tricted by the conditions (4.3).
It is easy to verify that
(4.13) -x xn:x ~ log (1- ii) $xn
x 3- -2
2n
in which the lower bound is written by observing that
2 3 2 3x x x x x x x- log(1 - Ii') :; n+ ~ + ~ + ••• ~ n + (n) + (n) + ••• ,
2n 3n;)
~'5
and both the -limits coverge to zero because of the restrictions
(4.3). This proves the statement (4.12) and hence the Theorem.
5 . ~ triple integral.
In this and the remaining sections of this chapter, we shall
confine ourselves to the study of the integral
(5.1) I = ffJD
where D is defined in (2.2) • To find an asymptotic approximation
to the value of I we first write
I = 11 + 12 '
26
where 11 and 12 denote the values of the integral over Dl and D2 -
To find 11
, we shall first evaluate the integral
IfJ
and then find an upper bound to
that is, an upper bound for the error committed in replacing the
n-p-l2 2
the factor ~(l-ml)(l-m) - m3 _7 in the integrand by
n- -(m + m2 )
e 2 1 Using (5.3) and (5_4), we can state that
It will then be demonstrated that both the error committed in
approximating 11 by III and the value of 12 are negligible as com-
pared to the least possible to value of I1
- Mathematically
Elim I ._ E = 0
11n->oo
27
and
lim
n -~ coI - E11
= 0 •
(5.8)
As a consequence of (5.5) and (5.6) we can write
and as a result of (5.1), (5.7) and (5.8) we can write
(5.9)
This will be the general line of argument to be followed in
obtaining an asymptotic approximation for the value of I.
It would appear from Theorem 2, proved in section 4, that
n n-p-l- 2'(ml +m2 ) 2 2
the approximation e for Ltl-ml )(1-m2)-m}-7 is
valid only in the domain D*C:Dl which is such that throughout D*
the variables m., 1=1,2 and 3 are 0 (!). We shall, however, work in~ p n
terms of the division Dl and D2 of the total domain and use the ex-
ponential approximation over the whole of Dl because of certain sim
plifications which result. Justification of the results thus ob-
tained is prOVided by two factors:
(1) The integrand shows that almost the whole of the density
is concentrated in that part of the domain Dl which is close to the
origin. */In fact, if we define a domain D LD by the inequalities
2 > 0ml m2 - m,
(5.10)Aml + m2 < - ,n
*then it 1s shown in section 11 that D contains almost the whole of
the density. This is probably the main reason why the exponential.x-
approximation, which is true over the domain D , gives close results.
(2) The discussion on the upper bound to error given in
section 7 actually proves that the loss of accuracy in using
n n-p-l- '2(ml+m2 ) 2 -2-
e instead of ~(1-ml)(1-m2)-m)-7 is negligible when
n is large.
It may also be remarked at this point that the exact value of
I 1s known from Sitgreaves ~45_7. It would appear obvious, there-
fore, that the as~~ptotic value of I could be obtained from the
one given by Sitgreaves by using Stirling's approximation to r(x).
This would no doubt hold true provided we were interested merely in
the asymptotic value of I. The reason, however, for our following
an independent approach is that we are interested in finding the
solution to a distribution problem. 'l'he techniques ~nd simplifica-
tions used in' the asymptotic evaluation of I, which emerge mainly as
a result of the supposition that n tends to infinity, will be used
in evaluating the limiting moments of a certain statistic, to be
called Wald's approximate classification statistic. This distribu-
tion problem will be our subject of discussion 1n chapter III.
6. The integral over Dl1 an asymptotic approximation.
ffp-3· E.:f!
(6 .1) Let Ii • I ("'1m2-m~) 2 lfl-"'J.)(1-m,J -m;"7 d"'J.dm2dm3,
Dl
where Dl is defined by the following inequalities:
2 > 0m1m2 - m3m1 + m2 < 1
(6.2)0 ~ m1 .< 1
0 ~ m2 < 1 j
and where m1
= 0 (1:.) for 1=1,2, and 3. We shall replace the secondp n
n- 2(ml+m2 )
factor in (6.1) bye, but the operation of integration
after this replacement needs some justification. There is no loss
of generality if we consider a similar univariate case and prove
that it is possible to replace a binomial raised to a large power by
an exponential factor to which it increases. We shall state this
result formally as
Lemma. Let rex) be a function of the real variable x, such that
Then
rex) < A if 0 =:; x ~ c, where 0 < c < 00.
30
jC x n dx. (1 - -) f(x)
(6.3) limo n
1.=cn ~> 00
~-x f(x) dxe
Proof
c_ ~)n(6.4) Let Id =J f(x) F(1 -x 7- e _ dx.
0n
Then (6.3) states that lim I Id I = o.
n -> 00
It is known that
-x x)no < e - (1 - - <- n-
2 -xx en
(see, for instance, Whittacker and watson ~54_7, page 242). There
fore, using this and condition (1), we get
(6.6)2 -xx e
n <2 -xx e
n
This shows that for all c
(6.7)
The quantity on the right hand side of (6.7) is positive and tends
to zero as n increases. Hence (6.3) is established.
We shall rewrite (6.1) as
31
(6.8)
and evaluate Ill' an asymptotic approximation to I l • This will be
followed by a Section on an upper bound to the error in using III
in place of I l •
To integrate with respect to m3
we first put
Thus
(6.l0)
Integration with respect to t~gives
(6.11) if
(6.12)
Making the transformation
2ml = Z cos 9,rj
m2 = z sin'- G,
we have
d{m1m2 )------- = 2z sin G cos G ;
d(z,9)
and so (6.11) becomes
(6.13) 2/z=o
rr/2
J9=0
n- -z p-l p-1p-l 2z e cos 9 sin 9 dzd9.
Integration with respect to 9 yields
(6.14)r(~)r(P;l)
I =----11 r(~)
r(~)r(~)
rep)
1
fz=o
n- 2Z p-l
e z dz;
and putting ~z = t we obtain the formc
Now for large n it is well known that
(6.16)
(6.17)
This further simplifies to
(6.18)
33
To be more exact we can write
n"2 co 00
(6.19) i -t p-l =j e-t t p - l dt -J e-t t p - 1 dt ,e t at
n~c.
and successive integration by parts shows that the right hand side
of (6.16) reduces to
(6.20)P 1 P 2n) - 1 n - p-l
r(p)-(2' n/2-2 n/2-"·'e e
p-lwhich can be written as r(p) - O(~)nh
e
as n tends to infinity,
p-1S. n~nce - n/2
e
tends to zero
(6.21) 4n(p-2)t (l+~) •nP n
We will show in Section 7 that t he error in taking III as
an approximation to the ~~irie 'of II is negligible 'in comparison to
the v.alue>;·of. Ii"'"
f It may be remarked in passiu~ that the integral occuring in
(6.11) could also be evaluated by usin g Dirichlet's formula ~54,
p. 258_7 namely
JJ ... f 0' -1t nn f(t11t2 •••+tn)dt1 •••dtn
(6.22)r(O'l) r(0'2)
-= r(Q'l + 0'2 +
nr. 0'.-1
i=l J.f(z)dz •
7 • An Upper Bound .!.£ ~rror.
In this section we shall consider the following problem:
How much error is committed by replacing
n-p-l2
by
in the tripe Dntegral (6.1) over the domain D1? We shall consider
two separate cases,
(A)
and
(B) >
and find an upper bound to error in both cases. 'The larger of these
35
shall ultimately be taken as the upper bound.
~ ! .. Let I d . dEmote the difference1
n-p-l2
I d will not decrease if in the factor ~l - ml - m2 + ml m2 - m~71
2 ml Tm2we omit m3
and replace ml m2 by (2 ). Making these changes, we get
The variable m3
can be integrated out by using the transforma-
tion
, ,
36
Using (6.22) in the double integral involved in (7.3), we get
(7.4)
1
r(p-l)r(E) j n-p-l ... !!. z2 2 p-1J.- z) 2 7'r(p ) . z (1 - 2" - e _ dz •
t=O
Replacing z/2 by t, (7.4) becomes
The expression
t=o
can be written as
(1 _ t)n-p-l -nt- e
(7.7) (1 - t)n-p-l _ e (n-p-l)t - (p+l)t• e ,
can be written as
(7.7) (1 - t)n-p-l _ e(n-p-1}t • e(p+l)t 1
which can further be written as
(P+1)t'+31 ••. 7- ;
and using the fact that 1 - Y~ eYwe find as a first approximation
that
(7.9) (1 - t)nl
_ ent < (1 _ t)nl
_ enlt + enlt (p+1)t •
This reduces to
(7.10)n' t 't(l-t) _en <en (p+l)t
by using the well known inequality
Using (7.10) in (7.5), we get
38
1
(7.12) iP~ t P (P+l)e(n-p-l)t dt .
t=o
The integral in (7.12) can be simplified by replacing
(n-p-1)t by wand extending the upper limit of ,integration for w
to infinity instead of n-~-l
This will on1y increase the upper bound for I d ,and we get1
a simpler result, namely
00
p -ww e dw.
w=o
On simplifying (7.13) we obtain
(7.14) I < 4n(p+l) td -1 (p-l)
1p+1 '(n-p-l)
which 'gives an upper bound to error in case A.
Case B. Now suppose
and let
39
Omitting the factor m1m2-m;, which is non-negative in the
domain of integration, one can write that
(7.16)
Integrating out m3
by the same transformation as was used
in Case A, we get
*(7.17) Id <1 jf
40
Using (6.22) on the dnuble integral involved in (7.17), we
obtain
1
r(~) reP;l) r(¥> )n n-p-l
* - - z -r(7.18) I d < zP-1L-e 2 _ (1-z) dll •1 - rep)
z=o
vJrite
n n-p-1 ~ n'z n'(7.19)
- 2'z(1 - z)
2 - 2 -2 2'e = e e (1 - z) ,
Where n' = n-p-l • Then
_ ~z n-p-1 n1z
(7.~0) e 2 _ (1 - z) 2 =e- ~L-1 pt1- ~Z +
C.
n f
'2(1 - z)
41
Since e-Y < 1 as a first approximation, we can write- ,
n n-p-1(7.21) ; 2
z_ (1 _ z) 2
and the use of (7.11) gives
n n-p-1- '2z 2
(7.22) e - (1 - z) ,
Replacing n I by n-p-1, and using this result in (7.18), we get
1
r(~)r(~)r(~)<
r(p) Iz=o
n-p-l--zpl-l n-p-l 2
z • -r e dz
n-p-1We put 2 z =w to get
n-p-12j wP<-l e-w dw
w=o
Extending the range of the integral to infinity and integrating, we
get
42
* / 2 p+l ( r), 8 (p+l)'4- ( ) :It p-c • r{p+2) = _..;.;,:rr~,--~'_~I dl n-p-l ';")p-2r (p) () , )pi- 1_ p-l {n,·p-l
*'The larger of the bounds I d and I d namely1 1
(7.26)
can therefore be taken as an upper bound to the error involved in
replacing the factor raised to a high power in the integrand by an
exponential factor. It may be remarked here that (7.26) gives only
a first approximation for the upper bound to error, and that a
closer bound would be obtained if we considered four terms in the
expansion of e-(p+l)t in (7.8), and three terms in the expansion
p+l-Tz
of e in (7.20). Needless to say, we can get closer and closer
bounds by considering a larger number of terms in (7.8) and (7.20).
It should be noted that the result (7.26) enables us to put greater
confidence in our approximation of the value of I, which is of order
1 .- ; because (7.26) asserts that the maximum error committed bynP
1supposing that 11 is approximated by III is of order p+l; andn
therefore negligible for large values of n.'
The bound 8:rr(p+l)lp-l
RD
1 can be rewritten as --!-(n_p_l)p+l np+l
by using the inequality 1 <~ for large n.(n_p_l)pi-l npi-l
Thus
(7. 27) "Error <R
Dl&c(pi-l)~ 1 1 II
p-l p+l = pi-l say.n n
As a result of the discussion in Sections 6 and 7 we can write
a formal proof of
Theorem 3.
Proof:
From the results of Sections 6 and 7 we can write
(7.28)
where I JD I < RD = l&c~rl)t and lim:. ( en ) = 01 . 1 P n ->00
Multiplying both sides of (7.28) by nP, and taking the limit as n
tends to infinity, of the right hand side in
4n(p-2) t + 4n(p-2) t en
the truth of Theorem 3 is established.
8. The integral over D2 •
(8.1)
44
We now consider the integral
where D2 is defined by the inequalities
m1 + m2 > 1(8.2)
0 ~ m1 < 1-0 < m,_, < 1- c: -
To integrate (8.1) with respect to m3
, we make the transforma-
tion1
m3
= L(1-m1
)(1-m2 )t_7 2'
1
t- '2
dt.
Then
If we notice that
F(a,b,c,x)
1
=' f(c) j Gb-1{1 _ G)C-b-l(lf(b)f(c-b)
o
) -a• Gx dG,
provided that I x I < 1, we can rewrite (8.4) as
This step is justified by the fact that
(1-ml )( I-m2 )
m1m2
in the domain under consideration; except that on the surface of
the plane m1+ m2 = 1 we have an equality sign in (8.?). But the
omission of the point set determined by the plane m1+ m2 = 1 does
not alter the value of I r , since it forms a set of measure zero.c
(8.8) Since F(a,b,c,x) = 1 + ~b x + a~!;+l~~(b+l) 2x +
where, for the hypergeometric function involved in (8.6), we have
which the rth.term is 1of order -1 •rn
Hence for asymptotic purposes
46
the first term in the expansion of F(a,b,c,x), namely unity, will
provide a reasonably good approxtmation.
With these considerations in view, we can rewrite (8.6) as
p-3- 2(n-p+2)
jf
..J·Each of the double integrals involved in (8.9) can be changed
to a repeated integral and evaluated. As an example, we consider
the first one, namely
(8.10)
This can be written as
(8.U)
m =01
m = I-m2 1
Integration by parts shows that
±'
(8.12)
(8.13)
p-7 n-p+6p-3 p-5 2 2 2
n-p+2 • n-pt4 • n-p+6 • (1-m1) ~. + •••
Using (8.12) in (8.11) we get the series:
+~ p-5n-p+2 • n-pt4 •
Writing the values of the integrals involved above, (8.13)
gives rise to the series:
(8.14) 2n-p+2
r(~)r(~)r(n}
r(n+25)r(n
2-5} ;
p-3 p-5 2+ n-p+2 • n-pt4 • n-p+6 • ----- + •••
ren)
48
Similarly we can find the series expansions for the values
of the remaining double integrals involved in (8.9). Using (8.14)
as the value of (8.10), and similar values for the other integrals
(8.9) gives
(8.15) _ _ .....;.;.:J'(.l.(n;...-...=.p...r..)..;..t__12 =(n-2) t(n-p+2)2n-p
O( 1 )n32n
Use can be made of Stirling's approximation to the value of r(x),
namely
(8.l6)1
r(x) = e -x XX Z L-I + I~X + ...
Equation (8.15) shows that the principal term in the value of 12
is of order ~, where the actual value of,I2 differs from the<:::2--nn
n .
1principal term by terms which are of order~ and higher.n 2
9. An upper bound to the value of 12 ,
We can start from (8.6) and write
p-3 n-p r{!)r(n-p+I)2 2 2 2
(m1m) L(l-ml ) (l-m,,;)_7 --~~-~ ~ r(n-r 2 )
where it is known that
(9.2)(l-ml ) (1-m2 )
ml ffi2< L
The maximum value of the hypergeometric series involved
(l-ml )(1-m2 )will correspond to the case in which = 1 , and inml m2
that case, using the formula
, r (c )r (c -a -b)F(a,b,c,l) = r(c-a)r(c-b) , we get
Since
(9.4) can be rewritten as
j}Transformation
(9.7) 2ml = z cos 9
. 2 9m2 = Z SUI
reduces (9.6) to
r(~)r(n;2)I < 2 Co ~
2 - (n-l)rT
By using the formula
,J2 j; p-2 n-p p-2 p-2
z (l-~) sin 9 cos 9 d9 dz •
z=l 9=0
1f
)~
o
a-lsin 9
b-lcos G
1 r(~)r(~)d9 :0 -2 b
r(a~ )for
integration with respect to 9 , (9.8) reduces to
2
jz=l
p-2(1 z)n-p dz 2' z.
This inequality is the same as
(9.11)
1
j p-2n-pw (l-w) dw.
1w=-2
Observing that
/1'2
we have
p-2w
1
(l_w)n-p dw = 1 _ p-2 j wp - 3(1_w)n-ptl dw,2n-l(n_ptl) n-ptl
1'2
(9.12)f(P;l)r(P;l)
r(p-l) 2n-p1
(n-p+l)
51
Inequality (9.12) shows that we can find a number RD such that2
Slight simplification would indicate that if n is so large
that Stirling's approximation for r(n) is valid then RD = 162
would give a liberal upper bound.
10. Comparison of I l and I2 .
In Section 8 we proved that
(10.1) where c
is some constant, and in Section 9 we established that
(10.2)
A comparison. of these results with the value of I l namely
(10.3)
where In is a certain constant less in absolute value than anotherI
known constant which is independent o~ n? ShOWB that
(10.4) limn-> 00
nThis statement follows from the obvious fact that 2 tends to
infinity more rapidly than nP where p is finite. This means
that the relative contribution of the domain D2 to the value of
the Integral I carried over D is negligible in the limit.
Theorem ()+).
Proof.
From (5.2) I = II + 12 , Usin.g (10.4) we have 1...-11 and
from theorem 3, Il~ ~(p-2)~ • Hence 4n(p-2)t which can alsop. p'
n n
be written as r(~)r(P;l)r(~) (~)p, is an asymptotic approximation
to the value of I.
As a further check of the correctness of our approximation
Ill' we can compare it with the exact value of the integral
referred to in Section 5. That value can be written as
(10.6) 4n(p-2)tIs = n(n-l) ••• (n-ptl)
where the subscript's' is for the author of the formula.
ing it with III written in (10.5), we have
53
Compar-
limn -> 00
limn -> 00
n(n-l) ••• (n-p+l)
nP= 1.
Hence our approximation is asymptotically equivalent to Is' the
exact value, in the sense of Poincare ;-13 7.-. -
11. *The integral over the domain D .
*Domain D was defined in Section 5 as that subset of D1
in
which mi = Op(~) for 1=1,2 and 3. Since - Jm1m2 ~ m3 ~ vmlm2 '
*one way of characterizing this domain would be to say that D
corresponds to the inequalities
(11.1) 2 > 0ml m2 - m3
o ~ m1+m2 < An ,
where A is a finite number, independent of n.
*We can evaluate the integral over D as follows:-
Let
(11.2)
Integration with respect to m3
by the usual transformation gives
(11.3)* r(~)r(P;1)
I =----r(~)
Putting , and
and integrating out 9, we get
(11.4)
A
jnz=o
n- -zp-l 2z e dz
Substituting w for ~ z, (12.4) can be written as
(11.5 )
Thus
* r(~)r(~)r(~)I = r(p)
A2'
j p-l-ww e dw.
w=o
(11.6)* r(~)r(P;l)r(~) 2 P
I = r(p) (ii) £r(p)
00
j p-l -w 7w e dw_,
A2'
which on further simplification gives
(11. 7) I* = 4rc(p-2) t
nP
00
)A2'
p-l -ww e dw
55
Co~paring this vulue with the exact value Is we have
(n.8)*lim I = 1 ...,..-1-::'"'('-:"r - (p.,l)l
n -> 00 S
00
j p-l -ww e dw,
which is also = limn --> 00
*I
III
A:2
Since A might be a large number though not of the order of
.'1, tile term00
(11.9) 1 j p-l -w dw(p-l}l
w e
A2'
shall be small compared to 1; e.g., for p=3 and A=200, we get
1(p-l)l
00J wp - 1 e-w dw = e-LOOflo02+ 200 + 2_7 ,
A2
which will give a small fraction, and the fact is established that
*almost the whole of the density is concentrated in the domain D
near the origin. Equation (11.8) would indicate that even for A
*~s small as 10, and p =3 say D accounts for more than 99 per
cent of the density. In practice, however, A can be taken larger,
consistent with (11.1).
Another point needing clari£ication is the use of the
*exponential approximation over the domain .. Dl - D • At this
stage the justification is provided by the upper bound to error
n n-p-l- '2{m1+m2) 2 2
involved in using e instead of Ltl~ml)(I-m2)-m3-7
inside Dl , which was worked out 1n Section 7. The upper bound to
. ~1error for Dl was found to be -p+T. A closer bound can be worked
n- *out for the domain DI - D , and it can be shown that it is a
constant times the same upper bound multiplied by an integral of
This upper bound can be obtained by following the same
lines as those followed in Section 7.
12.
57
Summary of Chapter II.
In this chapter we have considered the asymptotic evalua-
tion of the integral
(12.1)1= ffi
D
where D is determined by the fact that both factors involved in
(12.1) are non-negative, and 0 < m. < 1, i=1,2. Two simplifica-- 1.-
tiona used in the evaluation of I are:
(1) D can be split up into two domains, Dl and D2, by the
plane ml+ m2=1. The contribution due to the domain D2, for
which ml+ m2 ~ 1 is negligible in the limit, in comparison with
that of Dl •
(2) The integral over Dl is evaluated by replacing the
n-p-l n( )2 2 - "2 m +m2
fa~tor ~(l-ml)(1-m2)-m3-7 by e 1 • The justification
for the approximation thus obtained is prOVided partly by the
probability order of the variables, and partly by the bounds to
error found SUbsequently.
With these simplifications it is proved that
(12.2) ,
and that the exact value of I can be written as
58
(12.3)
where the second term is the remaining contribution due to Dl ,
and the remaining terms give the integral over D2 • Bounds have
been found, (7.27), for JD and, (9.13), for the integral over1
D2 • These have been shown to be negligible as compared to the
principal term in the value of I, giving 4n(p-2)t as an asymptoticnP
apprOXimation to the value of I.
CHAPTER III
ON THE ASYMPTOTIC DISTRIBUTION OF 1rJALD'S CLPSSIFICATION
STAT1ST1C IN THE NULL CI1SE
1. Introduction.
We are dealing with the problem of classifying an individual
into one of two groups or popula~ions such that the information re-
garding the two populations is based on two samples of sizes Nl and
N2 respectively. One may be called upon to consider the following
three situations:
(A) Nl and N2 large,
(B) Nl + N2 or n( = Nl + N2 - 2) large,
(C) . Nl and N2 small.
The study of case A is equ~valent to the study of a linear
function of normal variates, that is, treating the statistic U, de-
fined in Chapter I, or the linear discriminant function, as normally
distributed with means and covariance matrix replaced by their sample
estimates to get the mean and variance of the approxtmating normal
distribution. This case has been completely exploited by several
workers in this field.
The results available in case C have been summarized in
Sections 4 and 5 of Chapter I. The difficulties involved in obtaining
the exact sampling distribution of
joint distribution of ~'~2 and m3
being substantial, it makes sense
61
to ask wheth€.I' it would re possible to get the distribution of V in
case R. Obviously the results obtained would not be as exact as one
would like to have, but they should be better than the large sample
normal approximation of case A. It is thus in the sense of large n
that we shall use the words "asymptotic ll and "limiting", and it should
be noted that the assumption n large is less restrictive t han the
assumption Nl and N2
both large.
In this ch8pter we shall find the asymptotic moments of a
statistic v which will be called l.Jald' s approximate classification
statistic, and then,use those moments to find the limiting distribu-
tion of v, in the null case, separately for even and for odd valli es
of p.
2. Waldls approximate olassification statistic and its moments.
From Chapter I we recall that ~ald expressed the statistic ulti-
mately as a function of three variables, and stated that
Vo:
can be considered as the classification statistic.
by section 2 of Chapter II. Thus, by a convergence theorem due to
Kolmogoroff L- 25_7, the distribution of V can be well approximated
by the distribution of nm3
as stated by Waldo There is no loss of
62
generaity in considering
as the statistic instead of nm3
. 1nTe shall refer to this as the approxi
mate classification statistic ofWald, as against the exact statistic
V suitable for small samples.
2A. Limiting moments of the statistic.
riS a first step in finding 'Vk, the kth moment about the origin,
we shall discuss briefly the value of the integral
p-3 n-p-l
(2.3) I(k)= ]/'f nm3 k(~m2-m;) 2L(l-~)(1-m2)-m; _72-d~d~dm3 •
D
If we recall the discussion about the domain D from section 2, Chap-
ter II, it can be easily seen that the integral can be written as the
sum of two integrals over the interiors of the two cones defined by
Dl and D2• Thus (2.3) can be written as
(2.4)
where Iik) and I~k) denote the values of the integrals over the two
cones Dl and D2 •
Define
By the procedure followed in Section 6, Chcpter II, we get
63
(2.6)
which, for k :: 0, gives III of Chapter II.-
By following the methods of Section 7 and 9 of Chapter II, we
can show that the upper bound to the error in estimating Iik) by
Ii~) is of order ~+1' and an upper bound to the value of l~k) isn
Thus we can write
where
(2.8)
and
•
It should be noted that it is the upper bounds to, and not the
exact values of, In' and I D that are known; and to avoid dup1i-1 2
cation in their derivation, since they are obtained in exactly the
same way as similar bounds were found in Chapter II, we write the re-
suIts. They are
e.(2.10)
64
and
(~ .11)r(~)r(¥)r(I9)r(£~)n
k
I D2 ~ r(n+~-l)r(P_l) 2n+k- p(n_p+k+l)
It is easy to see from (2.7), (2.10) and (?ll) that
Inlim 1 = 0 , and
min I(k)n -> 00
I Dlim 2 = 0 ,
n -> 00 min r(k)
showing thereby' that In and In are negligible in comparison with1 2
the principal term in the value of I(k) •
Dividing (2.7) by III we get the expression for the asymptotic
moments, namely
(2.1?)
where
. nP+k2k+1
• (n"!J"iol)k+p+i
and
65
We shall rewrite (2.13) and (2.14) as
(? .15)
where
and
,
(2.17)
and
p+k+12k+l) k+l k+p ) ) n
Rl (k,n = r(T)r(T)(k+p (k+p+l k+p+l 1 P ,(n-p.l) r(2)r(~)
(2.18)
(2.19)
We will also write (2.12) as
p+k+ln
• (n-p+k+l)
•
•
We will refer to as the principal term in the value of V k, be-
cause, as can be easily verified,
(2.20)
(~ .21)
Rn (k,n)
lim 1 = 0,.../
n -> 00 vk
Rn (k,n)
lim 2 = 0 0"....
n -> 00 vk
To conclude this section, therefore, we can state that (2.19)
gives the kth moment vk; and, because of (2.20) and (2.21), we
6(
can write
(2.2?)
3. The asymptotic distribution of v ~ p = 2m •
In this section we shall find the asymptotic distribution .f
v for even values of p. By applying the general result we shall
also explicitly obtain the distribution for p = 2, 4 and 6.
Lemma 5.1.
and
r(~)r(£f:)
r(p-l)r(~)r(%)f or large n ,
Proo£':
The maximum value of n •• :l" 2 fQr n.> 2p + 2, and(n-p'-l) . '. -
r(%)r(~)~ ~ -1:, thr:rl)!or; th') ~th -of (3~1.) is a.,;,tablished.
To prove (3.2) we consider
r(n-2)2"
r n+k-l~
n+k2
p+k+ln
(n-p+k+l)
67
~+k
-. t th t 1 < 3 for 11 1 d 1 n 1We fh-au no e a n-p+k+l n a arga n; an s noe -2n+k- l <
for all large n
Lemma 5.2
The series
and < 1 for all n ~ 5, (3.2) follows.
and
are both convergent.
Proof:
Let uk denote the kth.term of the series. For the series
(3.5), we have
The ratio
(3.8)
Using Stirling's approximation to factorials, we have
..e 68
lim uuk := lim k~l (~).k -> 00 k+l k ->00 4
This simplifies to
lim uk "1. k _> 00 uk+1 = 2t
00
The ratio test stat es that the behavior of a series Z uk is deterk=O
mined by the following formula:
,,
~ ::::::8:0:::::6:r1fo : : 1
I Series diverges if c < 1"-
formula shows that series (3.5) converges if
UkIf lim ---- = c
k ->00 ~+l
Application of this
(3.11)
t < 1/2.
Consider now the series (3.6). Since
k+lT o
r(~)r(¥)
r(p-l)r(~)r(~)
r(~)2
(3.12 )
limk -> 00
so that the series (3.6) converRes for all values of t. In particular,
1therefore, we can say that for t < ~ both the series (3.5) and
, 69e(3.6) are convergent.
In
.-vthere are three error terms if we approximate ~k by v k • Since the
other two are negligible in comparison to the upper bound to RD (k;n),1
it will be enough to consider the contri~ution of this to ¢(t) the
moment generating function of vk •
1ATe define
(3.16)
then by (2.19)
where en is the contribution due te other error terms and is easily
seen to be an infinitesimal of an order higher than that of ~.n
By virtue of Lemmas 1 and 2, we write (2.17) as
1 00 t k!0(t) - ¢(t)i < - Z --k
'Rl(k,n)~~
I I nk=O. n
uniformlV for all \ t I < ITO i 1< ~
Therefore by Paul Levy's theorem~9, p. 96_7
(3.20) ,
..tt 70
where F (v) denotes the sequence of cumulativa dist:'1 iliution functionsn
,,-.../
corresponding to ¢ (t) , and F(v) corresponds to ¢(t). Thuswen
have proved the following theorem:
Theorem ,.
If F (v)n is the sequence of cumulative distribution functions
corresponding to vk for large values of n, and F(v) is the distri
bution function corresponding to 'Vk
, then given e, there exists
an Ne , ~uc~ that I F(v) - F(v) I < en
Theorem 6.
for n > Ne
When p, the numher of variables, is even, the asymptotic dis-
tribution of v is given by
mf(v)dy = Z b. f.(v) dv ,
j=l J J
where 2m = p,
1 j-l -j v>O'F[J) v e
(J .24) f. (v) = ,J 0 otherwise
and where the b .'s are suitable constants depending on m.J
Proof: Let p = 2m.
(J.26)
·e71
On exp!mding the right hand side of (3.26), we get
-::" =: (k+2m-2)(k+2m-4) ••• (k+4)(k+2) k~vk 2m-l • rrmJ
The moment generating function for the corresponding distribution is
given by
)
or
¢(t)00 tk.v~ i(T))k '
k=O 0
(3.28) z L (k+?m-2)(k+2m-4) ••. (k+4)(k+2)f(m) 2m- l
This can be rewritten as
( ¢~) . dm-1 k+m-1 k+m-2 d k+l k3.29) (t =c 1 -----1 Zt +c 2Zt +•••+c1 --dt L:t +cozt ,m- dtm- m-
where cOc1 ••• cm_1 are constants depending on m and these are ob
tained by comparing the coefficients of like powers of k in (3.28)
and (3.29). The uniqueness of the solution for cOcl ••• cm_1 follows
from the fact that each of the expressions (3.28) and (3.29) consists
of a factor ~k multiplied by a polynomial of degree m-1 in k.
We can write (3.29) as
¢(t)m 00 m-i .
= Z c Z d (tk+m-~)i=l m-i k=O dtm- i
For fUrther simplification, we writeCDZ t k+m- i as
k=O
which can be expressed ast m- i-r:t' and the operations of summation
72
and differentiation can be interchanged in the region of convergence
of the series, namely I t I < 1 •
Also, since..,
and furtherm-i m-i-l 1 _ dm- i
d r- Z t A + /_ ( 1 )i 1 t - • I-tdtm- - A=l - - dtm-~
(,3.30 ) becomesm dm- i 1Z c. . (l-t)
i=l m-~ dtm- 1
This can be rewritten as
r....J
¢(t) = ~ c (m-i)ti=l m-i (l_t)m-i+l
::
~~
m c.Z m-J.
i=l (l_t)m-i+l
of
It is well known that 1 is the moment generating function(l-t )a
f (v) ::a
1 a-l-vrraJv e
o
v~O
v < O·
Hence we CAn write the distribution whose moment generating function
k
e73
m *f(v)dv = Xc. f .(v) dv. m-~ m-~~=l
This can" be expressed in a slightly better notation by writing j for
m-i. This completes the proof of theorem 6.
Special Cases.
(i) p = 2.
The kth moment is given by
tIVV =k
r(~)r(~)
rn,
which on simplification gives
(3.36)
,
The corresponding moment generating function is
"r-.../ co¢(t) = l: t k
k=O
which can be written as
1= I-t
From (3.38) we conclude that
(ii) p = 4.
For p = 4 we have
if v ~ 0
otherwise
74
The mo~ent generating function for this, namely
~
¢(t) =: ,
can be rewritten as
,-..,.J
¢(t) =:d t k"'l tk
<;0 ( ) <;0(_)... -dt --;:;- + ... 2k t:. k
This on simplification becomes
The distribution of v is therefore given by
( ) l( -v -v)f v dv =: '2 ve + e dv
The moments in this case are given by
~
v =:k ,
which simplifies to
(k+4)(k+2) k!8
The corresponding moment generating function is
"""""¢(t) = ,
which can also be written as
Following the argument used in Theorem 6, this simplifies to
r.../
1 d2 1 1 {j 1(3.49) ¢( t) 3 ( 1== '8 ~(l-t)+ 8 crt(l-=t) + '8 'l-t)
dt
which givesr---.J
(J .,0) ¢(t) 1 + 3 + 3==
8(1_t)2•
4(1-t)3 8(1-t)
The distribution to which this refers is obviously
) ( (1 2 -v 3 -v 3-v(3.,1 f v)dv = '8v e + ave + 8e )dv
4. An integral equation due to Wilks.
S. S. Wilks L-" _7 considers the moments and distributions of
some statistical coefficients related to samples from a multivariate
normal population, and exhibits a new method of attack. He considers
two integral equations which he calls Type A and Type B, and uses
their solutions in deriving some now well known distributions. The
first result adapted for the present use can be written as follows:
If
(4.1)00
1 ,
whero k's and a's are real and positive and Band f(v) are
independent of k, then
f(v)
-82 8 2-1B v=f(al )r(a2)
v-x- -Bxe dx
The integral in (4.2) can be expressed in elementary functions when
al -82 is half of an odd integer; and this case, as we shall see later,
corresponds to the distribution of v defined in (2.3) for even
values of p. If, however, 81-a2 is an integer, the integral is
a Bessel function and this situation arises if p is odd. Before
using (4.2) in finding the distribution of v, we shall, for the sake
of completeness, add a note on Bessel functions.
5. A note on Bessel functions.
The equation
2 iw dw 2 2z ~ + Z dZ + (z -n ) = 0
dz
is called Bessel!s differential equ~ion of order n, and Bessel
functions are defined with reference to this equation. Its only
singularities are at z = 0 and z;: 00 c
~ solution in series of (5.1) near the origin enn be obtained
by supposing that w = ia.z:t
is a solution. It is found
77
that the discussion can be divided into four cases.
2i+l(a) n # i, n r ---2- where i stands for an integer.
In this case there are two independent solutions:
(5.2 )
where
J (z)n and J (z)
-n ,
J (z)n
00 (_l)r= i:
r=O r(r+l)r(n+r+l),
and is analytic for all values of z except possibly z ~ O. It
is called Bessel's function of the first kind.
(b) If n = i an integer,
J (z) and J (z) are two linearly dependant integralsn -n
satisfying the relation
J (z) = (_l)n J (z)-n n
In this case the solutions are
J (z) and Y (z)n n
where
•
n-1 ( -n+2rY (z) =J (z) log z _! i: n-r-lh (~)n n 2 r=O rt 2
1 00 (_l)r z n+2r- - E-· ('2) f¢(r) + ¢(n+r) 7,
2 r=O r(r+l)r(n+r+l) -
70
where
(5.6) rJ 1 1Y"(r) = - + - +1 21... r ' r = 1,2~;, •.• and 1/;(0) = 0
Y (z) is called "Ressel's functi.on of the second kind.n
J (z) and J (z) are two linearly independent integrals.n -n
and
00l:
i=O
2r(~)2
are the two solutions.
Yo(z) is Bessel's function of the second kind of order zero.
Sometimes a function G (z) is used instead of J (z) orn -nY (z) asthe second solution of the Bessel's differential equation.
n
It is defined by
G (z) ~ 2 ~ r J (z) - e-innJ (z)n SJJ1 nn .I- -n n _7 '
where n is not an integer; and
(5.11) G (z)n
J (z) - einn J (z) .L-' - -n n ...:.- 7 ,
2 cos n n -
when n is an integer.
tit 79
Ifwe put z = iv in (,.1), the result is
2 iwv -:-2 +
dv
dwv--dv
which is known as Bessel's transformed equation. Two solutions of
(,. .12), namely
I (v)n
00 1= l:
r=O r(r+l)r(n+r+l)
K (v)n
"" in G (iv) = __n__ L I_n(v) - In(v) _7 ,n 2 sin nn
are called respectively the modified Bessel functions of the first snd
second kinds of order n.
If n is a positive integer,
and
I (v) = I (v)-n n
,
K (v)n = lim
e -> 0•
6. Distribution of v for odd values of p .
In (2.2~) 'we proved that
(6.1)r(~)r(~)
r(~) r(~),.
which p-ives only the principiI term in the vaIu~ of ·Since we
,
60
are not using the exact value but only an asymptotic approximation
- .. 00
for the valu~ of ~ vkf(v)dv, the results Dbtained by the use of
o
(4.1) and (4.2) can not be presented as being final. Moreover, since
the paper of Wilks referred to in Section 4 depends heavily on Stek-
loff's paper on the theory of ~losure as applied to the problem of
moments ~47 _7, which is not easily available, the distribution for
odd values of p is here presented on a heuristic basis. It may
turn out to be the correct distribution, but it has to be left for
further discussion and rigorization.
Consider again the equation (6.1). If
2u=:v
then
(6.3)r(k+ ~)r(k+ ~)
r(~)r(~)•
(6.4)
Comparing (,.3) with (4.1), we have
B =: 4, al :; % and 8 2 =: ~ •
In this case (4.2) gives
(6.5)
Putting
p p-2- '2 2
f(u)du= 4 u dur(~) r(~)
00
1p+l u
- 2"" -x- 4ix e dx
(6.6) 2u ::: V and p::: 2m + 1 ,
81
we get
00
.jo
-m-lx e
2-(x+ ix)
dx
According to '~Tatson £52, p. 183 _i the integral
v200 -x- 4X~ x-m- l e dx , has been studied by Poisson, Glaisher, Kapteyn
·0
and others. The result stated in Watson is
(6.8) lvmjOOK (v) => -(-)m 2 2
o
2-(x+ +-::)
-m-l L.j..l'..x e dx
This reduces the distribution of v to the form
Putting m = 0, 1, 2, ••• iQ this, we get the distribution of v for
p :::r 1, 3, 5, •..•
7. The use of a differential equation in the evaluation of an
integral.
In Section 6 we found that
p+l- --n"x c
v2-x- 4X
e dx ,
82
where p is the number of variates in the underlying normal distri-
butiolTS •
1\ known teclmique for l:::\rE1luating
ro
¢( v) :: io
p+l v2- -r -x-17X
x e dx
is as follows:
¢(v) ::
p+l 00
2T j
odz •
Now we define
00
rev) = io
1 2 ·2..l.. - "?(z t ;)p-2 - zz e dz >., .
wherev2;O.
dz
12 2- -(z + ;)
2· z1- ezP
co
Y'<v) = -v1
Since the conditions for differentiation under the integral
sign are satisfied, we differentiate \Vev) with respect to v and
get
Similarly
dz
1 2 i- "'2(z + 2)
L- 1 z 7d 1~ EI z:= - ep-~ _ p-lz z
Now 2l( 2 "l/.)
-"'2'z.~z
e dz
12 2... -(z + ..!..)
2 2z
83
00
o
which is equal to zero identically therefore, using this identity we
obtain, from (7.4), (7.5) and (7.6), the following differential equa-
tion
The value of riC v) can be found by using the solution of th1.s and the
fact thatI p+l
¢( v) = _ if (v) • 2~ •v
8. The asymptotic distribution of v for even and odd values of p.
We shall, in this section, derive the distributions of v again
by starting with the result,
(8.1)OJ
12p+l v
.. 2' -x- wex e dx , and
by evaluating the integral involved by the help of (7.8) and (7.9).
We divide the discussion into two cases.
Case A. p := 2, 4, •.•
Let p:: 2
The differential equation (7.8), in this case, reduces to
where the symbol D stands for the operation of differ6ntiatio~.
(8 .3) \.( (v) = Ae v + Be- v
84
is the solution of
by definition.
(8.2). Also for p = 2
1 2 200 - '2(z + ..;)
"\f(v)=j e Z
odz
This gives
(8.5)
Thus
(8.6)
and hence
;; \//\V (0):: /"2 and T (00) = 0V
II/() 1;-vi v "'V 2 e
where ~(v) stands for the integral occuring in (801).
Hence we have the result
f( v)dv =-ve dv •
·e 85
Here
(8.9)00
'(v) ~.1 dz ,
but from
(8.10 )
(8.4) and (8.6)1 2 2
00 - ~(Z + .;) ~'
1 z n -v° e dv =V ~ e
Differentiating both sides of (8.10) with respact' to v and dividing
by -v, we get
(8.11)
Thus
-ve-v
(8.12)
Hence from (8.1)
() 1( -v -v)f v dv = 2' e + ve dv
Here
(8.14)
12 200 - -( z + v2)
y (v) .... i ~ e 2 Z dz •
° z
Using the reasoning of example 2, we get
(8.15 ) '!f'ev) ~Ii-v -vve +e
•
86
Therefore
(8.16)
This value substituted in (8.9) gives the' follow.l.~ distribution for
p "" 6.-v -v 2-v
f(v) dv = )8 +)VS +v e dv
The process can obviously be carried on to get the distribu-
tion of v for all even values of p.
Case B. p ::: ), 5, ...
(Bl ) Let p = 3·
The differential equation satisfied by ~(v) in this case
reduces to
C8.17) ,
which is the modified Bessel equation of order zero, and is satisfied
by
Therefore
(8.18)
But
¢Cv) •
(8.19 )
87
(see for instance, Watson i-52, p. 79 _7 )Therefore
Substitution of this in (8.1) gives
(8.21)
(B2) Let P "" 5.
Here
(8.22)co
'rev) = jo
dz
(8.23)00
la1 2 2
- -( z + v2
)1 2 Z- ez
Hence, on differentiating with respect to v and transposing suitable
factors, we getI·
(8.24) 'rev)-Ko(V) Kl(V)
= => -v v
Thusf
(8.25) ¢( v)-vKl (v) - Kl ev)
""v3
,
which, by using the formula
(8.26)
gives
I
vK (v) - nK (v) => -vK ( v)n n n+l
88
( 8.27)
and consequently
as the distribution for p = 5 .
This nrocess can obviously be continued to obtain the asymp-
totic distribution of v for all odd values of p.
This section also shows that we get the same distribution of
v for p = 2m by the two methods, namely
(1) The use of the moment generating function }
(2) The application of the integral equation given in Section 4.
9. Note on the construction of tables.
Case A (When p = 2m)
The distribution of v in this case is
(9.1)m
f( v) dv "" Z b. f . (v) dvj=l J J
where
f . (v)=J
(1 .i-l -j\ rmV"' e
L 0
v > 0
otherwise
00
j f( v)dv can
x
The evaluation of the integrals of the type
and where the b.'s are constants which can be found for Any givenJ
integral value of m.
..e 89
?obviously be made to depend on tal)les of YJ- distribution with even
degrees of freedom. For illustration it will be enough to consider
the cases when p = 2 and p = 4.
and the substitution
f(v)dv= e-vdv
)~2v = - shows that
2
p( v > l!) = p( 'X 2 > a)2
which gives the method of tabulat ing areas for the distribution of v.
In thi.s particular case it may be more convenient to use the
tables of exponential function.
Here
() l( -v -v)f. v dv = '2 G + ve dv
Putting,.'x-2
v =~. this becomes
The two frequency functions inside the square brackets are~2 fre-
quency functions for two and four degrees of freedom. Consider the
following table giving tail areas for these distributions.
2 .13534 .12246 .11080 .10026
4 i .40601 .37962 .35457 .33085t
.30844 .28730 .26739
..
V i 4.00 4.2 4.4 4.8
.09072
5.0
.08209
90
5.2
.07427
from table 7, Pearson and Hartley ~35_7o Averaging these as suggest-
'\/' xed by (9.5), we have the following table for p(,.... ~ x )=p( v > '2.)'
x .' 4
p .27068
. 4~4
.25104 .23269
4.6 '.
.21556 .19958 ,,18470 .17083
From this table it is possible by linear interpolation or by
using the formulae for interpolation when the arguments are not
equally spaced, to find thp. values of x corresponding to p= .25,
p ,.. .20 '3tc.
Similar remarks anuly to the construction of tables for p = 6,
8, •.• :)
Case B. P = 2m + 1.
For this CAse we proved in Section 6 that
TQbles for these distributions can be constructed by using
the series for K (v) and integrating term by term.m
10.
91
Sma~ary of Chapter III.
In this chapter we have discussed the distribution of v = Inm3
\
for large values of no The kth moment E(vk) is found in Section 3
by following the methods of integration developed in Chapter 110 These
moments have been used in finding the asymptotic distribution of v
for even values of p by the help of the corresponding moment generat-
ing function. For obtaining the large sample distribution of v for
odd values of p, use has been made of an integral equation due to
S. S. Hilks.
CHAPTER IV
AN ASYMPTOTIC SFRIES EXPANSION FOR THE DISTRIBUTION OF
1. Introduction.
w = IN THE NULL CASE
Harter L-18_7 has obtained the distribution of m3
as a double
series by starting with the joint distribution of ml , m2 and m3
of
ihTald in the special case when P i = 0 = t i' which we call the null
case, and which has been the subject ~f our discussion in the pre-
ceeding chapters. The series obtained by Harter would present diffi-
culti9s in practical applications, since in any practical situation
the number n, which is determined by the sizes of the two samples,
will not be very small. For large n the investigator wishes to use
that distribution of m3
in which the ratio of each term after the
-1 Ifirst to the preceeding term is of order ntis also obvious
that the main point in getting such a series 1s to obtain terms beyond
the first. Of these, however, the second and third approximations are
of chief interest and are doubtless easier to calculate than any of
those of higher order. 'Recause of these considerations in this chapter
we shall obtain the first three terms in the distribution of w = tm31
as an asympt~tic series. For the first a~proximation the constant ef
integration will be found, and the method of finding the tail areas
for the construction of tables will also be discussed.
It might be noted that the statistic w is ~ times the statisn
tic v defined in Chapter II. Towards the end we shall also compare
93
the result. of this chapter with that of Chapter III.
2. An asymntotic series for the distributioq.
i~e consider the joint probability distribution of ml , m2 and
w, which is the same as the probability distribution of ml , m2 and m3
except for the constant of integration because
Let C denote the constant of integration. Then
(2.1)
The region of integration is determined by
2> 0m1m2-w
~ -
(1-ml )(1-m2)-w2
> 0(2.?) D =: ,
l~~ > 0
1 .:: m2 > 0
which alsft determines the range 1 variate Im31o < w < - for the w = .- - 2
To integrate with respect to m1 and m2
, we shall keen w
fixed, and put m =: x -+ y and m =: X - Y . This gives1 2
p-3 n-p-l( ) ( 2 2 2 ?/-( 2 2 2 22.3 f x,y.w)d;ldydw "" ?C(r -y -w) _ x-I) -y -w _7 d::i::dydw
For fixed w, the two expressions in the brackets in f(x,y,w)
are zero on hyperbolas in the (x,y)-plane. Mnreover x + y = 0 and
222x - y = 0 are the asymptotes of x -y =w , and x + y - 1 ~ 0 and
x - y - 1 = 0 are the asymptotes of the hyoer~ola (x _ 1)2 _ y2 = w2 .
The region of integration f~r x and y is thus the area encl~sed
by the two hyperbolas and is sh~wn in the figure on the adjoining page.
The coordinates of the points of 1ntersection A and B €Jf
the two hyperbQlas are1
1 1 2 2'A = £ - (I. - w) _72' 4
11 1 2 '2
B = f -, -(- - w) 724-
The probability distribution of w will be given by the following
double integral:
/1 2V. '4 -w
r(w)dw ~ 20 I.J /1 2Y=-14 -w
r:-32 2 2-"'2
(x -y -w )
(2.5)
n-l"-l
i -ex )? 2 2 7 2- 1 -y - w _
Put
p-32" = r
dx dy dw •
96
and
n-p-l.-.,~ = q
2
AI 1 th 't· t / 2 2 bso rep ace e POS1 1ve roo VY +w Y a
To perform the integration with respect to x, we shall suppose
Y to he constant. Using (2.5) and by noticing the symmetry of the
inte?rand in y, we can write (2.4) a~ f(w)dw, where
(2,6)
/1
JIt'T;
f(m) = 4c •
-y=O
Let
2-w. I-a
Jx=a
( 2 2)rL-( )2 2 7qx -a r-l -a _ dx dy •
This transformation sets UP a one-to-one correspondence between the
values of x and the values of v. Furthermore, as x increases from
a t~ I-a, v increases monotonically from zero to one.
From (2.7) we have the following:
(2.8)
x = 1 -r.2 2/ a +(l-?a)(l-v)
2v + •••
·e
(2.11)
97
11-2a 2 7 - "2v + --.....~ v _ dv •
(l-a)
To examine the convergence of the series in v which will be
obtained as a result of this transformation, we regard v as a com-
plex variable and equate to zero the quantity under the radical sign
in (2.9). Thus, if va denotes a singularity, then
2-a~ , which givesJ..-ca
2(l-v) =a
v = 1 .±oia
/1-2a
This shows that the two singularities are situated on the line parallel
to the imaginary axis at unit distance and are equidistant from the
point 1. Also
or
2
=
( l_a)21-2a
l-a
JRa
Both the singularities lie outside the unit circle around the origin,
since(1_a)21-2a > 1 because
2a > a .
Using the transformation from x to v, we get
.e 98
1
(1-2a) r (
(l_a?+l a via
v +
We write (2.14) after expanding the last two binomials, but
omitting terms involving cubes and hipher powers of v since they
will not affect the first three t8rms of the desired asymptotic series.
This gives
/1 2v'4- w
/I
.Jy=O
(1_2a)q+r+l ra
( l_a)r+l
1J' vr (1_v)2q+l
v-=O
v +
) 22~. r 2 L- (r-l (l-~a+a )
2(1-a) 4a
. 2 la( 1-2a L7v + ••• j
/-1 1-2a (a
2+4a-2)(1-2a) 2 7_ + )2 v - ->----~.-;..r.4----- v • .• _ dv dy
(l-a ?(l-a)•
,
e 99
This can further be reduced to
(a2+4a-2)(1-28) r(1-3a+a
2)(1-2a) 7 "}
2(I-a)4 + )3 - + co. dv dy
2a(1-a
Integration with respect to v after replacing q and r by
their values in terms of n and p, gives
(2.17)
1 n-2 p-3E+ ( 2 T
f(w)=2 {' C ) 11-2a) ~
-" ?Y (I-a)
r(n-p+l)r(9)
( 2n-p+l)r 2
2 ( 2)21+ /- 1-2a + (p-3)(1-3a+a ) 7 p-l +/- p-3)(p-5)(1-3a+a- (l_a)2 4a(l-a) - 2n-p+l - 32a~(I-a)2
l
(p-3)a(I-2a)).j.(I_a)2
<l+4a-2)(1-2a) (P-3)(1-3a+a2)(1-2a) 7.
2(I-a)4 + 2a(l-a)3 -
~-- ----~-~~-
is an abbreviation
2(n -1)
(2n-p+l)(2n-p+3)
in which a
+ .... J }'rlY ,
,);,.....2--~2
for V Y +w • We shall write
(2.18)
To integr8t€ with respect to y we make the transformation
(2.19) Z =1222w _ 2 / yc.+wc.
2w-l
The limits C)f integret ion for z will be zero and one, since those of
;r-:;yare zero and Vu -w. Also z is a monotonically increasing func-
tion of y. This transformation will change the integrand essentially
into the product of two factors, one of which is a high power of l-z
and the other a series of ascending powers of z. Thus (2.19) will
change the integral into a sum of beta functions suitable for giving
an asymptotic series for the distribution of W.
To effect this transformation we have to find the values of
various factors involved, and we hAve
(??O) ,
(2.21) 2 2 2 /y+w=w_l+ 1-2w-z+W
101
1
( 2 2)2 ( r 1-2w 71- y +w = 1 - w)_ 1 - 2(1-w) z _ ,
(2.23)
1222 ( )1-2(y + w) = 1 - 2w)(1 - z ,
and
dydZ
1(1-2w)2/-2w -+- (1-2w)z 7
1 1
2z2L-4w+(1-2W)z_72..,
The singularities zl" z2 and z3 of the resulting series in z
are determined by the equations
(2.26)
and
respectively.
1-2wl+-W z=O
l-2w1 - 2C1-w) z = 0
l-2w1 + ""'1'iW z = 0
..
102
From these we have
(2.28)
(2.?9)
and
2wzl =
~w-l
z2 =2(1-w)1-2w
(2.30) Z ::::3
4w~
Since the range of w is from 0 to ~, we find from the above three
equations
(2.31)
(2.32)
and
(2.33)
-00 < zl < 0
2 < z2 < 00
-00 < z3 < 0
In otherwords, two of the singularities lie on the negative half ef
the real axis and one on the positive side in the z ?lane. To be
ablato get a convergent series in z we have to make sure that these
singularities do not lie in the unit circle around the origin. To
examine this, we proceed to find the range of values of w for which
z > l-
(i) Izll > 1 if ?w 1 if > 11-2w > or w Ii
( ii) Iz2! > 1 if 2f:2:) >1 or if 2> 1 which is true
( iii) Iz31 > 1 if 4w > 1 if 11-2w or w > '6
..
These investigations indicate that the resulting series in z will
converge for w
of vt~ which is
> ~, which1zerr, to 2' •
does not cover the whole range of values
We shall, however, proceed to make this
transformation and subsequently find the probability distribution of
w as a series o:f powers of ! .. Even the first approximaftion of thefl
resulting series will be shown to give close results, especially for
finding the right hand tail areas.
Makinr the transformation (2.19) in (2.17), we get:
n-l p-2--r --r
(2.34)f(w)=C1 (1-2w) -P~l
2(1-w) 2""
1 1 n-2 p~l p-l
)" - ~(l )7'1 1-2w 72 '1 1-2w z 7- 2
z -z ~ + -zw- Z_ L - 2(1-w)z=o
1- '2
L-l+ 14w2w z_7 ~1 + p-l rI. ( )l 2n-po4ol \iiI z,w
+ ,
where ~l(z,w) and ¢2(z'w) can be written down after making the
transformation in the r21evant factors in ('">017). To get three terms
in the probability distributi.on of w we need only retain the term
independent of z and the term containinR z from ¢l(z,w) and the
..
..e 104
term independent of z from ¢2(z,w).
If we retain only those terms in the various expansions involved
in (2.34) which contribute to the first three terms of the desired
series, we get
n-l p-222
(2.35) few) = CL (1-2w) wl p-l
22(1-w)
1. 1 n-2j z -2(1_z)2£1+ (P-l)~;-2W) z+
z=O
( ) ( 2 2Ll+ p-l)(1-2w z+ P -1)(1-2w) 2
4 (l-w) 3:?(1-w)2 z +
/- l-2w 3(1-2w)2 21-~z - z +
- oW l28w2
p-l (L- l-2w + (P-3)(1-3w+w2
) 72n-p+l (1_w)2 4w(1-w) -
2/-(1-2w) l-2w
+ ---..".-+- (1_w)3 (1_w)2
)
(p-3)w(1-2w) (w2+4w-?)(1-2w) + (P-3)(1-3w+w2)(1-2w)4tl~w)~ _. 2(1-w)4 2w(1-w)3
Further simplification gives
+ ...J} dz •
..
105
n-l p-222= c (1-2w) w
1 p-l. 22(1-w)
1 1 n-2
J -2(1 )2[1 /- (p-l)(1-2w) . (p-l)(1-2w)z -z +_. 4w ~ 4(1-w)
z=o
1-2w 7 /-(P-l)(P-3)(1-2W)2 (p2_1)(1_2w)2- ~- Z+ 2- + 2
oW ' - 32w 32(1-w)
2 2 2 )23(1-2w) (p-l) (1-2w) (p-l)(1-2w- 128w2"~ 16w(1-w) - 32w(1-w)
) p-l /- 7 p2_1 L- 7~l + 2n-p+l _ A + Bz+ ...... + (2n-p+l)(2n-p+3) E+ ..._ + ] dz .'
where A, Band E are functions of w, and are known explicitly from
(2.35). For the sake ~f brevity we write this as
n-l p-~27
few) :: c(l-2w) w1 p-l
2"2(1-w)
1 1 n-2j z - 2(1_Z) 2L-l+Alz+l\Z?+ .. ,_7,
z=o
I ( p-l - .' .:r>~~l:; -E+ }ll+2n-p+l LA+Bz+ •• ,_7+ (2n-p+-l)(2n-p+3) L .' "_7+ ,., dz.
..e 106
The integral involved in this can be writt.en ~
1 1 n-2
(2.38) J z- 2(1_z)-2- (l+L- ~n~;~l A+Al
z_7 +
z=o
,
where the terms in curly brackets are arranged in three blocks accord-
ing as they contribute to first, second and third approximations.
Integrating with respect to z, we have from these
p-l A + -!- A 72n-p+l n+l l-
2
/- (p -l)E p-l (
+ _ (2n-p+l)(2n-p+3) + (2n-p+l)(n+l) .B+AA1)
where
A =1-2w
"2(l-w)
(p-3) (1_3w+w2)
+ 4w(1-w) ,
..eB = (1-2w)~ _ 1-2w ~ (P-3)(4w4-11w2+7W- 1)
(1_w)3 (1_w)2 8w2(1_w)2 j
A = (p-1)(1-2w) + (p-1)(1-2w) 1~2w1 4w 4(I-w) - -aw'
107
B = (p-1)(P-3)(1-2w)2 + Ip2~1)(1-1W)2-1 32w2 32(1-w)2
and
(p_1)(1_2w)232w(1-w)
)o-1)(1-2w)232w2 .
2 2 2E = (P-3)~p-5)(~-3W+W) _ .(P-3)w(1-2wl _ (w +4w-2)(1-2w)32w (l-w) 4(1-w)2 2(1-W)4
+ (P-3)(1-3w+w2)(1-2w)2w(1-w)3
Furthermore,
1 1 ( P_1)-1 1 p-l (1)2n-p+l = 2ii 1-~ = "2!i + 7'"2
4+ 0 3 '
n n
•
III 1n+1 = - - -~ + 0(-)
n n n3 ,
111(
",.;;;, = --""l'\' + 0(-) ,2n-p+1)(2n-p+3) 4nc n3
1..,------ .(2n-p+1)( n+l) + O(~)
n,
··e1
(n+3)(n+l)1 0(1:...)... "2 + 3n n
,
108
Using these in (2.39), the first three termg of the series
can be written as follows:
n-l p-2 p-l(2.40) f(w)dw "" It (1_2w)-r w "'"T(l_w) 2[1+ fj- E? A + A
1_7 +
Explicit expressions for the first, second and third approximations
can be written by substituting the values of A, B, Al
, ~2 and E from
(2.39). This series can be written as
(2.41) few) = Kfl(w)L-l+ ~A(W)+ ~~(w)+ O(~) _7 ,n n
1and is such that the ratio of each term to the preceeding term is -,n
and is the desired asymptotic series.
3, The constant of integration for the first approximation.
The first approximation to the distribution of w is
p-2 p-l n-l( ) ~()- -,- ~ -1
(3.1) f w = KlW l-w (1-2w) + O(n )
where 1O<w<-.- - 2
The cons tant of integr ation can be found by start-
ing with the constant of integration of the triple integral found in
..
109
Chapter II or directly by integration from (3.1) as follows:
(J.2 ) 1Kl
=
~2 p-2 p-l n-l~ W~(l-W}- ~(1-2w)~ dw
o
Let 1-2w = y, then
. 1 n-l E~ p-11 1·) 2"" 2 -2(3.3) K = - y (l-y) (l+y) dy1 ~ 0 .
By comparin? with the hypergcometric integral, we can write
(3.4)
(3.5)
we get
1 1 r(~)r(~)Kl = Ii r(n+~+l)
Using the formula
F( a, b,c,x)
(U-l n+1 n+p+l )F T' 2' 2,-1
(l_x)-a F(a,c-b,c, -Xl)x-
(J.6 )
Since the third coefficient in thenypergeometric series involved is
large, the value of the series can be approximated by 1 for large n,
that is
and we can write
(p-l p n+p+l 1lim F "'2 ' "2' 2 '"2) = 1
n ->00,
(3-.8 ) 1- P2 "2
r(~)r(~)
r(n+~+l)
IQa
Using Stirling's approximation, it can be further simplified tl'l
,
giving approxim~tely the constant of integration in the first aPDroxi-
mation.
4. The tail areas for the first approximation.
Let
The transformation y = l-2w reduces it to
I-x
)
n-l p-2 ~-lT T --'2
y (l-y) (l+y) dy
a
This is the kind of integral we will have to evaluate for finding prob
abilities of the tyoe pew > ~).
W2 write the integral involved in (4.2) as:
t n-l
(L~.3) IJ,(t) = J ""2 'A( ) dyy y ,0
III
where~ - E;l
A(Y) = (l-y) (l+y) 2 • Integrati(m by parts gives
n+l2 T
~(t) = n~l Y A(Y)
t t n-l2) 2 1
- n+l y A (y)dy
o 0
This I!ives
n+l2 ""2 ) 2
~(t) =~ t A(t - ~n+J. n+J.
t n+l
j 2 1Y A (y)
o
dy •
Performing another integration, we get
(4.6)n+3
4-r I(n+l)(n;3) t A (t)+
dy •jt ~ rl
y A (y)
o
__ 4(n+l)(n+3)
From (4.5) we can write
n+l2 T I1f.l.(t)-n+It A(t)
t n+l2 j T 1= ---1 y A (y) dy ,n+
o
1~here ),,(y)
p-2 p-l""T - T
= (l-y) (l+y) o < y < 1- -
(4.8)
I
and this shows that the maximum value of A (y) corresponds to the
112
minimum value of y, which is zero, and
< B := 2p-3- 2 •
By taking account of thi.s bound we can write (4.7) as
t n+l
j y2 dy
o,
which is the same as
(4.11)
n+1
~(t) - 21
t2 A(t)n+
n+32(2p-3)2t
< tn+1)(n+3)
The range of t is zero to one, and thus equation (4.11) asserts
that given &, we can find N such that
n+1
~(t) ~ ~ t~ A(t) I< e for n > Nn+l •
n+l2 T---1 t A(t) can therefore be taken as an asymptotic approximan+
tion of ~(t). Using this in (4.2) we get
1 P n+1 p-? p-12"2 ? T -7
¢(x)= 2 n (l-x) x <..~-x) [1 + o(n-1)]r(~) (n+1)
lATe can write th1.s as
1 p-2'2 2 n+1 p-2 p-1. ( x) 2 n T T( - T 14.14) pew > '2 = (l-x) x 2-x) ..C1+0(n-) 7
r(~)
..
113
which provides a formula suitable for finding the tail areas of the
5. Comparison with the results of Chapter II~.
We have shown in this chapter that if we omit terms of order}n
p-2 p-l n-lT -T 2
f(w)dw :=: C w (l-w) (1-2w) dw
To get the corresponding first approximation for the distribution of
the statistic v= , we put vw = - in this, and getn
p-2 p-l n-lC T -T 2-"2
f( v)dv = - v (1- !) (1- 2) dvp n nn
Since
and
(1.i )
n-llim(l _ ~)2
nn-->oo
-ve
p-llim (1- !)-T
nn-->oo
1
.. 1
= 1
,
we can write (5.2) as2 .
B.:... -v2 ;- 1f( v) dv = Const. v e . dv _ 1 + OCii) _7 ,
which st3tes that for large n) v is approximately distributed as
~ 114
2'}./2 where the 'Y---2 has p degrees of freedom.
This is not in agreement with the probability distribution ~f
v obtained in Chapter III, and this discrepancy can be easily explained
by the fact that the convergence of the eeriee'tn ··.. ·"~n (20-38)0' ::.
W3S not guaranteed for the whole region of w. It was seen on pages
101-103, that convergence of the series in z· from which we obtained
f(w) by integration would be obtained only if 1w > '4. However, it
appears that the two distributions would give close results if we are
interested in the tail areas. As an illustration we shall find apnroxi-
mately the 5 % point for p = 4 by the results of the two chapters.
F~ample. To find x such that P(v > x) = .05 in the two
cases
(1)
From table 7, L~35_7w6 have the following values fQr the prob
3bility integral of the ~ 2-distribution.
Giving p( ,,--2 > 1--2 )- 0
'. 2~C1 7.8 8.0 8.2 8.4 8.6 8.8
d.f
2 .02024 .01832 .01657 .01500 .01357 .012::>8
4 .09919 .09158 .0845? .07798 .07191 .06630
Sum .11943 .10990 .10109 .09298 .08·548 .07858
1 .05972 .05495 .°5055 .04649 .04274 .03929"2 Sum
j
• 115
The last row gives probabilities .f the type 'p(2v > )Lg) or
p(v > vO) say, and shows that approximately p(v > 4.1) = .05 if we
use (1). Also from the same table we find that p(v > 3.9) = .05 if
we use (2).
6. Summary of Chapter IV.
In this chapter we have considered the problem of obtai.. ning an
asymptotic series for the distribution of w = m3
by starting with
the joint distribution of ml , ~ and m3
in the null case. The de
sired distribution is obtained by integrating out ml and m2 over
a lana shaped region enclosed by the two hyperboles ml m2 =w2(a con
stant) = (1-~)(1-m2)' To perform the two integrations, one with
respect to x = ~ + m2 and the other with respect to y =~ - m2
we have, at each stage~ regarded the other varia~les as constants and
found a transformation which changes to integrand into a function of
the form b i~ci(l - z) z, where b is a large number, and where zi
varies from zero to one. This leads us to a result of the type
few) = K L-fl(w) + ~f2(w)+ ~ f3
(W)+ ••• _7.n
First three terms of the asymptotic series have been obtained in
this ma~~er. For the first approximation we have also found the con-
stant of integration, and discussed the method of finding tail areas.
In Section 5 we have compared the results of this chapter with
those of Chapter III.
• CHAPTER V
THE APPLICATION OF TCHEBYCHEFF-MARKOFF INEQUALITIES
TO A SPECIAL CASE
1. Introduction.
This chapter will be confined to the discussion of the
special case p = J and Nl+ N2 = 20. In this case, starting
from the joint distribution of rol' m2 and mJ , we shall find the
moments of the exact statistic V. These moments will then be
used in setting up bounds to probabilities of the type P(V ~ ~)
by the use of some investigations due to Tchebycheff and Markoff
~48_7, ~49_7. This will provide, on the one hand, some exact
results of some importance for this case, and on the other hand
illustrate what can be done when the first few moments of an
unknown distribution are known.
2 • The integral over Dl •
As before, we shall denote by 11 and 12 the integrals
over Dl and D2 • Thus
(2.1)
Expanding the integral by the use of the binomial theorem, we
get
•(2.2) Iff
117
Each of the integrals in this sum can be calculated by
1/2first putting m,= (m1m2t) to integrate with respect to t,
then following the procedure of section 6, Ctnapter II. The result
is
(2.') II = rr. I 901
16 +1
11 + 12 +• 32 . ,0 • 48 • 88 •
1 1• 8 +
116 • 98 • 12 • 11 . l} + ;2 • 26 • 88 32 • 1,0 • 64 • 6
1 + 1 7+ ~1~28n--·~3~2-·~1~2~0---.~8 256. 128 • 16 • 17 - ,
which is therefore the integral over Dl •
3. The integral over D2 •
This has been found exactly in Chapter II} but anindepen-
dent derivation based on geometrical considerations could be
given here.
IffThe value of 12 ~e want will be obtained by putting n = 18.
•(3.2)
Then
In (3.1) put u = (m1 + m2) //2
and v = (m1 • m2) /.j2 .
Then let v = r cos Q
and J2m3
= z = r sin G •
11:
1(3.3) I = -
2 !2/
./2 Ju2+ 2(1- 12 u) .2n 2 2 n-4
/ j ur 2·(1- V2u+ ; ) 2d9drdu .
1· 9=0u=- r=O
J2Integration with respect to G and r is immediate, and yields'
{2
(3.4) 1 2 = 2n1' J1
12
2 n-2.Tr5 u 2
(1- V2 u + 2) du.
Puttingu = 02x and integrating we obtain,I = 'J(
2 (n_1)(n_2)2n-3
and for n = 18, it reduces to
•(3.6) 1t1
2;=: --..-;.;.---=-......
17 • 16 • 215
IIJ
4. The integral over D.
The value of I, the integral over D, is
(4.1) = 18 • 17 . 4
5 • Moments of V.
The kth moment ~ is given by :
Due to the symmetry of f(m~m2m3) the joint density function of
kml ,m2 and m3
in m3,E(V ) = 0 f~r odd values of k.
Putting k = 2, 4 and 6 io(5.1) and integrating as in
Section 2, we get the following moments:
(5.2)
We knm'1 that V >
~2 = 6.5571637176,
~4 = 459.6304942728,
~6 = 25661.8~65464
since V equals n m3
divided by a
quantity which cannot exceed one, and in fact remains less than
one inside D. Thus, since the range of 18m3
is from -9 to 9,
• 12 f1
the range of V in the case under cons~eration is larger, say from
-a to a where a > 9.
We shall now use the moments obtained above iI, setting up
bounds for P(V < ~) to get an idea of the exact simpling distribution
of V.
6. Some results due to Tchebycheff and Markoff.
We shall, in this section state without proof the results
which lead to the historic inequalities which were announced by
Tchebycheff and proved by Markoff, and which we shall use with the
moments obtained in the preceeding sectioD.
Theorem I. Any three consecutive polynomials in an arbitrary
sequence fp.(x) 1 of orthogonal polynomials satisfy the relation1
(6.1) p (x) = (ax + b ) p lex) - c p 2(x) ,m m m m- m m-
where p (x) stands for the mth orthogonal polynomial. Herem
a ,b and c are constants, a > 0 and c > O. If the highestm m m m m
coefficient of P (x) is denoted by k ,we havem m
<:lkm d .ma = -k- an c = ...-
m m-l m a m_1
The recurrence formula (6.1) is also true form = 1 if we
define p_1(x) = O.
• 121
Theorem II. The roots of the equation p (x) = 0 , where p (x) ism m
the orthogonal polynomial of degree m associated with the weight
function cx(x) on the interval (a,b) Jare all real and distinct;
and all of them lie in the range of definition of the polynomials.
Theorem III. If
oo(x) ,z-x
where o(x) is the weight function of the system of polynomials, then
wls satisfy the same relation (6.1) as the pIS, though with different
initial conditions.
Definition. Let
(6.4)m p.
= r. _1_
i=l z-C i,
where ~ (z) is or degree m-l in z, whereas p (z) is of degree m.m m
The c i are the roots of Pm(z) = O. Suppose a < cl < c2 ••• < cm
< b,
where (a,b) is the range of the basic function Q(x).
called the Christoffel numbers.
Then p. are1
•Theorem IV. The Christoffel numbers are positive, and
b
m J~ Pi = ~(x) =a(~) - a(a).i=l
a
122
Note: Because of theorem IV there exist numbers dl < d2 < ... < dm_l
lying between a and b such that
Theorem V. ••• d 1m- that is
more precisely
iL P
J.
j:::l
<a(c i +l - 0) - a(a), i::: 1,2, ... m - 1 •
That is if F(x) is the class of cummulative distributions having
the given moments, then
•c
ij dF(x)
a
. d i
< J dF(x) =
a
C i +1
~, p. <J dF(x)j=l J
a
123
7. Application of Tchebycheff-Markoff Theorems to this Example.
We have the following matrix of the moments of the distribution
studied in this chapter:
\-lO III 1-12 113 114
III 1l'2 113 114 115
112 113 114 115 116
113 114 115 116 117'"
=
0 6.557163716 0 459.·630494 16.557163716 0 459.6}0494 0
0 459.630494 0 256661.8465
459.630494 0 256661.8465 0 ,/"
in which all four principal diagonel matrices are positive definite.
Let
(7.1)
• 124
be one of the orthogonal polynomials in the sequence corresponding
to the frequency function of V which gives rise to the moments found
in Section 5. Then, by the definition of orthogonal polynomials,
(f.e:)
Taking Gk(x) = xk for k = 0, and 2 and noting that odd moments are
zero, we get the following e~uations for finding the coefficients lD
(7.1):
(7.3) and
6.5571637176ao + 459 .63049427~8a;;; + 256661.8465464a4 + 0
Taking a4 = 1 in (7.3), we obtain
(7.4)and
ao = 353"2.38864
ar- = -608.80~7~4c:
as the solution of'7.3).
•Thus
l~
Solving P4(x) = 0 we get the following four values of x ,
arranged in increasing order of magnitude
Now the function ~4(z) can be found using its definition
given in (6.2~, which gives
(7.7)4 4 2 2
(z - x· ) + B2 (Z. - x )~4,(Z) =E ;
x z - x
and, using (5.2) and (7.4), this becomes
(7.8)
The Christoffel Numbers.
The Christoffel numbers as defined in Section 6 are the
numbers p. given by1
• 126
m Pi= L: -i=l z - 01
So we have to split
(7.10)
into partial fractions. Write (7.10) as
3.423(7.11)
z + 34. 26+
p~) P3----+~---z + 3.423 z-
+z - 34.726
Comparing the coefficients of like powers of z in (7.10) and
(7.11) we get
Pl = P4 = .251
pr = P3 = .248Co
approximately, where p. corresponds to c. , the ith root of the1 1
equation (7.5).
Thus we get the following table giVing bounds to probabilities
by using the formula, (6.5).
• 127
Table 7.12
Limits for E
s < -34.726
-34.726 < S ~ -3.423
-3.423 < s:: 3.423
3.423 < S ~ 34.726
34.726 < ~
Bounds for
p(v < ~) = p
P < .251
o < P < .499
.251 < P S .747
.499 < P .:s 1
.747<P~1
It can be seen that the bounds given above are far from
being close. For obtaining bounds which are sufficiently close
and therefore useful we would have to calculate a large number of
moments. The labor involved in finding enough moments, and pro
ceeding with subsequent investigation based on those, however, is
prohibitive of any such invGstigation in these pages.
CHAPI'ER VI
NON-NULL CASE
1. Introduction.
This chapter will be devoted to the study of the non-null case.
In these first few sections W3 will consider the joint probability
distribution of ~, m2 and mJ
given by Sitgreaves L-45_7. This dis
tribution corresponds to the statistic
and has been obtained under the restriction that the mean vectors of
the two populations are proportional to each other. For large n we
shall convert this into a different form. Some of the difficulties inI
proceeding beyond that point will also be discussed.
The next section will deal with the distribution of
,2: ij C - x ) for the "" 1, and the assumption thatU "" 2: s zi Yj
case P oni j
j
h is large so that 2 can be replaced by 2 This assumDtion re-s er .duces U to the product of two normal variates whose distribution is
known; see for instance ~2 7, /-8 7 and /27 7. It has not been- - - --possible to extend this to the case p >,1.
In Section 7 of this ch9Pter we have exemplified the differential
method which was quite popular with statisticians a few years ago. The
illustration deals with the finding of approximat ions to the mean and
·e 129
variance of the statistic U for laIrge sam!lles by tcking into
account the sampling fluctuations of the sampl~ means and covariances.
Higher moments can also be found but the algebra involved is very
heavy.
The concluding section of the chapter deals with a practical
suggestion for modifying the variance of the discriminant function of
R. A. Fisher by taking into account the sampling fluctuations of the
means. The sample covariances can be taken as the population co-
variances when n is large. Thus the statistic U in this case be-
comes
2. The joint distribution.
The joint distribution of ml
, ~ and m3
given by Sitgreaves is
where
00
~
j=O
fC n+2 .) 2'-2+ J A. J 2C-) Ck In..
fCE + j)jl 2 1 ~2
.....
~ m3
M= ' -1, A. => 5 Z 0.9
l.....m3m2
130
and ~5 and k20 are the mean vectors.
Using the not8tion of the confluent hypergeometric series,
we can write (2.1) - as
p-3 n-p-l
1M! 3" II-~/f I 2
where
F( n+2 p x)2' -" ,
The function F(a,c,x) is also written as ¢(a,c,x) or as IFI(a,c,x),
and is known as the confluent hypergeometric function.
3. Notes for reference on confluent hypergeometric functions.
Consider the hypergeometric series
( ) + a.b z + a(a+l)(b.)(b+l) z2F a,b,c,z = I c c(c+~ ~ -I- •••• ,
in which we suppose that both a and c are positive. F(a,b,c~~)
gives a power series with b as the radius of convergence. It
defines an analytic function with singularities at 0, b, and 00. The
limiting case of this series as b --> 00 defines an entire function
whose singularity at 00 is the confluence of two singularities of
F(a,b,c,~) and which can be written as
131
( ) ax af a+l~ x2(3.2) F a,c,X = 1 + cIT + c 0+1 ~ + •••
It satisfies the confluent hypergeometric equation'
(3.3)d2 ' d
x J. + (c - x) .Ql - ay = 0dx2 dn
Accordi.ng to Bateman L-4_7, the asymptotic behavior of ¢fa~x)
as a --> 00 has been discussed by Perron, Tricomi and Taylor.
An asymptotic form of F uniformly valid in the neighborhood of x=O
given by Taylor is
1 c x 1
(3~4) F(a,c,x) = r(c)(Kx)2 - 2 e2 Jc
_l
L-2(Kx)2 _7+ O(-1
K )
where c and Kx are bounded, and K c/2 - 8, and J is the notation
for Bessel functions.
If : x is lJounddd and bounded away from zero and
arg x - arg K < n , then
1 x 3 c 3 Q
F( ) ,_,,2r(c)e2 K4 - 2 xIi - 2a,C,x =III2 "2-2
2i(Kx) -2i(Kx) (K)cle + c2e + x
I I22
ei.-exD Im(2T( x ) ,
where with s an integer, we have
.. ~ in(e- t)(2C-1) xc1=(2rt) e
e. <
and
1
arg(Kx)~ ~ (?s+l) 'it' - e.
1
(2s-1) ri + e ~ arg(Kx)2 ~ (2§+?) n - e ,
132
and'where Im(y) denotes the imaginary part of y. The first of these
results will be used in simplifying the distribution given in (2.3).
For large n, we have, by using (3.6),
2-p x
(4 .1) F(~, ~,x) = r(~)( - 2n41-J.-px)4e2Jp_~-i J('2n+4·-p)x_7+o( 2n:4-p).
2'"Let
p = 2 + 4q
TftThere q is an integer. Then, for lar ge n ,
( ) n+2 p )).+.3 F(2'~'x
x
r(2Q+l)(2Q-n-l)-qe2J 2c1"i v'('2n+2-4q)x _7 •
Using the relation
I (z) = i-n J (iz)n n
,
where I (z) stands for Bessel functions with purely imaginary argun
ment, (4.3) becomes
x
(4.5)F(n;2,~,x) (-1)Qr(2Q+l)(2Q-n-l)-Qe212qL-V(2n-4Q+2)X _7 .
133
Using this, we can write the joint density function of ~;m2 and m3
as
2_ ~ (ki+k~) p~? n-~-l
(4.6)f(mlm2m3)~d~dm3=ce IMI I I-MI
eXI p _2 l'"J( 2n~4-p )x '-r.&'].dm~dm.3 •-r
for all p satisfying (4.2); where
4A. The difficulties in proceeding further.
Various methods have been tried to proceed beyond this point,
but none seems to work well. The main difficulty, even at this stage,
is, that the coefficients in the expansion of the Bessel function in-
volved are increasing. As a consequence of this one would not be
justified in omitting terms in the expansion of I 2 ~/2n+4~px 7p- -2
beyond the first few, and discuss'the distribution of nm3
• The
difficulty would probably be removed if we consider small values ~f
n, and try to integrate over the 16ns-shaped region of Chapter IV to
find the distribution of m3
, but the objection to that would be that
m3
or TIm3
is not a suitable statis0ic for small values of n. This
discussion, therefore, had to be left at this point.
5. The distribution of U for large n ~ p~l, an independent
.§.pproach.
134
The statistic U reduces to z(y_x)/a2 for large n, since
2s is then found from a large sample, and can therefore be replaced
by c/ to which it approximates. This does not imply that the
sample means can also be replaced by their population values since
for n to be large it is enough that one of the sample sizes is large.
Moreover, none of the means has as many degrees of freedom as the var-
iance.
The distribution of z(y-i)/a2 can be found under both the
hypotheses
(1)
as follows.
z e)[l which is
z e)(2 which is
Let Z 6 lTl • The statistic U can be taken as the product of
two normal variates z which is * y-xz =2 which isa
(~N 2a
We can, instead of z and z*, consider the variables u and
v, where u is N(m,a2) and12
v is N(O,a ), where
12a = \I-lJ.and m => lJ. .. T or
a
\I-~ \T\I ..~ , according as z e )'1 or
a
z eJT2 •
135
The distribution of the product of two independent normal var-
iates is known from the work of C. C. Craig /-8 7, Aroian L-2 7 cmd- - -others, but for the sake of completeness ",re shall include a derivation.
Definition. x is said to be a Bessel variate if
as
,1"2
I l(b x )dxp-
p-l
f(x)dx = C x2 e-ax
1
I (b~2) is the modified Bess~1 function of the first kindp-l
(5.1 )
where
We shall now state without proof two lemmas.
2 IV. r2 x 2Lemma 1. If x is N(m,a), then ~' =~ is a Bessel variate.
a
222In fact, if A = m /a , then
,
'V 12which shows that " is a Bessel variate with 1a= -, b '" A
2
Lemma 2. If xl and x2 are two independent Bessel variates with
respective distributions
p-l 1~ -x 2
f(x Jdx. '" ex. e j I l(bx. )dx.,J J J P- J J
(j -= 1, 2) ,
then the distribution of ~ = xl - x2
is given by
_-b,2 2p-l2 ~ ~
f(~)d~== ~---(-);n 2
~o (E)2r ~2 ( )£J ----"'- K 1 ~ d~,
r==O 2 rtr(p+r) p+r--2
136
where Km(~) is the modified Bessel function of the second kind as
defined by Watson, or in vJhittaker and Watson i-,4, p. 373 _7.
The distribution of U == uv where u is
Let u V'!'"l::t-';"-'I 0- 0-'
and y u v== -
<'.J 0-
where 1") is N(~, 12) and t is N(~, /2).0- 0-
Thus by lemma 1, both
1")2 and ~ 2 are Bessel varia:. ss with2 1
a = 1, b = m2 and p = 2 as0-
parameters; and by lemma 2, theref~re) th8 distribution of the product
is given by
00
Zr=O (1) 2rrtr r+ 2 2
and by noticing that
lATe can rel-rrite this as,
1 m2:'2 .~
0-f(n )dU == _8__
n
00 2r Ifz (E1) ...!.- K (U)dU
r=O () ( 2r ) 1 r
v-IJ.Replacing m by IJ. - ~(j'
I
v-~"""and by v - ---, we get the distri2
0-
137
butions under the two hypotheses.
6. The asymptotic mean and variance of the statistic U =
z Zi j
ij (- -)s z. y. - x. by the differential method.~ J J ~------....:..;.....;;.---
1r.fe shall, in this section, find the mean and variance of U,
approximately for large samples, by a method which was formerly quite
pooular and still is sometimes used. The object of the section is
mainly to exemplify this method, which can somctliues be aoplied in
getting moments of an unknown distribution. Some of the sets of condi-
tions under which the method is apnlicable are discussed by Cramer
1.-9 _7 in Chapter 27, but we shal], like statisticians in the oast,
apoly it ~Jithout stopping to verify the validity of the application.
Because of the heavy algebra involved it will be enough if we con-
fine ourselves to the discussi.on of the first two moments.
I'
(6.1)
Let
U ij - -)= l: l: s z.(y.- x.i j ~ J J
be written as
(6.2)
where
b.z.l l
,
(6.3)
IrITe define
•
Then
ds ..lJ
138
Follmvinp this (:kiini t ~ on, We' let
(6.6)
where
,
T,Te note that
(6.8)
10 find E( dsij
) :
i'Let s J be, expanded in Taylor's series. irve have
Therefore
... .
Since E(ctskl
) = 0, this reduces to
, . 2 ij(6.10) E(ds iJ )= 12E/-E l. ~ 6 d ° dsk1dS t + ••• 7
.-klrt'o0kl0(jrt r
To evalu3te this, we havE to find
()2(jij
o °kl'() artand E(dskldsrt )
The let ter of these is known from Hot-dling L-23_7 as
(6.12)
139
0' 0' +0' 0'E(ds ds ) = kr It kt ~r
kl rt. n
To find the second ~rder derivatives involved we proceed ,as
follows:
Consider the identities
.. .(I1:O'J.JO' = 5Ji ik k
o
if k = j
if k f j
•
Differentiating (6.13) partially with respect to 0' AaI'" we have
(6.14)
where
(6.15)
Using
if i = k = a = ~
otherwise
(6.16) kmZ. O"kk J.
= orr:J.
in (6.14) and simplifying, we get
(6.17) ,
which provides a formula for the first derivatives.
If the covariance matrix Z is the identity matrix, as can be
supposed for the statistic U, which is known to be invariant under non-
singular linear transformations, then the crls can be replaced by
Kronecker deltas with the same suffices. Thus (6.17) simplifies to
..e
which states that
,
(6.19)
(6.20 )
(J ••~~
= - 1
:= - 1
,
and that the derivative wUh respect to any other element is zero.
To obtain we differentiate partially with re-
spect to crye the equation~
(6.21)
This giv8S~
Using (6.21) in (6.22) and replacing tho cr's by 6's as before, we
obtain
..
141
This gives
,2 i1... (]
". 2o ('Ji1
-= 6
(6.25)2 ij(]
--"2:- == 0,(1ij
(i =f j)
and all other derivatives of the second order are G180 zero.
Using thes~ results in (6.10), we have
(6.26) 6p (-2)- ... 0 nn
(6.?7) E(dsijdsgh ); ~ k Z !k 1 r t
gh(J
• I'Trt
Using (6.12), we can reduce this to
(6.2~) E(dsijdsgh )= ~ ~ L L 1n k 1 r t
ij(J-(Jkl
~3ubstituting the values of the first order derivative in t8rms of ots
.from (6.18) on the supposition of Z bei.n€ the identity matrix, we get
..e 142
In terms of a's we can use the notation
say
(6.30)
These results will now be used in finding E(U) and Var (U) •
(6,31)· To find E(U)
Since are allindependently distributed, the
expectation of the product is equal to the product of the expectations.
if z eTC.1.
E(Y. - x.) ::: v. - IJ.. =: d. sayJ J J J J
Thus
=.{~+~.2.E
n
if i = j
(6.35) E(U) -' 6- 2; L. IJ.. ( v. - IJ..) / 6~ -} -E 7 .i j 1. J J - ..l. n-
..e
This reduces to
(6.36)6 p 6
E( u) = (1 + -E) ~ I-L. (d.) + .J? ~ ~ I-L' (d.)n i=l ~ ~ n i j ~ J
ir'j
To find var( U)--We can write
(6.37)
. where
p pU = ~ b.z. = ~ b.z.
i=l ~ ~ j=l J J,
Define
(6.39)
where
~. =>J.
(6.hO)
Then in )(1'
(6.1.j.1)
To find
and
O"u :: ~ ~ /-~.~.O" •. + I-L..:I-L 'O"b b + ••• _7. . - J. J J.J J. J . .]. J ]. J
• 144
From these t111]O equations:
Since
from (6.3°) and
(6.46)
can be written as
ButJ'mZo 0
m kIn
therefore we obtain
{
I if j=k
~ ° otherwtse.,
(6.48) . ~ ~ ~jkIn d d + rTij (_1 1)
O'b •b ,'-'/ i.J i.J 1; v N +-1 J k m n ~ m 1 N2
Using (6.48) in (6041), we get
(6 49) 2 '" '" fA A ( '" 'I' 0 ijkIn d d + O'ij (2: + 1: )J].. °u ,-./ i.J {..J I'" 'I-' jOi' + !-L.!-L. i.J i.J k Ni j 1 J ~ J k m n m 1 N2
Replacing ~i and ~j by their values from (6.39), we reduce this to
2 . 'k . °i 'kIn i' 1 l'(6.50) O'u-v Z Z l, Z /-0
1 oJmO",. +~ !-L,!-L, 7dkd + Z LoO J(._ +~.dh.t:'!-L' 0
::!. j k m - 1J n 1: J - m i j N1 N2 .' 1 J
If we supoose that Z = I, then (6.50) reduces to
•(6.51)
where
2 1\1 A 2 1 (1 1) 2aU u + - Z Z Z Z IJ.-IIJ.. dkd 5" kIn + P -N + -N f.Li·
n i j k m ~ J m lJ '1 2
!:J. = Z Z akm dkdk m m
•
7. Correction term for the variance of the linear discriminant
function.
In this section we shall find the variance of the statistic
(7.1) ~f- ij (- -)U =Z:Za y.-x.i j zi J J
,
which is the same as U with ijs replaced by ija because of the
supposition that n is large. If Nl and N2
are both large then
- -Yj - xj
can be replaced by the corresponding difference in the popu-~~
lations, namely v j - IJ.j' giving for U a linear function of normal
variates. As an improvement we shall find the variance of U* by
taking into account the sampling flucutations due to the difference ~f
sample means.
We have
E(y.) := V.J. 1
Let
E(x·-!J.i)(X'-IJ..) = a i . = E(y.-vi)(y.-v.)1 J J J 1 J J
== E(z.-IJ..)(Z.-IJ..)1 1 J J
d. == \J. - "i1 J. J-
•
146
and/
The correction term for the variance of U* *is the variance of U
on the assumption that z = (zl' z2' •.. , zp) is fixed.
U:1f- "" ":' ')' (Jij(-y. -)£, u - x, z.i j J J ~
can be written as
where
so that
w =r
Y -x. r r ,
(7.8)
Hence
(7.10)
and thus
* /~ 1 ~ ~ ~ij5U = - + - (.J '-- v ow. Z i 'V N1 N2 i j J
f2(J
5U*
•which will give the correction term. Since
147
buti' .
Z cr. (J J = 5Ji J.r r
,
and
z Z crij 1'1. d. := tJ 2 say.i j J. J
(7 .12) gives
i' 2E Z l cr J z.z. = p + 6i j ~ J
Adding this to the variance of the linear discriminant function
we have, for the corrected variance,
(7.16) 2 (1 1 1) A2 (1 + 2: )
(J ::: + - + - L.\ + P -Nu* Nl N2 1 N2
This formula shows that the variance 62 based on the assumption
and N2 are larrge tt is an underestimat e of the correct variance of
the discriminant function, but that the difference approaches zero
as rapidly as N1 and N2 approach infinity.
liN1
• CHAFTER VII
SOME RELI\TED UNSOLVED PROBIEMS
In this chapter we shall describe very briefly some unsolved
problems related to the problem of classification.
1. On classification statistics of Wald and Anderson.
(a) The preceeding discussion deals mainly with the distri-
bution of the approximate statistic v = nm) ,that is the statis
tic whose distribution approximates the distribution of V where
nm..,V =: ----~.:>-2
(l-~)(I-m2)-m3
for large n. 1rJe have discussed mainly the
null case, and much work needs to be done in getting its distribu-
tion in the non-null Gase for the two statistics,
(1) Discussed by 1tfDld L-50 _7(2) Discussed by Sitgreaves L-45 _7 •
(b) The exact treatment of the sampling distribution of V,
both in the central and the non-central cases is still wantinge
2. The quadratic discriminators.
Let ~ and v denote the mean vectors of two p-variate normal
populations, and Zl and Z2 the two covariance matrices. There are
three s:ttuations that may arise in discussing the problem of classi-
fication, namely
•and
( a)
(c)
,
149
If we suppose all the population parameters to be known, then
in these three situations we get the following three statistics,
ij ( )U == Z Z a z. v. - ~.a i j 1. J J
7
where in U, r crij 7 == j-cr .. 7-1 , L-cr.. 7 being the conunon co-a I.. - - l.J- l.J -
d ij ij* Uvariance matrix of the two populations, an ~ , cr in band
U refer to the two covariance matrices in the two populations.c
Thus the distribution problem underlying (2.1) (b) and (c) are those
of a general indefinite quadratic form with zero expectations of the
normal variates in (b) but not in (c). The importance of this prob-
lem has been stressed by Hotelling ;-22 7. ThiS, of course, is under- -the assumption that the population parameters are known which amounts
to saying that N1
-> 00 and N2-> 00, and would be a first step
in discussing the distributions of the statistics
Wb :: Z Z (z. - x. )( z. - i.) ;-sij - sij* 7•• 1. 1. J J- -1. J
and
•r - )( - ) ij - )( - ) ij* 7W = EEL zi - x. z. - x. s - (z. - y. z. - y. s ,
c ij ~ J J ~ ~ J J -
which ~re obtained from Ub and U by replacing the pnpulationc
values in and U by their estimates from the samples.c
3. PossibHity of a different approach.
(a) It may be desirable to discuss the distribution of U =
ij (- -) b . ddt th d ~Tery often ,_'t 4S as zi Yj - x j Y some ~n epen en ms o. ~ ~
good start to examine in what form the non-centrality parameters would
enter into the distribution. The answer to this sometimes ~rovides a
key to the solution of the distribution problem. Furthermore some
questions related to the behavior of the test can be answered even
without finding the actual distribution in the non-null case.
(b) It might be worth while to try some altogether different
approach. It is pass iblG that we run into some si. mpler distribution
problems. Papers of Rao L-32 _7 and Roy L-;5_7 should be useful in
this connection.
4. Efficiency.
(a) The idea of efficiency in problems on classification needs
to be developed systsmatically. Kossack L-26__7 took I-P as the index
of efficiency where P is the common prohability of the; two types of
misclassification ove~ v3ri3tions of the parameters involved. He j
however, considered only the univariate case. Pitman L-37_7 defined
it as the ratio of two sample sizes.
·...,
• 151
These and other idEAS CAn be 8xJmined in this connection.
(b) If therc, Are marc: st"ltisttcs thAn one for the SAm8 situ?-
tion, th,,"1 some m(~3SUrC of rc19tiv·, efficiency is needed.
(c) The discriminant function of R.n. Fisher or the statis
tic ~ Z a ij z.(v. - ~.) ~re b3sed on the ass~~ption that ~l = L2 •i j ~ J J
One important problem that calls for investigJtion is to eXJmine how
good is the lineBr discrimin1nt function when 3ctually ~l f Z2 •
5. The': greater mean vector.
that even in the univariate case of SJrting numerous objects 'mown
to belong to one or the other of two norm~l populations with the same
known variance, the obvious rule of classifying an object to the popu-
l:3tion WhOS,3 meAn is closer to the me·'Jsure of the object, may not bG
the best rulo. Their ob,jc::ctions'1pply to the corresponding multi-
v:)ri9to situ~ltions and should be considered in problems of cl!Jssifi-
cation in multiv3ri:ote 3nAlysis.
• 152
BIBLIOGRAPHY
F 1 7 Anderson, T.W.,"Classification b;r Multivariate Analysis lf,
- - Psychometrika, Vol.XVI(195l), pp. 31-50.
/- 2 7- -
;- 6 7- -
14rion, L. A., liThe Probability Function of the Product oftwo Normally Distributed Variables II, "~nnals of Mathematical Statistics, Vol. XVIII(1947), pp. 26S-271.
Bahadur R.R. and Robbins H. E., '~e Problem of the GreaterMean ll, l\nnals of Mathematical St~ltistics, Vol. XXI (1950),pp. 469-487.
Batemen, Harry, Higher Transcendental Functions. Vol. I andII, McGr C1iI Hill Book Compm y, Inc.,. 1953.
Bose, R. C., liOn the Exact Distribution and Moment Coefficients of the D2-statistics ll , Sankhya, Vol. II (19351936), pp. 143-154.
Chernoff, Herman, "Large Sample Theory!!, Annals of MCjthematical Statistics, Vol. XXVII (1956), pD. 1-23.
Cochran, t.v. G. and Bliss, C. I., "Discriminant Function withCovariance", .';mals of Mathematical Statistics, Vol.XIX (1948), pp. 151-176.
Craig, c. C. "On the Frequency Function of xy", Annals ofMJthematical Statistics, Vol. Vrr(1936), pp.-l-IS.
Cramer, Harold, Mathematical Methods of Statistics. Princeton University Press, 1951.
Fisher, R. "" ''The Us e of Multi pIe He asurements in TaxonomicProblems", Annals of Eu~enics, Vol. VII(1936), pD. 179188.
L-ll_7 F~she:c, F.~ ,~.,.. HTne St[lti6tic~1 Utilization of ~/Qltiple Measurements fJ , Annals of Eup.enics, Vol. VIII(1938), pp. 376-386.
Fix, Evelyn, and Hodges, J. L., "Discriminatory Analysis: NonParametric Discrimination: Consistency Problemsll, Schoolof ~viation Medicine. Project number 21-49-004 (1951).
Ford, W. B., Studies in Divergent Series and Summability. TheMacmillan Co., New York, 1916.
Goursat, Edourad, (Hedrick,E.R.,translator), Ii Course inM·JthemBtical Analysis. Vol. I, Ginn and Company, New York,1904.
.J
..
153
Grad, Arthur, and Solomon, Herbert, "Distribution of QuadraticForms and Some "~pplic ations II, Annals of Malthematic a1Statistics, Vol. XXVI(1955), pp. ~64-477.
Gurland. John, "Distribution of Quadratic Forms and Ratios ofQuadratic Forms II, :mnals of Mathematical Statistic.§, Vol.XXII (1953), pp. 416-427.
Gurland; John, "Distribution of Definite and Indefinite Quadratic Forms", .Annals of Mathemct.ical Statistics, VoLXXVI(1955), pp. 122-128.
Harter, H. L. nOn the Distribution of Waldls ClassificationStotistic II, Annals of l"Iathemat ical Stat istics, Vol. XXII(1951), pp. 58-67.
HotGlling, Harold s llNew Light on the Correlation Coefficientand its Transforms", Journal of the Royal StatisticalSociety, Series B, Vol. XV, No.2 (1953), pp. 193-232.
L-20_7 Hotslling, Harold, Notes on ~pproximation Techni~ues. (unpublished), 1955.
Hotelling, Harold, "Some New Hethods for the Distribution ofQuadratic Forms II ,;~bstract, Annals of ~1athematical Statis~, Vol. XIX (1948), p. 119.
Hotelling, Harold, IlWlul tivIJriate"ncil.ysis ", Statistics andMathematics in l1iolop-y, Iowa State College Press, (1954),Dp. 67-80.
Hotelling, Harold, "Relation SetTrJeen Two Sets of V9riates !',Biometrika, Vol. XXVIII (1936), pp. 321-377.
Hotelling, H::irold, liThe Generalization of Student I s Ratio ll ,
,mnals of Mathematical Stqtistics, Vol.II(1931), pp. 360378.
Hotelling, Harold, IlA Generalized T-test and Measure of Multivariate Dispersion II, Second Berkeley Symposium on iVlathematical Statistics and Probability, University of California Press, 1951, pp. 23-41.
Isaacson, S. L., "P:Klblems in Classifying Populations ll , Statistics and M3thematics in Biology. Iowa State College Press(1954), pp. 107-119.
...
F27 7- -
;-32 7- -
154
Kendall, M. G., Notes on Mill tivariate t:nalyses, Institute ofStatistics, mimeograph series No. 95, (1954).
Kolmogoroff, A. N., Found3tions of the Theory of Probability.Chelsea Publishing Company, New York, (1950).
Kossack, C. F., flS ome Techniques for Simple Classification rr ,Proceedings of the Berkeley Symposium on MathematicalStatistics and probability, (1945-46), pp. 345-352.
Laha, R. G., IDn Some Properties of the Bessel Function Distribution", Bulletin of the Calcutta Mathematical Society.Vol. XVVI, No.1, (1954), pp. 59-72.
MacRobert, T. M., Functions of a Complex Variable. SecondEdition, Macmillan and Company, Limited, London, 1933.
Mann, H. B. [md Wald, Abrahan, nOn Stochastic Limit and OrderRelationship", flnnals of JVIathematical Statistics, Vol. XIV,(1943), pp. 265-275.
McCarthy, M.D., ("'On the Application of z-test to RandomizedBlocks ll
j Annuals of Mathematical Statistics, Vol. X(1939),pp. 337-359.
Mises, R. V" lIOn the CIa ssific.stion of Observed Data intoDistinct Groups", linnuals of J.VIathematical Statistics, Vol.16, (1945), pp. 68-73. .
Neyman, Jerzy, and Pe ar'son, E.S" IfContributions to the Theoryof Testing Statistical Hypotheses ll , Statistical ResearchMemoirs, Vol. 1(1936), pp. 1-161.
Ogawa, Junjiro, "Remark on Wald's Paper On a Statistical Problem ~rising in the Classification of an IndividUal into Oneof Two Groups", Institute of Statistics, Mimeograph SeriesNo.
Pachares, James, "Note on the Distribution of a DefiniteQuadratic FormB
, _Annals of Mathematical Statistics, Vol.XXVI(1955), pp. 128-ljl.
Pearson E.S. and H3rtley, H. 0., Biometrika Tables for Statisticians, Vol. r, cambridge, at University Press, 195&.
.' 155
PeErson, K9rl, Tables o£ Incomplete Beta Functions, CambridgeUniversity Press, 1934.
Pitman, E.J.G., Lecture Notes on Non-Parametric StatisticalInference. (unpublished).
£38_7 Rao, C. R., liThe Utilization of Multiple Measurements inProblems of Biological Classification ll , Journal of theRoyal Statistical Society, Series B, Vol. X (1948), pp.159-193.
Rao, C. R., Advanced Statistical Methods in Biometric Research.John Wiley and Sons (1952), New York.
Rao, C. R., IlA General Theory of Discrb':.ination lk1hen the Information about Alternative Popu.lations is based onS3mples ll , Annals of Mathematical Statistics, Vol. XXV(1954),pp. 651-670.
Robbins, H. E. and Pitman,E. J. G., r,r'"\pplication of the Methodof JVrl.xtures to Quadratic Forms in Normal Variates II, l.nnaJsof Mathematical Statistics, Vol. XX (1949), pp. 552-560.
Robbins, H. E., IlAsymptotically Subminim:ax Solutions of Compound Statistical Decision Problems~l, Proceedings of theSecond Berkeley Symposium in MClthematical Stat istics andProbability, University of California Fress, Berkeley,pp. 131-148"
L-43_7 Roy, S. N., '~n a Heuristic Method of Test Construction, andits Use in l'1ultivariate l\nalysisr,~, Annals of lVIathematicalStatistics, VoL XXIV, 1953, pp. 2~O-238. -
Roy $ S. No, A Report on Some Aspects of Multivari %e Anci.l. ysj_s,North Cl1lrolina Institute of Statistics, Mimeograph Series,No. 121, 1954.
Sitgreaves, Rosedith, liOn the Distribution of Two RandomMatrices used in Classification Procedures fl , Annals ofMathematical Statistics, Vol. XXIII(1952), pp. 263-270.
Smith, C.A.B., "Some Examples of Discrimination ll , A,nnalsofEugenics, Vol. 13(1947), pp. 272-2820
Stekloff, W., lIQuelques l\pplications Nouvelles de la Theoriede Fermeture au Problemede Representction Approchee deMoment.s l!, Memoire de l'"icademie Imperiale des Sciencesde st. Petersbourg. VoL XXXII, NO: 4, (1914), pp.
t.
r!
-e
1506
£)~8 7 S'Zeg0l'Gabor, Orthogonal Polynomials. American !1athematical- Society Colloquium Publication, Vol. XXIII, 1939.
/-49 7 Uspensky, J. V., Introduction to Mathematical Probability.- - McGraw Hill Book Compffiy, Inc., 1937~
i-50 7Wald, Abraham, r~n a Ststistical Problem Arising in the Classi-- fication of an Individual into One of Two Groups rr, Annals
of Mathematical Statistics, Vol. XV, 1944, pp. l45-Ib2:--
L-5l_/1ATald, Abrahan, Selected Papers in Statistics and Probability.McGraw-Hill Book Company, Inc., New York, 1955.
L-52_7 Watson, G. N., Theory of Bessel Functions. Second Edition,Cambridge University Press, 1945.
L-53 7Welch, B. L., tlNote on Discriminant Functions"i, Biometrika, Vol.- XXXI (1939), pp. 218-220.
/-54 7 Whittaker, E. T. and Watson, G. N., A Course of Modern An~Y8is.- - Fourth Edition, Cambridge University Press, 1952.
i-55 7Wilks, S. S., nOn Some Generalizations of the Analysis of Var-- iance", Biometrika, Vol. XXIV. (1932), pp. 471-494.
top related