testing a subset of the parameters of a nonlinear regression model

Testing a Subset of the Parameters of a Nonlinear Regression ModelAuthor(s): A. Ronald GallantSource: Journal of the American Statistical Association, Vol. 70, No. 352 (Dec., 1975), pp. 927-932Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2285459 .

Accessed: 14/06/2014 14:04

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 185.44.78.113 on Sat, 14 Jun 2014 14:04:53 PMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=astata

http://www.jstor.org/stable/2285459?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


Testing a Subset of the Parameters of a Nonlinear Regression Model

A. RONALD GALLANT*

The problem of testing for the location of a subset of the parameters entering the response function of a nonlinear regression model is considered. Two test statistics-the likelihood ratio test statistic and a test statistic derived from the asymptotic normality of the least squares estimator-for this problem are discussed and compared. The large- sample distributions of these statistics are derived and Monte Carlo power estimates are compared with power computed using the large- sample distributions.

1. INTRODUCTION

This study is a continuation of the work reported in [4]. Here, testing for the location of a subset of the parameters entering the response function of a nonlinear regression model is considered rather than a joint test of all the parameters. This problem is more useful in applications but more tedious to deal with analytically.

Two test statistics are considered for this problem- the likelihood ratio test statistic and a test statistic derived from the asymptotic normality of the least squares estimator. The tests based on these statistics are equivalent in linear regression but differ in nonlinear regression. Two approximating random variables with known small-sample distributions are derived; they are asymptotically equivalent to these test statistics. (The mathematical detail is deferred to the Appendix.) A Monte Carlo study is employed to verify that estimates of the power of these tests computed, using the distribution function of the approximating random variables, are sufficiently accurate for use in applications. Last, considerations governing a choice between the two tests are discussed.

A more formal description of the problem studied in this article is the following. We consider testing the hypothesis of location

H: r=To against A:r rT

at the a level of significance, when the data are responses yt to inputs xt generated according to the nonlinear regression model

Yt f (x,t, 0) +et (t =,l, 2 *,n), where the parameter 0 has been partitioned according to 0' = (p', r'). The form of the response function f(x,, 0) is known. The unknown parameter 0 is known to be

* A. Ronald Gallant is atssociate professor of statistics and economics, Institute of Statistics, North Carolina State University, Raleigh, N.C. 27607, currently on leave as visiting assistant professor of statistics, Graduate School of Business, University of Chicago, Chicago, Ill. 60637.

contained in the parameter space Q, which is a subset of the p-dimensional reals, and the first r coordinates p of the parameter 6' = (p', r') are regarded as nuisance parameters. The inputs xt are contained in 9C, which is a subset of the k-dimensional reals. The errors et are assumed independent and normally distributed with mean zero and unknown variance '2.

An example of a problem fitting this description is that of testing whether selected treatment contrasts are zero in a completely randomized experimental design, where some covariate w such as age is known to affect the experimental material by shifting its expectation ex- ponentially. Such a model would be written

zij - u + ti + cxe0wii + eij (i = 1,2, **,I; j= ,2, *, J),

following the usual conventions. Assume for simplicity that I = 2 and the hypothesis of interest is H: tl= t2

against A: t1 # t2. This hypothesis suggests a reparam- eterization to the model

Yt = OlXlt + 02X2t + 04e' 33t 3t+ et (t = 1,2, * . , n) where

Yt Zlt, xt = (1 1, l, lt), et =Elt (t = 1, 2 ,J) y

Yt =z2t, Xt' = (O, 1 W2t), et == 2t (t = J + 1, .., 2J),

and 6' = (tl - t2, u + t2, ,B, a). With this parameteriza- tion the hypothesis of interest becomes H: 01 = 0 against A: 06 X 0. The parameter r is 01 and the remaining parameters p'- (62, 63, 64) are the nuisance parameters. This example is used for the M\4onte Carlo simulations.

2. NOTATION, ASSUMPTIONS AND A PRELIMINARY THEOREM

The notation and assumptions set forth here are in addition to that of [4, Sec. 2], which the reader is pre- sumed to have read.

Notation: Given the regression model

yt = f(xt, 0*) + et (t = 1, 2, ... , n)

where 6* denotes the true but unknown value of 0, the observations

(yt, xt) (t = 1, 2, ... , n)

? Journal of the American Statistical Association December 1975, Volume 70, Number 352

Theory and Methods Section

927



928 Journal of the American Statistical Association, December 1975

and the hypothesis of location

H: r = -rT against A: Tr FrT,

where 0 has been partitioned according to 0' = (p', r'), we define:

F2 (0) = the n X r matrix formed by deleting the last p-r columns of F(0),

P1 = Fi(0*)[F,'(0*)Fi(0*)7-lFu/(o*) (n X n) PIL = I-P1 (nrXn)

n

0 = the p X 1 vector minimizing Z lyt - f(xt, 0) 12 over Q t=1

S2 = (y - f())'(y - f(A)) -p)

p = the r X 1 vector minimizing E {yt -f(x2, (pi ro)) }2 for t=1

(p, r0) in Q, = (y _ f(A))'(y-

&= (y - f(p, ro))'(y -f(, TO))/nf

B = f(*)- f(p, To) (n X 1) po = the r X 1 vector minimizing bo'(p)bo(p) for (p, To) in Q,

C22 = the q X q matrix formed by deleting the first r rows and columns of (F'(0*)F(0*))-l,

C22 = the q X q matrix formed by deleting the first r rows and columns of (F' (O)F(0&))',

U2 = the q X 1 vector obtained by deleting the first r rows of [F'(0*)F(0*)1]-F' (0*)e.

For simplicity we will write F for F(0*), F1 for F1(0*), and &o for 5o(po). Also, the following vectors and sub- vectors are related according to the equalities 0 - (p, A),

0* = (p*, r*) and 00 = (po, re). Distributions: Let F'(t; v1, V2, X) denote the noncentral

F-distribution function with v1 numerator degrees- freedom, v2 denominator degrees-freedom and noncentrality X (see [5, p. 77-8]). The central F-distribution with corresponding degrees-freedom is denoted by F(t; v1, v2). Define H(x; v1, v2, X1, X2) to be the distribution function given by

0, x ?1, l X2 0

G(t/[x - 1] + 2xX2/[x - 1]2; V2, X2/[X -1]2) 0

*g(t; v1,Xi)dt , x < 1, X2 >0?

f N(-t; 2X2, 8X2)g(t; vI, Xl)dt , x = 1, X2 > 0

00

1 - G(t/[x - 1] + 2xX2/[x - ]2; v2, X2/[x -1]2)

*g(t; vi, Xi)dt, x > 1

It is shown in [4] that

H(x; v1, V2, X1, 0) = F'(v2(x - 1)/v1; vl, v2, X1)

An additional assumption is used to obtain the charac- terizations of the test statistics studied in this article. (See [7, p. 305] for its motivation and justification.)

Assumptions (continued): As n increases, ro tends to T* at a rate such that V/nQr0 -r*) and V/n(po -p converge to finite limits.

As the associate editor and a referee have pointed out, while the motivation and justification for this assumption

are the same as in [7], we have departed from custom. Usually, the true state of nature r* is assumed to move toward the hypothesized value r0 as n increases, whereas here, we assume r* is fixed and that r0 varies with n. Also, pa depends on r0. To illustrate, consider testing H: 03 = r0 against A: 063 5? To for the example. In this case P. = [F,'(ro)F1(ro)]-}lF1'(ro)F1(r*)p*. Sequences of appropriate (pO, r,) can be found in consequence of continuity in r, and the fact that Fl'(r*)Fi(r*) must be full rank for sufficiently large n because 2 is assumed to have full rank. In general, the assumption imposes additional structure on the problem. Of course, r =r-*

and pa p* will satisfy the assumption when the null hypothesis is true.

Theorem 1 gives the asymptotic properties of 0, 62, and a2 which are used to characterize the two test statistics studied in this article.

Theorem 1: Let the assumptions just listed and in [4, Sec. 2] be satisfied.

The random variable a converges almost surely to 0* and may be characterized as

9 = 0* + (F'F)->F'e + an

where \/nman converges in probability to zero. The random variable 62 converges almost surely to r2

and its reciprocal may be characterized as

1/a2 -n/e'P-e + an

where nan converges in probability to zero. The random variable 02 may be characterized as

a2 = (e + 60)'P11(e + 5o)/n + bn

wh~ere nbn converges in probability to zero.

3. THE LIKELIHOOD RATIO TEST AND ITS LARGE-SAMPLE DISTRIBUTION

The likelihood of the sample y is

L(y;0, a2) = (2Xr2)-./2

* exp E- (1/2) (y -f(ft))'(y -f())/0f2]

The maximum of the likelihood subject to H is

L (H) = (27ra2)-nl2 exp [-n/2]

and its maximum over the entire parameter space is

L (Q) = (2ir62)0/2 exp [-n/2]

Thus, the likelihood ratio is L(H)/L(Q) = (O2/62h'2 and the likelihood ratio test has the form: Reject the null hypothesis H when the statistic

T (2/62

is larger than c, where P[T > c I H] = a. The computation of the statistic requires that two

nonlinear models be fitted to the data using, say, either Hartley's [6] or MIarquardt's [9] algorithm. The con- strained model

Yt - f (xe (p, ro)) + et



Testing in Nonlinear Regression 929

1. Monte Carlo Power Estimates for the Likelihood Ratio Test

H: 0, = 0 against A: 0, 0 0 H: 03= -1 against A: 03 + -1 Parameters Monte Carlo Monte Carlo

01 03 A2 P[X > c*] P[T>c*] SE X, A2 P[X >c*] P[T> c*] SE

0.0 -1.0 0.0 0.0 .050 .050 .003 0.0 0.0 .050 .052 .003 0.008 -1.1 0.2353 0.0000 .101 .094 .004 0.2421 0.0006 .103 .110 .004 0.015 -1.2 0.8307 0.0000 .237 .231 .006 0.8503 0.0078 .243 .248 .006 0.030 -1.4 3.3343 0.0000 .700 .687 .006 2.6669 0.0728 .618 .627 .007

must be fit to obtain a2 and the unconstrained model

yt = f(xt, 0) + et to obtain a2

The statistical behavior of T is given by Theorem 2.

Theorem 2: Under the assumptions listed in Section 2 and in [4, Sec. 2], the likelihood ratio test statistic may be characterized aS

T = X + c, where nc,, converges in probability to zero and

X- (e + 60)'P1L(e + bo)/e'PLe

The random variable X has the distribution function H(x; q, n - p, Xl, X2), where X1 = bo-(P -P)6o1(2o1 ) and X2 = .2

A relationship between the distribution function of the approximating random variable X and the noncentral F-distribution was mentioned in Section 2. Using this relationship, the critical point c* of X(P[X > c*| I = 0]= O = a) can be obtained from the formula

c* = 1 + qFa/(n-p)

where Fa denotes the upper a 100-percentage point of an F-distribution, with q numerator degrees of freedom and n - p denominator degrees of freedom. This critical point c* may be used to approximate the critical point c of T-in applications one rejects H: T= ro when the calculated value of T exceeds c*.

The calculation of the noncentralities Xi and X2 for trial values of 6* and a2 requires the minimization of the sum of squares E [f(xt, (p, To))-f(Xt, 9*)]2 to obtain po for computling 6o. This may be done using either Hartley's [6] or M\Iarquardt's [9] algorithm with f(9*) replacing y. The remaining computations are straight- forward matrix algebra.

A series expansion of the distribution function H(x; V1, V2, X, 2) is given in [4], which may be used to compute P[X > c*] once X1 and 2 have been obtained. However, in most applications 2 will be small (<.1) and P[X > c*] can be adequately approximated by charts of the noncentral F-distribution (see [1, 10]). The correspondence for the use of these charts is H (c*; V1, V2, , 2)- F'(Fa; vl, V27, ). That is, the charts are entered at the a level of significance with q numerator degrees of freedom, n - p denominator degrees of freedom, and noncentrality ?i = 601(P -PI)60/(2 T2).

A lMonte Carlo study reported in [4] for an exponential model with q = p = 2 and n = 30 suggests that the approximation P[T > c*] P[X > c*] will be fairly accurate in applications. Additional evidence is given in Table 1. Using the example of Section 1, 5,000 trials were generated with the parameters set at n = 30, 02 = 1,

04 = -.5 and cr2 = .001, with 0, and 03 varied as shown in the table. The standard errors in the table refer to the fact that if the approximation is valid, then the Monte Carlo estimator of P[T > c*] is binomially distributed with variance estimated by

P[X > c*] P[X < c*]/5000

4. A TEST STATISTIC DERIVED FROM THE ASYMPTOTIC NORMALITY OF THE LEAST SQUARES ESTIMATOR AND ITS LARGE-

SAMPLE DISTRIBUTION The least squares estimator 0 is asymptotically nor-

mally distributed with mean 0* and a variance-covariance matrix estimated consistently by EF'(d)F(0)]-ls2 under the assumptions of Section 2 (see also [2, 3]).1 This suggests the use of the test statistic

S - (T - To)'C221(T - To) IqfS2.

The test has the form: Reject the null hypothesis H when S exceeds d, where P[S > d I H] = a.

The program used to compute the least squares estimator will usually print 0, [F'(0)F(0)]-1, and s2. If q = 1, the test statistic can easily be computed from the printed results. However, when q > 1 some programming effort will be required to: recover 0, C22, and 82; invert C22; and perform the necessary matrix algebra if accuracy is to be maintained.

The statistical behavior of the statistic S is given by Theorem 3.

Theorem 3: Under the assumptions listed in Section 2 and in [4, Sec. 2], the test statistic based on the asymptotic normality of the least squares estimator may be characterized as

S = Y + Yn.

where -y,, converges in probability to zero and

(U2 + T* - r0)'C22 1(U2 + r* - To)/q

e'P'e/(n - p)

1?: V/n(O - O*) - N(O, 2-12) and a.s.: [(1/n)F'(O)F(0)]-l Z-lq2.




2. Monte Carlo Power Estimates for the Test Based on the Asymptotic Normality of the Least Squares Estimator

H: 01 = 0 against A: 01 o 0 H: 63 = -1 against A: 63 # -1 Parameters Monte Carlo Monte Carlo

o1 03 X P[Y > FJ] P[S > Fj SE A P[Y> Fj P[S > FJ] SE

0.0 -1.0 0.0 .050 .050 .003 0.0 .050 .056 .003 0.008 -1.1 0.2353 .101 .094 .004 0.2220 .098 .082 .004 0.015 -1.2 0.8309 .237 .231 .006 0.7332 .215 .183 .006 0.030 -1.4 3.3343 .700 .687 .006 2.1302 .511 .513 .007

which is distributed as F'(y; q, n -p, X), where

X = (T* - to) C221(T* -To/20_2

The critical point d* of Y(P[Y > d*IX = 0] = a) is obtained directly from a table of the F-distribution by setting d* = Fa, where Fa denotes the upper a 100- percentage point, with q numerator degrees of freedom and n - p denominator degrees of freedom. This critical point may be used to approximate the critical point d of S-in applications one rejects H: T = To when the calculated value of S exceeds Fa.

M\onte Carlo evidence in Table 2 suggests that, under the alternative, the approximation

P[S > d*] -P[Y > d*]

is not as accurate as the approximation of T by X. However, the approximation appears to be sufficiently accurate for use in applications. Table 2 was constructed in the same manner as Table 1.

5. COMPARISON Tests based on S and T are equivalent in linear

regression; they are not in nonlinear regression. This raises the question of which should be used.

Approaching this question on the basis of power, the analytical and iVionte Carlo evidence suggests that when a component of the parameter a enters the model non- linearly, the likelihood ratio test with respect to a hypothesis concerning it has better power than the test based on the asymptotic normality of the least squares estimator. When a component enters the model linearly, the two tests have the same power. However, the evidence presented here is too limited to place much reliance on this generalization. If the question is of critical importance in an application, the noncentralities X1 and X can be computed and compared. Provided X2 is small (<.1), the likelihood ratio test will have better power than the test based on the asymptotic normality of the least squares estimator when X1 is larger than X. It is not necessary to evaluate P[X > c*] or P[Y > d*] if rela- tive power is the only matter of concern; the comparison of noncentralities will suffice.

As regards computational convenience, we have seen previously that for q 1, the test based on the asymptotic normality is more convenient; for q > 1, the likelihood ratio test is more convenient. One qualification

is necessary. Hartley's [6] and Marquardt's E9] algo- rithms do not always converge in practice. Thus, it is possible that 0 can be successfully computed but not p.2 In such situations, p must be computed by other means, e.g., grid search, or the test based on the asymptotic normality of the least squares estimator employed by default.

In most applications, a hypothesis involves only a single component of 0 and, therefore, q = 1. The author's practice is to use the test based on the asymptotic normality of the least squares estimator for computational convenience when the purpose is to assist a judgment as to whether the model employed adequately represents the data. He uses the likelihood ratio test when the hypothesis is an important aspect of the study.

APPENDIX: PROOFS OF THEOREMS Three technical lemmas are needed in the proofs of the theorems.

Lemma 1: Let g(x, X) be a real-valued continuous function defined on S X A, where A is a compact set. Under the assumption of [4, Sec. 2] that the inputs {xt} are chosen so that the sequence of mea- sures { u,, } converges weakly and that the errors { et I are independent and identically distributed with mean zero and finite, nonzero variance:

1. The series (1/n) Et_ g (xi, X) converges to the integral f (x, X)dj.s(x) uniformly in X.

2. The random variable (l/n) g g(xi, X)et converges almost surely to zero uniformly in X.

3. If An converges almost surely to X* in A, then (1/n) g q(xi, In) converges almost surely to fg (x, X*)dA (x).

4. If X, converges in probability to X* in A, then (1/n) g (xI, X.) converges in probability to fJ (x, X*)di (x).

Proof: The family 51 = (g(x, X): X (A I is equicontinuous and bounded (see [8, p. 967-8] for a detailed argument). Applying the proposition of [8, p. 967], we have Part 1. Part 2 is proved as Lemma 3.4 in [3]. Parts 3 and 4 follow from the uniform convergence given in Part 1.

Lemma 2: Under the assumptions listed in Section 2 and in [4, Sec. 2], the random variable p converges almost surely to p*.

Proof: Let Qn(0) = (y - f(0))'(y - f(0))/n = e'e/n + 2e'(f(0*) - f(0))/n + (f(0*) - f(0))'(f(0*) - f(0))/n, which converges almost surely to Q(o) = a2 + ft f(x, 0*) - f(X, 0) 12d, (x) uniformly in 0 by the Strong Law of Large Numbers and Lemma 1, Parts 1 and 2. Consider a realization of the errors let) which does not belong to the exceptional set. Let (Pn) be the sequence of points minimizing Qn(P, to) corresponding to this realization. Since Q is compact, there is at least one limit point p# and at least one subsequence {InmI such that limm,. pn. = p#. As a direct consequence of the uniform con-

2 p is usually a good starting value for computing p.



Testing in Nonlinear Regression 931

vergence of the continuous functions Qn(6) to Q(0), we have q2 < Q(p#, r *) = limmrn.: Qnm(gnm, ro) ? liMm,0. Qnm(p*, To) = Q(0*) = 02 so that fIf(x, Q*) - f(x, (p#, #r*)) 12dA (x) = 0, which implies p = p* by assumption. Thus, excepting realizations in the exceptional set, Pn has only one limit point which is p*.

Lemma 3: Under the assumptions listed in Section 2 and in [4, Sec. 2], the random vector (1/V/n)F'e converges in distribution to a p-variate normal and (1/V/n)F'5, converges to a finite limit.

Proof: By Lemma 3.5 of [3] or Lemma 4.2 of [2], (l/\/n)F'e converges in distribution to a p-variate normal. By Taylor's theorem, (l/\/n)F'b0 = - (1 /\n)F'F(j) (0o - 0*), where C is on the line segment joining 0, to 0*. The assumption that V/n(Oo - 0*) tends to a finite limit and Lemma 1, Part 1 applied to (l/n)F'F(8) imply the result.

Proof of Theorem 1: The, first conclusion is proved as Theorem 3 of [3] or as Proposition 6.1 of [2]. The second conclusion is proved as Lemma 1 of [4].

By Lemma 2, (g, rT) will almost surely be contained in the open subset of Q containing 0*, allowing the use of Taylor's expansions in the proof and causing g to eventually become a stationary point of Q(p, To) = (y - f(p, To))'(y - f(p, To))/n.

We will now obtain intermediate results based on Taylor's expansions which will be used later in the proof. Using Taylor's theorem,

f(p, To) = f(0o) + Fi(T - po) + H(T -p.)

where H is the n X r matrix with typical row Vp'f (xt, o) -V,'f (xt, 0*) + j(g - po)'VP2f(xt, (3, To)) and p is on the line segment joining p to Po. Using Lemmas 1 and 2 and the assumption that /n(0o - 0*) converges to a finite limit, one can show that (1/n)F,'(g, ro)H, (1/n)F,'H, and (l/n)H'H converge almost surely to the zero matrix. Applying Taylor's theorem twice, one can write

[F,'(p, To) - Fl'](e + ,o) = E(Oo - 0*) + D(p - po)

where E is the r X p rmatrix with typical element n

ei- i (O2/1apapi)f(Xt, 9)[eet + f(xt, 0*) - f(xt, 0,)] t-1

and D is the r X r matrix with typical element n

di- = , (d2/dpdp,)f(xe, (p, rO))[et + f(xt, 0*) - f((xt 0)]* t=1

We will write E0 when this matrix depends on Po and will write E and D when the dependence is on p. Using Lemma 1 and the assumed convergence of 0o to 0*, one can show that (1/n)Eo, (1/n).9, and (1/n)D converge almost surely to the zero matrix.

We will now obtain the probability order of g. As mentioned earlier, for almost every realization of the errors { et}, p is eventually a stationary point of (- Vn/2)Q(p, r0) so that the random vector (-\/n/2)VpQ(p, To) = (1/\/n)F1'(p, To)(y - f(p, To)) converges almost surely to the zero vector. Substituiting the expansions in the previous paragraph we have that

[(1/n)F,'(f, To)FI + (1/n)F1'(gi, To)H -P(/n)D]n(p-p0) - (1/Vn)(O- 0*) -(1/Vn)Fi'(e + bo)

converges almost surely to zero. Lemma 1 and our previous results imply that the matrix in brackets converges almost surely to a posi- tive definite submatrix of 2. Lemma 3 implies that (l/Vn)F1'(e + ao) converges in distribution to a r-variate normal; (1/n)\Vn (oo - 0*) converges almost surely to the zero vector. These facts allow us to conclude that

U0 = z~/f[gi- Po- (Fm'Fm)-1F1'(e + 5o)]

converges in probability to zero and that vn = V/n(gi- po) is bou-nded in probability.

The sum of squares

IIY - f(P, ra)112 = Ile + 6. + f(O.) - f (, T.)j12

= jJPi'(e + 65,) + Pi(e + 60) - F -P) - -H(p-p.)112

= liPlL(e + Qo)112 - 2(e + ak)'PilH(- po)

+ IIP,(e + 6) - Fi(- p) - H(- p,)112

The cross product term may be written as

(e + )'P I ,1H (p - pa)

= (e + Q0)'H(p - p.) - (e + 60)'F1(F1'F1)-1F11H(p - pa)

n = v/n(O. - 0*)'((1/n)EO)'vn + -vn'(1/n) L [e) + f(x, 9*)

t=1

- f(xt, 0.)]Vp2f(xt, (,s, Jo))Vn

- 1/-v/n(e + bo)'F1 ( (1/n)F1'Fl)-'( (l/n)F,'H)vn

Each of these terms converges almost surely to zero by Lemmas 1, 2 and 3 and our previous results; By the triangle inequality,

IIPI(e + 6) + F1(p - p.) - H(p - po)j < IlFl[(F,'F1)-'F1'(e + -) - (- po)]j1 + IIH( - po)JI

- (un'((l/n)Fi'F1)un)l + (vn'((l/n)H'H)Vn)' .

The two terms on the right converge in probability to zero by our previous results.

Proof of Theorem 2: Applying Theorem 1 we have

T = X + anb7, + (n/e'Ple)bn + an(e + 6,)'Pil(e + 6o)/n

By Theorem 1, na,nbn and n (n/e'PLe)bn converge in probability to zero. Now an(e + 6,b)'(e + 50)/n bounds the last term from above as Pi- is idempotent. As in the proof of Lemma 2, (e + c5)'(e + 8,)/n converges almost surely to -2 and nan converges almost surely to zero by Theorem 1.

Set z = (1/ao)e, y = (l/a)bo, and R = P - P1. The random variables (zI, Z2, * * * z) are independent with density n(t; 0, 1). For an arbitrary constant b, the random variable (z + b-y)'R (z + by) is a noncentral chi-squared with q degrees-freedom and noncentrality b2-y'Ry/2, since R is idempotent with rank q. Similarly, (z + by)'Pl(z + by) is a noncentral chi-squared with n - p degrees- freedom and noncentrality b2-y'P'y/2. These two random variables are independent because RPl = 0 (see [5, p. 79 ff]).

Let a > 0.

P[X > a + 1] = P[(z + y)'Pl'(z + y) > (a + 1)z'Piz] = P[(z + y)'R(z + -y) > az'Plz - 2y'Plz - y'P'y] = P[(z + y)'R(z + y) > a(z a-1ay)'P'(z - a1y)

- (1 + a1)-y'P'L] 00 PEt > a(z - a-y)'PL(z a-1cy)

- (1 + a-1)-y'P'-y]g(t; q, -y'R-y/2)dt

P[(z - a-'y)'PL(z -a-l)

< (t + (1 + a-1)y'P'y)/a]g(t; q, y'R-y/2)dt

=fG(t/a + (a + l)-y'PLy/a2;n - p, y'P-yl/

[2a2])g(t; q, y'Ry/2)dt

By substituting x = a - 1, X1 = y'R-y/2, and X2 = -y'PLy/2 one obtains the form of the distribution function for x > 1.

The derivations for the remaining cases are analogous and are omitted.

-Proof of Theorem 3: The random variable S may be written

S = ( (n - p)/nq) (T - To) G22'1(t - o/f

+ ((n -p)/r&q)( - To) (0221- -C22-)(f -t2.




The second term on the right will be denoted by un; u1, converges in probability to zero because \/n(r- T*) is bounded in probability by Theorem 1 and Lemma 3, Vn(ro - +*) converges to a finite limit by assumption, 62 converges almost surely to a2, and n-l(C22-' - C22-') converges almost surely to the zero matrix by Lemma 1, Part 3.

Using the expression for 0 given in Theorem 1,

S = ((n -p)/nq)Z2'C22-lZ2/1T

+ ((n -p)/nq)[2a2'C22-'Z2 + aY2C2221a2]/a2 + un,

where Z2 and a2 are formed from (F'F)F'e + 0* - 0, and an by deleting the first r rows. Denote the second term on the right by vn. As noted previously, /nZ2 is bounded in probability, V\na2 converges in probability to the zero vector as stated in Theorem 1, and by Lemma 1, Part 1, the matrix n-lC22-' converges. Consequently, vn converges in probability to zero.

We now use the expression for 1/62 given in Theorem 1 to obtain

S = (Z2'C22-'Z2/q)/(e'P'e/(n - p))

+ ((n - p)/nq)Z2'C22 'Z2a. + V. + Unv

where an is as in Theorem 1. For the reasons noted previously, Z2'C22'1Z2 is bounded in probability, while an converges almost surely to zero. Thus, S = Y + en, where en converges in probability to zero and Y has the familiar form of the test statistic used in linear regression (see [5, p. 77-8]).

[Received August 1974. Revised April 1975.]

REFERENCES

[1] Fox, M., "Charts on the Power of the F-test," The Annals of Mathematical Statistics, 27 (June 1956), 484-97.

[2] Gallant, A.R., "Statistical Inference for Nonlinear Regression Models," Ph.D. dissertation, Iowa State University, 1971.

[3] --, "Inference for Nonlinear Models," Institute of Sta- tistics Mimeograph Series, No. 875, Raleigh: North Carolina State University, 1973.

[4] -, "The Power of the Likelihood Ratio Test of Location in Nonlinear Regression Models," Journal of the American Statistical Association, 70 (March 1975), 198-203.

[5] Graybill, F.A., An Introduction to Linear Statistical Models, Vol. I, New York: McGraw-Hill Book Company, 1961.

[6] Hartley, H.O., "The Modified Guess-Newton Method for the Fitting of Nonlinear Regression Functions by Least Squares," Technometrics, 3 (May 1961), 269-80.

[7] Lehmann, E.L., Testing Statistical Hypotheses, New York: John Wiley & Sons, Inc., 1959.

[8] Malinvaud, E., "The Consistency of Nonlinear Regressions," The Annals of Mathematical Statistics, 41 (June 1970), 956-69.

[9] Marquardt, D.W., "An Algorithm for Least-Square Estima- tion of Nonlinear Parameters," Journal of the Society for Industrial and Applied Mathematics, 11 (June 1963), 431-41.

[10] Pearson, E.S., and Hartley, H.O., "Charts of the Power Func- tion of the Analysis of Variance Tests, Derived from the Non- central F-distribution," Biometrika, 38 (June 1951), 112-30.



testing a subset of the parameters of a nonlinear regression model

Documents