explicit estimators of parametric functions in nonlinear regression

13
Explicit Estimators of Parametric Functions in Nonlinear Regression Author(s): A. Ronald Gallant Source: Journal of the American Statistical Association, Vol. 75, No. 369 (Mar., 1980), pp. 182- 193 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2287409 . Accessed: 15/06/2014 11:15 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 91.229.248.152 on Sun, 15 Jun 2014 11:15:15 AM All use subject to JSTOR Terms and Conditions

Upload: a-ronald-gallant

Post on 21-Jan-2017

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Explicit Estimators of Parametric Functions in Nonlinear Regression

Explicit Estimators of Parametric Functions in Nonlinear RegressionAuthor(s): A. Ronald GallantSource: Journal of the American Statistical Association, Vol. 75, No. 369 (Mar., 1980), pp. 182-193Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2287409 .

Accessed: 15/06/2014 11:15

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 91.229.248.152 on Sun, 15 Jun 2014 11:15:15 AMAll use subject to JSTOR Terms and Conditions

Page 2: Explicit Estimators of Parametric Functions in Nonlinear Regression

Explicit Estimators of Parametric Functions in Nonlinear

Regression A. RONALD GALLANT*

When repetitive estimations are to be made under field conditions using data that follow a nonlinear regression law, a simple poly- nomial function of the observations has considerable appeal as an estimator. The polynomial estimator of finite degree with smallest average mean squared error is found. Conditions are given such that as degree increases it converges in probability to the Bayes estimator and its average mean squared error converges to the lower bound of all square integrable estimators. In an example, a linear estimator performs better than the maximum likelihood estimator and nearly as well as the Bayes estimator.

KEY WORDS: Nonlinear regression; Explicit estimators; Bayes estimator; Average mean squared error; Chemical kinetics; Com- partment analysis.

1. INTRODUCTION

Consider a model from chemical kinetics,

8(y) = 01(e-X62 - e-xl)/(01 - 02)

8(y) is the expected relative yield of a substance y at time x, 01 is the rate at which y is formed, and 02 is the rate at which it decomposes. This model also finds use in compartment analysis. If yields are observed at times xl, x2, ..., xn and observation is subject to error, then the observed yields may be expected to follow the regression model,

yt = 01(e-xt02 - e-xt0l)/(01 - 02) + et

where the errors have mean zero and variance C2. Interest is focused on a particular parametric function.

Typically, the reaction rates 01 and 02 would be of interest. However, the parametric function

g*(0, a) = (01 - 02)>1 ln (01/02)

with 6 = (01, 02)' serves better for illustration; it gives the time at which the maximum yield of y occurs. The asterisk, 9*, is used to indicate the parametric function of interest that, in general, depends on all the parameters of the model although in this instance the dependence on a is trivial. Estimators are known functions of the observed vector y = (yl, Y2, . . ., yX)' and are denoted by 0(y), g(y), and so on. Specifically, an estimator cannot depend on (0, a).

This article is addressed to situations where g*(0, a) is to be estimated repetitively and a premium is placed

*A. Ronald Gallant is Associate Professor of Statistics and Economics, Institute of Statistics, North Carolina State University, Raleigh, NC 27650, currently on leave as visiting Associate Professor of Economics, Department of Economics, Duke University, Durham, NC 27706.

on the simplicity of the computations. That is, a (y) with a simple functional form, say linear

n

9(y) = ao + E atyt t=1

is strongly preferred. One has in mind, say, daily cali- bration of an industrial process.

Since the estimation is repetitive, previous experience may be presumed. It is assumed that one knows roughly what to expect by way of plausible values for (6, a) and that this information may be summarized as a prior distribution on the parameter space. Let -y = (0', a)' and denote the prior by p(y). For example, a smooth approximation of the empirical distribution function of many previous maximum likelihood estimates of (0, a) would suffice.

A classical method of deriving an estimator is to set forth a collection of functions that are acceptable as estimators and to choose the best member of this class according to some criterion. For example, in linear regression attention is restricted to the class of linear functions of the observations and the minimum variance unbiased estimator of a parametric function is sought. In nonlinear regression this approach fails because in most instances the class of parametric functions that admit of unbiased estimators is too small and, worse, minimum variance unbiased estimators will depend on, say, only one or two observations out of 100.

An approach that will succeed is to choose an estimator from a class of polynomial functions of the data according to the criterion of best average mean squared error (AMSE). As notation, if the class of linear estimators is desired, let

Z(y) = (1, Yl, Y2, Y Xn)

so that all linear functions of y have the form a'Z(y); if the class of quadratic functions is desired, write

Z(y) = (1, Yl, . . . , yny yi2, Y1Y2, Y22, y1y3, .* * yn 2

X

so that all quadratic functions of y have the form a'Z(y), similarly for higher-degree polynomials in y. Thus, the estimators of interest have the form 0(y) = a'Z(y) or v = a'z when the argument is suppressed. The criterion of choice is to select that member of the class which

? Journal of the American Statistical Association March 1980, Volume 75, Number 369

Theory and Methods Section

182

This content downloaded from 91.229.248.152 on Sun, 15 Jun 2014 11:15:15 AMAll use subject to JSTOR Terms and Conditions

Page 3: Explicit Estimators of Parametric Functions in Nonlinear Regression

Gallant: Explicit Estimators in Nonlinear Regression 183

minimizes AMSE,

AMSE (j) = f ,E - * (-y) ]2dp (y)

where S, (.) denotes expectation with respect to the distribution of the random variable y. The choice of a that yields the best polynomial estimator is

a [f 8(zz') dpe(y)]fq*e(Y) 8,(z) dpe(y)

The details are given in Sections 2 and 3. Immediately one asks, What are the statistical proper-

ties of the proposed estimator, and how sensitive are these properties to the choice of a prior? These questions are addressed next.

The estimator A (y) is shown to be a Fourier series expansion of the Bayes estimator in polynomials with respect to a weight function; Hermite polynomial expan- sions serve as an analogy. However, unlike Hermite polynomials, the weight function is the marginal distri- bution of the observations, which is denoted as p (y) (p(y) = fr n[y If(0), jadpQ(7), where n[y If(0), a] denotes the density of y). Much experience with Fourier poly- nomial expansions has accumulated and, from this result, one should know roughly what to expect of the poly- nomial estimator in applications. Two specifics are of special statistical interest. The Bayes estimator has the best AMSE achievable of any estimator, polynomial or not. If p (y) possesses a moment-generating function, then the AMSE of # (y) approaches that of the Bayes estimator as the degree of the polynomial increases. If, in addition, the errors are normally distributed, the response function continuous, and the parameter space closed, then the distribution of 0(y) approaches that of the Bayes estimator as degree increases. The details are given in Section 4.

The question of sensitivity to choice of prior may be addressed with these results in hand. The polynomial estimator #(y) inherits its statistical properties from the Bayes rule. The degree of inheritance is directly propor- tional to the degree of 0(y); for the chemical kinetics example a polynomial of first degree suffices as is seen in Section 5. The sensitivity of 0(y) to the choice of a prior should be nearly the same as the sensitivity of the Bayes rule to the choice of prior. Experience acquired in Bayesian analysis should carry over directly.

One might ask why some other polynomial expansion of the Bayes rule is not considered (e.g., a Taylor's series expansion). The answer is that the Fourier series expan- sion is optimal according to the criterion of best AMSE. Any other polynomial expansion must necessarily be inferior.

2. THE STRUCTURE OF THE PROBLEM

The observed data y,, Xt (t = 1, 2, . .., n) are assumed to have been generated according to the statistical model

Yt = f(xt,O@) + et (t = 1, 2, . .., n).

The form of the response function f(x, 0) is known, the unknown parameter 0 is a p vector known to be contained in the parameter space e, the inputs xt are known k vectors, the univariate responses yt are observed, and the errors el are assumed to have mean zero. The distribution function of the errors is denoted as Ne (e I a), where e = (er, e2, ..., en)', an n vector. A typical assumption is that the errors are independent normal, in which case a is univariate, but in general a is an r vector contained in a parameter space Z. The problem of interest is the estimation of a (possibly) nonlinear parametric function 9*Q(z), where y = (0', a')'.

For the example that describes the relative yield at time x of an intermediate substance in a chemical reaction, the response function is

f(x, 0) = 0 (e-x'2 - e-x6)/(01 - 02)

The time at which the maximum yield of the substance occurs is given by the parametric function

9*(01, 02, a) = (01 - 02)'1 ln (01/02)

The approach taken here is to estimate this parametric function directly by an estimator of the form, say,

n 9(y) = ao + E atyt

t=1

where yt is the yield observed at time xt. Note that this approach differs from the procedure of estimation of the parameters 01, 02, a and subsequent evaluation of the parametric function g*(01, 02, a) at the estimated param- eter values.

The estimators are obtained as follows: The observed Yi, denoted as the n vector

y = (Yy ,Y2, . ., yn)'

is transformed according to

z = Z(y)

In the examples of the previous section, the elements of z were a basis for the polynomials of first degree, of second degree, etc. Here, for the purpose of obtaining general results, the elements of z need not be polynomials. For example, the elements of z could consist of a few leading terms of the classical sine/cosine Fourier series expansion. The vector z has length m and typically m > n. The class of estimators g is of the form

9 = a'z X

where a is an m vector are considered. Estimators of this form are explicit functions of the observations, namely, 0(y) = a'Z (y).

The estimator 0 to be chosen from this class is that which minimizes AMSE with respect to a weighting measure p defined on r = e X 2. That is, one seeks to find h of the form h = a'z minimizing

AMSE (v) = f| 6[-q* (y) ]2dJp y)

This content downloaded from 91.229.248.152 on Sun, 15 Jun 2014 11:15:15 AMAll use subject to JSTOR Terms and Conditions

Page 4: Explicit Estimators of Parametric Functions in Nonlinear Regression

184 Journal of the American Statistical Association, March 1980

The notation S&(.) refers to expectation with respect to the distribution of y; that is, with respect to the distri- bution function

NY[y If (0), a] = Ne[y -f (0) 1 ] where

f(6) = [f(xi, a), f(X2, 0), .. .,f(x, 0)]

It is assumed throughout that the components of Z (y) are measurable and square integrable with respect to N1[y If(0), a] and that the functions g* (y), a, 8,EZt (Y) 1, and so on are measurable and square integrable with respect to p.

3. A SAMPLING THEORETIC DERIVATION OF THE ESTIMATOR

The following result is derived in the Appendix. In the class of estimators of the form atz, that choice of a which minimizes AMSE with respect to p is

& = [f |8 (zz')dp(7)] I* ()y)(z) dpQ)

The notation Jr 8-(zz')dp(y) denotes the m X m matrix with typical element

| 67(zizj)dp(ey)

Similarly, fr g*(y) S.(z)dp(y) denotes the m vector with typical element

L 9*(y) 8y(zi)dp(-y)

The notation A- denotes any generalized inverse of A. To illustrate the computations, consider the nonlinear

model e02

y = -1 e202 + e

Le362J

where 8(e) = 0, and 8(ee') = a2I. Suppose that 02 is the parameter of interest whence g* (y) = 02 and that

Z(y) = (1, Yl, Y2, Y3)Y

Let p(y) have the uniform distribution on the unit cube. Then

| sy (Zz')dp (7)

Ili 6le62 Ole202 6le362

1 12e202+o2 612e302 O12e402

o12e402+a 2 612e502

symmetric 612e602+ u2

* dOidO2do

1 (e-1)/2 (e2 _1)/4 (e3-1)/6 (e2 1/6+L 3 e-1/ (e94-1/12

= (e-)1+3,e-)1

symmetric (e6_ 1)/18+kJ3

J q*Q (y) dp (y)

,. , 1 1 I le1e2 I = | ~ | | e26| dOldO2do

Ole3 20

LOie302J

F 1/2 1 1/2

(e2+ 1)/8

(2e3+ 1)/18J a= (.354, -.0572, -.0169, .0697)

and J(y) = .354 - .0572y, - .0169Y2 + .0697y3

Often, in applications, the error distribution Ne(e a) is taken to be the n-variate normal with mean zero and variance-covariance matrix o2I. If the functions Zt(y) are taken as polynomials in y, then the elements of

8,(z) and 8,(zz') may be obtained as the moments about zero of a spherical normal distribution centered at f(0); these will be polynomials in f(xi, 0) and a.

4. BAYESIAN INTERPRETATION OF THE ESTIMATOR

The problem of estimating g*(Qy) may be recast in a Bayesian setting by regarding p as the prior distribution on r and N,(y If(0), a) as the conditional distribution of y given ey with density n[y f(0), a]. Take for the Bayes rule the mean of g* (y) with respect to the posterior density, denote it by g(y). This choice minimizes expected posterior loss for a squared error loss function. Under the previous integrability assumptions, the expected posterior squared error loss of an estimator is numerically the same as AMSE; the different terminology corresponds to the nomenclature of Bayesian or sampling-theoretic inference, respectively. Thus, g(y) is the optimal (un- restricted) choice under either criterion.

For a variety of purposes it may be convenient to approximate the Bayes rule by truncating a series expansion of g(y). Consider a Fourier series expansion in terms of a set of basis functions IZi(y) }f'=. The most natural choice of a weight function, in the present context, is the marginal distribution of the observations

p (y) = fn[y If (),X a]dp (7y)

Let {pi(y) }1 be a sequence of orthonormal basis functions with respect to p (y) generated from the sequence Zi (y) }I-1 by the Gram-Schmidt process (with linear dependencies in the Zt accommodated by deletion). The Fourier series expansion of the Bayes rule is

9(y) = ci cp(y) w=1

with Fourier coefficients

-x f | (y) eP(y)p(y)dyJ

This content downloaded from 91.229.248.152 on Sun, 15 Jun 2014 11:15:15 AMAll use subject to JSTOR Terms and Conditions

Page 5: Explicit Estimators of Parametric Functions in Nonlinear Regression

Gallant: Explicit Estimators in Nonlinear Regression 185

Corollary 1 of the Appendix states that if this expan- sion is truncated at m, then

m

a'Z(y) = , Ci < (y) i=1

(If there are deletions, then the expansion is truncated at m less the number of deletions.)

This result suggests that the choice of a class of func- tions to which attention is to be restricted in the con- struction of an explicit estimator may be regarded as a choice of basis functions for a Fourier series approxima- tion of the Bayes rule. A common choice of basis func- tions for the approximation of a nonperiodic function is the collection of polynomials, for example, the Hermite polynomials. Corollary 2 of the Appendix states that if the predictive density function possesses a moment- generating function and if { Zi (y) } is a basis for the polynomials, then

limm. f [0(y) - 9(y) ]2p (y)dy = 0

Thus if the explicit estimator is a polynomial and the predictive density possesses a moment-generating func- tion, then the expected posterior squared error loss (the AMSE) of the explicit estimator may be made as near the achievable minimum as desired by taking the degree of the polynomial suitability large. This would seem to serve as further justification for the choice of low-order polynomials for the construction of the explicit estimator.

A sampling theorist is interested in the extent to which the explicit estimator constructed from a polynomial inherits the sampling properties of the Bayes rule. Theorems 4 and 5 of the Appendix show that if the errors et are independent normal with common variance ty2 and if f(0) is continuous, then

limm' f [9(y) - 9(y) ]2p (y) dy = 0

implies that 9(y) converges in probability to g(y) for any 7o = (0o', O)' for which p(Fo) > 0 for all open sets Fo containing yo; convergence in probability for such -yo implies convergence in distribution. In this sense, the effect on the sampling distribution of the explicit esti- mator corresponding to some choice of a weighing measure p may be anticipated from the sampling be- havior of the Bayes estimator corresponding to that choice of p.

5. EXAMPLE

Consider the statistical model

Yt = f(xt, 0) + et

with response function

f(x, 0) = O1(e-92 - e-x0)/(O - 02) Oi # 02

= 01xe-x01 Oi = 02,

parameter space

= { (Ol, 02) 01 > 02

and design

1xt} = 1.25, .5, 1, 1.5,2,4, .25, .5, 1, 1.5,2,4 .

This model has appeared several times in the nonlinear regression literature (Box and Lucus 1959; Guttman and Meeter 1964; Gallant 1976), which facilitates comparison with standard methods of nonlinear regres- sion analysis.

As noted earlier, the response function f(x, 0) describes the relative yield at time x of an intermediate substance in a chemical reaction; consider estimation of the time at which the maximum yield occurs:

9* (0) = (01 -

02)-l ln (01/02)

Some additional notation required is: Let F(0) be the n X p matrix with typical element (O/30j)f(xt, 0), where t is the row index and j is the column index,

C = [F' (0)F(O) -1

A weighting measure p, which is expected to be favor- able to the maximum likelihood estimator (MLE), is a product measure comprising a uniform distribution on the ellipsoid (0 - r)'C- (0 - r) < r2 with C evaluated at r and the inverted gamma distribution (Zellner 1971, p. 371) with parameters v and 82. The parametric choices employed here are r = (1.4,.4), s = .025, v = 10, and r' = 8se. These choices were motivated by the study of the sampling distribution of the MLE reported in Gallant 1976. If p is viewed as a prior measure, then 0 has the uniform distribution on the ellipsoid

{0: (0 - r) 'C-1(0 - T)< r21

and a is independently distributed as the inverted gamma distribution. Appropriate transformations yield an ex- pression that is convenient for use with quadrature formulas:

f q(0, o)dp(-y) = [rr2r(a)h-1f ua-1

*1 L~ I qg[R'y + T, 1/Q(yu)I]dyj e-udu ly <r2

where a = v/2, -y = 2/(vS2), and R is obtained by factoring C as C = R'R. The integral within braces may be evaluated by means of a 12-point quadrature formula (Stroud 1971, p. 281) that integrates polynomials in y up to degree seven exactly; the outer integral may be evaluated using a 4-point quadrature formula (IBM 1968, DQL4) that integrates polynomials in u up to degree seven exactly. In terms of the arguments of

g(0, o), if g is a polynomial of degree less than or equal to seven in 0 and of degree less than or equal to v -2 in o, then it will be integrated exactly. The numerical integration procedure is equivalent to the substitution

This content downloaded from 91.229.248.152 on Sun, 15 Jun 2014 11:15:15 AMAll use subject to JSTOR Terms and Conditions

Page 6: Explicit Estimators of Parametric Functions in Nonlinear Regression

186 Journal of the American Statistical Association, March 1980

1. Quadrature Formula

Parameter Value Weight

i Oli 02i ,fi Pi

1 1.52972 .40788 .01824 .01297 2 1.27028 .39212 .01824 .01297 3 1.40000 .43339 .01824 .01297 4 1.40000 .36661 .01824 .01297 5 1.44837 .41539 .01524 .02157 6 1.35163 .40951 .01824 .02157 7 1.44837 .39049 .01824 .02157 8 1.35163 .38461 .01824 .02157 9 1.49649 .43070 .01824 .00923

10 1.30351 .41897 .01824 .00923 11 1.49649 .38103 .01824 .00923 12 1.30351 .36930 .01824 .00923 13 1.52972 .40788 .02625 .05084 14 1.27028 .39212 .02625 .05084 15 1.40000 .43339 .02625 .05084 16 1.40000 .36661 .02625 .05084 17 1.44837 .41539 .02625 .08456 18 1.35163 .40951 .02625 .08456 19 1.44837 .39049 .02625 .08456 20 1.35163 .38461 .02625 .08456 21 1.49649 .43070 .02625 .03618 22 1.30351 .41897 .02625 .03618 23 1.49649 .38103 .02625 .03618 24 1.30351 .36930 .02625 .03618 25 1.52972 .40788 .04231 .01025 26 1.27028 .39212 .04231 .01025 27 1.40000 .43339 .04231 .01025 28 1.40000 .36661 .04231 .01025 29 1.44837 .41539 .04231 .01704 30 1.35163 .40951 .04231 .01704 31 1.44837 .39049 .04231 .01704 32 1.35163 .38461 .04231 .01704 33 1.49649 .43070 .04231 .00729 34 1.30351 .41897 .04231 .00729 35 1.49649 .38103 .04231 .00729 36 1.30351 .36930 .04231 .00729 37 1.52972 .40788 .09843 .00002 38 1.27028 .39212 .09843 .00002 39 1.40000 .43339 .09843 .00002 40 1.40000 .36661 .09843 .00002 41 1.44837 .41539 .09843 .00003 42 1.35163 .40951 .09843 .00003 43 1.44837 .39049 .09843 .00003 44 1.35163 .38461 .09843 .00003 45 1.49649 .43070 .09843 .00001 46 1.30351 .41897 .09843 .00001 47 1.49649 .38103 .09843 .00001 48 1.30351 .36930 .09843 .00001 49 1.40000 .40000 .02500

A~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ of a discrete prior measure p,

f q(O, o)dA(_y) = pig(0i, _i)

for the measure p. This discrete measure is displayed in Table 1. In fact, it may be best to regard A as the prior in the sequel.

The central idea behind the choice of parameter points for Table 1 (and later Table 2) becomes lost in the details. The idea is that a numerical quadrature formula approxi- mates an integral by a weighted average, namely,

N

|rg((y)dp(7y) pig(,yi)

The choice of points Yi = (Of, ai) and weights pi are determined according to numerical analytic considera-

tions so as to achieve an accurate approximation. Table 1 displays the choice of -yi and pi implied the use of the two proceeding quadrature formulas. It seems useful later in the discussion (Table 2) to display the bias and MSE surfaces of the Bayes estimator, the explicit estimator, and the MLE. Since the moments of the Bayes estimator and the M\ILE are somewhat costly to compute and since they must be computed at the 49 points -y displayed in Table 1 to obtain AMSE it then becomes convenient to use these 49 points of the quadrature formula for the purpose of display. This is the reason for the choice of the 49 points yi = (0i, oi) used in Tables 1 and 2.

Two explicit estimators are considered, a linear func- tion in y,

12

#1(y) = aO + E aiyi i=l

and a quadratic function in y,

12 12

02(Y) = ao + E aiyi + E aijyiyj i=1 i<j

The performance of these estimators relative to the Bayes estimator and the MILE is shown in Table 2. This table gives the value of the parametric function, the bias and the root MSE for each estimator at each parameter value of the quadratic formula, together with the root AMSE for each estimator.

These conclusions obtain from Table 2 according to the criterion of average mean squared error. The slight improvement in the performance of the explicit quadratic estimator 02 relative to the explicit linear estimator Oi is not worth the added complexity entailed. The explicit linear estimator O1 is a serious competitor of the Bayes estimator in view of its simplicity. The explicit linear estimator is improvement over the MLE. From the theory, one expects that a polynomial of high degree will dominate the MLE. However, it is surprising that the linear estimator is an improvement. In the figure, which plots (MSE) i for a = .02625, one can see that this behavior obtains due to a fairly close approximation of the Bayes estimator.

The details of the computations are as follows. The moments of the explicit estimators were obtained exactly using the usual formulas for the moments of linear and quadratic functions of normally distributed random variables. The moments of the Bayes estimator were obtained from 2,000 Monte Carlo trials using the control variate method of variance reduction (Hammersly and Handscomb 1964, pp. 59-60). The random-number generator employed was GGNOF (IMSL 1975); the explicit linear estimator was used as the control variate. The estimated standard error of (AMSE)l for the Bayes estimator was .000034695. The moments of the MLE were approximated using the moments of the asymptotic distribution. As a partial check on accuracy, the moments of q(0) were estimated for ~y= (1.4, .4, .025) from an

This content downloaded from 91.229.248.152 on Sun, 15 Jun 2014 11:15:15 AMAll use subject to JSTOR Terms and Conditions

Page 7: Explicit Estimators of Parametric Functions in Nonlinear Regression

Gallant: Explicit Estimators in Nonlinear Regression 187

2. Bias and Root Mean Squared Error (MSE) of the Three Estimators of the Time of Maximum Yield

Explicit Estimators

Linear Quadratic Bayes's Estimator MLE*

i g*(Oi) Bias (MSE)1/2 Bias (MSE)1/2 Bias (MSE)1/2 (MSE)1/2

1 1.17830 .02820 .03270 .02903 .03300 .02040 .02486 .02480 2 1.33851 -.03340 .03727 -.03247 .03686 -.02239 .02801 .02916 3 1.21310 .01704 .02376 .01656 .02329 .01151 .02161 .02613 4 1.29663 -.01514 .02243 -.01559 .02285 -.01149 .02163 .02753 5 1.20910 .01785 .02434 .01749 .02380 .00704 .02027 .02575 6 1.26747 -.00395 .01702 -.00448 .01738 -.00194 .02289 .02735 7 1.23907 .00678 .01789 .00632 .01744 .00263 .02127 .02626 8 1.29970 -.01686 .02363 -.01729 .02419 -.00726 .02199 .02789 9 1.16859 .03204 .03607 .03250 .03614 .02449 .02741 .02481

10 1.28317 -.01000 .01934 -.00994 .01977 -.00710 .02002 .02798 11 1.22640 .01153 .02017 .01166 .01981 .00898 .01908 .02579 12 1.35002 -.03789 .04135 -.03735 .04118 -.02753 .03112 .02909 13 1.17830 .02820 .03692 .02911 .03684 .02694 .03309 .03569 14 1.33851 -.03340 .04102 -.03239 .04099 -.03007 .03685 .04196 15 1.21310 .01704 .02929 .01664 .02885 .01520 .02845 .03761 16 1.29663 -.01514 .02822 -.01552 .02861 -.01471 .02802 .03962 17 1.20910 .01785 .02977 .01756 .02913 .01259 .02806 .03706 18 1.26747 -.00395 .02415 -.00440 .02457 -.00321 .02926 .03936 19 1.23907 .00678 .02477 .00640 .02426 .00469 .02790 .03779 20 1.29970 -.01686 .02918 -.01721 .02982 -.01211 .02969 .04013 21 1.16859 .03204 .03993 .03258 .03973 .03121 .03571 .03571 22 1.28317 -.01000 .02584 -.00986 .02649 -.00914 .02649 .04027 23 1.22640 .01153 .02647 .01174 .02586 .01165 .02563 .03712 24 1.35002 -.03789 .04476 -.03727 .04486 -.03476 .04018 .04187 25 1.17830 .02820 .04765 .02934 .04677 .04050 .04787 .05754 26 1.33851 -.03340 .05089 -.03216 .05172 -.04443 .05237 .06764 27 1.21310 .01704 .04201 .01687 .04159 .02285 .03675 .06062 28 1.29663 -.01514 .04128 -.01528 .04166 -.02189 .03600 .06388 29 1.20910 .01785 .04235 .01780 .04148 .02273 .03812 .05975 30 1.26747 -.00395 .03860 -.00417 .03920 -.00561 .03260 .06345 31 1.23907 .00678 .03899 .00663 .03831 .00851 .03273 .06092 32 1.29970 -.01686 .04194 -.01697 .04278 -.02180 .03849 .06469 33 1.16859 .03204 .05001 .03281 .04921 .04696 .05255 .05756 34 1.28317 -.01000 .03968 -.00963 .04080 -.01437 .03233 .06491 35 1.22640 .01153 .04010 .01197 .03904 .01640 .03301 .05984 36 1.35002 -.03789 .05395 -.03704 .05470 -.05235 .05754 .06749 37 1.17830 .02820 .09368 .03103 .09044 .06311 .06893 .13386 38 1.33851 -.03340 .09538 -.03047 .09924 -.06961 .07975 .15736 39 1.21310 .01704 .09095 .01856 .09057 .03653 .04546 .14104 40 1.29663 -.01514 .09061 -.01360 .09138 -.03638 .04126 .14860 41 1.20910 .01785 .09110 .01948 .08953 .03933 .04596 .13900 42 1.26747 -.00395 .08943 -.00248 .09092 -.00660 .02791 .14761 43 1.23907 .00678 .08959 .00831 .08838 .01261 .02662 .14172 44 1.29970 -.01686 .09091 -.01529 .09282 -.03781 .04393 .15051 45 1.16859 .03204 .09491 .03449 .09222 .07476 .07675 .13392 46 1.28317 -.01000 .08990 -.00794 .09278 -.02217 .02464 .15101 47 1.22640 .01153 .09008 .01365 .08773 .02529 .03212 .13921 48 1.35002 -.03789 .09704 -.03535 .10029 -.08423 .08419 .15702 49 1.25276 .00163 .02275 .00100 .02267 .00072 .02915 .03670

(AMSE) 1/2 .03237 .03234 .03095 .04122

* Bias for MLE was .0 for all i.

existing set of 4,000 simulated values of the least squares estimator 0 (Gallant 1976); the results were est{ [E9(0)]} = 1.25283 and est{var[g (0) I I= .00135907 yielding est{MSE[g(0)]J} = .03687, which compares favorably with .03670, the value reported in Table 2. All code was written in double precision, save the use of GGNOF, using an IBM 370/165.

6. CONFIDENCE INTERVALS

Any procedure for finding an exact confidence region for 'y may be used to find an exact confidence region for

q* (,y). Let r be a random set function depending on y such that

P,(j y: y in ri) > 1 - a

Let G { g* (-y): y in r;since

{y:g*(-y) inG} D {y:y in f}

it follows that

Py({y:g*(y) inG}) > 1 - a

This content downloaded from 91.229.248.152 on Sun, 15 Jun 2014 11:15:15 AMAll use subject to JSTOR Terms and Conditions

Page 8: Explicit Estimators of Parametric Functions in Nonlinear Regression

188 Journal of the American Statistical Association, March 1980

Thus G is an exact confidence interval for g* (y)-exact in the sense that the probability statement is correct, not approximate.

When g* does not depend on o, as in the example, consider

U = ao + A'y

where ao is a q vector of known constants and A is an n X q matrix of known constants. Then, following Hartley (1964),

= {:[U - ao -A'f (0)]'(A'A)-l

*U - ao - Af (0) ]/[qs2 (0) ] < Fa where

s2 (0) = [y -f (0) ]'[I -A (A'A)-'A'][y -f (0) ]/ (n - q)

is an exact confidence region for 0; F. denotes the upper a percentage point of the F distribution with q degrees freedom for the numerator and n - q degrees freedom for the denominator. An exact confidence interval for 9*(0) is

G = {g*(0):0 in A

To illustrate with the example, consider the sample

.31753

.42208

.62973

.56630

.54830

.26603

= .29474

.49830

.58632

.63670

.56983

.26299

generated according to the model with 0 = (1.4, .4) and a = .025 (corresponding to i = 49 in Tables 1 and 2). For ao and A1 take

-1.441261 aO~L =_.193761'

-.27126 .45343

-.33201 .58293

- .18570 .42717

.035484 .14643

.21061 -.077778

.38468 -.30496 A=

-.27126 .45343

-.33201 .58293

-.18570 .42717

.035484 .14653

.21061 - .077778

.38468 - .30496

Root Mean Squared Error of the Explicit Estimator, Bayes Estimator, and the Maximum Likelihood Esti- mator of the Time of Maximum Yield, , = .02625

JMSE .045

.040ML ?4? p i MLE

.035 ~~~~~LINEAR

.030 '\ BAYES

.025 QUADRATIC

1.15 1.20 1.25 1.30 1.35 9*

With these choices, the first element of U = ao + A'y is the best linear AMSE estimator of g*(0) and the second element of U is the best linear AMSE of 01.

The estimate of g*(0) is the value 1.2255 observed for U1 and the exact confidence interval is

G = (1.0730, 1.3138) .

By way of comparison, the MLE of g*(0) is 1.2077 and a confidence interval obtained from the asymptotic theory of the least squares estimator is

(1.1206, 1.2948),

which is approximate, not exact. The true value of g*(0) is 1.2528.

This procedure for finding a confidence interval is entirely analogous to Scheff6's method for finding multiple confidence intervals that are simultaneously valid. The parametric confidence statement "0 in e" may be used for several different functions of 0 yielding a system of confidence intervals that are simultaneously valid. As with Scheff6's method, the procedure is con- servative. In this instance, the interval was 38 percent longer than that found using the asymptotic theory of the MLE of g*(0).

This content downloaded from 91.229.248.152 on Sun, 15 Jun 2014 11:15:15 AMAll use subject to JSTOR Terms and Conditions

Page 9: Explicit Estimators of Parametric Functions in Nonlinear Regression

Gallant: Explicit Estimators in Nonlinear Regression 189

The first attempt to find an exact confidence interval for this sample used the statistic

[U1 - ao1 - A'if(0)]'(A'1A1)-1

*[EU1 - ao - A'if(0)_]/s12(0)

to find the preliminary region (3. However, the region was constrained on the left onlly by the parametric constraint 02 < oN and appeared to continue indefinitely to the right. The addition of the best linear AMSE estimator of 01 as a second component of U remedied the problem and produced a small elliptically shaped region for (.

7. DISCUSSION

It is seen that linear estimators for use in nonlinear regression analysis may be found by truncating a Fourier series expansion of a Bayes rule, and that the linearity of these estimators may be exploited to find exact confidence intervals for a parametric function of interest. Subject to the regularity conditions, it is seen that the AMSE of a polynomial explicit estimator may be made as near the minimum achievable as is desired by taking the degree of the polynomial suitably large, and that the polynomial explicit estimator converges in probability to the Bayes rule as degree increases.

It is interesting to obtain the explicit estimator for a model that is linear in the parameters. The classical problem is the estimation of X'0 for a model of the form y = XO + e, where attention is restricted to estimators of the form est(X'0) = ao + al'y. Rather than find the best linear unbiased estimator, consider the linear explicit estimator that is obtained by solving the system of equations

| 1 Y' (ao) 1 r y yy a, r Y

Upon evaluation of the integrals these equations obtain

1 1 'X' \ao\

Xo XVoX' + XOO'X'+ +2I)2a,)

-X(V + X@ ) where

0 = f Odp a2 a2dp

and fr0 1 (0- )(0 - dp

This system is equivalent to the system

{1 0'X' \({a\ ( '

0+ -2I al/ \XVeI

which has solution

al= X(X'X + .?2V8'Y'XlA

The estimator becomes

est(X'O) = X'(X'X + C2Vo-1)-'(X'y + &2Vo-10)

An estimator of 0 itself may be inferred from this expression.

By virtue of the discussion in Section 4, this estimator is coincident with the Bayes rule wheneiver the Bayes rule is linear in the observations. As an example, let the likelihood be the n-variate normal with mean XO and variance-covariance matrix o2I; let the prior be the natural conjugate prior. The natural conjugate prior comprises a conditional distribution of 0 given r2, which is a p-variate normal with mean j and variance-co- variance matrix r21 and a marginal distribution for r, which is the inverted gamma with parameters s and v > 2. Then

f Odp a, f 2dp= vs2/(v-2)

and

J (0 -) ( - )'dp = [VS2/ (V -2) ]

Substituting into the formula for X'0 one obtains

est(X'0) = X'(X'X + 21f)1 (X'y + 2-l@)

which is immediately recognized as the Bayes rule. The class of estimators of the form

est(X'0) = X'(X'X + &2V-1)-1(X'y + a2V6-lj)

is quite rich. It includes the Bayes rule with normal likelihood and natural conjugate prior and any Bayes rule that is linear in the observations. It includes both the least squares estimator and the ridge regression estimator.

APPENDIX: THEOREMS AND PROOFS

A.1. A Sampling Theoretic Derivation of the Estimator

The estimator is considered in somewhat more gener- ality in the Appendix than in the text as the coefficient vector a of a'Z(y) may be restricted to a linear subspace a of Rm if desired.

Let bi, b2, ... I br be m vectors constituting a basis for a and arrange them as column vectors in the m X r matrix

B = Eb.b2. . :br]

Define gbi(y) = 8y(bi'z) (i= 1, 2, ..., r)

Let H be the r X r matrix with typical element

hij = fgbi (Y)gb,(y)dP(y) (i, j = 1, 2, . . ., r)

and let h be the r vector with typical element

hi- fgb,()g*QY)dPQY) (i = 1, 2, . . ., r),

This content downloaded from 91.229.248.152 on Sun, 15 Jun 2014 11:15:15 AMAll use subject to JSTOR Terms and Conditions

Page 10: Explicit Estimators of Parametric Functions in Nonlinear Regression

190 Journal of the American Statistical Association, March 1980

The matrix H is positive semidefinite since, for an arbitrary r-vector a,

rr

a'Ha = f LE aigbi(y)]2dp(y) > 0 ri=1

Moreover, the equations Ha = h are consistent with a solution

=H-h

where H- denotes any generalized inverse of H. Set r

9(Y) E aiibi (Y) i=1

Lemma 1: Let a be an arbitrary r vector. Then

r ~~~~~r Eg* -_ aigbi(Qy)]2dp(y)

i=1

- (a - a)'H(a - &x) + [g* (y) -(y)]Idp (y)

Proof: Add and subtract g and expand the integrand to obtain

(g* - a cigbi)2dp

f (g*

- #)2dp +

f (9-j aigbi)2dp

+ 2 (g*- -) aigbi)dp

By substituting i = cgbi in the last two terms one obtains

f (g*

-

9)2dp + f [E (ai -ai)9b,]2dp

+ 2 f Ei - ai)gbi(g* - cjgbi)dp

Substitution, using the equations defining H and h, yields

= f (g* - 9)2dp + (x - a)'H(sx- a)

+ 2( - a) (h - H&)

The cross-product term is zero because ut is a solution of Ha = h.

Let V(-y) be the variance-covariance matrix of z with typical element vij(-y). Let V be the m X m matrix with typical element

i = vij (7)dp (y)

Note that V is positive semidefinite since, for an arbitrary rn-vector a,

a'Va = f a'V(>y)adp(^y) > 0

Lemma 2: Let a be an arbitrary choice of an m vector from a and let the r vector a satisfy a = Ba. Then

AMSE(a'z) = a'Va + (a - a)'H(a - a)

+ L [9*(Y) -() ]2dp ()

Proof:

AMSE (a'z) = f var (a'z) + bias (a'z)dp

= f a'V(-y)a + [g*(y) - a'8,(B'z)]2dp (-y) .

By integrating the first term and applying Lemma 1 to the second, one obtains

= aVa + (a - &)'H(a - z) + (g*- 9)2dp

Theorem 1: In the class of estimators of the form a'z with a in a, that choice of a which minimizes

AMSE (a'z) = [a'z -* () ]2dp (7) is

a = B(B'VB + H)-h.

An alternative form for computations is

a = B [B'f 87(zz')dp(&y)B] B'f *y) 8(z)dpQy)

Proof: Rearrange the terms of the expression in Lemma 2,

AMSE(a'B'z) = a'(B'fVB + H)a - 2a'Hui + constant.

This is a quadratic equation in a with stationary point

a = (B'VB + H)-H&

and positive semidefinite Hessian matrix B'VB + H. Note that H& = h to obtain the first form given in the theorem. The second form is obtained by verifying these equalities algebraically:

H = B' 8 (z) 8' (z)dpB

B'VB = B'/ 8(zz') - 8(z)8'(z)dpB

and

h B'f g*8(z)dp

A.2. Bayesian Interpretation of the Estimator

The reference for notation, terminology, and general results is Hewitt and Stromberg (1965, Ch. IV, Sec. 16).

Define the measure u on the Lebesgue subsets of A X r by

()= f fIA (Y,Y)n[yIlf (O),a]dydp (y)

This content downloaded from 91.229.248.152 on Sun, 15 Jun 2014 11:15:15 AMAll use subject to JSTOR Terms and Conditions

Page 11: Explicit Estimators of Parametric Functions in Nonlinear Regression

Gallant: Explicit Estimators in Nonlinear Regression 191

where 1J is n-dimensional Euclidean space and IA (Y, -Y) denotes the indicator function of the set A. Consider the collection of measurable, square integrable functions with respect to this (probability) measure and denote this collection by 22( X r, IA). The inner product between gl, 92 in 22(c) X r, IA) is

(91, 92) = |91(y, Y)92(y, 7)d.(y, 7)

and the corresponding norm is

9 = [(9, a)]2 = [ | 2(y, y)dj(y,)

These definitions compose a Hilbert space on the collec- tion of measurable, square integrable functions with argument (y, y) (Hewlitt and Stromberg 1965, Eq. 16.8).

An estimator, A(y), is a measurable, square integrable function of y only, that is, a member of 22()( X r, I), which is constant with respect to variation in y. The average mean squared error of an estimator is

AMSE[#(y)] = Il# - g*112

the square of the distance between the estimator gA(y)

and the parametric function g*(y). The mean of g*(y) with respect to the posterior density

g (y) = f g* (y)n[y lf (O), cr2]dp (y)/p (y)

the Bayes rule, is the orthogonal projection of the parametric function g* (y) into the collection of estimators with respect to the inner product <, >. That is, the Bayes rule satisfies the Pythagorean identity

II9 - g*112 = lid - kjj2 + jljkg*112

Applying this identity to functions of the form a'Z(y) with a in a, the equation

AMSE[a'Z(y)] = fla'z - gkl2 + Il - g*l2

is obtained. Since the second term does not vary with a, it follows that an attempt to minimize AMSE[a'Z (y) ] by varying a represents an attempt to minimize fla'z - gll.

Note that neither a'Z(y) nor g(y) depend on a. The dimension of the latter minimization problem may, therefore, be reduced and attention may be restricted to measurable functions on cJ that are square integrable with respect to the measure P defined on the Lebesgue subsets of 'y by

V(A) ff IA((y)dI(y, Y)

That is, to minimize lla'z - g*11 with respect to the norm on 22 ( X F, JI), it suffices to find a in a that minimizes lIatz - gll with respect to the norm on 22(c), V). We have, by Theorem 1, that the choice

a-=B [B/f &6z (z')dp(Y)B] B'f g*(7) ()p)

minimizes AMSE[a'Z (y)] whence this choice is the solution of both of these minimization problems. For computations, we have by Fubini's theorem that for integrable g (y),

f g(y)dv(y) = ff g(y)n(yIf(O), o)dydp(y)

- I g(y) n(y If(8), a)dp(y)dy

- f g(y)p(y)dy

The foregoing discussion is summarized as Theorem 2.

Theorem 2: Let k(y) be the Bayes rule that minimizes expected posterior squared error loss when estimating g* (y) with prior measure p on r. Let p (y) be the marginal density of the observations,

p (y) = f n[y If (O), ojdp (y)

The estimator #(y) of Theorem 1 is the best approxima- tion of the Bayes rule in the class of estimators of the form a'Z(y) with a in (a in the sense of minimizing the distance

Ila/z-glj = [f| [a'Z(y) - g(y)]2p(y)dy]

Consider the sequence of functions {fp(y):i = 1, m - d generated by the Gram-Schmidt orthonormaliza- tion process from {Z,(y):i = 1, .. ., m; the ortho- normalization is with respect to 22 (Y, v); m - d refers to the fact that there may have been linear dependencies a.e. among the {Zi(y) :i = 1, ..., m} that were elimi- nated by deletion before orthonormalization. Note that it is a property of the Gram-Schmidt process that to within a null set the collection of functions of the form E'=, aiZ,(y) is the same as the collection of func- tions of the form cjzd Cjpj(y) provided that - = Rm. By Theorem 16.16 of Hewitt and Stromberg (1965) the choice

Ci= (y) 'i (y)p (y)dy

minimizes ilk- _ l-7d cjpjll regarded as a function of (ci, C2, . . ., CmId) and that this choice is unique. But

j=1 adiZi(y) is of the form Ij7 ci=C(y) to within a null set and minimizes Ilg- m- d c1 pl1 by Theorem 2 whence it follows that 7'-j-d czoi(y) = am=i &Z(y) a.e. Note that the choice of (Cl, C2, ..., Cm d) is unique but the choice of a may not be. We may summarize this dis- cussion as Corollary 1.

Corollary 1: Let g(y) be the Bayes rule that minimizes expected posterior squared error loss when esti- mating g*Qyz) with prior measure p on r'. Let p (y) be the marginal density of the observations and let {Zi(y):i = 1, .. ., m} be a subset of ?2(y, v) where v is

This content downloaded from 91.229.248.152 on Sun, 15 Jun 2014 11:15:15 AMAll use subject to JSTOR Terms and Conditions

Page 12: Explicit Estimators of Parametric Functions in Nonlinear Regression

192 Journal of the American Statistical Association, March 1980

defined by

v (A) = IA(Y)P(y)dy

Let {I j(y):i = 1, . . ., m - d} be an orthonormal se- quence generated from {Z,(y):i = 1, ..., ml by the Gram-Schmidt process. Then the estimator O (y) of Theorem I with e = Rm satisfies

m-d

9(y) = E cipi(y) a.e. v i=1

where c, = jfy g (y) (pi (y)p (y) dy. It is of interest to have a sufficient condition such that

the distance I9 - kil between the Bayes rule g and the explicit estimator 9 may be made arbitrarily small by taking 9 to be a polynomial of sufficiently high degree. Theorem 3 states such a condition.

Theorem 3: Let v be a probability measure defined on ly, the n-dimensional real numbers, and let { Zi(y) }I-= be a basis for the polynomials on ly. If the moment-gener- ating function, f'i exp (u'y)dv(y), exists on an open neighborhood of the zero vector, then { Z(y) }f=- C 22('Y, v) and {Zj(y)}1X=l is complete; that is, f C ?2(2j, v) and (f;zi) = 0 for i = 1, 2, ... implies f = 0 a.e. v. In consequence, if { 'pi(y) }-I is a sequence of orthonormal functions generated from { Z (y) } = by the Gram-Schmidt process, if f C 22(qJ, v), and if fm(y) = Ef=1 ciq'(y), where ci = (f, <,), then

lim llfm - fff = 0 m-*=

Proof: Existence of the moment-generating function implies all moments of a distribution exist whence

{Zi(Y)}f-= C ?2(CW v). Let aj > 0 be such that f e2u 'Ydv(y) < c* for -8j < uj< Kj j = 1, 2, . . ., n. Let f E ??(qJ, v) be given such that (f, Z,) = 0 for i = 1, 2. Consider the function

f (z) = f f(y) exp (z'y)dv (y)

where z is of the form z = u + iv and u, v 9 J. Now

exp (z'y) I = I exp (uy) [cos (v'y) + i sin (v'y)] I < Iexp (u'y)I

which shows that /(z) exists if - j < uj < bj, (j = 1, 2, .. ., n). Decompose f as f = f -f- where f+ and f- are nonnegative. Suppose that it can be shown that J(iv) 0 then it follows that

f+ (y) exp (iv'y)dv (y) = f-(y) exp (iv'y)dv(y)

The same constant multiple will normalize both f+ and f- so that the measures defined by P+(A) = c fAf+(y)dv(y) and P-(A) = c fA ft(y)dv (y) are probability measures. But the preceding equation implies that these two probability measures have the same characteristic func-

tion. Consequently, f+ = f- a.e. v whence f = f+-f- = 0 a.e. v. Thus, if it can be shown that /(iv) _ 0, then {Zi (y) }-1 is complete.

Now

cf(z) = f exp (z'y)dP+(y) - exp (z'y)dP-(y)

and by applying Theorem 9, Chapter 2, of Lehmann 1959, p. 52, to each function on the right, one can conclude that f(z) is an analytic function of the single variable zj = uj + ivj over the region - aj < uj < 6j, - oo < v < oo provided the remaining variables are held fixed at points with - ,j < uj < bjfor .j' # j. By Hartog's Theorem (Bochner and Martin 1948, p. 140) it follows that f/(z) is analytic over the region - aj < uj < bj, -0o < vj < oo (j = 1, 2, . .., n). The same inductive argument used by Lehmann in the proof of Theorem 9 to show that all partial derivatives of the form (om/ozjm)nf(z) may be computed under the integral sign may obviously be used to conclude that any partial

(Om/Hji= azj,mi)f(z) may be computed under the integral sign. Now

m

(am!/ H Zjmi)f(O)

i=1~~~~~~~~~~~~~~,

=](a_/ HI Ozjp)f (y) exp (z'y)dp (y) Iz=o jj1

= ]JH dym/Iif (y)dP(y) j=1

=0

since 117=1 yjm1 is some finite linear combination of the Zj(y) (i = 1, 2, .. .). Since f(z) is analytic then f(z) = 0 over the region - j < uj < -j -oo <v< oo (j= 1, 2, .. ., n) and in particular f (iv) 0 for -oo < Vj < oo (j 1, 2, ..., n) as was to be shown.

The claim that limmn,flfm - f 1 = 0 follows directly from Theorem 16.26 of Hewitt and Stromberg (1965, p. 245).

The following result follows immediately from Corol- lary 1 and Theorem 3.

Corollary 2: Let g(y) be the Bayes rule that minimizes expected posterior squared loss when estimating g*(y) with prior measure p on r. Let the moment-generating function of p(y), the marginal density of the observa- tions, exist in an open neighborhood of the zero vector. Let { Zi(y)}- be a basis for the polynomials. For each m let

Om (Y) = E iZi (y) i=O

be the estimator given by Theorem 1 with a = Rm. Then

lim f [Om (Y) - g(y) ]2 p(y)dy = 0

This content downloaded from 91.229.248.152 on Sun, 15 Jun 2014 11:15:15 AMAll use subject to JSTOR Terms and Conditions

Page 13: Explicit Estimators of Parametric Functions in Nonlinear Regression

Gallant: Explicit Estimators in Nonlinear Regression 193

If 9m converges in mean square to g with respect to the predictive density p(y) then 9m converges in both proba- bility and distribution to g with respect to the predictive density p(y). From the sampling theory point of view it is of more interest to know whether or not 9m converges in probability and distribution to g with respect to the conditional distribution N (y If (0), a). A sufficient condi- tion is given as Theorem 4.

Theorem 4: Let v denote the probability measure on the measurable space (ij, 63) determined according to v(B) = fr f'y IB(y)n(y If(8), a) dydp(-y), and let v( y a) denote the probability measure determined by v (B 1) = fj IB(y)n[y If(8), cr]dy. Let .m(y) converge in proba- bility to g(y) on (y, 63,yv). If {v(BIy)jBE} is an equi- continuous family at -yo and if for every open set ro containing Yop(ro) > 0, then Om(y) converges in proba- bility to g(y) on [ J, 63, v(* Iyo)].

Proof: Given 6 > 0 choose an open set ro containing yo such that for all B E 631 v(BIyo) - v(B y)I < a/2. Given e > 0 choose M so that m > M implies v(Bm) < p(ro)6/2 for Bm = {y: | m(y) - g(y) I < e}. Then

P(BmI|yo) < [1/p(Fro)]f P(BmIy)dp(y) + 8/2 r0

< [l/p(ro)]j P(Bmly)dp(y) + 6/2

< v(Bm)/p(ro) + a/2

< 6.

A proof that the normal distribution generates an equicontinuous family { v(B y) IBE when f(0) is con- tinuous is given next. It is straightforward and could be applied to many common distributions.

Theorem 5: Let f(G) be a continuous function on a closed set 0, let n[y f(0), a] be the n-variate normal density function with mean vector f(O) and variance- covariance matrix f2I, and let 6 denote the Lebesgue subsets of Rn. Then the family { v(B Y)]BeCS. where v(B y) = fB n(y If(O), a)dy is equicontinuous for every y = (6', a)' E 0) X (0, oo).

Proof: Fix yo E e X (0, oo). Let S be a bounded open sphere containing 0o, let S denote its closure and set i = 0 C S whence e) is compact; set

= {o: o2/2 < U2 < 3ff2o/2}

Giveni e > 0, there is a radius r such that each closed sphere Ye with radius r and center f(0) satisfies v (Ye I y)

> 1 - e/4 for all a & 2. Then Y = UJYe is compact, since the continuous image f[E] of a compact set 0- is compact. Now the normal density function n[y f(O), a] is continuous in (y, 0, a) on the compact set Yf X i X S, hence uniformly continuous. Then choose c such that

I (y, 0', o) - (y', 0'o, 0o) I = 0 a) - (Wo, ao) I< <

implies

I n[y If(0), a]- n[y If(0o) ao] I < c(2 fdy).

Now

Iv(BI-y) - v(BI-yo)I = Iv(B YIy) + v(BC r) Iy)-v(Bf r Iyo)

-v(Br) -YIy)I < Iv(Bn'i Yy) - v(Bf YIyo)I + E/2

= n(y Iy) - n(y |yo)dy + e/2

< f ej2 f dy)dy + e/2

<_.

Thus, { v (B y) BEB is equicontinuous at Yo.

[Received March 1977. Revised June 1979.]

REFERENCES

Bochner, Solomon, and Martin, William Ted (1948), Several Complex Variables, Princeton, N.J.: Princeton University Press.

Box, G.E.P., and Lucas, H.L. (1959), "Design of Experiments in Non-Linear Situations," Biometrika, 46, 77-90.

Gallant, A. Ronald (1976), "Confidence Regions for the Parameters of a Nonlinear Regression Model," Institute of Statistics Mimeo- graph Series, No. 1077, Raleigh: North Carolina State University.

Guttman, Irwin, and Meeter, Duane A. (1964), "On Beale's Mea- sures of Non-Linearity," Technometrics, 7, 623-637.

Hammersly, J.M., and Handscomb, D.C. (1964), Monte-Carlo Methods, New York: John Wiley & Sons.

Hartley, H.O. (1964), "Exact Confidence Regions for the Param- eters in Non-Linear Regression Laws," Biometrika, 51, 347-353.

Hewitt, Edwin, and Stromberg, Karl (1965), Real and Abstract Analysis, New York: Springer-Verlag.

IBM Corporation (1968), System 360 Scientific Subroutine Package, Version III, White Plains, N.Y.: Technical Publications Dept.

IMSL (1975), IMSL Library 1, Fifth Ed., Houston, Tex.: Inter- national Mathematical and Statistical Libraries.

Lehmann, E.L. (1959), Testing Statistical Hypotheses, New York: John Wiley & Sons.

Stroud, A.H. (1971), Approximate Calculation of Multiple Integrals, Englewood Cliffs, N.J.: Prentice-Hall.

Zellner, Arnold (1971), An Introduction to Bayesian Inference in Econometrics, New York: John Wiley & Sons.

This content downloaded from 91.229.248.152 on Sun, 15 Jun 2014 11:15:15 AMAll use subject to JSTOR Terms and Conditions