adventures with arima software

9
ELSEVIER International Journal of Forecasting 10 (1994) 573-581 Paul Newbold*, Christos Agiakloglou, John Miller Department of Economics, University of Illinois, Urbana-Champaign, Urbana, IL 618290, USA Abstract Many software packages are available for fitting autoregressive inegrated moving average models to time-series data. The practitioner is faced with a wide choice of programs for fitting these models. As we illustrate in this paper, the results obtained can depend substantially in important respects on the particular choice that is made. Our examples serve as a basis for exploring the causes of this phenomenon. Keywords: Autoregressive integrated moving average models; Parameter estimation; Software differences 1. Introduction Over the years since the methodology was introduced by Box and Jenkins (1970), the fitting to time-series data of autoregressive integrated moving average (ARIMA) models, for forecast- ing and other purposes, has become common practice. Software for carrying out the necessary computations is now widely available as part of statistical packages. Of course, the same is true of many well-established statistical techniques, such as multiple regression. However, there is an important distinction, which we will illustrate in this paper. Except in a few pathological, typical- ly artificial, cases, fitting the same model to the same data will yield more or less identical results whatever software is used for multiple regres- sion. That is not the case for the estimation of the parameters of ARIMA models. This can be a source of considerable confusion and frustration * Corresponding author for novice users whose primary interest is in application rather than theoretical details. In trying to put ourselves in the position of the ‘novice user’, we have in mind those with a general understanding of the methodology who wish to try it out on real data. This group might include practicing applied statisticians and stu- dents, as well as their instructors, in under- graduate and masters level courses with an applied orientation. Such users are very likely to find themselves in the position of being unable to replicate results obtained by other users or published in textbooks, purely as a consequence of employing different software. This phenom- enon is certainly known to specialist time-series analysts. However, its sources involve details that may be obscure to many practitioners, and that many instructors will feel are beyond the scope of applied courses. We know of no other statistical methodology of such wide application where this issue arises. An illustration of the possible magnitude of the problem and of some of its causes therefore seems worthwhile. 0169-2070/94/$07.00 0 1994 Elsevier Science B.V. All rights reserved SSDI 0169-2070(94)00537-M

Upload: paul-newbold

Post on 28-Aug-2016

221 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Adventures with ARIMA software

ELSEVIER International Journal of Forecasting 10 (1994) 573-581

Paul Newbold*, Christos Agiakloglou, John Miller Department of Economics, University of Illinois, Urbana-Champaign, Urbana, IL 618290, USA

Abstract

Many software packages are available for fitting autoregressive inegrated moving average models to time-series data. The practitioner is faced with a wide choice of programs for fitting these models. As we illustrate in this paper, the results obtained can depend substantially in important respects on the particular choice that is made. Our examples serve as a basis for exploring the causes of this phenomenon.

Keywords: Autoregressive integrated moving average models; Parameter estimation; Software differences

1. Introduction

Over the years since the methodology was introduced by Box and Jenkins (1970), the fitting to time-series data of autoregressive integrated moving average (ARIMA) models, for forecast- ing and other purposes, has become common practice. Software for carrying out the necessary computations is now widely available as part of statistical packages. Of course, the same is true of many well-established statistical techniques, such as multiple regression. However, there is an important distinction, which we will illustrate in this paper. Except in a few pathological, typical- ly artificial, cases, fitting the same model to the same data will yield more or less identical results whatever software is used for multiple regres- sion. That is not the case for the estimation of the parameters of ARIMA models. This can be a source of considerable confusion and frustration

* Corresponding author

for novice users whose primary interest is in application rather than theoretical details.

In trying to put ourselves in the position of the ‘novice user’, we have in mind those with a general understanding of the methodology who wish to try it out on real data. This group might include practicing applied statisticians and stu- dents, as well as their instructors, in under- graduate and masters level courses with an applied orientation. Such users are very likely to find themselves in the position of being unable to replicate results obtained by other users or published in textbooks, purely as a consequence of employing different software. This phenom- enon is certainly known to specialist time-series analysts. However, its sources involve details that may be obscure to many practitioners, and that many instructors will feel are beyond the scope of applied courses. We know of no other statistical methodology of such wide application where this issue arises. An illustration of the possible magnitude of the problem and of some of its causes therefore seems worthwhile.

0169-2070/94/$07.00 0 1994 Elsevier Science B.V. All rights reserved

SSDI 0169-2070(94)00537-M

Page 2: Adventures with ARIMA software

514 P. Newbold et al. I International Journal of Forecasting 10 (1994) 573-581

In this paper we report analyses of five time series through every package we could find on the Urbana-Champaign campus of the University of Illinois. Some packages allow more than one estimation procedure, in which case all were used. We tried to put ourselves in the position of the novice, or unsophisticated, user. For exam- ple, although many programs allow the user a range of optional modifications, we generally ran them in default mode.

Our ‘sample’ of time series is very small, and was certainly not randomly chosen. Indeed, we looked for cases where we would expect to find, on a priori grounds, differences in results among programs. However, we did not have to look very hard: three of our five series are taken from the same secondary source. We would not view any of our examples as remotely pathological. None of our series is short by the standards deemed adequate by the textbooks. Perhaps one of our models might be regarded as being gener- ously parameterized, but by no means ridiculous- ly so. In short, we believe that novice users will inevitably encounter many similar cases.

The set of programs we have employed does not include all of those currently available, but is more than adequate to illustrate our point. We must emphasize that it is not our intention to critically review these programs or to compare their merits. This is not a competition at the end of which a winner will be announced. For that reason, the names of the packages used have been suppressed. Our aim is to illustrate what can and, we believe, often will happen when the same model is fitted to the same data by differ- ent programs.

2. Some sources of differences in estimates

Assuming that the error terms generating an ARIMA model are normally distributed, the exact likelihood function can be computed, al- lowing full maximum likelihood estimation (ML). Simulation evidence reported by Ansley and Newbold (1980) suggests a preference for ML over alternatives that are approximations to

full maximum likelihood in moderate-sized sam- ples. However, many commercially available software packages implement one or both of the computationally convenient approximations to ML proposed by Box and Jenkins (1970). These are conditional least squares (CLS) and the backcasting method (BC). (The latter is some- times called ‘backforecasting’.) Two of our pack- ages implemented ML, six implemented CLS, and seven implemented BC. In addition, two packages implemented other different approxi- mations to full maximum likelihood. (These are denoted ‘Other 1’ and ‘Other 2’ in our tabulated results.)

Since the objective functions maximized differ among the packages, we would naturally expect some differences among the point estimates of the ARIMA model parameters. The results presented in Section 3 illustrate just how large these differences can be in practice, and conse- quently how much divergence among forecasts might result when the same data and model are analyzed through different programs.

Whatever procedure is employed, point esti- mation can be viewed as the minimization of the sum of squares of quantities that are non-linear functions of the parameters. Therefore, software must incorporate a non-linear regression algo- rithm. This is another potential source of differ- ences among estimates. The algorithms require initial estimates of the parameters, and optimi- zation proceeds iteratively until convergence is achieved. Termination may occur after some specified number of iterations or when relative change in either the objective function or the estimates is less than some specified amount. Many programs allow the user to specify initial values or termination criteria. However, putting ourselves in the position of the novice user, we ran all programs in default mode, with one exception. When it was indicated that the default maximum number of iterations had been exhausted, we increased that number. We sus- pect this to be common practice among users. The example given in Section 4 illustrates how differently structured optimization algorithms can yield very different parameter estimates.

Page 3: Adventures with ARIMA software

P. Newbold et al. I International Journal of Forecasting 10 (1994) 573-581 575

3. Four seasonal series

In this section we report the analyses of four monthly seasonal time series. These are:

Series A (168 observations): Natural logarithms of index of kilowatt hours of electrici- ty used.

Series B (120 observations): Housing starts. Series C (120 observations): Housing sales. Series D (77 observations): Common logarith-

ms of sales of a company. The first three series were taken from Pan-

kratz (1991, pp. 133-134, 219-220, 366, 370- 371). To all of these we fitted the ‘airline model’:

(1 - B)(l - B12)X, = (1 - W)(l - 0B12)Er,

where B is the back-shift operator and E, is an innovation error presumed to be white noise. Series D was taken from Chatfield and Prothero (1973), and to this we fitted the model:

(1 - +B)(l - B)(l - B’*)X, = (1 - @B~*)E, .

(1)

For all series, the models fitted and the trans- formations employed were those of the original source.

Simulations reported by Ansley and Newbold (1980) suggest that, in the aggregate, the largest differences in the performance of different es- timators occur when the true values of seasonal moving average parameters are close to the boundary of the invertibility region (1 O( < 1, in the case of the models here). Accordingly, in choosing our series we looked for reported estimates in this area. In addition, since the Chatfield/Prothero paper occasioned such lively debate on its publication, we were intrigued to speculate on what these authors would have found given access to software available today.

Table 1 shows coefficient estimates for the four series, together with their reported esti- mated standard errors, from 17 programs. To what extent our ‘novice user’ would be perplexed by differences of these magnitudes in the esti- mates is a matter of speculation. These differ-

ences are least for series A and greatest for series D, but since the former has as many as 168 observations (or 155 after differencing), it would not be unreasonable to expect more consistency. As predicted, the largest differences in the point estimates are for the seasonal moving average parameter, where the ranges are 0.803-1.000 for series A, 0.754-1.000 for series B, 0.731-1.000 for series C, and 0.447-1.000 for series D.

Rather than discuss further our results series by series, it is more appropriate to do so in terms of estimation method. This is so since it appears that each program has achieved its objective: the major differences arise because the sums of squares functions being minimized are not the same.

Maximum likelihood. The two maximum likelihood programs produced very similar point estimates. For series B and C, ML 2 gave estimates of precisely one for the seasonal mov- ing average parameter, while the corresponding ML 1 estimates were close to one. Such esti- mates might suggest over-differencing. On the other hand, they could reflect the ‘pile-up’ effect discussed by Cryer and Ledolter (1981) and Shephard and Harvey (1990). These authors show that even when the true parameter is well inside the invertibility region, the likelihood function can quite often have a global maximum on the boundary. The user might respond to these estimates by fitting an alternative model involving seasonal dummy variables. In this context, it is interesting to note that the CLS and BC estimates of these parameters are signifi- cantly less than one.

Conditional least squares. The six CLS pro- grams produced very similar results for all four series. Simulation results of Ansley and Newbold (1980) suggest that, when the true value of the seasonal moving average parameter is close to the boundary of the invertibility region, con- ditional least squares estimators can be seriously biased towards zero. Certainly the conditional least squares estimates are substantially lower than the maximum likelihood estimates for series B, C, and D. However, this outcome is not inevitable, since for series A the maximum

Page 4: Adventures with ARIMA software

576 P. Newbold et al. I International Journal of Forecasting 10 (1994) 573-.%‘I

Table 1 Parameter estimates for some seasonal models

Series A

Package e &

Series B Series C Series D

e 6 e & & a

ML1

ML2

CLS 1

CLS 2

CLS 3

CLS 4

CLS 5

CLS 6

BCl

BC2

BC3

BC4

BC5

BC6

BC7

Other 1

Other 2

0.693 0.803 (0.057) (0.076) 0.694 0.804

(0.056) (0.075)

0.697 (0.058) 0.696

(0.059) 0.698

(0.058) 0.697

(0.058) 0.697

(0.067) 0.697

(0.058)

0.859 (0.052) 0.859

(0.053) 0.859

(0.052) 0.859

(0.052) 0.859

(0.053) 0.859

(0.053)

0.704 (0.057) 0.705 (0.057) 0.700 (0.057) 0.700 (0.059) 0.700 (0.058) 0.708 (0.066) 0.704 (0.057)

0.893 (0.024) 0.893 (0.025) 0.876 (0.026) 0.943 (0.015) 0.887 (0.059) 0.946 (0.025) 0.893 (0.026)

0.691 (0.059) 0.693 (0.059)

1.000 (0.161) 0.803 (0.048)

0.270 (0.087) 0.269

(0.085)

0.290 (0.093) 0.289

(0.094) 0.290

(0.094) 0.290

(0.094) 0.290

(0.088) 0.290

(0.094)

0.286 (0.093) 0.285 (0.094) 0.284 (0.090) 0.280 (0.094) 0.286 (0.094) 0.309 (0.084) 0.286 (0.094)

0.259 (0.094) 0.271 (0.093)

0.967 (0.601) 1 .ooo

(153.6)

0.755 (0.071) 0.754

(0.074) 0.755

(0.072) 0.755

(0.071) 0.755

(0.072) 0.755

(0.072)

0.843 (0.035) 0.843 (0.035) 0.876 (0.032) 0.936 (0.019) 0.844 (0.081) 0.924 (0.057) 0.844 (0.037)

1.000 (0.193) 1 .ooo

(0.001)

0.200

(0.086)

0.216

(0.083)

0.245

(0.094)

0.244

(0.095)

0.245

(0.095)

0.247

(0.096)

0.245

(0.088)

0.245

(0.095)

0.191 (0.095) 0.189 (0.096) 0.205 (0.091) 0.203 (0.095) 0.189 (0.096) 0.198 (0.108) 0.189 (0.096)

0.231 (0.094) 0.201 (0.095)

0.967 (0.724) 1 000 (57.3)

0.733 (0.072) 0.731

(0.074) 0.733

(0.072) 0.731

(0.074) 0.733

(0.073) 0.733

(0.073)

0.891 (0.034) 0.890 (0.035) 0.881 (0.030) 0.937 (0.018) 0.891 (0.071) 0.950 (0.049) 0.891 (0.036)

1.000 (0.193) 1.000

(0.007)

-0.454 (0.114)

-0.453 (0.113)

-0.526 (0.116)

-0.528 (0.117)

-0.526 (O.llY)

-0.528 (0.118)

-0.526 (0.122)

-0.526 (0.119)

-0.458 (0.113)

-0.457 (0.115)

-0.448 (0.102)

-0.430 (0.114)

-0.458 (0.114)

-0.425 (0.119)

-0.458 (0.115)

-0.430 (0.116)

-0.468 (0.111)

0.725 (0.195) 0.727

(0.193)

0.462 (0.125) 0.458

(0.128) 0.452

(0.128) 0.447

(0.130) 0.462

(0.147) 0.452

(0.129)

0.796 (0.050) 0.794

(0.053) 0.814

(0.048) 0.908

(0.029) 0.795

(0.117) 0.911

(0.050) 0.795

(0.052)

1.000 (0.250) 0.696

(0.091)

Note: Figures in parentheses are the reported standard errors

likelihood estimates of 0 are somewhat lower than the conditional least squares estimates.

Backcasting. There is considerably more vari- ability among the point estimates from this approach than among those based on either maximum lik,elihood or conditional least squares. Presumably, this is because the backcasting ap- proach is imprecisely defined, since a truncation rule must be specified. The BC 1, BC 2, BC 5,

and BC 7 point estimates are very close indeed, while those from BC 3 do not differ much from these four. (Oddly, the standard errors reported by BC 5 can differ substantially from those of other programs in this group.) On the other hand, the BC 4 and BC 6 estimates can be quite different from those of the other five programs, but are very close to each other (though the associated standard errors need not be).

Page 5: Adventures with ARIMA software

P. Newbold et al. I Internalional Journal of Forecasting 10 (1994) 57%S81 577

Other. The point estimates from ‘Other 2’ were very close to the full maximum likelihood estimates, though the reported standard errors associated with the seasonal moving average parameter estimates were quite different. Un- iquely among our packages, ‘Other 1’ produced seasonal moving average parameter estimates of one (to three decimal places) for series A and D, as well as for series B and C.

We believe the ‘novice user’ is likely to be perplexed by findings similar to those of Table 1, whatever the purpose of the analysis. It could possibly be argued that some users are indiffer- ent to anything but the forecasts produced by the fitted models. These models might differ in other respects, however. In Table 2, we have attempt- ed to produce a measure of divergence among the forecasts. For forecasts up to 12 months ahead, we have computed the range (highest- lowest) for our 17 sets of forecasts. To stan- dardize units, these ranges have been divided by the forecast standard errors provided by the ML 2 program. (In fact, these standard errors differ little from one program to another.) Again, the differences are generally smallest for series A and largest for series D. We are not sure how large a difference should be to cause concern, but find it difficult to be sanguine about the fact that two researchers on our campus could quite easily produce forecasts differing by as much as

Table 2

Range of forecasts, divided by forecast standard error (as reported by ML 2)

Series

Horizon A B C D

1 0.16 0.58 0.26 0.93 2 0.26 0.70 0.28 1.11 3 0.11 0.37 0.38 1.54 4 0.14 0.31 0.34 0.98 5 0.15 0.43 0.31 0.94 6 0.44 0.34 0.38 0.89 7 0.13 0.21 0.38 0.84 8 0.20 0.23 0.46 0.67 9 0.10 0.25 0.39 0.81

10 0.11 0.17 0.42 0.56 11 0.14 0.20 0.47 0.33 12 0.14 0.27 0.41 0.19

four-tenths of a standard error, after fitting the

same model to the same data. By any standards, the differences for series D

reported in Table 2 look rather alarming. Having fitted the model (1) to these data and computed forecasts, Chatfield and Prothero (1973) were also alarmed. They regarded their point forecasts of sales at the peak of the following year (6- months ahead) as “much higher than can reasonably be expected”. Indeed, it is this ob- servation that prompted their paper on problems in applying the ARIMA methodology. The paper provoked a lively discussion, including Box and Jenkins (1973). Much of this discussion centered on the appropriateness or otherwise of the logarithmic transformation. It appears that, had the class of power transformations of Box and Cox (1964) been considered, the logarithmic transformation would have been rejected in favor of something close to the cube root. Nevertheless, the incorporation of power trans- formations into ARIMA analysis has not become standard practice in the intervening years. In- deed, we believe that the typical ‘novice user’- and many sophisticated users-would follow the original lead of Chatfield and Prothero, taking logarithms after a casual inspection of the graph of the series.

In Table 3, we examine the sales forecasts for the peak month of the next year which could be achieved with the programs at our disposal today. We have listed the forecasts of log sales taken directly from the program output and have simply taken antilogarithms to generate forecasts of sales. To keep our results comparable with those of Chatfield and Prothero, we have not corrected for bias, as for example in Granger and Newbold (1976). To do so would reduce the magnitudes somewhat, but would not change the relativities greatly: our impression is that such bias corrections are rarely applied in practice. The forecasts range from sales of 1061-1392 units-that is, the highest forecast exceeds the lowest by over 31%! The highest of our fore- casts, 1392 from BC 3, appears to be something of an outlier. In fact, it is almost identical to the initial forecast (1387) reported by Chatfield and Prothero. These authors noted that, while their

Page 6: Adventures with ARIMA software

578 P. Newbold et al. I International Journal of Forecasting 10 (1994) 573-581

Table 3

Six-months ahead forecasts for Series D

Package Log sales Sales

ML1 3.077 1194 ML2 3.077 1195 CLS 1 3.028 1066 CLS 2 3.027 1063 CLS 3 3.027 1063 CLS 4 3.026 1061 CLS 5 3.028 1066 CLS 6 3.027 1063

Package Log sales Sales

BCl 3.079 1199 BC2 3.079 1199 BC3 3.144 1392 BC4 3.100 1258 BC5 3.079 1199 BC6 3.108 1282 BC7 3.086 1219 Other 1 3.112 1293 Other 2 3.070 1176

program employed backcasting in estimation, initial values of the innovations were set to zero in calculating forecasts. Subsequently, they re- computed forecasts through backcasting initial values, obtaining 1221 for predicted sales in this particular month. This they deemed ‘more reasonable’, but ‘rather high’: it is remarkably close to what we obtained from BC 7, and just a little higher than results from BC 1, BC 2, and BC 5. Thus, for this data set, a user could have achieved virtually the same parameter estimates through the same estimation method, and yet radically different forecasts.

Full maximum likelihood estimation was not available to Chatfield and Prothero. Had it been, they would have obtained a peak forecast of 1194 from ML 1 or 1195 from ML 2, just a little lower than their second reported figure. Ironical- ly, conditional least squares certainly was avail- able to these authors. We would not have rec- ommended that they use it, but, had they done so, they would have obtained a peak forecast in the neighborhood of 1065-and presumably there would have been no publication!

4. A non-seasonal series

In this section we discuss the analysis of a series of 160 quarterly observations on the logarithms of U.S. real gross national product (1950-89). We fitted to these data the ARIMA (2, 1, 2) model

(1 - 41B - W2)[(l - B)X, -4 = (1 - 8,B - lYZB2)&, . (2)

Although our series is quite long, the model (2) could be regarded as being generously pa- rameterized. There are, however, sound reasons for interest in this model. The usual formal and informal criteria for choosing a parsimonious ARIMA representation suggest that either an ARIMA (1, 1, 0) or an ARIMA (0, 1, 2) model might be adequate. The ARIMA (2, 1, 2) model nests both of the simpler models and, through a second autoregressive term, allows the possibility of cyclical behavior that is sometimes thought to be appropriate for economic time series. A user might well want to fit (2), either as a check on the adequacy of the simpler models or as part of a search for a parsimonious model through an order selection criterion. Watson (1986), Clark (1987), and Campbell and Mankiw (1987) have all considered the modelling of this time series over an earlier, but strongly overlapping period. Campbell and Mankiw fitted several ARIMA models, as tools for the estimation of the per- sistence of economic shocks. They singled out the model (2), in addition to the two simpler models, for discussion. Both Watson and Clark fitted unobserved components structural models, of the type discussed in detail by Harvey (1989). Their models are constrained versions of a full ARlMA (2, 1, 2) model, which is the simplest model to contain ARIMA (1, 1, 0), ARIMA (0, 1, 2)) and the Watson-Clark structural models as special cases. It therefore seems reasonable that

Page 7: Adventures with ARIMA software

P. Newbold et al. I International Journal of Forecasting 10 (1994) 573-581 579

an economist would be interested in the estima- tion of the full model (2), at least as a starting point for further analysis.

Table 4 shows the estimates we obtained for this model. (Although the mean change, p in (2), was estimated with the other parameters, these estimates reveal nothing of great interest, and are excluded from the table.) The programs used here are the same as in the previous section, with two exceptions. First, we have no

results for BC 4, which failed to execute. Sec- ond, one of our packages permits two automated selections of initial parameter values. In this case, these led to different fitted models. The second of these is denoted BC 8.

It is quite clear from Table 4 that substantially different parameter estimates were possible, depending on which program was used, for this data set. In the final column of the table, we have calculated the log likelihoods for each set of

Table 4

Parameter estimates for an ARIMA(2, 1, 2) model

Package $1 &,

ML1 0.577 -0.299

(0.409) (0.274) ML2 0.664 -0.388

(0.338) (0.231)

6,

0.217

(0.398) 0.312

(0.327)

e:

-0.397

(0.169) -0.459 (0.152)

Log likelihood

514.381

514.481

CLS 1 0.685 -0.382 (0.386) (0.250)

CLS 2 0.126 0.059 (0.670) (0.439)

CLS 3 1.344 -0.532 (0.163) (0.141)

CLS 4 0.131 0.329 (0.687) (0.522)

CLS 5 0.685 -0.383 (0.282) (0.209)

CLS 6 0.569 -0.202 (0.374) (0.265)

BCl 1.774 (0.003)

BC2 0.140 (0.630)

BC3 0.704 (0.393)

BC5 0.663 (0.324)

BC6 0.663 (0.227)

BC7 0.663 (0.315)

BC8 1.680 (0.034)

-0.797 (0.010) 0.073

(0.423) -0.392 (0.251)

-0.398 (0.224)

-0.400 (0.224)

-0.400 (0.217)

-0.749 (0.036)

0.331 (0.376)

-0.240 (0.665) 1.030

(0.189) -0.197 (0.704) 0.332

(0.261) 0.224

(0.376)

1.461 (0.000)

-0.230 (0.625) 0.351

(0.383) 0.311

(0.311) 0.311

(0.202) 0.311

(0.302) 1.328

(0.074)

-0.424 514.447

(0.161) -0.161 513.639 (0.246)

-0.239 512.280 (0.164) 0.138 512.065

(0.373) -0.425 514.448 (0.220)

-0.274 514.044 (0.177)

-0.466 512.009 (0.015)

-0.157 513.664 (0.239)

-0.426 514.440 (0.160)

-0.475 514.478 (0.149)

-0.478 514.477 (0.257)

-0.477 514.478 (0.146)

-0.392 513.049 (0.098)

Other 1 0.663 -0.399 0.311 -0.476 514.478 (0.319) (0.222) (0.306) (0.148)

Other 2 0.548 -0.191 0.203 -0.271 514.084 (0.655) (0.407) (0.647) (0.219)

Note: Figures in parentheses are the reported standard errors.

Page 8: Adventures with ARIMA software

580 P. Newbold et al. I International Journal of Forecasting IO (1994) S-581

parameter estimates. The highest of these is for ML 2.

The substantial differences in Table 4 cannot be attributed to differences in the objective function being optimized. For example, the estimates produced by CLS 1, CLS 5, BC 3, BC 5, BC 6, BC 7 and ‘Other 1’ are very close to the ML 2 estimates, though the BC 6 standard errors differ somewhat from the others. In fact, what- ever function is optimized for this particular model and data, that function is multi-modal, and rather flat around the optimum.

The ML 1 estimates are quite close to the ML 2 estimates. The difference is due to early termination of the ML 1 iterations. When the default value for relative change in the objective function was relaxed, forcing further iterations, ML 1 produced the same point estimates as ML 2. It might be argued that the ML 1 and ML 2 estimates reported in Table 4 differ only insub- stantially in relation to their standard errors. There is, however, a sense in which different conclusions might be drawn from these two sets of estimates. One way in which a user might assess whether a model is over-parameterized is through the r-ratios associated with the estimates of the highest order autoregressive and moving average parameters in the model. The f-ratios associated with the ML 1 estimates of (&, 0,) given in Table 4 are ( - 1.09, - 2.35) while those for ML 2 are (- 1.68, - 3.01). Presumably, on this basis, the ML 1 user would be more inclined to drop the second autoregressive parameter from the model than would the ML 2 user.

The CLS 6 and ‘Other 2’ point estimates are also fairly close to the ML 2 estimates-at least in relation to the standard errors. However, the cause of the difference here is not early termina- tion of the optimization iterations. Indeed, we supplied both programs with the ML 2 point estimates as initial values. Both iterated away from these initial values, again yielding the estimates reported in Table 4. It appears then that, in this case, the cause of different estimates is differences in the functions being minimized.

Point estimates that are all very small com- pared with their standard errors were obtained from CLS 2, CLS 4, and BC 2. Perhaps surpris-

ingly, in the case of CLS 2 and BC 2 the issue is again early termination of the iterations, When further iterations were forced by relaxing the termination criteria, estimates very close to those of ML 2 were obtained. In the case of CLS 4, we forced as many iterations as the program permit- ted. The estimates moved a little, but remained far from the ML 2 estimates.

Finally, Table 4 reports three sets of point estimates-from CLS 3, BC 1, and BC &-that are spectacularly different from the ML 2 esti- mates, and with standard errors that are small enough to suggest the ML 2 estimates to be implausible values for the parameters. The BC 1 and BC 8 estimates are close to one another and to the boundaries of the stationarity and inver- tibility regions, which accounts for the small standard errors. The CLS 3 estimates differ somewhat from the other two, reflecting differ- ences between conditional least squares and backcasting near the boundary of the invertibility region. Given the substantial differences be- tween these three estimates on the one hand and the ML 2 estimates on the other, one would expect to find that the corresponding values of the log likelihood differed by more than the figures shown in Table 4. This suggests that the functions that are being minimized have multiple local minima. To check this possibility, we began by supplying ML 2 with the CLS 3, BC 1, and BC 8 point estimates as initial values. In none of these cases was ML 2 able to achieve its maxi- mum likelihood estimates as reported in Table 4: indeed, ML 2 reported final estimates that did not differ greatly from these initial values. Next, we supplied these three packages with the ML 2 point estimates as initial values. BC 1 and BC 8 then reported final estimates that were very close to the ML 2 estimates. CLS 3 iterated away somewhat from the initial estimates, achieving final estimates almost identical to those reported for CLS 6 in Table 4. We conclude that, for this particular data set, the initial parameter esti- mates can be crucially important in determining the final estimates and that it is this factor that accounts for the largest of the differences among the point estimates reported in Table 4.

Our example in this section has illustrated that

Page 9: Adventures with ARIMA software

P. Newbold et al. I International Journul of Forecasting 10 (1994) 573-5Sl 581

substantial differences among ARIMA model parameter estimates may not only be due to differences in the functions that are minimized. The structure of the minimization algorithm, including specification of initial values and termi- nation criteria, can also substantially influence the estimates achieved.

5. Conclusions

We have been at some pains to emphasize that the results in this paper are no more than illustrations of what a user could find. Neverthe- less, it is tempting to say something about how a user should proceed. Our preference is to esti- mate ARIMA models through full maximum likelihood. This preference is based on the simulation evidence of Ansley and Newbold (1980), and the examples presented here have not, and cannot have, added to that evidence, beyond illustrating differences that can arise when alternatives are employed. We know of no evidence arguing any of its approximations to be in any sense preferable to full maximum likeli- hood, beyond the question of computational cost. The practical importance of this factor is rapidly becoming negligible. Many packages currently available allow estimation by two or more approaches. (Indeed, it is a littte disap- pointing that several incorporate just those two approaches discussed by Box and Jenkins in 1970.) We do not know what the novice user would make of options of this sort, and are uncomfortable trying to explain in apphed courses why this situation prevails.

Our results have illustrated how the user may be confronted with substantially different esti- mates and forecasts, depending on which pack- age is used, and consequently the difficulty in replicating published results. This state of affairs is unfortunate, and could certainly discourage the novice from attempting to implement the ARIMA methodology.

References

Ansley, C.F. and P. Newbold, 1980, Finite sample properties

of estimates for autoregressive moving average models,

Journal of Econometrics, 13, 159-183. Box, G.E.P. and D.R. Cox, 1964, An analysis of transforma-

tions, Journal of the Royal Statistical Society, B, 26, 211-

243.

Box, G.E.P. and GM. Jenkins, 1970, Time Series Analysis, Forecasting and Control (Holden Day, San Francisco,

CA).

Box, G.E.P. and G.M. Jenkins, 1973, Some comments on a

paper by Chatfield and Prothero and on a review by

Kendall, Journal of the Royal Stutistical Society, A. 136,

337-34s.

Campbell, J.Y. and N.G. Mankiw, 1987. Are output fluctua-

tions transitory?, Quarterly Journal of Economics, 102, X57-880.

Chatfield, C. and D.L. Prothero, 1973, Box-Jenkins seasonal

forecasting: Problems in a case study, Journal of the Royal Statistical Society, A. 136, 295-336.

Clark, P.K., 1987. The cyclical component of U.S. economic

activity, Qz~~~ter~y Journal of Economics, 102, 797-814. Cryer, J.D. and J. Ledolter, 1981, Small sample properties of

the maximum likelihood estimator in the first order moving

average model, Biometrika. 68, 691-694. Granger. C.W.J. and P. Newbold, 1976, Forecasting trans-

formed series, Journal of the Royal Statistical Society, B, 38, 189-203.

Harvey, A.C. _ 1989, Forecasting. Structural Time Series Models and the Kalrnan Filter (Cambridge University

Press, Cambridge).

Pankratz, A., 1991, Forecasting with Dynamic Regression Models (Wiley, New York).

Shephard, N.G. and A.C. Harvey, 1990, On the probability

of estimating a deterministic component in the local level

model, Journal of Time Series Analysis, 11, 339-347.

Watson, M.W.. 1986. Univariate detrending methods with

stochastic trends, Journal of Monetary Economics, 18, 49-75.

Biographies: Paul NEWBOLD is Professor of Economics at

the University of Illinois, Urbana-Champaign. He has

published extensively in the areas of time-series analysis

and forecasting.

Christos AGIAKLOGLOU received his Ph.D. in economics

from the University of Illinois, Urbana-Champaign. He is currently serving in the Greek navy.

John MILLER received his Ph.D. in economics from the

University of Illinois, Urbana-Champaign. He is now

employed as a senior analyst with Lehman Brothers, New

York.