forecast selection when all forecasts are not equally recent

8
International Journal of Forecasting 7 (1991) 349-356 North-Holland 349 Forecast selection when all forecasts are not equally recent * Lawrence D. Brown State University of New York at Buffalo, Buffalo, New York 14260, USA Abstract: Little is known about which forecasts to select when all forecasts are not equally recent. This paper uses security analysts’ annual earnings forecasts to examine this issue. The comparative predictive accuracy of the mean and three timely composites is examined, where the three timely composites are the most recent forecast, the average of the three most recent forecasts, and the 30-day average. The mean is shown to be less accurate than all three timely composites, and the 30-day average is shown to be the most accurate timely composite. The findings suggest that tradeoffs exist between recency and aggregation, and that these tradeoffs are related to firm size. Keywords: Recency, Aggregation, Timely composites, Earnings forecasts. 1. Introduction When presented with forecasts that are not all equally recent, forecasters need to determine which forecasts to select when forming composite esti- mates. Surprisingly little is known about which forecasts to select in these circumstances. This paper uses security analysts’ annual earnings fore- casts to examine this issue. Clemen (1989) cites several studies in his survey paper that show equal-weighting schemes to generate composite forecasts that are as accurate as those generated via complex weighting schemes [i.e., Brandt and Bessler (1981); Makridakis et al. (1982); Rowse et al. (1974)]. Thus, I adopt an equal-weighting methodology, and examine the relative predictive accuracy of four composites: (1) the average of all forecasts regardless of age (mean); (2) the most recent forecast which, in the absence of ties, is a ‘composite’ of one individual’s forecast (most re- cent forecast); (3) the average of the three most recent forecasts (three most recent average); and (4) the average of those forecasts made within the past 30 days (30-day average). I refer to the latter three composites henceforth as timely composites. The mean is found to be less accurate than the three timely composites, and this result pertains to all ten firm-size deciles. The 30-day average (three most recent average) is the most (least) accurate of the three timely composites. However, this finding is sensitive to firm size. I attribute this sensitivity to firm size to tradeoffs existing between recency and aggregation which are correlated with firm size. * This paper has benefitted from the comments of Dosoung Choi, Bob Clemen, Robert Fildes, Alex Gould, Jerry C.Y. Han, Ron Huefner, Kwon-Jung Kim, Tom Lechner, Scott Stickel, Bob Winkler, and the participants at the Ninth International Symposium on Forecasting. The capable re- search assistance of Bapi Nag and Wonsun Paek is also appreciated. The paper proceeds as follows. Section 2 dis- cusses the four composite forecasts considered in this study, and presents hypotheses regarding their comparative accuracy. Data and results appear in Section 3. Conclusions and implications for the literature on forecast combination constitute Sec- tion 4. 0169-2070/91/$03.50 0 1991 - Elsevier Science Publishers B.V. All rights reserved

Upload: lawrence-d-brown

Post on 22-Nov-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

International Journal of Forecasting 7 (1991) 349-356

North-Holland

349

Forecast selection when all forecasts are not equally recent *

Lawrence D. Brown

State University of New York at Buffalo, Buffalo, New York 14260, USA

Abstract: Little is known about which forecasts to select when all forecasts are not equally recent. This paper uses security analysts’ annual earnings forecasts to examine this issue. The comparative predictive accuracy of the mean and three timely composites is examined, where the three timely composites are the most recent forecast, the average of the three most recent forecasts, and the 30-day average. The mean is shown to be less accurate than all three timely composites, and the 30-day average is shown to be the most accurate timely composite. The findings suggest that tradeoffs exist between recency and aggregation, and that these tradeoffs are related to firm size.

Keywords: Recency, Aggregation, Timely composites, Earnings forecasts.

1. Introduction

When presented with forecasts that are not all equally recent, forecasters need to determine which forecasts to select when forming composite esti- mates. Surprisingly little is known about which forecasts to select in these circumstances. This paper uses security analysts’ annual earnings fore- casts to examine this issue. Clemen (1989) cites several studies in his survey paper that show equal-weighting schemes to generate composite forecasts that are as accurate as those generated via complex weighting schemes [i.e., Brandt and Bessler (1981); Makridakis et al. (1982); Rowse et al. (1974)]. Thus, I adopt an equal-weighting methodology, and examine the relative predictive accuracy of four composites: (1) the average of all

forecasts regardless of age (mean); (2) the most recent forecast which, in the absence of ties, is a ‘composite’ of one individual’s forecast (most re- cent forecast); (3) the average of the three most recent forecasts (three most recent average); and (4) the average of those forecasts made within the past 30 days (30-day average). I refer to the latter three composites henceforth as timely composites.

The mean is found to be less accurate than the three timely composites, and this result pertains to all ten firm-size deciles. The 30-day average (three most recent average) is the most (least) accurate of the three timely composites. However, this finding is sensitive to firm size. I attribute this sensitivity to firm size to tradeoffs existing between recency and aggregation which are correlated with firm size.

* This paper has benefitted from the comments of Dosoung Choi, Bob Clemen, Robert Fildes, Alex Gould, Jerry C.Y. Han, Ron Huefner, Kwon-Jung Kim, Tom Lechner, Scott

Stickel, Bob Winkler, and the participants at the Ninth

International Symposium on Forecasting. The capable re-

search assistance of Bapi Nag and Wonsun Paek is also

appreciated.

The paper proceeds as follows. Section 2 dis- cusses the four composite forecasts considered in this study, and presents hypotheses regarding their comparative accuracy. Data and results appear in Section 3. Conclusions and implications for the literature on forecast combination constitute Sec- tion 4.

0169-2070/91/$03.50 0 1991 - Elsevier Science Publishers B.V. All rights reserved

350 L. D. Brown / Forecast selection

2. Formulating composite security analyst annual

earnings forecasts

Annual earnings forecasts for a given firm are sequential in nature. Lys and Sohn (1990) using Zacks Investment Research, Inc. data for the 1980-1986 sample period, show that the average number of calendar days between consecutive an- nual earnings forecasts for an individual analyst who follows a given firm is 57.69 (median = 42.0). O’Brien (1988) reports that the following char- acteristics pertain to the Lynch, Jones & Ryan Institutional Brokers Estimate System (I/B/E/S) data for two composites that I use in this study - the mean and the most recent forecast. The median ages of the mean forecasts for horizons of 240, 180, 120, 60, and 5 trading days before the annual earnings announcements for the years 1975 to 1981 are 60, 58, 62, 67, and 67 trading days, respectively. In contrast, the median ages of the most recent forecasts for these five horizons are 4, 3, 4. 4, and 11 trading days, respectively. Thus. the most recent forecast is approximately 60 trading days, or one calendar quarter, more recent than the mean forecast.

To understand how the four composite fore- casts used in this study are defined, consider for- mulating a composite security analyst earnings forecast on April 30, 1991 for IBM’s 1991 earn- ings. Assume the following: (1) 28 individual analyst forecasts are available; (2) the most recent forecast is made on April 29, 1991; (3) eight forecasts are made during April 1991; and (4) 20 forecasts are made between January 1 and March 30, 1991. In the context of this study, the mean is the arithmetic average of the 28 forecasts; the most recent forecast is the April 29 forecast; the three most recent average is the arithmetic average of the three forecasts made latest in April 1991; and the 30-day average is the arithmetic average of the eight forecasts made in April 1991. Alterna- tively, consider formulating a composite security analyst earnings forecast on April 30, 1991 for Castle and Cook’s 1991 earnings. Assume that five individual analyst forecasts are available in the following order of recency: April 17, 1991, April 3, 1991, February 27, 1991, January 31, 1991, and December 16, 1990. In this example, the mean is the arithmetic average of the five forecasts; the most recent forecast is the April 17 estimate; the three most recent average is the arithmetic average

of the April 17, April 3, and February 27 esti- mates; and the 30-day average is the arithmetic average of the two April forecasts. In contrast to IBM, the 30-day average for Castle and Cook utilizes fewer forecasts than the three most recent average. The IBM example is more illustrative of large firms; the Castle and Cook example is more illustrative of small firms. More generally, larger firms have more analysts following them [Bhushan (1989); Shores (1990)], and more analysts’ esti- mates available with a given degree of recency (e.g., 30 days or less).

When formulating composites of security analysts’ earnings forecasts, there are two compet- ing influences at work, recency and aggregation. The recency principle suggests that discarding old forecasts improves forecast accuracy [O’Brien (1988); Stickel (1990)]. The aggregation principle suggests that idiosyncratic error is mitigated by combining as many forecasts as possible [Hogarth (1978); Winkler and Makridakis (1983); Ashton and Ashton (1985)]. The aggregation principle presumes that all forecasts are equally recent. However, some of the security analysts’ earnings forecasts are ‘quite old’ relative to others in the sense that they are conditional upon fewer quarterly reports. As predictive accuracy is im- proved by incorporating information from quarterly reports [Abdel-Khalik and Espejo (1978); Brown and Rozeff (1989)], the predictive accuracy of a composite security analyst earnings forecast should be improved by discarding these older in- dividual forecasts.

Ideally, I would investigate how the number of analysts following the firm and the recency of analysts’ forecasts impacts on differences in fore- cast accuracy amongst the four composites. Un- fortunately, I do not have these data available, so I need to use a proxy variable. I thus assume that larger firms have more analysts following them and, for a given recency cutoff, these firms have more analysts’ forecasts available. These assump- tions suggest that: (1) for a given amount of aggregation (e.g., three most recent average), the recency principle is more likely to pertain to larger firms; (2) for a given amount of recency (e.g., 30-day average), the aggregation principle is more likely to pertain to larger firms.

O’Brien (1988) and Stickel (1990) respectively use I/B/E/S and Zacks Investment Research, Inc. data on security analysts’ annual earnings

L. D. Brown / Forecasf selection 351

forecasts to show that the most recent forecast is more accurate than the mean. To verify that the data used in this study possess external validity, and to extend the analysis to other timely com- posites, my first hypothesis is:

Hl: The three timely composites are more accu- rate than the mean.

Based on the findings of O’Brien (1988) and Stickel (1990) I expect the most recent forecast to be more accurate than the mean. This result should generalize to other members of the set of timely composites, including the three most recent aver- age and the 30-day average.

To better understand the tradeoffs between re- cency and aggregation, I segment the data into ten firm-size deciles. As larger firms are likely to have more analysts following them and as their mean forecast is likely to be relatively more recent, my second hypothesis is:

H2: The relative advantage of timely composites versus the mean is greater for smaller firms.

Tradeoffs between recency and aggregation are best understood by comparing the predictive accu- racy of the three timely composites. If recency outweighs aggregation, the most recent forecast will be more accurate than the three most recent average and the 30-day average. If aggregation outweighs recency, the three most recent average and the 30-day average will be more accurate than the most recent forecast. As it is not evident a priori which of these effects predominates, my third hypothesis is:

H3: The predictive accuracy of the most recent forecast differs from that of the three most recent average and the 30-day average.

If tradeoffs between recency and aggregation are correlated with firm size, comparisons amongst the three timely composites using firm size as a partitioning variable should be informative regard- ing these tradeoffs. 1 first compare the three most recent average with the most recent forecast, seg- menting the data by firm-size decile. If the three most recent forecasts are similar in their degree of recency, the aggregation principle should apply, and the three most recent average should be more

accurate than the most recent forecast. However, if at least one of the three most recent forecasts is ‘quite old’, their average is likely to be less accu- rate than the most recent forecast. Since the three most recent forecasts of larger firms are relatively more likely to possess similar degrees of recency, the advantage of the three most recent average versus the most recent forecast should be greater for larger firms. Thus, my fourth hypothesis is:

H4: The advantage of the three most recent aver- age versus the most recent forecast is greater for larger firms.

Second, I compare the most recent forecast with the 30-day average. If the 30-day average consists of three or more estimates, it should be more accurate than the most recent forecast as it utilizes aggregation without sacrificing much via recency. However, if the 30-day average consists of fewer than three estimates, such as in the Castle and Cook example, its accuracy is likely to be indistinguishable from that of the most recent forecast. Since larger firms are relatively more likely to have three or more estimates with recency of 30 days or less, the advantage of the 30-day average versus the most recent forecast should be greater for larger firms. Thus, my fifth hypothesis is:

H5: The advantage of the 30-day average versus the most recent forecast is greater for larger firms.

Third, I compare the three most recent average with the 30-day average. If the three most recent forecasts have recency of 30 days or less, their average should be approximately as accurate as the 30-day average. However, if they include at least one estimate which is ‘quite old’, their aver- age is likely to be less accurate than the 30-day average. Since the three most recent forecasts of smaller firms are relatively more likely to include at least one ‘quite old’ forecast, I expect the pre- dictive advantage of the 30-day average versus the three most recent average to be greater for smaller firms. Thus, my sixth hypothesis is:

H6: The advantage of the 30-day average versus the three most recent average is greater for smaller firms.

352 L. D. Brown / Forecast selection

3. Data and results

Data are obtained from Zacks Investment Re- search, Inc. Each firm has forecast data available for up to 60 month-ends prior to January 31, 1989. For the purposes of this study, it is neces- sary that the mean, the most recent forecast, the three most recent average, and the 30-day average be available. Thus, the mean (but not necessarily the 30-day average) is based on an average of three or more forecasts, and the most recent fore- cast has recency of 30 days or less. Composite forecasts are generated on each firm’s fiscal year end for each year, 1984 to 1988. The sample sizes range from a low of 1,003 firms in 1984 to a high of 1,760 firms in 1987. The total sample for the 1984-88 period is 7,077 for tests of hypotheses one and three, and 6,899 for tests of hypotheses two, four, five, and six which require firm-size data. Forecast accuracy is defined as the Mean absolute prediction error (MAPE). MAPE is the arithmetic average of the distribution of absolute prediction errors (APE), where APE = ((actual - forecast)/actual (, and all ratios greater than 1.0 are set equal to 1.0.

Table 1 presents MAPES of the four composite forecasts for the 1984-1988 period and for each of the five years. For the 1984-1988 period and for each of four years, the 30-day average is the most accurate, followed by the most recent forecast, the three most recent average, and the mean. Thus, the mean is less accurate than all three timely composites, and the 30-day average is the most accurate timely composite.

Table 2 presents results of significance tests of hypothesis one. Consistent with O’Brien (1988) and Stickel (1990) the most recent forecast is significantly more accurate than the mean. The t-value for the 198441988 period is 23.33, ranging from a low of 9.15 in 1987 to a high of 13.36 in 1985. Thus, the data in this study possess external validity. Also consistent with hypothesis one. the three most recent average and the 30-day average are significantly more accurate than the mean. The t-value for the first comparison is 25.19 for the 198441988 period, ranging from a low of 8.26 in 1988 to a high of 14.24 in 1985. The t-value for the second comparison is 25.50 for the 1984-1988 period, ranging from a low of 9.99 in 1988 to a high of 13.79 in 1985. The finding that the mean is inferior to the three timely composites on the predictive ability dimension is consistent with the evidence that the mean is inferior to these three timely composites on the market association di- mension [Brown and Kim (1991)].

Table 3 presents results of significance tests of hypothesis two - whether or not the relative accu- racy of timely composites versus the mean is sensi- tive to firm size. where firm size is defined as the arithmetic mean of the firm’s common equity at its fiscal year end. To test hypothesis two, I corre- late firm size with the relative reduction in MAPE

which occurs by using a timely composite in lieu of the mean. The relative reduction in MAPE is defined as (MAPE Mean - MAPE Timely Com- POSite)/MAPE Mean, and the correlation test ap- plied is the Spearman Rank Correlation. The cor- relation results for the most recent forecast, the

Table 1 MAWS of mean, most recent forecast. three most recent average, and 30-day average. A

Year Mean Most recent forecast

Three most recent average

30.day average

N

19x4 0.229 0.171 0.178 0.167 1.003

1985 0.272 0.201 0.209 0.202 1,250

1986 0.287 0.238 0.242 0.232 1,559

1987 0.261 0.219 0.220 0.216 1,760

1988 0.247 0.198 0.216 0.19x 1,505

1984-1988 0.261 0.209 0.216 0.206 7,077

” MAPE is the arithmetic average of the distribution of absolute prediction errors (APE) where APE = \(actual- forecast)/actual I. Mean is the arithmetic average of all forecasts outstanding as of the end of the fiscal year. Most recent forecast is that individual

forecast made closest to fiscal year end. Three most recent average is the arithmetic average of those forecasts outstanding as of the end of the fiscal year that are made closest to fiscal year end. 30-day average is the arithmetic average of those forecasts

outstanding as of the end of the fiscal year that are made within 30 days of fiscal year end. All APES greater than 1.0 are set equal to

1 .o.

L. D. Brown / Forecast selection 353

Table 2

Differences in MAPES between mean and each of the three timely composites. ’

Year Mean-

most

recent

forecast

T-statistic h Mean-

three most

recent av.

T-statistic h Mean- T-statistic h N

30-day

average

1984 0.058 10.53 * 0.051 10.64 * 0.062 11.68 * 1,003

1985 0.071 13.36 * 0.063 14.24 * 0.070 13.79 * 1,250

1986 0.049 11.09 * 0.045 12.58 * 0.055 12.68 * 1,559

1987 0.042 9.15 * 0.041 11.02 * 0.045 10.00 * 1,760

1988 0.049 9.19 * 0.031 8.26 * 0.049 9.99 * 1,505

1984&1988 0.052 23.33 * 0.045 25.19 * 0.055 25.50 * 7,077

* MAPE is the arithmetic average of the distribution of absolute prediction errors (APE) where APE = I(actual~ forecast)/actual 1.

Mean is the arithmetic average of all forecasts outstanding as of the end of the fiscal year. Most recent forecast is that individual

forecast made closest to fiscal year end. Three most recent average is the arithmetic average of those three forecasts outstanding as

of the end of the fiscal year that are made closest to fiscal year end. 30-day average is the arithmetic average of those forecasts

outstanding as of the end of the fiscal year that are made within 30 days of fiscal year end. All APES greater than 1.0 are set equal to

I .o.

h T-statistic is a test of the difference between the MAPES of the mean and a timely composite. The timely composites in columns 3. 5,

and 7 are the most recent forecast. the three most recent average, and the 30-day average, respectively.

* Significant at the 0.01 level, two-tailed test.

three most recent average, and the 30-day average conventional levels for one of the timely com- (with p-values in parentheses) are -0.721 (0.019) posites. These results are weakly consistent with 0.042 (0.907) and - 0.442 (0.200) respectively. hypothesis two. Thus, the relative advantage of two of the three Table 4 presents results of significance tests of timely composites versus the mean is greater for hypothesis three, along with a comparison of the smaller firms, and the results are significant at three most recent average and the 30-day average.

Table 3

Differences in MAPES between mean and each of the three timely composites: Results segmented by firm-size decile. a

Decile Firm

size Mean-

most

recent

forecast

T-statistic h Mean-

three most

recent av

T-statistic ’ Mean-

30-day av.

T-statistic ’ N

1 26.4 0.087 10.07 * 0.037 7.48 * 0.086 10.24 * 689

2 62.5 0.073 7.45 * 0.059 7.64 * 0.075 7.91 * 690

3 107.7 0.101 10.97 * 0.079 10.51 * 0.099 10.93 * 690

4 173.1 0.059 7.55 * 0.051 8.19 * 0.055 7.59 * 690

5 272.9 0.049 6.42 * 0.060 9.37 * 0.057 7.65 * 690

6 431.6 0.048 6.93 * 0.049 8.22 * 0.055 8.01 * 690

7 693.4 0.033 5.15 * 0.041 6.84 * 0.039 6.45 * 690

8 1.168.6 0.032 5.68 * 0.038 7.57 * 0.038 6.88 * 690

9 2,187.3 0.020 3.27 * 0.028 5.09 * 0.028 5.23 * 690

10 9,799.2 0.018 4.34 * 0.021 6.47 * 0.020 5.84 * 690

A MAPE is the arithmetic average of the distribution of absolute prediction errors (APE) where APE = I(actual- forecast)/actual /.

Mean is the arithmetic average of all forecasts outstanding as of the end of the fiscal year. Most recent forecast is that individual forecast made closest to fiscal year end. Three most recent average is the arithmetic average of those three forecasts outstanding as of the end of the fiscal year that are made closest to fiscal year end. 30-day average is the arithmetic average of those forecasts

outstanding as of the end of the fiscal year that are made within 30 days of fiscal year end. All APES greater than 1.0 are set equal to 1.0. Firm size is the arithmetic mean (in $1 millions) of the common equity in the year that the forecast was made.

h T-statistic is a test of the difference between the MAPES of the mean and a timely composite. The timely composites in columns 4, 6,

and 8 are the most recent forecast, the three most recent average, and the 30-day average, respectively. * Significant at the 0.01 level, two-tailed test.

354 L. D. Brown / Forecast selecrwn

Table 4

Differences in MAPES amongst the three timely composites. “

Year Three most

recent average-

most recent

forecast

T-statistic h Most recent

forecast

30-day

average

T-statistic h Three most

recent a~.-

30.day av.

T-statistic h N

1984 0.007 2.04 * * 0.004 1.75 0.011 3.14 * 1,003

1985 0.008 2.16 ** - 0.001 ~ 0.30 0.007 2.40 * * 1.250

1986 0.004 1.47 0.006 2.90 * 0.010 3.51 * 1,559

1987 0.001 0.54 0.003 1.09 0.004 1.40 1,760

198X 0.018 4.29 * 0.000 ~ 0.03 0.018 4.76 * 1,505

19X4-1988 0.007 4.79 * 0.003 2.30 ** 0.010 7.00 * 7.077

MAPS is the arithmetic average of the distribution of absolute prediction errors (APE) where APE = /(actual - forecast)/actual 1.

Three most recent average is the arithmetic average of those forecasts outstanding as of the end of the fiscal year that are

made closest to fiscal year end. Most recent forecast is that individual forecast made closest to fiscal year end. 30-day average

is the arithmetic average of those forecasts outstanding as of the end of the fiscal year that was made within 30 days of fiscal

year end. All APES greater than 1.0 are set equal to 1.0. h

T-statistic is a test of the difference between the MAPES of two timely composites. The timely composites being compared in

columns 3, 5, and 7, are described in columns 2, 4, and 6. respectively.

* ( * * ) Significant at the 0.01 (0.05) level, two-tailed test.

The most recent forecast is more accurate than the three most recent average in all five years, with significance in three years and for the 198441988 period (t-value = 4.79). Moreover, the 30-day average is more accurate than the most recent forecast in three years with significance in 1986 and for the 198441988 period (r-value 2.30). While

these results are consistent with hypothesis three, it is surprising that the most recent forecast is more accurate than one aggregate (suggesting that recency outweighs aggregation), but less accurate than another (suggesting that aggregation out- weighs recency). I show below that this phenome- non is due to grouping all firms together, rather

Table 5 Differences in MAPES amongst the three timely composites: Results segmented by firm-size decile. ”

Decile Firm

size

Three most T-statistic h Most recent T-statistic h Three most T-statistic ’ N

recent average - forecast- recent av

most recent 30-day 30-day av.

forecast average

1 26.4 0.051 6.96 * - 0.002 ~ 0.64 0.049 7.11 * 689

2

3 4

5

6

7

8

9

10

62.5 0.014

107.7 0.021

173.1 0.008

272.9 -0.011

431.6 - 0.001

693.4 - 0.008

1.16X.6 - 0.006

2.1 x7.3 - 0.009

9.799.2 - 0.003

2.12 * * 3.47 * _

1.57 _

-1.97 **

-0.19

-2.37 **

-1.78 -2.50 **

~ 1.24

0.001 0.46 0.015 2.56 * * 690 0.002 -0.56 0.020 3.51 * 690

0.005 -1.52 0.004 0.86 690

0.008 2.53 * * ~ 0.003 - 0.66 690

0.007 1.55 0.006 1.28 690

0.006 1.93 ~ 0.002 - 0.60 690

0.006 2.17 ** 0.000 0.18 690

0.008 2.31 ** ~ 0.001 -0.27 690

0.002 0.90 - 0.001 -0.58 690

MAPE is the arithmetic average of the distribution of absolute prediction errors (APE) where APE = /(actual - forecast)/actual 1. Three most recent average is the arithmetic average of those forecasts outstanding as of the end of the fiscal year that are made closest to year end. Most recent forecast is that individual forecast made closest to fiscal year end. 30-day average is the

arithmetic average of those forecasts outstanding as of the end of the year that are made wlthin 30 days of fiscal year end. All

APES greater than 1.0 are set equal to 1.0. Firm size is the arithmetic mean (in $1 millions) of the common equity in the year

that the forecast was made. h T-statistic is a test of the difference between the MAPES of two timely composites. The timely composites being compared in

columns 4, 6. and 8. are described in columns 3, 5, and 7, respectively.

* ( * * ) Significant at the 0.01 (0.05) level, two-tailed test.

L. D. Brown / Forecast selection 355

than segmenting the sample into firm-size deciles. Regarding comparisons of the three most recent average with the 30-day average, the latter is more accurate in all five years, with significance in four of them and for the 198441988 period (t-value = 7.00).

Table 5 presents comparisons amongst the three timely composites segmented by firm-size decile. The most recent forecast is more accurate than the three most recent average for the four smallest firm-size deciles with statistical significance in three of them. However, the exact opposite per- tains to large firms. The three most recent average is more accurate than the most recent forecast for the six largest firm-size deciles with statistical significance in three of them. These results suggest that some of the three most recent forecasts for small firms are ‘quite old’, causing the aggregation principle to fail, while the three most recent fore- casts for large firms are sufficiently recent, allow- ing the aggregation principle to hold. The results are consistent with hypothesis four.

The finding that the 30-day average is more accurate than the most recent forecast pertains mostly to large firms. The 30-day average is more accurate than the most recent forecast for the six largest firm-size deciles with statistical significance in three of them. In contrast, the most recent forecast is (insignificantly) more accurate than the 30-day average for three of the four smallest firm- size deciles. These findings suggest that the aggre- gation principle is ineffective for small firms and effective for large ones. The results are consistent with hypothesis five.

The finding that the 30-day average is more accurate than the three most recent average per- tains only to small firms. The three cases of sig- nificance are found for the three smallest firm-size deciles, suggesting that the three most recent fore- casts for small firms are not all that recent. How- ever, for large firms, the three most recent fore- casts are sufficiently recent that the aggregation principle pertains to them, making the 30-day average nearly identically accurate to the three most recent average. These results are consistent with hypothesis six, and with Libby and Blashfield (1978) who used equally recent forecasts to show that the principle benefits of aggregation are ob- tained by averaging the judgments of only three (of 43) decision makers.

The table 5 results suggest that: (1) for a given

degree of recency, predictive ability is enhanced by increasing the number of estimates; (2) for a given number of estimates, predictive ability is enhanced by increasing forecast recency. Aggrega- tion outweighs recency for large firms, so the three most recent average and the 30-day average are more accurate than the most recent forecast. Re- cency outweighs aggregation for small firms, so the most recent forecast is more accurate than the three most recent average and the 30-day average. It is evident that combining small and large firms in table 4 led to the apparent paradox that recency outweighs aggregation and vice versa.

4. Conclusions and implications for the literature on forecast combination

I examine forecast selection when all forecasts are not equally recent. I show that tradeoffs exist between recency and aggregation, and that these tradeoffs are related to firm size. A simple average of all forecasts outstanding at a moment in time without regards to forecast recency (mean) is sig- nificantly less accurate than forecasts by three timely composites: (1) most recent forecast, (2) three most recent average, and (3) 30-day average. The results should be generalizable to other timely composites, such as Lynch, Jones & Ryan’s Flash, a simple average of all security analyst annual earnings forecasts with recency of six weeks or less.

Amongst the three timely composites, the 30- day average is the most accurate and the three most recent average is the least accurate, but these results are sensitive to firm-size. More specifically, the most recent forecast is more accurate than both the 30-day average and the three most recent average for small firms, but the most recent fore- cast is less accurate than the 30-day average and the three most recent average for large firms. Thus, recency outweighs aggregation for small firms, but aggregation outweighs recency for large firms.

This study offers some suggestions for impro- ving the predictive ability of composite forecasts when all forecasts are not equally recent. Further research is needed to obtain additional insights into the situation-specific nature of the tradeoff between recency and aggregation. Some factors to consider when determining which forecasts to

356 L. D. Brown / Forec~ust selection

select when all forecasts are not equally recent are the distribution of forecast ages, the coefficient of variation of the forecast distribution, and the number of estimates available with a given degree of recency.

References

Abdel-khalik. A.R. and J. Espejo. 1978, ‘Expectations data and

the predictive value of interim reporting., Journal of

Accounting Research, Spring, l-1 3.

Ashton. A.H. and R.H. Ashton, 1985, ‘Aggregating subjective

forecasts: Some empirical results’. Management Science,

Dec.. 1499-1508.

Bhushan, R.. 1989, ‘Firm characteristics and analyst following’,

Journal of Accounting and Economics. July, 255-274.

Brandt, J.A. and D.A. Bessler. 1981, ‘Composite forecasting:

An application with U.S. hog prices’. American Journal of

Agricultural Economics, Feb. 135-140.

Brown. L.D. and K.J. Kim, 1991. ‘Timely aggregate analyst

forecasts as better proxies for market earnings expectations’,

Journal of Accounting Research. forthcoming. Brown, L.D. and MS. Rozeff. 1979. ‘The predictive value of

interim reports for improving forecasts of future quarterly

earnings‘, The Accounting Review, July 585-591.

Clemen. R.T.. 1989, ‘Combining forecasts: A review and an-

notated bibliography’. International Journal of Forecasting,

5. no. 4, 559-583.

Hogarth. R.M., 1978. ‘A note on aggregating opinions’.

Organizational Behavior and Human Performance, 21, 40-

46.

Libby, R. and R.K. Blashfield. 1978, ‘Performance of a com-

posite as a function of the number of judges’. Organiza-

tional Behavior and Human Performance, 21. 121-129.

Lys, T. and S. Sohn. 1990, ‘The association between revisions of financial analysts’ earnings forecasts and security-price

changes’. Journal of Accounting and Economics, Dec. 341-

363.

Makridakis, S. et al.. 1982. “The accuracy of extrapolation

(time series) methods’. Journal of Forecasting, April-June 111-153.

0 ‘Brien. P.. 1988. ‘Analysts’ forecasts as earnings expecta-

tions’. Journal of Accounting and Economics, Jan. 53-83.

Rowe, G.L., D.H. Gustafson and R.L. Ludke, 1974, ‘Com-

parison of rules for aggregating subjective likelihood ratios’,

Organizational Behavior and Human Performance, Oct.

274-285.

Shores, D.. 1990, ‘The association between interim information

and security returns surrounding earnings announcements’.

Journal of Accounting Research. Spring 164-181.

Stickel, SE.. 1990, ‘The bias and accuracy of consensus earn-

ings forecasts comprised of updated individual analyst fore-

casts’. Working paper, University of Pennsylvania.

Winkler. R.L. and S. Makridakis. 1983. ‘The combination of

forecasts’. Journal of the Royal Statistical Society, Part 2,

150-157.

Bio~aph~: Lawrence D. BROWN is the Samuel P. Capen Professor of Accounting and Chairman. Department of Accounting and Law at the State University of New York at Buffalo. His 50 publications span the areas of accounting, finance, and forecasting. He has presented his research at over 30 universities, most recently at Hebrew University of Jerusa- lem. His research in the topical area of this paper, earnings forecasting by security analysts. has been published in Accounting Review, Contemporary Accounting Research, In- ternational Journal of Forecasting. Journal of Accounting Re- search, Journal of Business Forecasting, Journal of Finance, Journai of Forecasting, and Journal of Portfolio Management.