esse 4020 / ess 5020 time series and spectral analysis 2 ... · 1 esse 4020 / ess 5020 time series...
TRANSCRIPT
1
ESSE 4020 / ESS 5020 Time Series and Spectral Analysis 2 Oct 2019 Notes 3: Forecasting From CM2009 5.9 5.9 Forecasting from regression 5.9.1 Introduction A forecast is a prediction into the future. In the context of time series regression, a forecast involves extrapolating a fitted model into the future by evaluating the model function for a new series of times. The main problem with this approach is that the trends present in the fitted series may change in the future. Therefore, it is better to think of a forecast from a regression model as an expected value conditional on past trends continuing into the future. So just based on fitting a deterministic model to the time series - no treatment of random fluctuations and their potential memory.
A few remarks on models and forecasting; STATIONARY time series. Brockwell and Davis, Chapter 5
2
Chatfield (2000,2001), Time Series Forecasting
Chatfield stresses the value of plotting the data carefully. more from Chatfield (2000) - http://www.ccoms-imsuerj.org.br/capfts/2011/uploads/Chatfield_Ch_Tim_Series_Forecasting(2000)(1st%20edition)(280).pdf https://www.crcpress.com/Time-Series-Forecasting/Chatfield/p/book/9781584880639
3
So far we have only looked at univariate methods
4
Cryer and Chan, Time Series Analysis with applications in R: Chapter 9 – electronic access through library https://link.springer.com/content/pdf/10.1007%2F978-0-387-75959-3.pdf
“Forecasting is like driving a car blindfolded with help from someone looking out of the rear window” –Anonymous
5
Chatfield(2001) Chapter 4 This chapter turns attention directly to the topic which is the prime focus of this book, namely forecasting the future values of a time series. In particular, this chapter describes a variety of univariate forecasting methods. Suppose we have observations on a single time series denoted by x1, x2, . . . , xN and wish to forecast xN+h for h = 1, 2, . . .. A univariate forecasting method is a procedure for computing a point forecast, x^N (h) , based only on past and present values of the given series (possibly augmented with a function of time such as a linear trend).
Different authors use different notations for forecasts, but usually involve a ^.
6
Note that here wi are WEIGHTS, not white noise. A necessary preliminary is to determine trend and seasonal components so that remaining component, xt is stationary. Note that the R procedure “decompose” uses running averages for these components – not very useful for forecasting in my view.
7
decompose {stats} R Documentation
Classical Seasonal Decomposition by Moving Averages
Description Decompose a time series into seasonal, trend and irregular components using moving averages. Deals with additive or multiplicative seasonal component.
Usage decompose(x, type = c("additive", "multiplicative"), filter = NULL)
Arguments x A time series. type The type of seasonal component. Can be abbreviated. filter A vector of filter coefficients in reverse time order (as for AR or MA coefficients), used for
filtering out the seasonal component. If NULL, a moving average with symmetric window is performed.
Details The additive model used is: Y[t] = T[t] + S[t] + e[t] The multiplicative model used is: Y[t] = T[t] * S[t] * e[t] The function first determines the trend component using a moving average (if filter is NULL, a symmetric window with equal weights is used), and removes it from the time series. Then, the seasonal figure is computed by averaging, for each time unit, over all periods. The seasonal figure is then centered. Finally, the error component is determined by removing trend and seasonal figure (recycled as needed) from the original time series. This only works well if x covers an integer number of complete periods.
8
e.g. from earlier notes Lake Huron water levels. (Notes 1a, p10-12)
Another Example : Power Demand in Ontario: ONTpower.csv: http://www.ieso.ca/en/Power-Data Part of first day.
Date Hour DOY01-Jan-17 1 17172 13522 0.04201-Jan-17 2 16757 13117 0.08301-Jan-17 3 16370 12816 0.12501-Jan-17 4 16075 12605 0.16701-Jan-17 5 16050 12563 0.20801-Jan-17 6 15797 12544 0.25001-Jan-17 7 15861 12758 0.29201-Jan-17 8 16137 13065 0.33301-Jan-17 9 16143 13367 0.37501-Jan-17 10 16530 13644 0.41701-Jan-17 11 16628 13728 0.45801-Jan-17 12 16637 13555 0.50001-Jan-17 13 16596 13567 0.54201-Jan-17 14 16515 13441 0.58301-Jan-17 15 16623 13442 0.62501-Jan-17 16 16986 13926 0.66701-Jan-17 17 17892 14942 0.70801-Jan-17 18 19171 16279 0.750
Total Demand
Ontario Demand
Want to import these data into R or other software. read.csv(file, header = TRUE, sep = ",", quote = "\"",dec = ".", fill = TRUE, comment.char = "", ...) > ONTP <- read.csv ("D:/4020-2017/ONTpower.csv",header=TRUE, sep=";") > class(ONTP) [1] "data.frame" > dim(ONTP) [1] 2160 5 > ONTP[1:4,1:5] Date Hour Total.Demand Ontario.Demand DOY 1 01-Jan-17 1 17172 13522 0.042 2 01-Jan-17 2 16757 13117 0.083 3 01-Jan-17 3 16370 12816 0.125>
9
> DOY <- ONTP[,5] > DOY[1:10] [1] 0.042 0.083 0.125 0.167 0.208 0.250 0.292 0.333 0.375 0.417 > ONTD <- ONTP[,4] > ONTD[1:10] [1] 13522 13117 12816 12605 12563 12544 12758 13065 13367 13644 > ONTD.ts = ts(ONTD,delta=1/24); > class(ONTD.ts);[1] "ts" > ONTDD <- decompose(ONTD.ts, "additive", filter=NULL); > plot(ONTDD) We set the period as 24 hrs. Problem is that “trend” includes a weekly variation. Maybe set frequency as 7x24 = 168 \> ONTDW.ts = ts(ONTD,delta=1/168) > ONTDWD <- decompose(ONTDW.ts, "additive", filter=NULL) > plot(ONTDWD)
1200
016
000
2000
0
obse
rved
1300
015
000
1700
0
trend
-200
00
1000
seas
onal
-100
00
1000
0 20 40 60 80
rand
om
Time
Decomposition of additive time series
1200
016
000
2000
0
obse
rved
1450
015
500
1650
0
trend
-200
00
2000
seas
onal
-200
00
1000
2 4 6 8 10 12 14
rand
om
Time
Decomposition of additive time series
10
Exponential Smoothing as a forecast method. CM2009, Chapter 3 (3.4) and B&D, Section 9.2 (2002) or 10.2(2016) Chatfield (2001), Perhaps the best known (ad hoc) forecasting method is that called exponential smoothing (ES). Simplest is simple exponential smoothing (SES). A one step ahead forecast xfN(1) for xN+1 is based on a geometric(?) sum of past observations, [take care with notation! xfN(h) is forecast h steps ahead of xN , α < 1] xfN(1) = αxN + α(1-α)xN-1 + α(1-α)2xN-2 + α(1-α)3xN-3 + ..... (1) where α is the smoothing parameter. Equivalent to xfN(1) = αxN + (1-α)xfN-1(1) CM 2009: 3.4.1 Exponential smoothing Our objective is to predict some future value xn+k given a past history {x1, x2, . . . , xn} of observations up to time n. In this subsection we assume there is no systematic trend or seasonal effects in the process, or that these have been identified and removed. The mean of the process can change from one time step to the next, but we have no information about the likely direction of these changes.?? A typical application is forecasting sales of a well-established product in a stable market. The model is xt = μt + wt (3.14) where μt is the non-stationary mean of the process at time t and wt are independent random deviations with a mean of 0 and a standard deviation, σ. We will follow the notation in R and let at be our estimate of μt. Given that there is no systematic trend, an intuitively reasonable estimate of
11
the mean at time t is given by a weighted average of our observation at time t and our estimate of the mean at time t − 1: at = αxt + (1 − α)at−1 0 < α < 1 (3.15) The at in Equation (3.15) is the exponentially weighted moving average (EWMA) at time t. The value of α determines the amount of smoothing, and it is referred to as the smoothing parameter. If α is near 1, there is little smoothing and at is approximately xt. This would only be appropriate if the changes in the mean level were expected to be large by comparison with σ. At the other extreme, a value of α near 0 gives highly smoothed estimates of the mean level and takes little account of the most recent observation. This would only be appropriate if the changes in the mean level were expected to be small compared with σ. A typical compromise figure for α is 0.2 since in practice we usually expect that the change in the mean between time t − 1 and time t is likely to be smaller than σ. Alternatively, R can provide an estimate for α, and we discuss this option below. Since we have assumed that there is no systematic trend and that there are no seasonal effects, forecasts made at time n for any lead time are just the estimated mean at time n. The forecasting equation is xfn(k) = an k = 1, 2, . . . (3.16) Calculating exponential weighted mean. Matlab Econometrics Toolbox™ User's Guide (3944 pages !!!) No matches for Exponential Smoothing. But see https://en.wikipedia.org/wiki/Exponential_smoothing
12
It is available in smoothts.m but "smoothts will be removed in a future release. Use smoothdata instead." and it doesn't seem to be available there. A simple code below clear %AR2 with exponential smoother ARUNS=9; STEPS=50; alpha1 = 5/6;alpha2 = -1/6; exalpha = 0.2 exalpha alpha1 alpha2 ARUNS STEPS rng(0);%for repeatability
RN = randn(STEPS, 1); %different from assignment 2. Y(1)=0; Y(2)=0;A(1)=0; A(2)=0; for s = (3:STEPS) Y(s) = alpha1*Y(s-1) + alpha2*Y(s-2) + RN(s); A(s) = exalpha*Y(s) +(1-exalpha)*A(s-1); end plot(Y) %comes in red hold on plot(A) %YR=10; xlim([0,STEPS]); %ylim([-YR,YR]) title('AR2 Example with exponential smoothing') hold off pause
exalpha = 0.2 exalpha=0.8
For forecasts one would use the value of A at each step as the forecast for all future times.
13
Lets try with smoothts: smoothts(Y,”e”,0.2) - 'smoothts' requires Financial Toolbox. Needs to be downloaded and installed! AND >> YS=smoothts(Y,"e",0.2) Warning: SMOOTHTS will be removed in a future release. Use SMOOTHDATA instead. 0.2 0.8 and 0.5 and 0.2
YS=smoothdata(Y,"e",0.5); Error using smoothdata>parseInputs (line 356) Smoothing method must be 'movmean', 'movmedian', 'gaussian', 'lowess', 'loess', 'rlowess', 'rloess', or 'sgolay'. i.e. no exponential smoothing available!
14
Also note Shumway-Stoffer, EWMA – Exponentially weighted Moving Average, “In addition, the model leads to a frequently used, and abused, forecasting method called exponentially weighted moving averages (EWMA)…… In EWMA, the parameter α (=1-λ in S-S) is often called the smoothing parameter and is restricted to be between zero and one. Larger values of (1-α) i.e smaller α) lead to smoother forecasts. This method of forecasting is popular because it is easy to use; we need only retain the previous forecast value and the current observation to forecast the next time period. Unfortunately, as previously suggested, the method is often abused because some forecasters do not verify that the observations follow an IMA(1; 1) process, and often arbitrarily pick values of α.” 3.4.2 Holt-Winters method – from CM 2009 We usually have more information about the market than exponential smoothing can take into account. Sales are often seasonal, and we may expect trends to be sustained for short periods at least. But trends will change. If we have a successful invention, sales will increase initially but then stabilize before declining as competitors enter the market. We will refer to the change in level from one time period to the next as the slope. Seasonal patterns can also change due to vagaries of fashion and variation in climate, for example. The Holt-Winters method was suggested by Holt (1957) and Winters (1960), who were working in the School of Industrial Administration at Carnegie Institute of Technology, and uses exponentially weighted moving averages to update estimates of the seasonally adjusted mean (called the level ), slope, and seasonals. The Seasonal component adds complexity – Brockwell and Davis just update slope. Easier if no seasonal component. i.e. st = 0 in equations below. Note “slope” is increment per time step.
15
16
So, except for the seasonal component we extrapolate forward in time from an exponentially smoothed value with an exponentially smoother slope. clear %HoltWinters – simple, no seasonal STEPS=50; alpha1 = 5/6;alpha2 = -1/6; exalpha = 0.5; exbeta=0.5; rng(0);%for repeatability RN = randn(STEPS, 1); %different from assignment 2. Y(1)=0; Y(2)=0;A(1)=0; A(2)=0;B(1)=0;B(2)=0; for s = (3:STEPS) Y(s) = alpha1*Y(s-1) + alpha2*Y(s-2) + RN(s); A(s) = exalpha*Y(s) +(1-exalpha)*(A(s-1)+B(s-1)); B(s) = exbeta*(A(s)-A(s-1)) + (1-exbeta)*B(s-1); YH1(s+1)= A(s) + B(s); YH5(s+5)=A(s) + 5*B(s); end plot(Y,'black') hold on %plot(A) plot (YH1,'red--+') plot(YH5,'green--o') %YR=10; xlim([0,STEPS+5]); %ylim([-YR,YR]) title('AR2 Example with Holt-Winters smoothing') hold off
17
HW 0.5, 0.5
Some HW matlab code on web but not in matlab packages as far as I could find.
18
Partial Autocorrelation : Some notes from Matlab Help 22 Oct 2019 The theoretical ACF and PACF for the AR, MA, and ARMA conditional mean models are known, and quite different for each model. The differences in ACF and PACF among models are useful when selecting models. The following summarizes the ACF and PACF behavior for these models. Model ACF PACF AR(p) Tails off gradually Cuts off after p lags MA(q) Cuts off after q lags Tails off gradually ARMA(p,q) Tails off gradually Tails off gradually From Wikipedia In time series analysis, the partial autocorrelation function (PACF) gives the partial correlation of a stationary time series with its own lagged values, regressed the values of the time series at all shorter lags. It contrasts with the autocorrelation function, which does not control for other lags. This function plays an important role in data analysis aimed at identifying the extent of the lag in an autoregressive model. The use of this function was introduced as part of the Box–Jenkins approach to time series modelling, whereby plotting the partial autocorrelative functions one could determine the appropriate lags p in an AR (p) model or in an extended ARIMA (p,d,q) model.
19
Cowpertwait and Metcalfe, p81
4.5.6 Partial autocorrelation From Equation (4.21), the autocorrelations are non-zero for all lags even though in the underlying model xt only depends on the previous value xt−1 (Equation (4.18)). The partial autocorrelation at lag k is the correlation that results after removing the effect of any correlations due to the terms at shorter lags. For example, the partial autocorrelation of an
20
AR(1) process will be zero for all lags greater than 1. In general, the partial autocorrelation at lag k is the kth coefficient of a fitted AR(k) model; BUT ONLY for k, not k-1 etc. If the underlying process is AR(p), then the coefficients k will be zero for all k > p. Thus, an AR(p) process has a correlogram of partial autocorrelations that is zero after lag p. Hence, a plot of the estimated partial autocorrelations can be useful when determining the order of a suitable AR process for a time series. In R, the function pacf can be used to calculate the partial autocorrelations of a time series and produce a plot of the partial autocorrelations against lag (the ‘partial correlogram’). Also look at Shumway and Stoffer, Section 3.3 So lets build simple AR(1) and AR(2) time series and look at their ACFs and PACFs. % AR2D.m clear STEPS=10000; sigma= 1.0; alpha1 = 0.8; alpha2 = 0; Y=1:STEPS; rng(0);%for repeatability RN = sigma * randn(STEPS, 1); Y(1)= RN(1); Y(2)=RN(2); for s = (3:STEPS) Y(s) = alpha1*Y(s-1) + alpha2*Y(s-2) + RN(s); end plot(Y) title('AR Examples') xlabel('X step'); ylabel('Y displacement'); pause autocorr(Y);
21
pause parcorr(Y)
Note for AR(1) ρk = α1k , where α2 = 0. So if α1 = 0.8, ρ1 = 0.8, ρ2 = 0.64, ρ3 = 0.512, etc
22
With alpha1 = 0.6 alpha2 = 0.3; sigma=1 (can change that)
We can show for an AR(2) series ρ1=α1/(1-α2)=0.857, ρ2=α1ρ1+α2=0.8143, ρ3=α1ρ2+α2ρ1 =0.746
Note pacf(2) ≈ α2 and that α1=ρ1*(1-α2)=0.6
23
Lake Erie Water Levels So, If you have a sample time series and the pacf drops to near zero after 1 or 2 steps you might be able to approximate/model that series with an AR(1) or AR(2) model, and use that for short term forecasting. Recall Lake Erie water levels, Try with all data, no trend removal etc %LakeErie.m clear YY=csvread("E:4020/GTLAKES/YY.csv"); YA=csvread("E:4020/GTLAKES/YA.csv"); plot(YY,YA); %works without transpose, title('Lake Erie Annual level') xlabel('YEAR'); ylabel('Height (m)'); axis ([1920 2020 174 176]); pause autocorr(YY); pause parcorr(YY)
24
But probably best to remove trend and then fit a model. Linear regression fit %LakeErie.m clear YY=csvread("E:4020/GTLAKES/YY.csv"); YA=csvread("E:4020/GTLAKES/YA.csv"); YYD=YY-YY(1); YAM = mean(YA) YAD = YA-YAM; mdl=fitlm(YYD,YAD) plot(YYD,YAD); %works without transpose, title('Lake Erie Annual level perturbation') xlabel('Years from 1918'); ylabel('Height depature from Mean (m)'); hold on plot(YYD,mdl.Fitted) hold off YYS = YAD-mdl.Fitted; YYSM=mean(YYS) YYSVAR=var(YYS,1)
pause
Now lets look at YYS, the stochastic part. YYSM = 8.6839e-17; YYSVAR = 0.0939 hold off YYS = YAD-mdl.Fitted; YYSM=mean(YYS); YYSVAR=var(YYS,1); pause autocorr(YYS); pause parcorr(YYS)
25
Inference is that an AR(2) model might fit pause; autocorr(YYS); ACYYS=autocorr(YYS) pause; parcorr(YYS); PACYYS=parcorr(YYS) ACYYS = 1.0000, 0.8200, 0.5787, 0.4113, 0.2947, 0.2431, …….. PACYYS = 1.0000, 0.8258, -0.2882, 0.1082, -0.0411, 0.1140, ……. So the implication is that an AR(2) model with α2 = -0.2882 and α1=ρ1*(1-α2) = 1.056 might be appropriate
26
Is this stationary?? clear STEPS=10000; alpha1 = 1.056;alpha2 = -0.2882; alpha1 alpha2 STEPS p=[-alpha2 -alpha1 1]; rts=roots (p); rts if min(abs(rts))<=1; 'rts <=1' pause ; % could we stop? end; Y=1:STEPS; rng(0);%for repeatability RN = 5*randn(STEPS, 1); Y(1)= RN(1); Y(2)=RN(2); for s = (3:STEPS) Y(s) = alpha1*Y(s-1) + alpha2*Y(s-2) + RN(s); end plot(Y) title('AR Examples') xlabel('X step'); ylabel('Y displacement'); pause autocorr(Y); pause parcorr(Y)
rts = 1.8321 + 0.3367i ; 1.8321 - 0.3367i
One and 2 year forecasts. plot (YYD, YYS, 'black'); title('Lake Erie Annual level perturbation from Linear best fit') xlabel('Years from 1918'); ylabel('Height depature from Mean (m)'); hold on alpha1=1.056 ; alpha2 = -0,2882; YYSF1(1)=0; YYSF1(2)=0;YYSF2(1)=0; YYSF2(2)=0; for s = 3:101; YYSF1 (s) = alpha1*YYS(s-1) + alpha2*YYS(s-2); % AR(2) + persistence YYSF2 (s) = alpha1*YYSF1(s-1) + alpha2*YYS(s-2); PYSF1(s)=YYS(s-1); PYSF2(s)=YYS(s-2); end; % seems to create row vectors
27
plot (YYD, YYSF1, 'blue'); plot(YYD, YYSF2, 'red')
Black is original data, Blue is forecast from 1 step back, red is forecast from 2 steps back. Are these any better than persistence (PYSF1, PYSF2)? % Not sure why I need to transpose but .... - because my original data were 1 column and many rows MY1 = mean(YYSF1'-YYS) VAR1 = var(YYSF1'-YYS,1) MY2 = mean(YYSF2'-YYS) VAR2 = var(YYSF2'-YYS,1) MPY1 = mean(PYSF1'-YYS) VARP1 = var(PYSF1'-YYS,1) MPY2 = mean(PYSF2'-YYS) VARP2 = var(PYSF2'-YYS,1) MY1 = -0.0036 VAR1 = 0.0286 MY2 = -0.0044 VAR2 = 0.0612 MPY1 = -0.0046 VARP1 = 0.0340 MPY2 = -0.0045 VARP2 = 0.0781 var(YYS) ans = 0.0949 mean(YYS) = 3.7923e-17
------------------------------------------------------