Download - Fitting and Predicting a Time Series Model
7/27/2019 Fitting and Predicting a Time Series Model
http://slidepdf.com/reader/full/fitting-and-predicting-a-time-series-model 1/14
Fitting and Predicting a Time Series Model(Midterm 2)April 11, 2013
Sueja Goldhahn23185225
Prof. GuntuboyinaIntro to Time Series Analysis
7/27/2019 Fitting and Predicting a Time Series Model
http://slidepdf.com/reader/full/fitting-and-predicting-a-time-series-model 2/14
Summary
As a midterm take-home assignment, the task is to fit the best possible model
to a given time series dataset and predict the outcome for the next year. The time
series dataset was chosen from a group of 5 datasets given for the assignment. The
information of the data is unknown, other than the fact that the data is weekly data
from Google Trends, obtained on March 20, 2013. The dataset consists of 429 data
points, comprised of 8 years and 13 weeks of information from the first week of
January 4, 2004 to March 24, 2012, as seen in Figure 1. The data for the next year,
from March 25, 2012 to March 23, 2013, are to be predicted.
After analyzing the data closely, the model chosen to be the best fit is a
multiplicative seasonal autoregressive moving average model, ()
(). Using this model, the resulting prediction for the next year is given in Figure
11 and 12.
This report will describe the methods used to choose the best model for the
time series dataset, and explain why I believe this model is the best fit to the dataset.
Then I will go on to explain the techniques used to predict the model for the next
year.
7/27/2019 Fitting and Predicting a Time Series Model
http://slidepdf.com/reader/full/fitting-and-predicting-a-time-series-model 3/14
7/27/2019 Fitting and Predicting a Time Series Model
http://slidepdf.com/reader/full/fitting-and-predicting-a-time-series-model 4/14
Method Used to Fit the Model
The first thing that needs to be observed in the data is trend and variability.
By observing the data, it is very apparent that there is a quadratic trend, along withseasonality. The variability in the data stays constant throughout, indicating that a
transformation of the data would not be necessary. This would suggest that the data
needs to be differenced once to eliminate the quadratic trend, and then differenced
again to eliminate the seasonality. The second time will be differenced with lag
equal to 52, which is the number of weeks in a year.
After being differenced, the trend should now be eliminated, and the data
should look like white noise. The differenced data is displayed as Figure 2. The plot
visibly seems to have no structure. To check for the optimal orders of differencing,
the standard deviation of the data should be considered. A correctly differenced
data should have a small standard deviation. A table of standard deviations for each
order of differencing is displayed in Figure 3, showing that the standard deviation is
indeed the smallest for this order of differencing.
Figure 3
Standard Deviation of
Differenced Data Order 0
Lag 1:
Order 1 Order 2
Order 0 6.782 3.556 6.387
Lag 52: Order 1 3.895 2.145 9.936
Order 2 4.051 3.289 17.475
Once the correct order of differencing is attained, the next step is to look at
the autocorrelation and partial autocorrelation functions to determine the
autoregressive and moving average terms. The autocorrelation function is displayed
in Figure 4, and the partial autocorrelation function is displayed in Figure 5.
7/27/2019 Fitting and Predicting a Time Series Model
http://slidepdf.com/reader/full/fitting-and-predicting-a-time-series-model 5/14
Figure 4
Figure 5
Examining the autocorrelation function gives several clues as to what fit to
use in this model. The first thing to note is that the model has a seasonal MA term as
a result of the negative 52nd lag. The 104th lag is significant, with the value at that
point being 0.109, lying above the standard deviation. Also, there is some
asymmetry around these lag points. Hence, the two values to consider in the model
is SMA(2) and SMA(3).
7/27/2019 Fitting and Predicting a Time Series Model
http://slidepdf.com/reader/full/fitting-and-predicting-a-time-series-model 6/14
Next thing to note is that lag-1 is significant and the autocorrelations cut off
after lag-1, which is very characteristic of an MA model. It is possible that there also
contains an AR value mixed into the equation due to the shrinking variability in the
autocorrelations as the lags increase.
The partial autocorrelation shows characteristics of an MA model as a result
of the slow decay. From these observations, a few good models to test out are:
() ()
() ()
() ()
() ()
After fitting the data to each of these models, I had chosen two of the best fits
based on their AIC score: Refer to the appendix for the results of each model under
“Results of Each Model .”
() ()
() ()
To see how close the theoretical autocorrelation functions of each model
match up to the actual autocorrelations, I had plotted the two autocorrelations
together in Figure 6-9. This helps to see how well the models fit to the data, and if
they are any good at computing the phi and theta values. To see the phi and theta
values used, as well as the function, please refer to the appendix under “Results of
Each Model .”
7/27/2019 Fitting and Predicting a Time Series Model
http://slidepdf.com/reader/full/fitting-and-predicting-a-time-series-model 7/14
Figure 6
Figure 7
7/27/2019 Fitting and Predicting a Time Series Model
http://slidepdf.com/reader/full/fitting-and-predicting-a-time-series-model 8/14
Figure 8
Figure 9
The blue points are the autocorrelation from the data, and the red points are the
theoretical autocorrelations. Both models seem to capture the structure of the
autocorrelations, which tells me that the models chosen are a good fit. The second
model, which has a smaller AIC, looks as though the theoretical autocorrelations fit
better than the first model.
7/27/2019 Fitting and Predicting a Time Series Model
http://slidepdf.com/reader/full/fitting-and-predicting-a-time-series-model 9/14
Choosing the Best Model for Predicting
To determine which of the two competing models to use, the models must be
tested for how well it predicts the data. That is done through cross-validation. I will
first explain the methodology used in cross-validating the data, and then show the
results of the Cross Validation score.
Given that there are 8 years and 13 weeks in the dataset, Cross-validation is
done through predicting values of those years and taking the sum of squares of
errors in the predictions. It is optimal to predict as many years within the data as
possible. The number of data points required in predicting the model limits the
number of years that I am able to predict. Obviously, I would not be able to predict
the first year of data, as there are no data points to predict from.
The minimum number of years of data needed to predict are 4 years, so I
used the data points 1 through 221 to predict data points from 222 to 273. Then I
used the data points 1 through 273 to predict data points from 274 to 325, and so
on. The accuracy in the predictions is found through comparing the predicted data
to the actual data by taking the difference and using the sums of squares method.
This is done for each of the 4 years that are predicted, and then averaged to attain
the Cross-Validation score. The model with the smallest CV score is the best model
for the dataset, and will be used to predict the next year’s data.
The CV score results are shown in Figure 10 and 11, along with the plots of
the actual data (in black) and predicted data (in red). It is evident that the models
are able to predict the data accurately. Although the second model has a slightly
7/27/2019 Fitting and Predicting a Time Series Model
http://slidepdf.com/reader/full/fitting-and-predicting-a-time-series-model 10/14
better AIC score, the CV score is significantly larger than the CV score for the first
model. Therefore, the best model for this data set is the first model,
ARIMA(1,1,1)X(0,1,3). This model will be used to forecast the next year’s data in the
following section.
Figure 10
ARIMA(1,1,1)X(0,1,3)
CV = 479.9672
Figure 11
ARIMA(1,1,2)X(0,1,3)
CV = 495.4493
7/27/2019 Fitting and Predicting a Time Series Model
http://slidepdf.com/reader/full/fitting-and-predicting-a-time-series-model 11/14
The Forecast
Now that the best model for the dataset is chosen, the next year will be
forecasted using the model. The data, including the forecast, is displayed in Figure
11. The forecasted data visually look accurate. The 95% confidence interval of the
forecasted data is exhibited in Figure 12.
Figure 11
Figure 12
7/27/2019 Fitting and Predicting a Time Series Model
http://slidepdf.com/reader/full/fitting-and-predicting-a-time-series-model 12/14
# APPENDIX
# data: d1 #
plot(d1,type="l",main = "Time Series Data", ylab="",xlab="Weeks",sub="January 04, 2004 to
March 24, 2012")#Difference for seasonality and trendd = diff(d1, lag = 52)d = diff(d)
#Check to see that there is no trend left in the model
plot(d,type="l",main = "Differenced Data", ylab="",xlab="")
#Check acf and pacf
acf(d, lag.max = 200,main="Autocorrelation Function")pacf(d, lag.max = 200,main="Partial Autocorrelation Function")
#Results of Each Model:
# ARIMA(0,1,1)X(0,1,3)arima(d1, order = c(0, 1, 1), seasonal = list(order = c(0, 1, 3), period = 52))#aic = 1508.32# ma1 sma1 sma2 sma3# -0.6419 -0.3395 0.2599 0.1885#s.e. 0.0515 0.0610 0.0750 0.0770
#ARIMA(0,1,1)X(0,1,2)arima(d1, order = c(0, 1, 1), seasonal = list(order = c(0, 1, 2), period = 52))#aic = 1512.92# ma1 sma1 sma2# -0.6290 -0.3430 0.2827#s.e. 0.0513 0.0653 0.0748
#ARIMA(1,1,1)X(0,1,3)arima(d1, order = c(1, 1, 1), seasonal = list(order = c(0, 1, 3), period = 52))#aic = 1504.52# ar1 ma1 sma1 sma2 sma3# 0.2149 -0.7890 -0.3429 0.2711 0.1873
#s.e. 0.0815 0.0547 0.0604 0.0766 0.0772
#ARIMA(1,1,2)X(0,1,3)arima(d1, order = c(1, 1, 2), seasonal = list(order = c(0, 1, 3), period = 52))#aic=1504.34# ar1 ma1 ma2 sma1 sma2 sma3# 0.6451 -1.2347 0.3173 -0.3372 0.2759 0.1881#s.e. 0.1904 0.2061 0.1557 0.0603 0.0758 0.0771
7/27/2019 Fitting and Predicting a Time Series Model
http://slidepdf.com/reader/full/fitting-and-predicting-a-time-series-model 13/14
#Theoretical Autocorrelation of ARIMA(1,1,1)X(0,1,3)
#phi and theta values:ph = .2149th = c(-.789, rep(0, 50), -0.3429, -.3429*-.789,
rep(0, 51), 0.2711, .2711*-.789,rep(0,51),.1873,.1873*-.789)
acf(d, lag.max=175,main = "Theoretical Autocorrelation of ARIMA(1,1,1)X(0,1,3)",col="blue")ACF = ARMAacf(ar = ph, ma = th, lag.max = 175)points(x=0:175,y=ACF, col="red",type="h")
pacf(d, lag.max=175,main = "Theoretical Partial Autocorrelation of ARIMA(1,1,1)X(0,1,3)",col="blue")PACF = ARMAacf(ar = ph, ma = th, lag.max = 175, pacf = T)points(x=1:175,y=PACF, col="red",type="h")
#Theoretical Autocorrelation of ARIMA(1,1,2)X(0,1,3)#phi and theta values:ph = .6451th = c(-1.2347,.3173,rep(0,49),-.3372, -.3372*-1.2347,-.3372*.3173,rep(0,49),.2759,
.2759*-1.2347,.2759*.3173,rep(0,49),.1881,.1881*-1.2347,.1881*.3173)
acf(d, lag.max=175,main = "Theoretical Autocorrelation of ARIMA(1,1,2)X(0,1,3)",col="blue")ACF = ARMAacf(ar = ph, ma = th, lag.max = 175)points(x=0:175,y=ACF, col="red",type="h")
pacf(d, lag.max=175,main = "Theoretical Partial Autocorrelation of ARIMA(1,1,2)X(0,1,3)",col="blue")PACF = ARMAacf(ar = ph, ma = th, lag.max = 175, pacf = T)points(x=1:175,y=PACF, col="red",type="h")
#Cross-validation
#model 1: ARIMA(1,1,1)X(0,1,3)
pred = rep(0,208)CV = rep(0,4)
for (i in 0:3){k = 221 + i * 52nd1 = d1[1:k]d1fit = arima(nd1, order = c(1, 1, 1), seasonal = list(order = c(0, 1, 3), period = 52))d1fc = predict(d1fit, n.ahead = 52)pred[(1 + i * 52):((i + 1) * 52)] = as.numeric(d1fc$pred)CV[ i + 1 ] = sum((d1[(k + 1):(221 + (i + 1) * 52)]-as.numeric(d1fc$pred))^2)
}mean(CV)
7/27/2019 Fitting and Predicting a Time Series Model
http://slidepdf.com/reader/full/fitting-and-predicting-a-time-series-model 14/14
plot( c(222:429), d1[222:429], type = "l",
main = "Comparison of Prediction to Actual Data", xlab = "Time", ylab = "")points( c(222:429), pred, col = "red",type="l")
#model 2: ARIMA(1,1,2)X(0,1,3)
pred = rep(0,208)CV = rep(0,4)
for (i in 0:3){k = 221 + i * 52nd1 = d1[1:k]d1fit = arima(nd1, order = c(1, 1, 2), seasonal = list(order = c(0, 1, 3), period = 52))d1fc = predict(d1fit, n.ahead = 52)pred[(1 + i * 52):((i + 1) * 52)] = as.numeric(d1fc$pred)CV[ i + 1 ] = sum((d1[(k + 1):(221 + (i + 1) * 52)]-as.numeric(d1fc$pred))^2)
}mean(CV)
plot( c(222:429), d1[222:429], type = "l",main = "Comparison of Prediction to Actual Data", xlab = "Time", ylab = "")
points( c(222:429), pred, col = "red",type="l")
#The Forecast Using Model ARIMA(1,1,1)X(0,1,3)
d1fit = arima(d1, order = c(1, 1, 1), seasonal = list(order = c(0, 1, 3), period = 52))
d1fc = predict(d1fit, n.ahead = 52)
U = d1fc$pred + 2*d1fc$seL = d1fc$pred - 2*d1fc$se
newx = 1:481newy = c(d1, d1fc$pred)
plot(newx, newy, type = "l",main="Data Including the Forecast",xlab="Weeks",ylab="",sub="January 04, 2012 to March 23, 2013")
plot(430:481, d1fc$pred, type = "l",ylim = c(60,110),
main="Forecast and 95% Confidence Interval for the Next Year",xlab="Weeks", ylab="",sub="March 25, 2012 to March 23, 2013")points(newx[430:481], d1fc$pred + 2*d1fc$se, col = "blue", type = "l")points(newx[430:481], d1fc$pred - 2*d1fc$se, col = "blue", type = "l")