presentation

30
Forecasting U.S Gasoline Prices: ARIMA Modeling Using R Srinivas KNS

Upload: srinivas-kns

Post on 29-Jul-2015

66 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Forecasting U.S Gasoline Prices: ARIMA Modeling Using R

Srinivas KNS

Gasoline (Quick Facts) Gasoline or petrol is a transparent, petroleum-

derived liquid that is used primarily as a fuel in internal combustion engines.

A 42 gallon Crude produces19 gallons of gasoline through Fractional Distillation.

Some of the main components of gasoline: isooctane, butane, 3-ethyltoluene, and the octane enhancer MtBE.

Gasoline Consumption in US Gasoline accounts for about 66% of all the energy

used for transportation, 47% of all petroleum consumption, and 17% of total U.S. energy consumption.

Consumption in 2010 was about 138 billion gallons, an average of about 378 million gallons per day.

63% of the crude oil for Gasoline manufacturing is imported.

Retail Price of Gasoline and its components.

Ave retail price till 2013 is

3.43

Ave retail price in

2013 is 3.58

Objective of the ANALYSIS To Understand Price volatility due supply and

Demand Constraints.

The study on gasoline price will help to design better options for pricing mechanism for markets.

Trend estimates allow the investigation of empiri cal regularities that characterize the gasoline markets.

DATA DESCRIPTION

Data is obtained from U.S Energy Information Administration (EIA). (Publicly Available data).

The data set used is U.S. All Grades All Formulations Retail Gasoline Prices (Dollars per Gallon) prevailing in the United States since Apr-1993 to November, 2014. In total we have 260 observations.

Data Collection Mechanism and Policy. Every Monday, retail prices for all three grades of

gasoline: regular, midgrade, and premium, are collected by telephone from a sample of approximately 900 retail gasoline outlets.

the prices are published by 5:00 P.M. Monday, except on government holidays, when the data are released on tuesday (but still represent Monday's price).

the reported price includes all taxes and is the pump price paid by a consumer as of 8:00 A.M. Monday.

Variable DescriptionVariable Name Description

1. time time series in ts format which runs from April, 1993 to Nov,2014

2. Gasoline Prices Monthly prices of gasoline corresponding to the time

3. Lag value of Gasoline prices

Monthly prices of gasoline lagged by one period

4. First difference Gasoline prices

Monthly prices of gasoline by taking difference of gp and lgp.

Box Jenkins Methodology Box Jenkins Methodology or ARIMA Model – Auto

Regressive Integrated Moving Average Model).

A Stochastic Modelling Approach.

AR Process. Forecast the variable of interest using a linear

combination of past values of the variable.

where et is white noise. et∼ iidN(0,σ2e)

MA Process A moving average model uses past forecast errors in

a regression-like model.

where et is white noise. et∼iidN(0,σ2e)

A Combined ARIMA Model

Model Selection Plot the data. Identify any unusual observations.

If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.

If the data are non-stationary: take first differences of the data until the data are stationary.

Examine the ACF/PACF: Is an AR(p) or MA(q) model appropriate?

Try your chosen model(s), and use the AICc to search for a better model.

Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.

Once the residuals look like white noise, calculate forecasts.

Model Identification Algorithm

Identification of the model: •Choosing tentative p,d,q

Parameter estimation• Identification of AR & MA Coefficients

Diagnostic checking• Are the estimated residuals white noise ?

Forecast

Yes

NO

Stationarity Stationary time series is one whose properties do not

depend on the time at which the series is observed.

There are two methods in order to determine Stationarity in a time series.

ACF and PACF Plots Unity Root Tests

Augmented Dicky Fuller test(ADF Test) H0 = the data needs to be differenced to make it

stationary Ha = the data is stationary and doesn’t need to be

differenced Kwiatkowski-Phillips-Schmidt-Shin (KPSS)test

H0 = observable time series is stationary Ha = observable time series is not stationary

Diagnostic Checking There are Two ways to check for the

residuals. ACF of Residuals to see that there are no

correlations in the residuals portmanteau test

H0: The data are independently distributed Ha: The data are not independently distributed.

ACF and PACF Plots

Both Plots Indicate that given Time series is non stationary

Augmented Dicky Fuller Test

Large P –value(>0.05) Indicates that given time series is not stationary

Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test

Small P value suggests that the series is not stationary

Observations from the Initial Analysis. Given Time series is not Stationary Log Transformation of the series is

required in order to stabilize the variance.

Transformation and Tentative Model Selection. Applying Log Transformation.

Taking First Difference of the transformed Data.

Different models are chosen to minimize the Aicc values.

Plot and Unity tests of 1st Difference - log transformed Series

1995 2000 2005 2010 2015

-0.3

-0.10.00.1

0 5 10 15 20 25 30 35

-0.2

0.0

0.2

0.4

Lag

ACF

0 5 10 15 20 25 30 35

-0.2

0.0

0.2

0.4

Lag

PACF

Time plot and ACF and PACF plots for differenced Transformed gasoline price data

Tentative Model SelectionARIMA(p,d,q) AIC AICC BICARIMA(1,1,1)� -822.45 -822.35 -811.78ARIMA(1,1,2)� -824.51 -824.36 -810.29ARIMA(1,1,3)� -829.8 -829.57 -812.02ARIMA(2,1,1)� -824.69 -824.54 -810.47ARIMA(2,1,2)� -825.23 -824.99 -807.44ARIMA(2,1,3)� -828.78 -828.44 -807.43ARIMA(3,1,1)� -822.77 -822.54 -804.99ARIMA(3,1,2)� -824.55 -824.22 -803.21ARIMA(3,1,3)� -826.83 -826.38 -801.93

From the above table we can conclude that ARIMA(1,1,3) model has least AICC Value hence the best

model for this data.

Final Model

Equation of the Final Model

(1 - 0.68d) (1-d) yt = (1- 0.08d - 0.41d2 - 0.22d3

)et

Model Validation

ACF plot of Residuals

Portmanteau test

The given Model Satisfy all the necessary conditions

Forecasts

Conclusion ARIMA(1,1,3) reveals significant coefficients for all

lags in the AR process, which basically implies that gasoline prices in period t are significantly related with past period gasoline price levels.

On the other hand error term (MA process) has also significant coefficients for all (3 in this case) last period error values. That clearly reveals the fact that gasoline prices are significantly related to both the past period price levels and to unobserved factors.

This is the fact that has resulted in terms of higher gasoline prices under forecasting in this analysis.

Discussion