handout trx

1 | G a n e s h M a n j h i

Workshop on Econometric packages

at Indian Council for Research on

International Economic Relations

EVIEWS

29 MaY 2008

Programme: Eviews Training by Ganesh Manjhi

Ph.D, JNU, New Delhi-67


OUTLINE OF CONTENTS:

I. INTRODUCTORY SESSION

II. TIME SERIES ANALYSIS & FORECASTING

III. ECONOMETRIC METHODS WITH EViews

I. INTRODUCTORY SESSION

Target: This session will equip one to take decisions based on time series data. The regression techniques covered in this session will be particularly useful for people interested in forecasting and relating/predicting a variable to/from a single or a set of explanatory variables. It covers the basic elements of ordinary least squares (OLS) models as well as time series econometrics and forecasting.

Prerequisite: Attendees are familiar with MS Office especially MS Excel, and have basic knowledge of statistics - descriptive statistics, both numerical (mean, standard deviation, standard error, etc.) and graphical (histogram, scatter plot, etc.), hypothesis testing and confidence interval.

Sub Contents:

Wokfile basics: This involves the creation of a new work file or loading in the memory an existing one. Along with explanation of Data management.

Basic Data Analysis: a brief description of statistical graphs from series and groups and descriptive statistics.

The classical Linear Regression Model: this will involve methods of estimation- least squares, maximum likelihood, dummy variables, and autocorrelation and hetroskedasticity

Duration: 5-6 pm, 1 hour, 29th May

Refer Handout 1


II TIME SERIES ANALYSIS & FORECASTING

Target: Providing basic knowledge of Structural Time Series Models using eviews. Attendees will able to estimate and analyze econometric models using eviews.

Prerequisite: It entails the knowledge of contents mentioned below and things covered in Session-I.

Sub Contents:

Univariate Methods: Exponential Smoothing, Decomposition, ARIMA Modeling

Multivariate Methods: Single / Multiple Regression Model, 2SLS

Duration: 1 Hour, date has yet to be decided

III ECONOMETRIC METHODS WITH EViews

Sub Contents:

Dummy variable

Granger Causality Test

Maximum Likelihood Estimation

Co integration analysis and vector auto regression (VAR) modeling


Handout

(A)Window Basics

Create workfile and data entering.

Plotting the series, run OLS regressions and computing descriptive statistics

***********************************************************************

To create a workfile, click File/New/Workfile, as shown in the following dialogue

box


The workfile will come as shown below:

As data is time series, click on dated regular frequency, and enter the frequency

start: date and end: date. Click OK

Next to enter data by import: Save the excel sheet as text file, click on

File/Import/Read Text-Lotus-Excel


Name for series OR number if named is in file such as we can write “9” in our

case

The imported data will appear in the workfile as follows:


(B)Multivariate Methods

o We have the following variables in the workfile:

Cons- Cement Consumption Demand

gdp – Gross domestic product

cngdp – Construction share in GDP

wpi – whole sale price index

plr – Prime lending rate as proxy for interest rate

prdn – Total production

capin – capacity installed

gdcf – gross domestic capital formation

To save the workfile click File/Save As on the main tool bar

To plot the graph

Highlight the cons and plr on the workfile and double click

Select Open Group

Click on View/Graph/Line on the Group Spreadsheet

Similarly, you can plot the scatter diagram. To plot the regression line

along with scatter plot: click View/Graph/Scatter/Scatter with regression


For the descriptive Statistics:

Highlight the two series on the workfile and double click.

Click Open Group

Click View/Descriptive statistics in the Group spreadsheet

To run the OLS regression:

Click on Objects/New Object/Equation or Quick/Estimate Equation on the

main toolbar. The following dialogue box will appear. Type in the

dependent variable on the LHS (cons) and then the constant (c) and the

independent variable (gdp, wpi, plr) and click OK.

Cons =f (c, gdp, wpi, plr)


The Estimation Output View of the equation would appear

The fitted values can also be viewed along with the actual values of cons and the

residual plot by clicking View/Actual, Fitted, Residual Table.


(i) Estimation of Regression Models


o Two variables Regression Models.

o Multiple Regression Models.

Obtain the descriptive Statistics for the series

To regress cons on the constant plr [Cons = f(c, plr)] :

The F-statistics at the bottom right of the table gives the joint significance of the

coefficients (excluding the constant) in the regression. Since there is only one

slope coefficient, the F-statistic is equal to the square of the t-statistics of plr

We can also observe the serial correlation by Durbin-Watson statistics(0.859)

R-squared is the coefficient of variation shows goodness of fit of the model.


The Multiple Regression Model:

To generate the first difference, lag series and growth series: Click the Genr

button on the workfile toolbar and type:

wpi1=wpi(-1)

gdpgr=(gdp-gdp(-1))/gdp(-1)

cngdp=cngdp(-1)

To regress cons on the constant gdp plr wpi [cons = f(c, gdp, plr, wpi)]:

Click Quick /Estimate Equation and specify the equation in the dialog box.

Click OK to get the estimation output

Alternatively, special functions can be used directly in the equation. For instance,

to get the percentage change (growth) in GDP, the @PCH(gdp) can be used


Click View/Actual, Fitted, Residual/Graph to view the regression graphically

To test whether more two variables/more than two variables are jointly significant

we do the Wald F-test as follows:

Click View/Coefficient Test/Wald coefficient restrictions


The Wald F-stat will look as shown below and the observing the p-value we

see that wpi and growth rate of gdp are jointly significant.


(ii) Multicollinearity

How to deal with the problem of multicollinearity.

*****************************************************************

o Click Quick/Estimate Equation to run the following regression

Cons = f(gdp, cngdp, gdcf, wpi, plr, cngdp)

In the regression result we see that, only the cngdp is significant The R2 is

very high but the variables are not significant. Furthermore, the highly

significant overall F-statistics and low individual t-statistics indicate

collinearity. In order to check this, the correlation matrix of the all the

variables is looked at. Highlight all the variables considered in the above

equation in the workfile, Open Group, View/Correlations


The high correlation coefficient of 0.98 between cngdp and gdp makes the regression

unable to identify the effects of each of these variables separately

Drop gdp and re-run the regressions. The results thus obtained are:

We see that results are improving over all. The Adjusted-R2 , AIC, Schwartz

etc. are giving better results now

We can further fit this model in a better way after observing the serial

correlation and heteroskedasticity.


(iii) Serial Correlation

o How to deal with issues related to serial correlation

***********************************************************************

Proceeding further with the same result as above

Since the Durbin-Watson statistics is close to 2 so it indicates no first order serial

correlation, but we can have higher order serial correlation. So to see the serial

correlation we plot residuals and also check by LM test at various lag.


Observing the F-statistics, we reject the null hypothesis of no serial correlation, so

we get 4th

order serial correlation. To correct the 4th

order serial correlation we

add ar(4).

Observing the F-statistics, we cannot reject the null of no serial correlation at 1%

level of significance.


We can also check the residual to see whether result improved.

Serial Correlation with Lagged dependent variable as Regressors

The estimation output of the equation cons=f(c, cngdp, cngdp1, gdp1, plr, wpi,

gdcf) appears as:

In the presence of the lagged dependent variable as one of the regressors, the

Durbin h statistic is used to test for serial correlation.

Create a coefficient result vector. For this, type coef(10) result in the

command window.

To store the h statistics in the first row of the result vector, use the

following command:

coef result(1)=(1-@dw/2)*(@regobs/(1-@regobs*@covariance(7,7)))^.5

The h-statistics -0.91 is lesser than the critical value of the normal distribution at 5% level

of significance -1.96. Hence, the null hypothesis of no serial correlation is not rejected.


In case the term inside the square root becomes negative, it is not possible to use

the h-statistic for carrying out the test for serial correlation . Alternatively,

In the command window type genr res=resid

Run the following regression

res c res(-1) cngdp, cngdp1, gdp1, plr, wpi, gdcf

For no serial correlation, the coefficient of res (-1) should not be

significantly different from zero. Observing the p-value, the hypothesis

is not rejected


(iv) Heteroskedasticity

o How to deal with heteroskedasticity

************************************************************************

Select the variables gdp cngdp, cons, gdcf, plr, wpi and Open as Group to view

the graph of selected series.

To include a time trend in cement consumption function. For this use special

function @trend( ) . For example, @trend(92.01) in a series with value 0 in

1992.01, value 1 in 1992:02, and so on.

Click Quick/Estimate Equation and specify the following in the dialogue box:

Cons c @trend(92.01) cngdp gdcf, wpi, plr

The specification estimates of cons function with trend


The estimation output appears as:

To view the regression graphically, click View/Actual, Fitted, Residual Graph.

The following graph appears:


Observation: Heteroskedasticity can be traced by observing that the residuals

fluctuate more widely in the middle periods.

To carry out the White‟s test without specifying the form of heteroskedasticity.

Click View/Residual Test/White Heteroskedasticity (cross term). The following

result appears:

The upper window gives the test statistics under the null hypothesis of

homoskedasticity and the associated p-values. F-statistics is the Wald version of


the test and Obs*R-sqaured is the Lagrange multiplier (LM) version of the test.

Observing the p-values, the null hypothesis of homoskedasticity is not rejected.

In the presence of the heteroskedasticity the standard errors from the OLS are

incorrect.

To get consistent standard errors. Re-estimate the cons function and at the

same time click on Options. Therein click on heteroskedasticity Consistent

Covariance.


Click OK, following estimation output appears:

Observing the results, we get expected standard errors have changed.

To generate the weighting series for the efficient cons estimation , use the

following command in the command window:

Genr w= 1/ (@trend (92.1)) ^0.5

Re-estimate the cons function and at the same time click Options and mark the

weighted LS/TSLS icon and specify the weight as w.


The estimation Output appears as:

Observe the E-views give the result for both the weighted and unweighted

statistics. Here we get better result in the case of WLS.


(v) Granger Causality Test

Select the variables and Open as Group and View/Granger Causality select the

lag as shown below in the Lag Specification

You get the output as follows:


(vi) Forecasting with a Single-Equation Regression Model

Change the sample as 1992:Q2 to 2001:Q4

Click Quick/Estimate Equation to estimate the following equation:

Demand: DDt = c0 + c1WPIt+ c2 GDPt-1 + c3 CNGDPt + c4 PLRt+ et

Check for the serial correlation, heteroskedasticity to get the efficient estimates.

However multicollinearity is not the problem for forecasting.

Forecast Option

To forecast from the model:

DDt = c0 + c1WPIt+ c2 GDPt-1 + c3 CNGDPt + c4 PLRt+ et

Click on the Forecast button:

Give the sample range for forecast as 2002:Q1 to 2002:Q4

Give the name for the forecast series, say consf1

Click OK. Note that the forecasting method used in this case in the Static

method as there are no lagged dependent variables on the RHS of the

equation (E-view’s users Guide, Chapter 15)

The following table of the Forecast Evaluations would appear:


To plot the forecasted series together with the actual series:

Set the sample range to the forecasting period i.e. 2000:01 to 2002:04

Highlight the two series in the work file.

Open two series as Open/Group and View/Graph. The graph show that

forecasts initially over predicts and but after 2002:02 it under predict. That

can also be observed from the bias proportion(0.15) in the box above

To plot the actual and the forecasted series along with the forecast interval:

Change the sample range to 2001:01 to 2002:04


Generate the upper and lower bounds of the forecasts interval by the

following commands:

Genr up=consf+2*se

Genr low =consf-2*se

Change the sample range by typing smpl 2000:01 to 2002:04

Highlight cons, consf, up, low, Open Group and View/Graph


(vii)Simultaneous Equation Estimation

Simultaneous Equation Estimation

How to estimate a system of equation

How to carry Hausman test

How to do 2SLS

***********************************************************************

Demand: DDt = c0 + c1WPIt+ c2 GDPt-1 + c3 CONTNt + c4 PLRt+ et Supply: SSt= a0 + a1WPIt+ a2 WPIt-1 +c3 CONTNt+ c4 CAPt+ ut Equilibrium: DD=SS

This is an example of simultaneous equation model. Here we are taking WPI, DD and SS

as endogenous variables and rest of the variables we are considering is either exogenous

or predetermined. Where, SS = Cement Consumption Demand, SS = Cement Supply.

Further to find the equilibrium price level, the WPI equation can be written as follows:

WPI = f (WPI (-1), GDP, GDPt-1, CONTNt, PLRt, CAPt)

To test for the endogeneity and simultaneity with respect to WPI variable

First regress WPI on WPI(-1), GDP, GDPt-1, CONTNt, PLRt, c



Click Genr to the workfile window and type res=resid. Alternatively, type genr

res=resid in the command window. This is done to carry out the Hausman Test by

an auxiliary regression.

Next, click Quick/Estimate Equation, to estimate the following equation:

DD=f(WPI, GDPt-1, CONTNt, PLRt, res)



RES in the regression should not be significantly different from zero under the

null hypothesis that WPI is exogenous. Observe that the p-value for res is

significantly different from zero at 5% level of significance. It means WPI is

endogenous at 5% level of significance. If the null hypothesis is rejected then

the OLS estimates of the Cement Demand gives bias and inconsistent results.

o How to do 2SLS?

Regress WPI on all the exogenous variables in the system. This has been

done above in estimating the first regression

Next obtain the fitted values from this regression. For this type the

following in the command window

GENR WPIHAT=WPI-RES

Estimate the following equation:

DD C WPIHAT GDPt-1, CONTNt, PLRt,

The results of the estimation are given as:


Compare these estimation results with that when the Hausman test is

carried out by auxiliary regression. Observe, the standard errors from this

regression are not correct.

o To Obtain the correct standard errors of the 2SLS estimates,

Click Estimate in the equation window.

In the estimation settings, give the method as 2SLS.

Specify all the exogenous variables in the system including the constant in

the instrument list.

In the present case these are

C GDPt-1, CONTNt, PLRt ,CAPt The following window will appear


Click OK and get the following results:

Here we get the correctly calculated standard error as final result. From here we can

do the static recursive forecasting as we have done for the single equation model and

calculate the 95% confidence band etc.


(C)Univariate Methods

(i)Decomposition

o Modelling and Forecasting with the Classical Decomposition(Multiplicative)

Method

********************************************************************

To open the series, Double Click on the “cons”

Next, click “proc”, then “seasonal adjustment”, and “OK”. Then, select ratio to

moving average-Multiplicative”. Write “cons” for the adjusted series

(deseasonalised series) and then click “ok”.


The output result will look like

Since the sum of the „scaling factors‟ is 4.0079, adjust it to 4, i.e., equal to the

number of seasons in a year. For this write

scalar c1a = (1.047819/4.0079)*4

scalar c2a = (0.921718/4.0079)*4

scalar c3a = (0.961611/4.0079)*4

scalar c4a = (1.076753/4.0079)*4

in this command window. This will give you the „adjusted scaling factors‟, where

c1a =1.045754, c2a = 0.919901, c3a = 0.959716, c4a = 1.074631

To get the seasonal indexes, first generate a dummy seasonal factor for each

season in the following way:


To Calculate seasonal index, click on “genr”

The seasonalindex series will look like


To deseasonalize the time series by dividing it by the adjusted seasonal indexes,

write “consd = cons/seasonalindex

Estimate the trend cyclical regression equation using the deseasonalised data

(consd). Before running the regression, we need to generate a trend variable by

the commands: choose “GENR”, type trend=@trend(1992.02) in the dialogue

box. To get the value of trend equals 1 for quarter 2 of year 1992, generate

another series by writing trend1 = trend +1 in the dialogue box.

mailto:trend=@trend(1992.02)


Fitted Trend (= a + bt) can be calculated by writing the following in the dialogue

box “fitted_trend = Consd-resid”

Multiply the fitted trend values by their appropriate seasonal factors to

compute the fitted value, that is, write “Fittedvalue =

fitted_trend*seasonalindex” in the generate window.


Write “residual = cons-fittedvalue” in the generate window to calculate errors

and to measure the accuracy of fit.


To measure the accuracy of fit, i.e. Root Mean Squared Error, double click

“residual”, then click “view” and select “descriptive statistics”, “Histogram

and stats”. RMSE will be standard deviation of the residual series.

To get Adjusted R-square, first get the standard deviation of the “cons” series

by following the similar steps given for residuals. The value of the standard

deviation will be 5.403. Then write


“Scalar RBAR2 = 1-(0.879528/5.402913)^2”

In the command window, Adjusted R-square is equal to 0.97350015, which is

quite high. Of the original variance of the “cons” series = (5.402913)2, more than

97% has been removed by decomposing it into seasonal and trend components.

The RMSE is 0.879528, which shows this is very good fit.


Tests of Stationarity and Cointegration

o How to carry unit root tests

o How to check for Cointegration using Johansen Methods Procedures

************************************************************************

To carry out the ADF(Augmented Dickey fuller) and PP (Phillips Perron) test

consider three different regression equations:

p

i

tititt ytyy2

1210 (1)

p

i

tititt yyy2

110 (2)

p

i

tititt yyy2

11 (3)

For sample size of 100 the complete set of test statistics are as follows:

Summary of the Dickey-Fuller Tests

Model

Hypothesis

Test Statistics Critical Values for

95% and 99%

confidence

Intervals

Model (1)

0 -3.45 and -4.40

02 3 6.49 and 8.73

020 2 4.88 and 6.50

Model (2) 0 -2.89 and -3.51

00 1 4.71 and 6.70

Model (3) 0 -1.95 and -2.60

To carry out the unit root tests (Augmented Dickey-Fuller Tests) for cement

consumption demand cons, doubles click on cons highlight right click/Open/Unit

root. We will start from a more general model i.e Model(1) and thus include a

constant and trend in the ADF test equation with the optimally chosen lag values :


The ADF Test statistics reported at the top of the window is the t-statistic of cons (-

4) in the test regression. The t-statistic reported under the null of unit root does not

have a normal distribution but is simulated critical value. The unit root test is one-

tailed test with the null hypothesis of a unit root against the alternative stationary

process (i.e. a root less than unity). We see that for the cons, we cannot reject the

null of unit root hypothesis even at 10% level of significance.

The same exercise you have to carry for Model (2) (i.e. with intercept but without

trend) and Model (3) (i.e. without intercept without trend) till you reject the null of

unit root. If you are not able to reject the null till the Model (3), then you have a

non-stationary series, but if you reject the null of unit root at Model(1) then you

stop there and declare the series non-stationary and so on…..


To test for the joint significance of unit root and the trend, we carry out a random

walk test. This test is more stringent than unit root test.

Run the test equation by Quick/Estimate Equation and specify the test

equation

The result which is similar to ADF test equation is

To test for the joint significance of @trend (1992:01) and cons (-1), Click

View/Coefficient Tests/Redundant Variables and type in the variables under

the test in the dialog box.


The result is

The F-statistics ( 3 ) under the null of random walk does not follow the

standard F-distribution and the P-value reported is not applicable. From the

tables above the 5% critical value reported is 6.49 (from table above) and the

estimated F-statistic is 4.91, so we cannot reject the null hypothesis of

presence of stochastic trend (unit root) and thus conclude that the series is

non-stationary.

Similarly, to sequentially test (ADF) we proceed to carry out , 1 , :

Double click on cons and View/Unit Root Test. Choose the option intercept

(i.e. Model (2)). The default lag length chosen by Eviews for this series is 4.

The following result thus can be obtained:


We see that the null of unit root the absence of deterministic trend cannot be

rejected even at 10% critical value, where the 10% critical value is -3.51 (for a

sample size of 100)

And so on…….!!!

Similarly you can do PP perron test taking the same critical values as in ADF case.

Some more unit root test options available in Eviews are: KPSS (Kwiatkowsky-

phillips-Schmidt-Shin), DFGLS, ERSPO (Eliot-Rothenberg-Stock Point-Optimal)

etc.

Repeat the same unit root test process for other variables considered, such as gdp,

plr, wpi. If all the variables considered are non-stationary at level, then we expect

that series will be stationary at 1st difference and the order of integration will be one

and hence we can carry to do cointegration test to get the long rum relationship

among all the variables. If some of the series are stationary at the level then that

variable will be introduced as exogenous variable.

o VAR Model:

Highlight the variables cons, gdp, plr, wpi Open/as VAR the dialog box will

appear as follows:


To choose the lag interval for endogenous variables we take the maximum lag and

observe the values for AIC and Schwarz SC. The lag values with minimum AIC

and Schwarz SC give the optimal lag interval. However we can select the lag

interval as 1-4 2-4 3-6 depending on your requirements.

After selecting the lag values and correctly specifying the variables Click OK ,

results will appear as follows:


Diagnostics Views:

Once you estimated the VAR equation, a set of Diagnostics views are provided

under the menu View/Lag Structure and View/Residual Tests in the VAR window

In the VAR results window Click on View/Lag Structure/AR roots table,

following result appears:


Reports the inverse roots of the characteristic AR polynomial. The estimated

VAR is stable (stationary) if all roots have modulus less than one and lie inside the

unit circle. In our case one root is outside the unit circle, so VAR is not stable. If the

VAR is not stable, certain results (such as impulse response standard errors) are not

valid. There will be kp roots, where k is the number of endogenous variables and p

is the largest lag. If you estimate a VEC with r cointegrating relations, k-r roots

should be equal to unity.

To carry pair wise Granger Causality Test and tests whether endogenous variables

can be treated as exogenous look at the Wald F-statistics in the result window for

each equation.

OR, One can also try to do this test by highlighting the variables cons, gdp, plr,

wpi, Right Click/Open as Group/Granger Causality, specify the same lag value as

in the VAR. and observe the result whether endogenous variable can be treated as

exogenous

To see serial correlation, normality and heteroskedasticity click View/Residual

Tests, and observe the result whether all these problems available or not. If serial

correlation and heteroskedasticity is available then remove by re-checking the lag

values and estimating by GLS/WLS respectively.

The representation of the VAR results can be done by View/Representations,

following result appears:


Impulse Response Graph:

Click on Impulse following dialog box appears click on required options

Click on the impulse definitions and choose the appropriate options in the

dialog box shown below:


Generalized Impulses as described by Pesaran Shin (1998) construct a set of

innovations that does not depend on VAR ordering. However in the case of

Cholesky dof adjustment is a set of innovations that depend on VAR ordering. (See

page no.730-731, Eview5 Users Guide) for dof sdjustment and no adjustment

explanation.


While impulse response functions trace the effects of a shock to one endogenous

variable on to the other variables in the VAR, variance decomposition separates the

variation in an endogenous variable into the component shocks to the VAR. Thus, the

variance decomposition provides information about the relative importance of each

random innovation in affecting the variables in the VAR.

To obtain the variance decomposition, select View/Variance Decomposition... from

the VAR object toolbar. You should provide the same information as for impulse

responses above. Here also we get Generalized and Orthogonalized Variance

Decomposition. Cholesky give the orthogonalised variance decomposition where

ordering of the variable are important.

.

o Cointegration Test:

To carry out the Cointegration test Highlight the variables cons, gdp, plr wpi

Open as Group/View/Cointegration test. The following dialog box appears and

click on Summary option to get all the results together.


We get one cointegrating vector for the option “No Intercept No Trend” and two

cointegrating vector for “Intercept No Trend”. To check whether the summary

table giving the correct result, it can be confirmed it from individual option

cointegration test by clicking on the single option such as click on option “No

Intercept No Trend”, results appears as follows:


From the different options of Johansen Cointegration Test we can select our

appropriate cointegrating vector on the basis of expected sign. Suppose we select

option one then we can carry similar exercise for cointegration as in VAR.


o

o

(ii) ARIMA

o How to plot autocorrelation functions and to determine the presence of a unit root.

o How to determine the order of the ARIMA models using sample autocorrelation and

partial autocorrelation functions

o How to estimate ARIMA models and to use them for forecasting.

***********************************************************************

Set the sample size to 1992:02 to 2002:04

To plot graph highlight cons right click -View/Line Graph


Click View/Correlogram to see the correlogram in levels. (The correlogram is shown

only up to 20 lags). We see that the correlogram of cons

To plot and compute the autocorrelations of the first difference of cons generate the

first difference of the series by clicking on genr and name it dcons. Click on d (cons)

View/Line Graph. We see that mean of the series appears to be constant, although the

variance is unusually high during 1997-99.


To view the correlogram in first difference Open cons View/Correlogram/click on

option first difference

Check the size of ACF and PACF at various lag lengths. We see that the sample

autocorrelation function is much smaller in magnitude with lag 1, 3, 5, 7 shows

pattern of change in signs. It declines slowly and loosely consistent with stationary

series.

Similarly, plot the autocorrelations of the second difference. The results does not

appear to be qualitatively different from those for d(cons), thus indicating over

differencing, suggesting that the order of integration is d=1

Observing the autocorrelation function for d(cons). We see that it begins decaying

after k=1(value of -0.752), thus exhibiting moving properties that are second or third


order or more as ACF function giving significantly different autocorrelations. On the

other hand as partial correlations remain significantly different from zero for the few

lags (k) values. Hence little autoregressive term would be sufficient but require more

moving average terms. However we can start with ARIMA (2, 1, 2), ARIMA (3, 1, 2)

and so on.

To fit ARIMA model

Change the sample size 1992:02 to 2001:04

Click Quick/Estimate Equation and specify the model ( ARIMA(2,1, 2)) with

each term separately in the dialog box as

d(cons) c ar(1) ar(2) ma(1) ma(2) The result is:

For the diagnostic check, view the model fit by View/Actual, Fitted,

Residual/Graph


Check the residual autocorrelation function by View/Residual Tests/Correlagram-

Q-Statistics. For a correctly specified model the residuals should be white noise.

This implies that the autocorrelations and partial autocorrelations should be all

zero. To check this Eviews gives the Ljung-Box Q-Statistics which follows a Chi

Square Distribution, given by

)()2( 1

2

kT

r

TTQ

p

k

k


The Q-Statistic for the null hypothesis that there is no serial correlation up to

order 20 is 48.82 with a P-value of zero indicating serial correlation in error terms

and misspecification.

Try out ARIMA models of different lag length specifications like (2, 1, 4), (2, 1, 6),

(3, 1, 2), (3, 1, 4), (3, 1, 6), (4, 1, 4)….etc. After some experiment we opt ARIMA (4,

1, 4) model to generate ex post forecasts over various horizons. To estimate this

model, click Quick/Estimate Equation and type:

d(cons) c ar(1) ar(2) ar(3) ar(4) ma(1) ma(2) ma(3) ma(4)

Note that the 4th

order AR and MA terms are significant.

For the diagnostic check, view the model fit by View/Actual, Fitted, and

Residual/Graph.

Check the residual autocorrelation function by View/Residual

Estimates/Correlogram-Q-Statistics. The Q-Statistic for up to 20 lags is

approximately 13.55, which is smaller than that from the ARIMA (2, 1, 2) model.

As the figure shows, none of the autocorrelations and partial autocorrelations is

individually significant (except lag 7), nor is the sum of the 20 autocorrelations, as

shown by the Q-Statistic. In other words, the correlograms of both autocorrelations

and partial autocorrelation give the impression that the residuals are purely random.

Hence there is no need to look any other ARIMA model. So we use model ARIMA

(4, 1, 4) for forecasting purposes.

To generate ex post forecasts over horizons using ARIMA(4, 1, 4)

To obtain up to 4th

- quarter ahead forecast


Click Forecast in the ARIMA(4, 1, 4) window

Give the forecasts sample range 2002:01 to 2002:04

The forecast obtained (call it consf) is for cons and not for d(cons) and the

forecasting method used is DYNAMIC (it‟s multi-step forecast and the forecasted

values the dependent variable are used to determine the forecast for the 1st –

quarter.

The bias proportion 0.04 indicates that the forecasts consistently track the actual

series. This can be seen graphically by plotting cons and consf. Change the

sample size 2001:01 to 2002:04, highlight cons and consf Open as Group and

View/Graph line

The plot shows that the model over predicts in the 1st two quarters but under

predicts in the last two quarters for the forecasting range


Similarly we can forecast for the 2nd

, 3rd

and 4th

quarters ahead forecasts and get

the accuracy measures for each period to compare across the models.

(ii) Binary/Dummy Variable

o How to analyze the result of a variable having qualitative in nature such as gender,

educational status etc. (ANOVA)

o How to analyze the qualitative and quantitative variables together.( ANCOVA)

***********************************************************************

We have considered the following variables:

Variables Dummy=1 O otherwise

gender If male If female

Race_White If White Otherwise

Race_Black If Black Otherwise

Race_Asian If Asian Otherwise

Experience If Experience>=6 Otherwise

Experiecne_1 If Experience>=5 and Otherwise


Experience<=7

Experience_2 If Experience>=8 Otherwise

Age If Age>=25 Otherwise

Age_1 If Age>=25 and Age<=30 Otherwise

The objective of the study is to analyze relationship between earnings and

characteristics of employees. Although we can have many more characteristics but

here we use only years of experience, gender, race, age and the age squares. The

basic model is

Earnings = β1 + β2*gdum + β3*race + β4*age + β5*experience + εt

To see whether earning depends on experience. Highlight earning and experience

and Open as Group. Quick/Estimate Equation, following result appears:

Results show that the coefficient of the experience is significantly different from zero.

Hence earnings do depend on experience.

Similarly to see whether earnings depend on gender/race/both. Here we take female

as control group and after including the race dummy control group will be black

female and so on…… Highlight earning and gdum Open as Equation, following

result appears. Observing the result we see that male earning is not significantly

different from female.


In the first box above we see that male earning is not significantly different from the

female counterpart. Similarly even after including the race dummy individual

coefficients are not significantly different from the earning of black female in the

second result box.

handout trx

Documents