adrian giurca - time series rep. and pred. in economics

37
POLYTECHNIC UNIVERSITY OF BUCHAREST POLYTECHNIC UNIVERSITY OF BUCHAREST FACULTY OF ENGINEERING IN FOREIGN LANGUAGES FACULTY OF ENGINEERING IN FOREIGN LANGUAGES MASTER IN BUSINESS ADMINISTRATION MASTER IN BUSINESS ADMINISTRATION ENGLISH STREAM ENGLISH STREAM Time Series Representation Time Series Representation and Prediction in Economics and Prediction in Economics Professor Professor dr.ing. Rodica Tuduce dr.ing. Rodica Tuduce

Upload: aditzaster

Post on 15-Sep-2015

224 views

Category:

Documents


5 download

TRANSCRIPT

Time Series Representation and Prediction in Economics

Time Series Representation and Prediction in Economics MBAE FILS 2014

POLYTECHNIC UNIVERSITY OF BUCHAREST

FACULTY OF ENGINEERING IN FOREIGN LANGUAGES

MASTER IN BUSINESS ADMINISTRATION

ENGLISH STREAM

Time Series Representation and Prediction in EconomicsProfessordr.ing. Rodica Tuduce Student

Adrian GiurcaTable of contents:1. Time Series Analysis...3

1.1. Time Series Prediction3

2. Characteristic and representation of Time Series...5

3. The Autoregressive Moving Average Model....9

4. Simulating an ARMA Model in Matlab.10

5. Stock market prediction...16

5.1. Stock market price patterns.17

5.2. Stock Market Forecaster in Matlab ...18

6. Bibliography.211. Time Series Analysis

A time series is a collection of observations made sequentially in time. Examples are daily mortality counts, particulate air pollution measurements, and temperature data. Figure 1 shows these for the city of Chicago from 1987 to 1994. The public health question is whether daily mortality is associated with particle levels, controlling for temperature.

Mortality

Pollution

Temperature

178

157

136

115

94

73

183

145

107

69

31

-7

89

68

47

26

5

-16

1987 1988 1989 1990 1991 1992 1993 1994

Figure 1

We represent time series measurements with Y1 , . . . , YT where T is the total num- ber of measurements. In order to analyze a time series, it is useful to set down a statistical model in the form of a stochastic process. A stochastic process can be described as a statistical phenomenon that evolves in time. While most statistical problems are concerned with estimating properties of a population from a sample, in time series analysis there is a different situation. Although it might be possible to vary the length of the observed sample, it is usually impossible to make multiple observations at any single time (for example, one cant observe todays mortality count more than once). This makes the conventional statistical procedures, based on large sample estimates, inappropriate. Stationarity is a convenient assumption that permits us to describe the statistical properties of a time series.

1.1. Time Series PredictionA time series prediction is a sequence of real-valued signals that are measured at successive time intervals. Autoregressive (AR), moving average (MA), and autoregressive moving average (ARMA) models are often used for the purpose of time-series modeling, analysis and prediction. These models have been successfully used in a wide range of applications such as speech analysis, noise cancelation, and stock market analysis (Hamilton (1994); Box et al. (1994); Shumway and Stoffer (2005); Brockwell and Davis (2009)). Roughly speaking, they are based on the assumption that each new signal is a noisy linear combination of the last few signals and independent noise terms.

A great deal of work has been done on parameter identification and signal prediction using these models, mainly in the proper learning setting, in which the fitted model tries to mimic the assumed underlying model. Most of this work relied on strong assumptions regarding the noise terms, such as independence and identical Gaussian distribution. These assumptions are quite strict in general and the following statement from Thomson (1994) is sometimes quoted:

Experience with real-world data, however, soon convinces one that both stationarity and Gaussianity are fairy tales invented for the amusement of undergraduates.

2. Characteristic and representation of time series

A time series is a time-ordered sequence of observations (realizations) of a variable. Time series analysis uses only the time series history of the variable being forecasted in order to develop a model for predicting future values.

Usually the observations are taken at regular intervals (days, months, years), but the sampling could be irregular. Examples of time series include the air temperature in meteorological science, blood pressure in biomedical science, vibration in mechanical engineering or civil engineering and others.

Time series analysis uses a collection of systematic approaches to extract information about the characteristics of a physical system that generates time series. Approaches to time series analysis include estimating statistical parameters, building dynamic models, performing correlations and others.

A time series analysis consists of two steps:

1. Building a model that represents a time series.

2. Using the model to predict (forecast) future values.

Time series are analyzed in order to understand the underlying structure and function that produce the observations. Understanding the mechanisms of a time series allows a mathematical model to be developed that explains the data in such a way that prediction, monitoring or control can occur. Examples include prediction/forecasting, which is widely used in economics and business. Monitoring of ambient conditions, or of an input or an output, is common in science and industry. Quality control is used in computer science, communications and industry.

There are numerous software programs that will analyze time series, such as SAS, SPSS, STATISTICA, STATGRAPHICS and others. For those, who want to learn or are comfortable with coding, MATLAB, S-PLUS, and R are other software packages that can perform time series analyses. EXCEL can be used if linear regression analysis is all that is required (that is, if all you want to find out is the magnitude of the most obvious trend). A word of caution about using multiple regression techniques with time series data: because of the autocorrelation nature of time series, time series violate the assumption of independence of errors.

As an alternative to traditional statistical forecasting methods have been advocated neural networks. Across monthly and quarterly time series, the neural networks did significantly better than traditional methods. As suggested by theory, the neural networks were particularly effective for discontinuous time series.

Neural networks have been widely used as time series forecasters: most often these are feed-forward networks which employ a sliding window over the input sequence. Typical examples of this approach are market predictions, meteorological and network traffic forecasting. Two important issues must be addressed in such systems: the frequency with which data should be sampled, and the number of data points which should be used in the input representation. In most applications these issues are settled empirically, but results from work in complex dynamic systems suggest helpful heuristics.

A time series is a sequence of signals, measured at successive times, which are assumed to be spaced at uniform intervals. We denote by Xt the signal measured at time t, and by t the noise term at time t. The AR(k) (short for autoregressive) model, parameterized by a horizon k and a coefficient vector R k , assumes that the time series is generated according to the following model, where t is a zero-mean random noise term:

Xt = X k i=1 iXti + t .

In words, the model assumes that each Xt is a noisy linear combination of the previous k signals. A more sophisticated model is the ARMA(k, q) (short for autoregressive moving average) model, which is parameterized by two horizon terms k, q and coefficient vectors R k and R q . This model assumes that Xt is generated via the formula:

Xt = X k i=1 iXti + X q i=1 i ti + t,

where again t are zero-mean noise terms. Sometimes, an additional constant bias term is added to the equation (to indicate constant drift), but we will ignore this for simplicity. Notice that this does not increase the complexity of the problem, since we can simply raise the dimension of the vector by one and assign the value 1 to corresponding signal. Note that the AR(k) model is a special case of the ARMA(k, q) model, where the i coefficients are all zero.Many of the statistical techniques used in time series analysis are those of regression analysis (classical least squares theory) or are adaptations or analogues of them. Many of the forecasting techniques utilize regression methods for parameter estimation.

Econometrics and Time Series Models

Econometrics models are sets of simultaneous regressions models with applications to areas such as Industrial Economics, Agricultural Economics, and Corporate Strategy and Regulation. Time Series Models require a large number of observations (say over 50). Both models are used successfully for business applications ranging from micro to macro studies, including finance and endogenous growth. Other modeling approaches include structural and classical modeling such as Box-Jenkins approaches, co-integration analysis and general micro econometrics in probabilistic models; e.g., Logit, and Probit, panel data and cross sections. Econometrics is mostly studying the issue of causality; i.e. the issue of identifying a causal relation between an outcome and a set of factors that may have determined this outcome. In particular, it makes this concept operational in time series, and exogenetic modeling.

3. The Autoregressive Moving Average Model

One of the most frequently used and precise analysis and short-term forecast techniques is known as the Box-Jenkins method, based on the concept of ARIMA (Integrated Mixed Autoregressive Moving Average Series) process. The ARMA(p,q) model is suitable for modeling stationary processes. A stationary process features a process generation mechanism that is invariant in time. The average and the variance of a stationary process do not change in time, and the covariance of the variables depends only on the length of the time interval that separates the two variables. The trend & seasonal components do not occur in stationary series. The non-stationary ARIMA(p,d,q) models are specific to the non-seasonal phenomena whose trend can be eliminated, and thus the process can be made stationary, by finite differences of a certain order d. The stages of building an ARIMA model are: the identification of the model, the assessment of the parameters, the check-up of the model selected, the utilization of the model for forecast.

The linear models are not always sufficient to satisfactorily model real world phenomena. There is an increasing interest in developing, including for modeling time series. Some of the interesting features that the nonlinear models may offer are: (1) the prediction interval does not increase in time, and (2) the distribution of forecast errors is not a normal one. Several non-linear models are frequently used: statistical models of nonlinear structure (non-linear autoregressive processes, threshold models, bilinear models), models for changing variance, and neural networks.

Autoregressive Model

Many observed time series exhibit serial autocorrelation; that is, linear association between lagged observations. This suggests past observations might predict current observations. The autoregressive (AR) process models the conditional mean ofytas a function of past observations,yt1,yt2,,ytp. An AR process that depends onppast observations is called an AR model of degreep, denoted by AR(p).

The form of the AR(p) model in Econometrics Toolbox is

yt=c+1yt1++pytp+t,wheretis an uncorrelated innovation process with mean zero.

In lag operator polynomial notation,Liyt=yti. Define the degreepAR lag operator polynomial(L)=(11LpLp). You can write the AR(p) model as:

(L)yt=c+t.Moving Average Model

MA(q) model

The moving average (MA) model captures serial autocorrelation in a time seriesytby expressing the conditional mean ofytas a function of past innovations,t1,t2,,tq. An MAmodel that depends onqpast innovations is called an MAmodelof degreeq, denoted by MA(q).

The form of the MA(q)modelin Econometrics Toolbox is yt=c+t+1t1++qtq,wheretis an uncorrelated innovation process with mean zero. For an MA process, the unconditional mean ofytis=c.

In lag operator polynomial notation,Liyt=yti. Define the degreeqMA lag operator polynomial(L)=(1+1L++qLq).You can write the MA(q)model as yt=+(L)t.

Invertibility of the MA Model

By Wold's decomposition, an MA(q) process is always stationary because(L)is a finite-degree polynomial.

For a given process, however, there is no unique MA polynomialthere is always anoninvertibleandinvertiblesolution. For uniqueness, it is conventional to impose invertibility constraints on the MA polynomial. Practically speaking, choosing the invertible solution implies the process iscausal. An invertible MA process can be expressed as an infinite-degree AR process, meaning only past events (not future events) predict current events. The MA operator polynomial(L)is invertible if all its roots lie outside the unit circle.

Autoregressive Moving Average Model

For some observed time series, a very high-order AR or MAmodel is needed tomodel the underlying process well. In this case, a combined autoregressive moving average(ARMA)model can sometimes be a more parsimonious choice.

An ARMAmodel expresses the conditional mean ofytas a function of both past observations,yt1,,ytp, and past innovations,t1,,tq.The number of past observations thatytdepends on,p, is the AR degree. The number of past innovations thatytdepends on,q, is the MA degree. In general, these models are denoted by ARMA(p,q).

The form of the ARMA(p,q) model in Econometrics Toolbox is: yt=c+1yt1++pytp+t+1t1++qtq,wheretis an uncorrelated innovation process with mean zero.

In lag operator polynomial notation,Liyt=yti. Define the degreepAR lag operator polynomial(L)=(11LpLp). Define the degreeqMA lag operator polynomial(L)=(1+1L++qLq). You can write the ARMA(p,q)model as (L)yt=c+(L)t.Stationarity and Invertibility of the ARMA Model

Consider the ARMA(p,q) modelin lag operator notation, (L)yt=c+(L)t.

From this expression, you can see that yt=+(L)(L)t=+(L)t, where

is the unconditional mean of the process, and(L)is a rational, infinite-degree lag operator polynomial,(1+1L+2L2+).Fitting models

ARMA models in general can, after choosing p and q, be fitted byleast squaresregression to find the values of the parameters which minimize the error term. It is generally considered good practice to find the smallest values of p and q which provide an acceptable fit to the data. For a pure AR model the Yule-Walker equations may be used to provide a fit.Aplications

ARMA is appropiate when a system is a function of series of unobserved shocks (the MA part) as well as its own behavior. For example, stocks prices may be shocked by fundamental information as well as exhibiting technical trending and mean-reversion effects due to market participants.4. Simulating an ARMA Model in Matlab

a(L)yt=b(L)et1 clear;

2 a=[10.5]; % AR coeffs

3 b=[10.40.3]; % Ma coeffs

4 T=1000;

5 e=randn(T,1); % generate Gaussian white noise

6 y=filter(b,a,e); % generate y

The filter function can be used to generate data from an ARMA model, or apply a filter to a series.

Impulse-Response

To graph the impulse response of an ARMA, use fvtool

1 % create an impulse response

2 fvtool(b,a,'impulse');

Sample Covariances

1 [c lags]=xcov(y,'biased');

2 figure;

3 plot(lags,c);

4 title('Sample Covariances');

The option, 'biased', says to use ; unbiased would use

Spectral Analysis

Population spectral density for an ARMA:

1 % population density2 w = 0:0.1:pi;

3 h = freqz(b,a,w); % returns frequency response = b(e^{iw})/a(e^{iw})

4 sd = abs(h).^2./sqrt(2*pi); % make into density

Estimating the Spectral Density

Parametric methods These estimate an AR(p) and use it to compute the spectrum.

1 [sdc wc]=pcov(y,8); % estimate spectral density by fitting AR(8)

2

3 [sdy wy]=pyulear(y,8); % estimate spectral denstity by fitting AR(8)

4

% using

5

% the Yule-walker equations

Non-parametric methods

Definition1. The sample periodogram of is

Remark2. The sample periodogram is equal to the Fourier transform of the sample autoco-variances

1 [sdp wp]=periodogram(y,[],onesided); % estimate using sample

Denition3.A smoothed periodogram estimate of the spectral density is

where hT() is some kernel weighting function.A smoothed periodogram is a weighting moving average of the sample periodogram. The following code estimates a smoothed periodogram using a Parzen kernel with band-width .

1 rT=round(sqrt(T));

2 [sdw ww]=pwelch(y,parzenwin(rT),rT-1,[],onesided); % smoothed periodogram

Definition4. A weighted covariance estimate of the spectrum is:

EMBED Equation.3 where gT(k) is some kernel.1 % Bartlett weighted covariances

2 wb=0:0.1:pi;

3 rT=sqrt(T);

4 [c t]=xcov(y,blased)

5 weight=1-abs(t)/rT;

6 weight(abs(t)>rT)=0;

7 for j=2:length9(wb)

8 sdb(j)=sum(c.*weight.*exp(-i*wb(j)*t));

9 end

10 sdb=sdb/sqrt(2*pi);

Simulating an ARMA and estimating the spectrum1 clear;

2 close all; % closes all open figure windows

3

4 % model: y_t=0.9 y_t{t-1}+b(L) e_t

5 a=[1 -0.7]; % AR coeffs

6 b=[1 0.3 2]; % MA coeffs

7 T=200;

8 e=randn(T,1); % generate Gaussian white noise

9 y=filter(b,a,e); % generate y

10

11 % plot y

12 figure;

13 plot(y);

14 xlabel(t);

15 ylabel(y);

16 title(ARMA(1,2));

17

18 % create an impulse response

19 fvtool(b,a, impulse);

20

21 % calculate and plot sample auto-covariances

22 [c lags]=xcov(y,biased);

23 figure;

24 plot(lags,c);

25 title(Sample Covariances);

26

27 % estimate spectral density

28

29 % parametric

30 [sdc wc]=pcov(y,8); % estimate spectral density by fitting AR(8)

31 [sdy wy]=pyulear(y,8); % estimate spectral density by fitting AR(8)

32

% using

33

% the Yule-walker equations

34

35 % nonparametric

36 [sdp wp]=periodogram(y,[],onesided); % estimate using unsmoothed

37

% periodogram

38 rT=round(sqrt(T))*3;

39 [sdp ww]=pwelch(y,parzenwin(rT),rT-1,[],onesided); % smoothed

40

% periodogram4142 % Bartlett weighted covariances43 [c lags]=xcov(y,biased);

44 t=-(T-1):(T-1);

45 weight=1-abs(t)/rT;

46 weight(abs(t)>rT)=0;

47 wb=ww;

48 for j=1:length(wb)

49sdb(j)=sum(c.*weight.*exp(-i*wb(j)*(-(T-1):(T-1))));

50 end

51 sdb=sdb/sqrt(2*pi);

52

53 % population density

54 w=wb;

55 h=freqz(b,a,w);

56 sd=abs(h).^2./sqrt(2*pi);

57

58 figure;

59 plot(wp,sdp,wc,sdc,ww,sdw,wb,sdb,wy,sdy,w,sd);

60 legend(raw periodogram,parametric AR,smoothed periodogram,

61 bartlett weighed cov,Yule-Walker,population density);

5. Stock market prediction

The development of the stock market is a necessity both for the assurance of a continuous economic growth and for the efficient allocation of resources in economy. The most important sector of the capital market in Romania, the stock exchange market, is defined and regulated by the LawNo. 52 (7th of July 1994). A major role in the stock market development is played by the legislation, mainly by those particular laws that concern the transparency, the protection and the equal treatment of the investors. It maintains the confidence of the investors in the stock market and persuades them to invest in it.

The stock quotations represent the prices for deals at the stock exchange market. They reflect the relationship between the demand and the supply on that market. The main factors that influence the stock quotations are:

(1) the economic status of the remittent,

(2) the investors expectations concerning the profitableness and the dividends,

(3) the evolution of the national and international stock market, and the specific particularities of the stock market activity that affect the demand and the supply,

(4) political, military, cultural factors, and others.

The random walk hypothesis

The random walk process is frequently used as a model for the stock market quotations. For this particular model all the predictions are equal to the last observed value, and the confidence intervals are higher as the forecast horizon is expanding. A particular version of this process, the random walk with tendency, takes into consideration the existence of a tendency and enables one to include that tendency within the prediction.

Forecasting and stock market analysis methods

Depending on howthe shares are evaluated, the stock market analysts can be classified as: (1) fundamentalists, they consider only the fundamental factors of the market, and (2) chartists, they use graphic analysis techniques.

The fundamental analysis is based on the study of the economy, field and society state, with a view to determining the value of the share of a specific company. Fundamental analysis monitors the profits of the company and the dividends that the company offers, takes into account the expectations about interest rates, and evaluates the risk associated with the company. It uses statistical, mathematical and financial algorithms, applied to the official periodic financial statements of the company, in order to evaluate, as correctly as possible the shares price.

The technical analysis is based exclusively on the study of the internal data of the stock market, considering that all the economic, financial, political and psychological factors are incorporated into a single element: the share quotation. The technical analysts study the short-term changes of the shares price, starting with a study of the history of the quotations, within an interval of at least 6 months, and assume that the past behavior will extend into the future. The technical analysis offers information about the possible future evolution of the stock market

Stock market prediction is the act of trying to determine the future value of a company stock or other financial instrument traded on a financial exchange. The successful prediction of a stock's future price could yield significant profit. Some believe that stock price movements are governed by the random walk hypothesis and thus are unpredictable. Others disagree and those with this viewpoint possess a myriad of methods and technologies which purportedly allow them to gain future price information.Technological methodsTechnological Forecasting, in general, applies to all purposeful and systematic attempts to anticipate and understand the potential direction, rate, characteristics, and effects of technological change, especially invention, innovation, adoption, and use. One possible analogy for TF is weather forecasting: Though imperfect, TF enables better plans and decisions. A good forecast can help maximize gain and minimize loss from future conditions. Additionally, TF is no more avoidable than is weather forecasting. All people implicitly forecast the weather by their choice of whether to wear a raincoat, carry an umbrella, and so on. Any individual, organization, or nation that can be affected by technological change inevitably engages in forecasting technology with every decision that allocates resources to particular purposes.

5.1. Stock market price patterns

Forecasting the direction of future stock prices is a widely studied topic in many fields including trading, finance, statistics and computer science. The motivation for which is naturally to predict the direction of future prices such that stocks can be bought and sold at profitable positions. Professional traders typically use fundamental and/or technical analysis to analyze stocks and make investment decisions. Fundamental analysis is the traditional approach involving a study of company fundamentals such as revenues and expenses, market position, annual growth rates, and so on (Murphy, 1999). Technical analysis, on the other hand, is solely based on the study of historical price fluctuations. Practitioners of technical analysis study price charts for price patterns and use price data in different calculations to forecast future price movements (Turner, 2007). The technical analysis paradigm is thus that there is an inherent correlation between price and company that can be used to determine when to enter and exit the market . In finance, statistics and computer science, most traditional models of stock price prediction use statistical models and/or neural network models derived from price data (Park and Irwin, 2007). Moreover, the dominant strategy in computer science seems to be using evolutionary algorithms, neural networks, or a combination of the two (evolving neural networks). The approach taken in this thesis differ from the traditional approach in that we use a knowledge-intensive first layer of reasoning based on technical analysis before applying a second layer of reasoning based on machine learning. The first layer of reasoning thus performs a coarse-grained analysis of the price data that is subsequently forwarded to the second layer of reasoning for further analysis. We hypothesis that this knowledge-intensive coarse-grained analysis will aid the reasoning process in the second layer as the second layer can then focus on the quintessentially important aspects of the price data rather than the raw price data itself.There are about as many chart (or price) patterns as there are stock market analysts, and there are a lot of stock market analysts. Different patterns are: trend lines, support/resistance, fan lines, channel lines, retracement, speed resistance, gaps, reversal patterns, head and shoulder patterns, double tops/bottoms, triple tops/bottoms, saucers or rounding patterns, cup and handle, V-formations, triangles, diamonds, flags and pennants, wedge formation and trading ranges. Candlestick charts have their own series of price patterns such as hammers, doji, stars, dragonfly doji, spinning tops, and others. The most widely used charting methods are bar charts and candlestick charts; some traders also use point and figure charts.

Example of a price pattern:

The Oslo Benchmark Index from 01-01-2005 to 01-04-2010. Notice the sharp drop from mid-2008 to early 2009 resulting from the financial crisis.5.2. Stock Market Forecaster in Matlab

Stock Market Forecaster is a very useful financial advisor, an application written in Matlab language (M-files and/or M-functions), Matlab Financial Toolbox is required in order to function. And the demo version of this application tested in this project has only a time delay limitation (artificial delays have been inserted to protected code, each pattern requires about 25 minutes, in the full version, the source code is extremely fast and highly optimized: in some fractions of second data are processed and sell/buy signals are ready to be used).Some key elements of this Matlab application are:

High profit price patterns

Fully-customizable code

Easy and intuitive implementation

About SMF Tool

We have developed an efficient tool for intraday stock market forecasting based on Neural Networks and Wavelet Decomposition. This software has been tested on real data obtaining excellent results. SMF Tool gives Buy/Sell signals with a high degree of accuracy.

How SMF works

SMF accepts, as input, a sequence of given length N. The system can determine if at least one of future prices - within an observation window of fixed length M - will be higher or lower than current price.

s1 s2 sN p1 p2 pMCurrent priceElement of input sequence

Element of predicted sequence

A buy signal is given if at least one of red circles is greater than current price. A sell signal is given if at least one of red circles is lower than current price. Formally a buy signal is verified if (p1>sN) OR (p2>sN) OR ... OR (pM>sN), a sell signal is verified if (p1