econometrics: advance

28
Giulio Laudani #28 Cod. 20192 Econometrics II FAVERO PARTS: 2 A model to describe the return: 2 ARMA model: 2 Volatility model: 3 Classic computation: 4 ARCH and G-ARCH estimators: 4 How to improve estimates: 5 Modeling long-run relationships in finance: 5 Differencing procedure: 6 Cointegration: 6 GUIDOLIN PARTS: 8 Econometric analysis: 8 Types of data: 8 Steps involved in formulating an econometric model: 8 OLS model: 8 Multivariate model: 11 Simultaneous Equation model: 11 VAR model: 11 Studying the non-normality of distribution: 11 Method to detect non-normality: 11 How to fix non-normality issues: 12 Multivariate models: 13 Exposure Mapping models: 13 Conditional Covariance model: 13 DCC model: 13 VEC and BEKK model: 14 Principal component approach: 14 Switching/Regime models: 16 Threshold model: 16 Smoot transaction: 16 Markov switching model: 16 Simulation methods: 17 Historical simulation: 17 Monte Carlo Simulation: 17 Filtered Simulation (Bootstrapping): 18 1

Upload: giulio-laudani

Post on 22-Nov-2014

407 views

Category:

Economy & Finance


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

Econometrics II

FAVERO PARTS: 2

A model to describe the return: 2

ARMA model: 2

Volatility model: 3Classic computation: 4ARCH and G-ARCH estimators: 4

How to improve estimates: 5

Modeling long-run relationships in finance: 5Differencing procedure: 6Cointegration: 6

GUIDOLIN PARTS: 8

Econometric analysis: 8Types of data: 8Steps involved in formulating an econometric model: 8OLS model: 8

Multivariate model: 11Simultaneous Equation model: 11VAR model: 11

Studying the non-normality of distribution: 11Method to detect non-normality: 11How to fix non-normality issues: 12

Multivariate models: 13Exposure Mapping models: 13Conditional Covariance model: 13DCC model: 13VEC and BEKK model: 14Principal component approach: 14

Switching/Regime models: 16Threshold model: 16Smoot transaction: 16Markov switching model: 16

Simulation methods: 17Historical simulation: 17Monte Carlo Simulation: 17Filtered Simulation (Bootstrapping): 18

1

Page 2: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

Favero Parts:

A model to describe the return:

We are going to use as an explanatory model of return evolution the one that is assuming the presence of two compo-nent: the permanent information and the temporary noisy one. The second one prevails in the short term high frequency observation, while the first emerges for longer horizon. This property implies:

In general, predictability of returns increases with the horizon. The best estimate for the mean of returns at high frequency is zero, but a slowly evolving time varying mean of returns at long-horizons could and should be mod-eled. [there is a strong evidence of correlation between industrialized countries stocks]

If we run a regression we are expecting high statistical significance for the parameters for log horizon The presence of the noise component in returns causes volatility to be time-varying and persistent, and the annu-

alized volatility of returns decrease with horizon Data shows non normality behavior, unconditional distribution with higher tails Non-linearity is also a feature of returns at high frequency: a natural approach to capture nonlinearity is to differ-

entiate alternative regimes of the world that govern alternative description of dynamics (such as level of volatility in the market), for example Markov chain.

The model is describe by the following formula obtained starting from:

Rt+1=Pt+1+Dt+1

Pt

−1dividing both sidesby ( 1+R t+1 )∧multiplyingboth sidesbyPt

Dt

we have :P t

D t

=(1+R t+1 )∗Dt+1

Dt

∗(1+Pt+1

Dt+1) taking logswehave :

The consequences of this model are: The model implies the possibility that long-run returns are predictable. So forecasting models for the stock mar-

ket return should perform better the longer the forecasting horizon. Moreover, the importance of noise in deter-mining returns disappears with the horizon.

Return in the short term are not predictable, while volatility is predictable The forecasting performance for stock market returns depends crucially on the forecasting performance for divi-

dend growth. Note that in the case in which the dividend yield predicts expected dividend growth perfectly the proposition that returns are not predictable holds in the data1.

Since the model is assuming a linearization we need to test that the dividend yield fluctuates around a constant mean around the model is effectively modeled

We can use this model to compute the Sharpe ratio (r¿¿ p−rf )/σ p¿ Given two equation one for modeling the independent variable used in the second one, we need the first to

compute the unconditional mean of the second dependent variable

ARMA model:

Modeling time series is a way to predict financial variable by using past and current value of error terms, hence it is an a-theoretical model, meaning we do not want to understand why something is happening, but we are simply empirically at-tempt to describe the behavior of those variables. On the other hand the same goal can be achieved by structural model which try to model depend variable by looking for explainatoring variables, meaning the aims to understand why and how something is happening.

1 However, the empirical evidence available tells us that the dividend yield does not predict dividend growth. If other variables than the dividend yield are predictors of divi-dend growth, then the combination of these variables with the dividend yield delivers the best predicting model for the stock market

2

Page 3: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

The most general specification is the ARIMA which is created by taking linear combination of errors, and it is used to run one or multi step forecast both with a recursive or rolling window procedure. The first one consist of adding the new ob-servation to the whale sample, while the second one is based on the idea of leaving the sample fixed and moving forward for any new observation.To check if the forecast is accurate or not there are several fit measures. The two simplest one are the mean squared er-ror (MSE) and the mean absolute error (MAE), the first will penalize more large deviation than small one, while the second will penalize high and small error at same level. A more sophisticated one is the mean absolute percentage error (MAPE) which give a % of the error and is bounded to be positive, however cannot be used for variable which will change in sign. The last one is the U-test which use a benchmark model to compare the used one.Belonging to same family of error checking, but with a different scope, are the economic loss function, which are focus not on the punctual estimation, but on the ability of the model to predict the sign and the turning point. This kind of infor-mation is really useful and in the end more profitable strategy.

Within the ARMA class, the simplest one is the MA(k) moving average which describe a depend variable by using a combi-nation of current and lagged white noise. The features of this model are: after the k lagged considered the model collapse to the intercept which is even the mean of the dependent variable, and it has constant variance with auto correlation dif-ferent from zero only for the lagged term.

The AR(q) autoregressive model describes the dependent variable upon only the lagged value of the variable itself plus an error term. In this model the stationarity condition requires that the characteristic equation of the unit root for the lagged term doesn’t have any solution greater than one, in this case the mean of the process exist and is given by:

E ( y )= ω1−β1−…−βn

The Wold’s theorem says that any stationary process can be divided into a deterministic part and a purely stochastic part which is a MA of infinite order. The autocorrelation function can be solved by solving a set of simultaneous equations, where the function will geometrical decay to zero.If the data are experimenting a MA dynamic (autocorrelation) in the residual the R2 will be low, and the model should be corrected with introducing AR dynamic.

The ARMA process is the combination of the MA and AR models. This class of model is specifically differentiated by the other two by using the pacf, in fact the acf is the same of the AR one, it will have a geometrical decay pacf function. A spe-cial class is the ARIMA, where the process is not stationary.

To choose the appropriate number of lags (to ensure to obtain a parsimonious model) we cannot directly/simply use the plot of the pacf2 and acf function, since when we are using mixture of variables it is hard to understand what’s going on. That’s why academics have developed several information criterion, which are based on the general idea to use the vari-ance of the error term corrected by come parameters representing the penalty of adding new parameters and the subse-quent absorption of degree of freedom for. Our goal is to minimize the criterion. There isn’t any superior index. [page 233]

Volatility model:

The estimate of variance is central in the financial field, there exist several possibilities to be applied: those who are linear, like ARMA family, or non-linear. The last are useful to properly address some common features such as leptokurtosis, volatility clustering/pooling3 and leverage effect.

2 This function is defined as the measure of correlation between the current and the lagged value, after controlling for any intermediate lags. It will always has the same value of the acf for the lag number one, since there is no corrections.3 Volatility level shows correlation in size in the short term

3

Page 4: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

To test the presence of non-linear relation we can still use the usual t-student or F distribution, however they are not flexi-ble, in the maximum likelihood world there are three possible test to be run: Wald, Likelihood Ratio and Lagrange Multi-plier. Basically all of them works by comparing the ML estimates with another ML estimates restricted [a sort of F-test], more specifically we compute the distance between the maximum value of the two function.

2 ln (L (θ )−L (~θ ) )where L (.. ) is theloglikelihood functionThe Wald is based on the horizontal distance, while LM compare the slope of the two function and the LR is simply the

previously formula written with likelihood function l (θ )/ l (~θ ). The LR is distributed according with a chi-square distribu-

tion with “m” degree of freedom (# of restrictions).

Another totally different class of variance model is the one assuming a random variance (note that the G-ARCH class as-sumed a deterministic behavior given all the info at the time, basically the error term is only in the mean). Those model are called stochastic volatility and add an error to the variance, meaning it is not any more observed, but latent (modeled indirectly). Unfortunately this class of model are hard to implement and to run appropriate estimates

Classic computation:In this class belongs the Risk Matrix model, Implied volatility and historical one [page. 384] They will capture heavy tail:

σ t2=(1−γ ) Rt−1

2 +γ σ t−12 =(1−γ )∑ Rt−1Rt−1γ

k

ARCH and G-ARCH estimators:It is the most famous non-linear financial models, where the non-linearity is present in the volatility estimation, while the mean is still assumed to be constant. The usage of this model can be suggested either by theory or by some sort of test that are classified in general4 and specific. This class of estimators are superior to the previously one because allow for mean reverting and long run value (usually given by non-conditional variance), the conditional variance is constant.

The ARCH model is an autoregressive model, which definition is based on the concept of conditional and unconditional ex-pectation. This class of model aims to describe the conditional behavior of the random time series, while won’t say noth-ing on the conditional mean, which can take any forms5:

{y=β1+β2 x2+ β3 x3+ε ε N (0 , σ 2)σ2=α0+α 1 εt−1

2

The condition for this class of model is that all the parameters must be positive6 and to test the significance of the model we can run a null hp on the joint meaningfulness of the model which is distributed according a chi-square. This specific class has several drawbacks: we do not know how many lags we need, the model can easily become over-parameterized and hence not parsimonious, the non-negative constrains can be easily breach where are employed many parameters.

The G-ARCH will solve most of the limits previously seen, in fact it is less likely to breach the non-negative constrain since it allows an infinite number of past squared errors to influence current conditional variance, hence G-ARCH requires few parameters to capture the whole lagged relationship. There are some class of G-ARCH model to specifically capture asymmetric behavior in the data distribution: GJR-GARCH, where it is added a Boolean variable [I=0,1] which assumes 1 value when the return is negative and negative otherwise

σ 2=ω+β σ t−1+α ε t−1+ Iθεt−1,This specification is made to specifically capture leverage, in fact for negative return the equity of the company is reduced and so the leverage/risk is increased since debt is not changing.

4 The most famous one it the BDS test which is able to detect several possible departure form linearity, the specific test are good in detecting one class of possible nonlinear relation, but it won’t be able to detect the other forms 5 There could be the presence of the conditional variance even into the mean modeling GARCH-M6 This requirement is stronger than the minimum, but it is the only feasible to be applied

4

Page 5: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

NG-ARCH model is defined as σ 2=ω+α (rt−1−σ t−1θ )2+β σ t−12 , where the unconditional variance is

ω

1−α ( 1+θ2 )−βAnother leverage model is the EG-ARCH model where the function is defined with the exponential, hence it always posi-tive, however the expected variance beyond one period cannot be calculated analytically.

ln σ 2=ω+α ¿¿

To estimate the parameters both for ARCH and G-ARCH we cannot use OLS7, but we should rather employ the maximum likelihood8, more specifically a log-likelihood function to have the additive property in the function. The parameter esti-mated to show stationarity need to some to less than one, otherwise we will have an integrated G-ARCH.

How to improve estimates:

There exist two method to improve the quality of our estimates by using intraday observation. The two methods proposed are the range and the realized one.

The first one use the parameter D=ln (SHIGH )−ln (SLOW ) where the expected value of the squared range is

E (D2 )=4 ln (2 )σ 2where σ is estimated with=

14 ln (2 )

∗1

T∗∑ Di

2≈0,361 Di

2withthe Parkistonapproximation

, we can also use as a proxy the G-Arch volatility forecast itself. This form of computation has been shown more persis-tence than the usual one and autocorrelation lags are positive and significant from lag 1 to 28 (on average). This variable can be use into a G-Arch equation as a new variable to improve the quality of the estimates, note that even if the variable D and the past volatility are highly correlated we do not care, since we are not interested on the single parameter but on the final output. This model is preferred for illiquid asset, in general it is less performing than a good realized data (4 hourly).

The second approach is basically the usage of intraday observation. This method has the property to increase the quality of the estimate since the volatility of the parameter is positive affected by the frequency, furthermore has been shown that at high frequency for high liquid asset the log estimate is close to the normal distribution. In this case it is possible to estimate logarithm of volatility with a LRW model or ARMA model, but since we are interested in level we must be aware of the transformation (it is no linear). This class of estimators perform quite well in the short term, for long horizon we need to model integrated ARIMA model to ensure mean reversion/de-trending. The drawback of this approach is that sometimes the quality of the data is poor, and so there could be noisy estimation due to microstructure of the market.

These methodologies can be applied to increase covariance estimate as well. The realized approach can be directly applied with the as usual warnings, while the range based one need to correct: Assuming that the cross variance of portfolio

σ 12+σ2

2+2σ1,2 we can estimate the covariance by reverting that formula and end up with our proxy

cov=0,185 (D32−D 1

2−D22), all the other consideration are still valid.

Modeling long-run relationships in finance:

As a prelude for this discussion we need to define the concept of stationarity and its importance in the finance field. A variable is defined stationary if all of its variable are distributed with the same distribution over time, hence the IC won’t

7 OLS is focus only on the parameter of the conditional mean, not the variance one, hence the target optimization is not good8 Basically we want to differentiating the function by its own parameter and we are aiming to find the maximum point, in the non-linear case the function can show local maxi-mum point, hence it is crucial to properly specified the algorithm to find the real absolute maximum point. This method is usually run assuming random distributed error, how-ever even if the hp do not hold, the estimates are still consistent.

5

Page 6: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

change, however this is a strong assumption hence we usually refer to stationarity with a weaker requirement that is hav-ing a long-run mean. This result can be achieved if the random variable has constant mean, variance or auto covariance/correlation (white noise is always a stationary process). The last one Is represented with a specific function named auto-correlation function (acf). Box and Pierce have developed a test to check the joint hp of null coefficient for n-lags, this test is distributed according a chi-square.

In the finance field it is import to deal with stationarity since we are aiming to define shocks that will die away after some-time. Non-stationary series will have spurious regression estimates of the parameters (they won’t be different from zero) and the R2 will be low (suggesting lack of correlation), however those variables might be trending together. Furthermore our estimates will lose all the asymptotic properties increasing the previously problem of inconsistency.

There exist basically two kind of not stationarity: the random walk with drift and a trend stationary process. The first one is an AR(1) model, while the second trend is function of time, however both the models have a coefficient greater than one, so that all the past shocks have an infinite memory. The plot of those series will show an increasing trend behavior, hence the mean won’t be constant, while a stationary process will show a plot similar to a white noise with constant mean.

Those two processes must be treat differently to eliminate the non-stationarity: the first one needs a differencing process/unit root process, while the second needs a de-trending procedure. It is important to use the appropriate method, otherwise we will introduce some error: the deterministic trend if treated with differencing procedure will intro-duce an MA(1) structure which is not invertible and so it has undesirable property; conversely if one tried to de-trend a stochastic trend then we won’t remove the non-stationarity.

Differencing procedure:The differencing procedure can be generalized to solve series with more than one unit root, we say that a series is inte-grate od order “d” to express the number of unit root (y I (d)) , in general a series of order “d” needs to be differenti-ated d time to remove the non-stationarity.

y t=b1+b2 x t+b3 y t−1+ε∆ y t=δ 1+δ 2 x t+δ 3 y t−1+θ

The parameter 1,2 are the same, the 3 is different but they will have same standard deviation, the residual are the same

To test unit root we may use the autocorrelation function, however this method is not appropriate, the acf may show a decay factor even if the series has an infinite memory of shocks. To test unit root we need other approach:

The first one proposed is to test the coefficient of the AR() to test the null hp “=1”, this model is employed on the differentiated equation to test if the coefficient on the lagged variable is still different form 0, we can also add the intercept and a deterministic trend in the differentiated equation. The test is distributed according to a non-stan-dard distribution, and the null is rejected for value more negative than the critical one. The test need that the er-ror terms are not correlated one to the other, to ensure this we can add n-lags9 to our equation (augmented Dickey-Fuller test)

To test for higher unit root, meaning to check if the series is of order s, that is greater than d (chosen value). This matter is not important in finance field since no time series has ever contains more than one unit root

9 To choose the appropriate number of lags it is suggested to use a rule of dumb based on the frequency, it is crucial to choose the correct number since too few may leave auto-correlation in the model, while too many may led to reduce the power of the test since we are using the degree of freedom

6

Page 7: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

Cointegration:Now we are going to consider the case of two integrated variables and their joint behavior, the so called Cointegration. If two integrated are combined, the combination will have an order of integration equal to the largest.

y=β1+β2 y t−1+ β3 x t−1+εx=b0+b1 x t−1+ε

where boththe variableare I (d )

The short run elasticity is given by β3while the long run β3

1−β2

. The previously set of equation can be differentiated:

∆ y=β1+(1−β¿¿2)( y t−1−β3

1−β2

xt−1)+ε¿ ∆x=b0+ε2 twhereβ3

1−β2

x t−1 canbe interprete as thelong runequilibrium

It is desirable to obtain residual that are not integrated, we can do that if there exist a possible combination of the two variables which is stationary, meaning the two variable co-move, or more in general the two variables are bounded by some sort of long run relationship. This kind of behavior is very frequent in the finance field studies.

Differently form the simple differentiation procedure the Cointegration aims to ensure/find a long term relationship adding a lagged levels of cointegrated variables, known as error correction terms

∆ y=∆x β+α ( y t−1−x t−1 )+εThis new formulation will grant us to define an equilibrium between the variables, where the coefficient Alfa represent the speed of the reversal of the model, which can be expressed using more than two variables.

There exist several test to check the existence of Cointegration among different variables based on a residual analysis simi-lar to the Durbin-Watson: we will use the residual to see if they are integrated or not, the null hp is that the model is inte-grated, of course this method depend on the model chosen to explain the relationship, it doesn’t provide us with the cor-rect specification. If the model fails we should use just the difference method to eliminate the integration problem, how-ever there won’t be any long run solution.

To estimate the parameters we cannot use OLS straightforward10, instead there are three approaches: Engle-Granger; En-gle-You and Johansen.The first method consist on using the OLS to run our model with all the variables I(1)and save the residual, which must be I(0), otherwise we cannot continuing our estimates. We will run again our equation but instead of the explicit error correc-tion term we are going to use the lagged residuals, the two variable are now useful to made inference. this approach suf-fers of lack of power, possible simultaneous bias.

The IC for integrated variable will increase to infinite

10 The parameters are not meaningful if we are using more than one Cointegration

7

Page 8: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

Guidolin Parts:

Econometric analysis:

Types of data:

Time series are data that have been collected over a period of time on one or more variable. Possible issues can be: the frequency need to be chose to avoid possible bias or not significant observation and also to ensure a continuous or regu-larly space; the variable can be qualitative or quantitative, both of them are naturally ordered chronological. A time series of random variable is a stochastic process, the probability structure of a random variable is determined by the joint distribution of a stochastic process, where we can assume to have a constant plus an error which dynamics will explaina7describe the randomness.We can use time series to study the change of some variable over time, to perform the event study, basically to study all the case in which the time dimension is the most important one

Cross sectional data are on one or more variable collected at a single point in time, so there is no natural ordering in the observation. Those data can be used to study the relationship between variables.

Panel data have the dimension of both time series and cross-sections. Those data are the one providing more info

The data collected can be continuously distributed or discrete data (depends on the variable observed). Another impor-tant characteristic of data is according to whenever they are cardinal, nominal or ordinal numbers.

The cardinal are those for whom the value assumed have a meaning, twice bigger in value means twice The ordinal can be interpreted as providing a positioning/ordering, begging twice does not mean twice in value The nominal are those data that do not carry any information either in ordering and in valuing

Steps involved in formulating an econometric model:

1. At first we need to state our general statement of the problems, keeping in mind that we do not need to capture every relevant real-world phenomenon, but it should present a sufficiently good approximation (be useful)

2. The second stage is to collect the data, remember to understand them before starting to use them3. Third we need to choose our estimation model and to apply it4. Forth we need to check the hypothesis of the model, if they are met and to understand possible deviation5. Fifth we need to understand the result of the model and to formulate our theory

OLS model:

OLS, ordinary least square is a method used to estimate the regression parameters of linear regression: a set of variables regressed on common factors, hence evaluating the relationship between a given variable and one or more other vari-ables11. It assumed a linear relationship between the depend variable and the weights, hence the independent variable are free to have any form.

Besides OLS there exist other methods to estimate the regression parameters: Moments and maximum likelihood. The

OLS estimates consists of minimizing the ε ' ε the sum squared error [Y−(β0+ β1 X ) ]2 we want that our model on av-

erage is equal to Y. This method is preferred because has an analytic solution and under certain hp is superior to any other methods as proofed by Gauss Markov theorem. An estimator need to have an important feature to be useful, that is un-bi-

11 It is different from the correlation measure which doesn’t implied any cause of the change, it simply stated the presence of a linear relationship

8

Page 9: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

asness and if it cannot be achieve we need to require consistency, which is an asymptotic property which require less hp on the error and its correlation with the independent variables.

The procedure consist of Setting the first derivatives to 0 (it is a sufficient condition since the function is concave) we end up with the

β=( X ' X )−1X ' Y which E ( β )=β+∑ ( xi−x) ε /∑ (x i−x )2 and V ( β )=σε

2 ( X ' X )−1

β0= y−x β which E ( β0 )=β0+E (β− β )x∧V ( β0 )=V ( y−x β )=V (∑ y i

n−∑ (xi−x )

y i∗x

∑ (x i−x )2 )=¿

¿V ¿ From those formula we see that to increase the estimation quality we need to increase the range of the independent vari-

ables X or to improve the estimate of the σ 2.

Y=X ' β+ε ; Y=X (X ' X )−1X 'Y ; ε=(I−X (X ' X )−1

X ' )YAs the formula shows the depend variable randomness comes from the presence of the error, under weak hp we can say

E(Y ) = Y and V(Y )=σ ε2X (X ' X )−1

X ' ;V (Y )=σ ε2 I n

E ( ε Y ' )=0 ; E ( ε Y ' )=σε2(I−X (X ' X )−1

X ' )

The two parameter estimated are invariant to any change of unit of measure; speaking about the intercept we need to keep in mind that it is not in general a good proxy of the real intercept because it will collect even all the garbage of the model, furthermore it can be meaningless if there exist observation of the independent variable close to the origin

Since we are using sample data, not the population, we need to perform some estimates, one of the most important is the error variance estimated with the classic variance formula with the correction of the absorption of two degree of freedom

The OLS HP:

OLS requires certain hp to properly work and to allow the user to infer IC. Those hp are divided in weak and strong, the first are used to ensure satisfying property to the estimates, namely un-biasness, while the second one to perform the sig-nificant test and to build our IC and forecast

Weak hp are three and they will ensure all together that OLS estimators are BLUE:1. The expected value of the error is 0 (it is always the case if the intercept is included in the model) and they

are not correlate with the X; if X is random we should request E (ε ¿ )=02. The variance of error is constant and the correlation among errors is 0. If this hp fail, so that V (ε )=Σ we can

still estimate the β with the generalized least method where βgls=(X ' Σ−1 X )−1X ' Σ−1Y and it is still

BLUE.

¿GLS theV (ε )=Σ=BB ' ;B−1Y=B−1 Xβ+B−1 ε=¿>Y °=X ° β+ε °where Y °=B−1Y∧V (ε ° )=B−1 ΣB−1=I we can still use ε °' ε °=(Y °−X° β )' I−1(Y °−X ° β )Note the V ( ε )≠V ( ε ) since ε=ε−( β0−β0 )−( β−β ) X hence V ( ε )=¿ σ ε

2(I−X (X ' X )−1X ')

3. The matrix X is a full ranked one to avoid multi-collineratity and to ensure the matrix (X’X) to be invertible. The effect of the multi-collinearity is the increase of the betas variance

V ( β j )=σε

2 (X ' X ) j−1

1−R2 whereR2is theresult of the X jregressed vs .the other indipendent variables

9

Page 10: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

The Gauss Markov theorem states that βols is BLUE by using the definition of variance efficiency which states

that if β i and β ii are both unbiased estimated we can say that β i is not worse than β ii iif V(β i¿−V (β ii) is at

least psd.

E ( β )=( (X ' X )−1X '+C )Xβ=β+CXβ henceCX=0¿beunbiased

V (β )=σ ε2 (( X ' X )−1

X '+C ) ( (X ' X )−1X '+C )'=σε

2 [ ( X ' X )−1X ' X ( X ' X )−1

+(X ' X )−1X ' C'+CC '+( X ' X )−1

CX ]σ ε

2[( X ' X )−1+CC ']

Strong HP are two: the error are independent one to each other and to X, hence they are distribute as a Normal. It follows that even the beta has the same distribution since they are linear combination of the errors. Under those hp we can built confidence of interval and test the statistical meaningfulness of the model parameters.

Test used to assess hp and other model features:

There are several test used in statistics to assess the fit and the overall and one by one coefficient significance, since the error variance is unobservable we need to use the sample variance, so instead of the Gaussian distribution we are going to use the t-student distribution The t-ratio which is a special types of test where we are assuming to check the difference of the variable from 0.

x−xs x

the percentile za¿define the IC∧¿ see if the 0 isincluded∨wecanuse the“a” p−value ;“n” is the degreeof freedom ,eachof themwill beused for each estimators

The general idea is to divide the numerator (hp) by its standard deviation. The paired sample is a procedure to test the difference between estimator by considering the difference “d” introduced as a new parameters in the model. Hence The standard deviation is automatically computed by the model and it will consider the effect of the potential correlation12 among estimators

The F-test to test more/jointly hp. The F ratio is defined as

V ( εr )−V ( εur )q

V (εur )n−k−1

=

Rur2 −Rr

2

q1−Rur

2

n−k−1

; where k is the # of parame-

ters “q” is the # of factors tested. Clearly the higher will be the improvement of the unconstrained model to explain the relationship relative to the unconstrained one, the higher will be the ratio and so it will be significant. The F-test on one variable in general gives the same result of a two side t-test

The R2 ¿V (Y )V (Y )

=1−V (ε )V (Y )

=p2( y ; y ) if the constant is included in the regression or better if Y=Y . This measure

can only be reduce by adding new independent variables

V ( y )=∑ ¿¿

How to deal with failure on the OLS hp: Some consideration based on exams test:

1. Cov(r1,r2) where both return has been computed on the same factors it is equal to β1β2V (f )2. If the constant is included in the model we have Cov (Y , Y )=V (Y )3. V ( y )=β 'V (X )B+V (ε)

12 Positive corr will reduce the variance

10

Page 11: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

4. If we are doing an IC for the Forecast the IC will ~Xβ+zαV (~ε );where V (~ε )=V (~xβ− y f )=¿

5. The Mean square error Eθ ((T−Eθ (T ) )2)+(Eθ (T )−θ )2 where T is the estimators θis the real value

6. If we do not consider the complete model, but we miss to consider one independent variable which is correlated with the included variables, there will be a bias in our coefficient since they will be correlated

with the error ~β1= β1+ β2δ where δ isthe coefficient of the x2on x1 variabley=B0+B1 x+B2 z+ε1 realmodel

y=a0+a1 x+ε2 estimatedmodel

the error will be E (a1 )−B1 regressing z=x γ 1+ε3

7. If the intercept is excluded in the model (and it is effectively different form zero) than the estimates of the betas are biased. However if the intercept is really 0 the coefficient variance will be lower.

E (ε )=β0 I8. If the Cov(Xi;xj)=0 than each Beta could be estimated by the univariate formula

X ' Y=X ' Xβ+X ' ε=X 'Y−X ' X β=( X ' X−X ' X ) β=¿>X'Y−X ' Y=V β sinceY=Xβ

Where V is the Var-Cov Matrix of X, if the statement it is true that matrix is a diagonal

Endogenous variables model:

Those class of model has been introduced to overcame the problem of the endogenouoity in the X matrix, hence the ma-trix itself is stochastic and there will be a bias in the estimates, furthermore even the consistency property will be lost. Be endogenous means that the variables used are identify by the others13.

To identify if variables are endogenous there exist some possible test, but first we are now providing a definition of exo-geneity, actually we need to provide two definition since there exist two forms of exogeneity: a predetermined variable is on that is independent both of contemporaneous and future errors in the equation; a strictly exogenous variable is one that is independent also for all past errors.

Simultaneous Equation model:We need to find a reduced form equation for each endogenous variable, and use this new equation in our simultanous es-timates, granting that all the used variables are exogenous. Those reduced equation do not allow to directly retrieve the original coefficient for the single endogenous variables, actually it is not always possible.

The previously problem is called Identification issue, meaning that the extraction of the original value depends on the in-formation available, in other world the number of available equation must be equal or less to the number of parameters. There are two condition to be met to grant a identified equation: Order and Rank condition.

VAR model:This class of model are a mix of the univariate time series and the simultaneous model. This model has been often advo-cate as a solution of endogenous structure, since they are flexible and compact in notation.

The idea of this class of model is that all variables are endogenous, hence we do not need to model any identification re-strictions or perform any test or study the theory behind variable behavior. This is possible because at any time “t” all the variable are known and so pre-determinate. On the other hand there are some drawback: It is a a-theoretical model, there is no guideline on how many lags, or parameters are needed, there are lots of parameters to be estimated and besides all

13 A classic example is the price and quantity value

11

Page 12: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

the concept of stationarity becomes fuzzy, in fact any differentiating procedure will reduce the quality of info obtained to model the relationship between variables.

Studying the non-normality of distribution:

Understand data distribution is important in finance to properly manage risk. All the econometric model developed in this section aims to explain finance data behavior.

Method to detect non-normality:

Those methodologies consist of tests, graphical procedure and possible data smoother. This task is really important since we are interested in defining risk measure which depends on tail distribution.

Smoother/converter into continuous distribution:Since financial data are discrete distributed we need some algorithm to convert them into continuous one. We can use a Kernel estimator which depends on two parameters: the bandwidth “h” and the kernel function K(x)

1nh

∑ K ( x−X i

h );∫K ( x )dx=1 ;h=0,9∗σ∗T−1

2

The kernel function can assume three possible forms: Gaussian; Epanechnikov and Triangular; actually this decision don’t really matter. The “h” is chosen to minimize the integrated MSE function14.

The Jarque-Bera test:This famous test is based on the third and fourth moments analysis, it aims to measure departure form normality using

the term according to this formula: JB=T6∗ske w2+ T

24∗(kurt−3 )2. This test is distributed following the chi-square

with two degree of freedom, high value will show strong evidence against normality.If you standardized the data with their unconditional variance the test won’t change

Q-Q plots:A less formal and yet powerful method to visualize non-normality is to represent the quintiles of standardized data15 against those of a normal distribution by using a scatter plot, graphically the data should distribute on a 45° line. This method allow to see where the non-normality occur.

How to fix non-normality issues:

The academics had decide to describe data assuming that they are distributed with unconditional non-normal distribution, while the conditional one is IID distributed with dynamics/time-varying densities.

We have developed a G-Arch model to try to fix/eliminate the non-normality of data, however empirical data shows us that we still have failed in achieving that result. An explanation of this behavior is that the G-Arch error are still assumed to be normal, hence a possible easy solution is to substitute the normal with the t-student.

The new parameter need to use the t-student is “d” degree of freedom, which is computed numerically to increase the fit of the distribution to the empirical data. It must be greater than 2 (to grant the existence of the variance at least) and can be a real number, the higher its value more closer became the distribution to a normal . This distribution has a polynomial

14 The distance between the Histogram and the continuous Cumulative distribution15 The moments are the sample one which are distributed according to a Normal (under the assumption of normal distributed error) with variance 6 and 24 and 0 expected value [that is why they are standardized]

12

Page 13: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

decay factor with fatter tails but still with no skewness. If we are going to use this distribution to compute Var, we need to

correct the sample variance estimation with the factor ( d−2d )

12.

Another possible solution proposed is the Cornish-Fisher Approximation, that can be defined as a Taylor expansion of a normal distribution to take into account the skewness and kurtosis index

C F−1=φ−1+δ1

6[ (φ−1 )2−1 ]+ δ 2

24[ (φ−1)3

−3φ−1 ]− δ 12

36[2 (φ−1 )3−5φ−1]

A totally different approach is to study just the tail distribution without spending energy on studying the whole set of data, this method, named Extreme Value Theory, provides a good solution to our problems. We will analyze the distribution of rescaled data conditioning to a given thresholds “u” Pr {z←(x+u)∨z←u }, and we will assume that those data will

follow a generalized Pareto distribution (GP) which depends on the parameter ε.

1−(1+ εxβ )

−1ε if ε ≠0 ;1−e

− xβ ∈this casetheGP followsaGaussiandistr ibution expdecay factor . Value

greater than zero implies a thick tail, while negative value will implies thin tails.

To estimate the parameters we can use the Hill’s estimator (if ε>0¿B( x ) x−1ε =c x

−1ε so that ¿absorbthe

parameter β into theconstant c .The problem with this methodology is that do not provide guideline to choose the “u” and since the estimation highly depends on it, it is criticized to be a noisy method.

To complete the info provided by those methodology used to compute VaR we can introduce the Expected Shortfall, which gave us an idea on the possible distribution/outcome of extreme value behavior behind our VaR, it allow us to ad-dress difference between method on computing extreme quintile at higher confidence level (it may be the case to have similar value at 1%, but totally different at 0,1%). It is basically the expected value conditioning with our VaR. the ratio be-tween ES and VaR is greater than one in the case of fatter tails, while be exactly equal 1 for Gaussian return

Multivariate models:

We need to develop multidimensional model to achieve a more realistic and actively managed risk tool, hence we need to create model to compute/add correlation among assets. In theory we could stay in the univariate world by computing our estimation using the portfolio as a single asset, however by doing so we are severely limiting our work, in fact for any change of the portfolio composition we need to re-run all our model. This may be a suitable solution for passive strategy when the relative weights of the portfolio do not change, but it is absolutely unfeasible for active purpose

On the other way round the computation of the corr matrix is not a trivial task, in fact this task suffers of the so call satura-tion problem, basically the number of parameters needed for the n security increase at an higher rate than “n”, so we will need more and more observation for any added securities16. A ratio that measures this problem is the saturation ratio that is the ratio between the total number of observation and the number of parameters needed. To solve this problem we need to develop some approaches.

16 i−0,5T

this is the formula to compute the frequency probability

13

Page 14: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

Exposure Mapping models:We can overcome the computation problem by using a factors model. We will run our model to compute the betas for each variables, from where will get the corr measure. The limits is that this method ignores the idiosyncratic components, hence it works only for well diversified portfolio, where the idiosyncratic component is negligible .

Cov (x i , x j )=β i β jσmkt2

Furthermore it is not obvious which factors are going to be choose

Constant Conditional Covariance model:In this case we are using the same algorithms used to compute variance to compute covariance measure, hence depend-ing on the algorithm chosen we have our result. We can chose the moving average estimates, or the exponential smoother, or G-Arch based.

In the last two examples we need to add a restriction on the parameters, they must be the same for all the securities and constant over time, meaning they must not depends on them, this restriction is quite fuzzy and do not match the real world behavior, but it will ensure the respect of the SPD property for the Var-Cov matrix. Although the conditional covari-ance is time varying

cov=p1,2√σ 12σ 2

2

DCC model:The previously problem of constant parameters among all securities is dealt by this class of model with a step approach17 which is focus only to capture the time dynamic: at first we will estimate the variance with suitable methods, than we pro-ceed to estimate the corr by using the standardized data given the previously estimate volatility (the corr matrix is the same for both returns and standardized data).

The corr matrix estimation needs an auxiliary variables to ensure that the estimated value falls in the interval [-1,1].

ρ1,2=q1,2

√¿¿¿The suggested approach to build those auxiliary variables is to use a G-Arch type dynamic, the parameters are the same across all the securities. This method is easy to implement, manageable since we need to estimate few parameters simul-taneously (QMLE used in waves), although a MLE will be more efficient, but unsuitable for large portfolio.

VEC and BEKK model:First of all we define with VEC the function that converts the up triangular matrix into a vector. This function is used to deal with multivariate models where we need to define some structure on the parameters and estimators to ensure both the respect of PSD property and to manage the number of parameters needed18.

The method suggest is a five step process: 1) variance targeting for the unconditional value to avoid big fluctuation for any small change of the key parameters; 2) we then compute the diagonal value with a G-Arch model. we are going to restrict the parameters matrix to be diagonal, but different from each assets.

The BEKK is computational demanding (it is still an over parameterized model), however it is becoming more popular. This multivariate class of model do not have specific diagnostic check, people still use the univariate one, however this decision is unfeasible due to impossibility to check for the size of the test. The elements that are checked are the adequacy of the specification and the ex-ante evidence of multivariate effects

17 To give an example with 15 assets we have 15*14/2= 105 parameters, hence we need at least 160 periods observation, meaning 14 years of monthly data or 32 weeks18 We will decompose the Var-Cov matrix into the corr matrix and into the standard deviation matrix, doing that we can separately estimate them

14

Page 15: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

Principal component approach:This approach is based on 7 steps:

1. Estimate univariate G-Arch for each asset2. Standardized the returns and order them3. Compute P on this matrix4. Estimate G-Arch for each column of the PC matrix5. Compute the corr matrix with a loading function C=LDL’ and standardized it to ensure a diagonal of 16. Scale the corr matrix with the initial estimate of the variance

Principle component is an old and alternative method to estimate factors and betas by using the spectral theorem, where the number of principal components is less than or equal to the number of original variables. The rational of the method is to proxy the unobservable factor with portfolio return, which are built up to be sensible to constrains. We need to jointly estimate the factors and the betas.

r=Fq X q' +ε ,where ε=Fm−q X m−q

This transformation is defined in such a way that the first principal component has as high a variance as possible (that is, accounts for as much of the variability in the data as possible19), and each succeeding component in turn has the highest variance possible under the constraint that it be orthogonal to (uncorrelated with) the preceding components, hence the first elements of the error matrix is smaller than the smaller of the factors’. Principal components are guaranteed to be in -dependent only if the data set is jointly normally distributed. PCA is sensitive to the relative scaling of the original variables

Assuming to know the variance and to have a time independent Var matrix. This last assumption is added just to simply calculus, in fact there exist more complex methodologies to apply Principal component. Returns’ variance can be represented by the spectral theorem. Other assumption is that V(r) is a full rank matrix, thus if k is the rank it is equal to m which is the number of returns used

V (r )=X Λ X '=∑ x i∗x i∗λ i; X' X=I

1. Where x is the eigenvectors and the λ is the diagonal matrix which has been ordered from the highest to the smallest value starting from the upper left position20

2. The factors proposed are portfolio return, computed using the eigenvector and the market returns. Since

each portfolio is made by f j=r∗x j where x is the eigenvector, each of this portfolio is independent to

the other so we can use the univariate formula to compute our beta

β j=

Cov ( f j ;r )V (f j )

=(E ( x j

' r ' r )−E (x j' r )E (r ))

V ( f j )=x

j

∗V (r )

V (f j )=

x j' λ j

λ j

=x j ' so The beta are the eigenvector for

the specified factor3. The variance of this factors is equal to the diagonal matrix in the spectral decomposition and it is a diag-

onal

V (f j )=V ( r x j)=x j X ∆ X '= λ j; V (F )=Λ4. Since our model completely explains the return behavior, so to change it in a model more close to the

common regression we will rearrange the formula. We will divide the factors in two group. The first one will be the variables matrix, the residual will be the error matrix.

The residual matrix will have mean 0 and it is uncorrelated with the factors

V (ε )=V ( r−f j∗β j )=X− j Λ− j X− j '

19 A G-arch model on 100 securities has 51010050 parameters.20 By possible we mean given the constrain on the squared sum of the weights to be equal one, otherwise there won’t be a bound since it can be arbitrary change by multiply-ing by a constant. There exist other alternative such using the module, however those methodologies doesn’t allow an analytic solution

15

Page 16: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

Thus the Var-Cov highest value of the residual will be smaller than those of the factors one The factors matrix rank is equal to q where q is the number of factor considered (q=j)

There is drawback in this methodology, it doesn’t generally respect the pricing theory which state that there should not be extra remuneration not to bear any risk, in fact the residual can be correlated to some return and so they are not idiosyncratic, and furthermore this risk is not negligible an asset even if it’s not correlated with the factors included can have an excess return

There is another way to build principal component by maximize the portfolio risk under orthogonally and that the sum of the squared weights is one constrain representing each principal components

1. We will built a Lagrangian function to maximize the variance under the constrain, we will end up with that the weights are the eigenvectors and the variance is the diagonal elements of the spectral theorem decomposition of the variance of the return

maxϑ ' ϑ

V (rθ )=¿>L=θV (r )θ−1−λ(θ'θ−1)

The θ' θ constrain is made to have an analitic solution, even if it doen’t have an economic meaning, in

fact in general the linear combination of θ and return is not a portfolio21

|X Λ X '−λI|θ=0only if θ=x j

2. The book suggest to see the marginal contribution of the total variance of each component to notice how basically all the variance is explained by the first three components

Assuming an unknown Var-Cov matrix : we can start from an a priori estimate of V(r) using historical data, how-ever there could be the case that the quality is to low, that’s way it is suggested another methodology. We can start to estimate each components, starting from the highest, one by one.

1. This method consists of maximize the variance with the usual constrain x’x=1 leaving all the estimation error in the last component, since we can better off the estimate of the first one

Switching/Regime models:

The idea of this new set of models is to try to describe random variable by dividing their possible probability distribution into several one, each of those will have specific parameters depending on state of nature. The simplest possible model is the one based on regression estimation sing dummy variable, however this approach suffer of sever limits, it is valid only ex-post, hence it is useless to forecast random state occurrence, while it is useful to explain cyclicity behavior.

More advanced model will treat those state occurrence as random and will try to model the change of state of nature defining a proper probabilistic model

Threshold model:This method defines a discrete number of scenario, each of them has specific parameter value and the occurrence of them is conditional to some other variable value called thresholds. The change of state in this model are abrupt.

{S t=1 (μst=μ1 ) if x t<x1

…S t=1 (μst=μn) if x t≤ xn

Smooth transaction:This method is close to the previously one, but instead of using trigger event it will use probability distribution depending on the value assumed by the threshold variables, so the threshold variable do not determine the state deterministically.

21 Remember that the Var-Cov matrix is a PD, otherwise (PSD) we cannot directly apply the theorem. The |Λ−λI|=0 is the characteristic equation. Which is of order equal to the rank of the Var-Cov matrix, so it can be solved only numerically

16

Page 17: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

Hence, given the cumulative distribution expressed as F(S,x) we can compute the marginal probability of the state of na-ture as the difference of the value assumed by the CDP using the ending and the beginning threshold value.

{ P (S t=1 )=F ( 1; x t )…

P (S t=k )=F (k ; x t )−F (k−1 ;x t )

Markov switching model:This is the frontier in the academic field. This class of models is assuming that the random variable defining the state of na-ture is a discrete, first order, k-state, irreducible, ergodic Markov chain. Those attributes define a random variable which depends only on its immediately past, do not have absorbing state and it has a long run mean, hence

plimE [δ t|δ t−1 ]=π

This approach can be easily combined with all the previously models to better capture non-normality.

There exist different possible Markov model: one modeling just the first moment (MSI(k)) and the other where we model even the volatility (MSIH(k)), besides this two there are a class of autoregressive model, where we are going to model even those added parameters; the last one is the model based on VAR [MSIVARH(k,p)] which has been proofed useful to explain contagion dynamics: Simultaneous (correlation between the variable –co-movement-), linear (through the VAR el-ements, meaning the auto regression components) and nonlinear (through the fact that the regime variable that drives the process of all variable is common to all the variables, hence the switching process)

The parameters of this class of models are the usual first and second moment state dependent, but also the probability collected into the transition matrix, representing the persistence and change probability of the state of nature. They are estimated with ML model.

Appling the model has been noticed some wired behavior, the variance of the bull state, where returns are higher, is lower than the one were the we stay in a bearish state. This behavior, which seems to be against the theory of finance, is well explained by the uncertainty behind the state occurrence, if we will correct for this issue the variance becomes “correct”. Basically we are computing expected value conditional to the probability of occurrence of the different states.

Since these random variables are unobservable statistician need to infer them. There are two possible methods: filtered

state probability ε t+1=P (S t=1 |F t ¿ is based on conditioning on past information, meaning all the available informa-

tion at the time when the analysis has been performed; smoothed probability is an ex-post measure, since we are valuing a point in the past. By applying the Bayes theorem we compute the whole set of probability.

The ergodic prob. ε 1∞=1−ε1

2−ε1−ε2

; the duration (for the state 1) of the state is 1

1−ε 1 ;

ε t=(ε1 , t−1 p1,2 f (R t|S t= j ; Ft .1)+ε2 ,t−1 p1,2 f (Rt|S t= j ; F t .1¿¿/ f (R t|F t−1 ¿

The book proposes an algorithm developed by Kim, basically it proceed backwards instead of proceeding forward based on the fact that at the final date both filtered and smoothed have the same value. We only need to provide the initializa-tion to the algorithm, the usually used one is the ergodic or a dummy 0,5 value.

To test the null hp on the k-scenario against k+1 scenario we cannot use the likelihood test, since the limit distribution chi-square cannot be used, hence we need to use information criteria. The proposed one is the Hannan-Quinn criterion:

17

Page 18: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

HQ=−2( l (θ )T )+2num

(θ )∗ln (ln (T ) )T

Hence to minimize HQ we need to increase the likelihood or decrease the number of parameters.

Simulation methods:

The simulation methodologies is the answer of econometrician to the problem of dealing with completely new situation, where past observation cannot provide useful guideline. Thanks to simulation we have the chance to run experiment un-der “controlled” condition, we seek to model the functioning system as it evolves. In finance field their usage is focused on VaR or ES or other risk measure.

Historical simulation:This kind of simulation is based on the idea that past distribution is a good proxy for the future one. We are going to find the percentile distribution of data and use it to extract future quintile. It is easy to be implemented an model free, how-ever there is no guideline on the time length to chosen, there is a trade-off between significance and reactiveness, further-more it assign equal weights to all the observation.

A close model is the Weighted historical simulation where instead of assign equal weight we are going to use a declining exponential algorithm which is function to a decay factor.

Some common critics on both methods are that we cannot use the usual accrual to modify daily data into weekly or annu-ally data and if we are going to use this simulation to change short position22 we can have fuzzy result.

Monte Carlo Simulation:This approach is based on parametric distribution defined ex-ante used to model the quintile. This method is flexible, pa-rameter efficient and it allows to model time series, on the other hand it is tightly parameterized , hence we need to be confident on the distribution chosen to explain the random variable. Differently form the historical one, MC has the useful property to define a path generation process23. A classic use of the Monte Carlo simulation is to define the behavior of a variable whom we now the asymptotic property, but we have too few observation to be confident to directly apply them.

There exist several techniques to be implemented to reduce the variance of this estimates: the antithetic and control vari-eties. This problem is quite sizable in fact e common techniques to reduce the variance of the estimate we should increase the sample size by 100 times.

The first approach proposed is basically a smarter usage of the path generation to ensure that after a given number of replication we have covered the maximum range of possible outcomes. The idea is to take for each draw the complement one. Belong to this family there are other possible methods with different rules.

The second one is based on the idea of using a highly correlated variable with the one on which we are interested on. The chosen variable is known, hence it can be used as non-random

Filtered Simulation (Bootstrapping):This is mixture of the past two approaches, basically we are going to use the past percentile distribution to draw with re-placement our future quintile, following the usual MC path generation process. The advantage of this method is to take in-ference without strong assumption on the underling variable distribution.

22 We can use the absolute sum of the θ, but only numerical solution are available23 In this case it is suggested to compute the position on the right tail as gains

18

Page 19: Econometrics: Advance

Giulio Laudani #19 Cod. 20192

This approach performance bad where the data has outlier, and there is no hp on the distribution, it is exactly extracted by the data

19