random matrix application to correlations among volatility of assets

8/13/2019 Random Matrix Application to Correlations Among Volatility of Assets

1/17

Random Matrix Application to Correlations Among Volatility of Assets

Ajay SinghPerimeter Institute for Theoretical Physics, Waterloo, Ontario N2L 2Y5, Canada

Dinghai XuDepartment of Economics, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada

In this paper, we apply tools from the random matrix theory (RMT) to estimates of correlationsacross volatility of various assets in the S&P 500. The volatility inputs are estimated by modelingprice fluctuations as GARCH(1,1) process. The corresponding correlation matrix is constructed.It is found that the distribution of a significant number of eigenvalues of the volatility correlationmatrix matches with the analytical result from the RMT. Furthermore, the empirical estimatesof short and long-range correlations among eigenvalues, which are within the RMT bounds, matchwith the analytical results for Gaussian Orthogonal ensemble (GOE) of the RMT. To understand theinformation content of the largest eigenvectors, we estimate the contribution of GICS industry groupsin each eigenvector. In comparison with eigenvectors of correlation matrix for price fluctuations,only few of the largest eigenvectors of volatility correlation matrix are dominated by a single industrygroup. We also study correlations among volatility return and get similar results.

I. INTRODUCTION

Volatility of asset returns is one of the most impor-tant elements in financial research. Since the birth ofseminal models like Black-Scholes and the Autoregres-sive Conditional Heteroscedasticity (ARCH/GARCH),increasing attention has been paid to the analysis ofthe time-varying behavior in volatilities in the past fewdecades. Unlike the asset prices, the volatility is latenton the market. In other words, it is not directly ob-served and therefore, some estimations are necessary tomake volatility time series visible for analysis. Thereare several well known measures to construct volatility.Broadly, we can classify these methods into the follow-ing three categories. The first one is the classical para-metric model-based methods, referred as the estimatedvolatility. In particular, the volatility is generated usingthe models such as ARCH/GARCH, stochastic volatilitymodels, etc. The second type is called the implied volatil-ity, which is backed up by the option pricing formula,e.g., Black-Scholes. In the last category, the volatility isconstructed non-parametrically from the high frequencytrading data, namely realized volatility. All these threevolatility measures are extensively used in the financialindustry.

In this paper, we want to extend the traditional volatil-ity analysis into a multivariate environment. The volatil-

ity correlations are naturally introduced in this picture.Note that the correlations among stock price fluctuationsfor different assets are very important because of their di-rect use for risk management in the Markowitz portfoliotheory [1, 2]. However, in practice, there are differentsources of noise embedded in the final product of esti-mated correlations, such as finite-sample bias due to the

[email protected]

limiting time domain, estimation errors from the ineffi-ciency in the estimating procedure, measurement errors

in the model construction process, etc. In their semi-nal work [3], Laloux et. al. show that this accumulatednoise in the correlation matrix for price fluctuations canbe accounted by using the tools from the random ma-trix theory (RMT) [46]. In particular, they find thatdistribution of eigenvalues of empirical correlation ma-trix, excluding some of the largest eigenvalues, fits verywell in the Marcenko-Pastur distribution of the RMT [47]. In [8, 9], it is further shown that the properties ofthis correlation matrix resembles with the Gaussian Or-thogonal ensemble (GOE). These results strongly suggestthat eigenvalues of correlation matrix falling under theMarcenko-Pastur distribution contain no genuine infor-

mation about the financial markets. Hence, one shouldsystematically filter out such noise from the correlationsfor a more accurate estimation of future portfolio risk(see[10] and references therein).

The correlations among volatility of different assetsare useful in portfolio selection, pricing option book andin certain multivariate econometric models to forecastprices and volatility [11, 12]. For example in the contextof the Black-Scholes model, the variance of a portfolioof options exposed only to vega risk is given by[11]

Var() =i,j,k,l

wiwlijlkCjk . (1)

Herewiare weights in the portfolio, Cij is the correlationmatrix for implied volatility of underlying assets and thevega matrix ij is defined as

ij = pij

, (2)

where pi is the price of option i and j is the impliedvolatility of asset underlying optionj . In volatility arbi-trage strategies, generally correlations among the volatil-

arXiv:1310.16

01v1

[q-fin.ST]6

Oct2013
mailto:[email protected]:[email protected]


2/17

2

ity return, that is the change in volatility, are used. Sincevolatility correlation matrix Cij in (1) or the correlationsamong volatility return are always estimated, they con-tain both systematic and random errors. Hence, for abetter forecast of risk, one certainly needs to estimateand remove noise from these correlation matrices.

The use of volatility correlation matrix in risk manage-ment and volatility arbitrage strategies can have another

important aspect. Once volatility correlation matrix isobtained, one can ask several interesting questions aboutits eigenvalues and eigenvectors. In both risk manage-ment and arbitrage strategies for assets or derivatives,one tries to utilize as much information as available aboutthe market. For example, eigenvectors of correlation ma-trix for price fluctuations reveal that the most correlatedstructures in the stock market, which are also stable fora longer time period, are the industry sectors [9,13,14].In this case, by selecting a portfolio vector orthogonal toall the relevant eigenvectors, one can significantly reducethe portfolio risk. Now in case of volatility correlationmatrix, one can naturally ask if its eigenvectors carry

any new information about the market and whether theyare stable in time. If yes, then how can this be utilizedfor a better estimation of risk and improving volatilityarbitrage strategies?

In this paper, we apply tools from the RMT to volatil-ity correlation matrix. We use one of the well stud-ied econometric models, GARCH(1, 1), to estimate thetime evolution of volatility of assets [1517]. In theGARCH(1, 1) process, the volatility is measured as thestandard deviation of price fluctuations. In econometricsliterature, we do realize that there are rather sophisti-cated models available to measure daily volatility. How-ever, it has been observed that if one is not concernedabout the asymmetric response of volatility to price fluc-tuations, i.e., leverage effect [18], GARCH(1,1) is notoutperformed by any other model to a significant level[19, 20]. Hence, GARCH (1,1) is treated at least as astarting point of the analysis. We leave the use of mul-tivariate models and use of other proxies of volatility forfuture work.

The rest of the paper is organized as follows. In sectionII,we discuss the data being used in this study and howdo we model the price fluctuations to generate volatil-ity time series. Using volatility time series, we constructthe correlation matrix and calculate its eigenvalues insection III. Then, by fitting the cumulative probabilitydistribution of eigenvalues with analytical expression, wefind the optimum number of eigenvalues which fall un-der the Marcenko-Pastur distribution of RMT. Once weknow which eigenvalues could possibly be pure noises,we perform further tests to ensure that they are indeedrandom. We study statistical properties of these eigen-values in section IV. The nearest-neighbor spacing dis-tribution, next to nearest-neighbor spacing distributionand number variance for unfolded eigenvalues are calcu-lated. While first two quantities test the short-range cor-relations among the eigenvalues, the later evaluates the

long-range correlations. We find a very good agreementof these quantities with analytical results for GOE ofthe RMT. In sectionV, we study the eigenvector statis-tics and find that the eigenvector corresponding to thelargest eigenvalue is the market mode. Since the eigen-value for market mode is of the order of the total num-ber of assets, market has very strong correlations acrossvolatility of assets. As we discuss in section V A, the

market mode also influences the eigenvectors under theMarcenko-Pastur distribution. We further calculate thecontribution of GICS industry groups in the eigenvectorswhich are supposed to carry genuine information. Theeigenvectors of correlation matrix for price fluctuations,which are outside the Marcenko-Pastur distribution, arerelatively stable in time and are dominated by particularindustry groups[13,14,21]. However, in case of volatil-ity correlation matrix, only few of the largest eigenvec-tors are dominated by particular industries. In sectionVI,we discuss the robustness of our results by modelingthe time series with small and larger number of param-eters compared to GARCH(1,1). Finally in sectionVII,we summarize our results and discuss future directions.In appendixC, we mention results for application of theRMT to correlation matrix for volatility return 1.

II. ESTIMATES OF VOLATILITY AND

CORRELATION MATRIX

In this paper, we use daily closing prices for 427 stocksin the S&P 500 over the time period covering from July1, 2009 to June 28, 2013 [22]. We represent the totalnumber of stocks by N (N= 427) and the length of the

time series for price fluctuations by T (T = 1005). Thecompanies omitted are those that left or joined the S&P500 list in this time duration 2 Note that our data belongsto a time duration which contains the aftershocks of eco-nomic crisis of year 2008-09. It has been observed thatmarket is generally strongly correlated in volatile periods[23]. Hence, more eigenvalues of return and volatility cor-relation matrices tend to deviate from the standard RMTresults, as compared to analysis in [3, 9].

To model each time series, we begin with defining thereturn on stock i at time t by ri,t = log(Pi,t+1/Pi,t),wherePi,t is the price of stock i {1, 2, . . . , N } at timet {1, 2, . . . , T }. We can normalize the return time seriessuch that they have unit variance and zero mean, and

1 It is worth mentioning that in this paper three types of corre-lation matrices are discussed: return correlation matrix definedas(5), volatility correlation matrix given by (11) and correlationmatrix for volatility returns in (C2).

2 For few time series, we observe that either they are non-stationary or the GARCH(1,1) is not a good model to estimatethe volatility. Hence, these time series are also not considered inthe paper.


3/17

3

write as an N T matrix Git such that

Git = ri,t ri

i, (3)

where

ri =ri,t and i =

(ri,t ri)2 . (4)

In our notation, angle brackets represent the average overthe time series until unless stated explicitly. Now thecorrelation matrix for price fluctuations can be writtenas

C = 1

TGGT . (5)

For convenience, we referCas return correlation matrix.

Further, we estimate volatility time series by modelingthe returnri,t using a univariate GARCH(1,1) process

3.Generally in a GARCH framework, both the conditionalmean ri,t and conditional variance 2i,t at t, given the

informationIt, are functions of time:

ri,t = ri,t|Ite, (6)2i,t = (ri,t r2i,t|Ite. (7)

Heree represents the average over the ensemble and Itis information about the prices till time t.

In this paper, a standard GARCH(1,1) structure isused. There are two equations in the process to modelthe conditional mean and conditional volatility,

ri,t = i,ti,t,

2i,t = i0+

i1r

2i,t1+

i1

2i,t1, (8)

wherei,tis a random element drawn from a Gaussian ort-distribution. The coefficients i0,

i1 and

i1 for stock i

are estimated using standard econometric packages[24].Using these parameters, we can sequentially estimate thevolatility time series i,t based on (8).

As it is generally observed, we find that volatility hasa distribution very close to a lognormal. In figure 1,we draw the distribution of log(i,t), which fits well inGaussian distribution with mean 4.099 .001 and stan-dard deviation 0.379 .001. We can again normalize thevolatility time series such that it has zero mean and unitvariance:

i,t = i,t isi

. (9)

Here i and si are mean and standard deviation of the

3 It is worth mentioning that unit root tests are preformed toverify the stationarity condition before fitting the data into theGARCH structure[36,37].

log t

og

t

5.5 4.5 3.5 2.5

0.

0

0.

2

0.4

0.

6

0.

8

1.

0

1.

2

FIG. 1: (Colour Online) The probability density distribu-tion for logarithm of daily volatility, i.e., log(i,t) is shown.

The volatility time series are generated by modeling theprice fluctuations for 427 stocks from S&P 500 as univari-ate GARCH(1,1) processes. The data set has skewness 0.19and kurtosis 3.01. The red line is the Gaussian fit with mean4.097 .001 and standard deviation 0.378 .001.

volatility time series i,t:

i =i,t and si =

(i,t i)2 . (10)

The positive tail in the distribution ofi,t fits well witha power-law coefficient 4.5, which is close to 5.4 for dailymean absolute deviation of high-frequency return at an

interval of 5 minutes[2]. Now, we can arrange the timeseries for normalized volatilityi,t as N T matrix Gand then the volatility correlation matrix is given by

C = 1

TGG

T . (11)

Using the volatility correlation matrix, one can computeits eigenvaluesi and conjugate eigenvectors vi. Now inthe next section, we find the optimum number of eigen-values that fit in the density distribution from the RMT.

III. RMT AND EIGENVALUE DISTRIBUTION

In this section, we compare the eigenvalue distributionfor volatility correlation matrix (11) with the analyticalresults from the RMT. Note that correlation matrix hasN(N 1)/2 independent components and one requirestime series of length T > N for estimation of correlationmatrix. However, if time series are uncorrelated, an em-pirical estimation of correlations require time series of in-finite length. In the context of the RMT, assume that wehaveNuncorrelated time series of length T, with random


4/17

4

elements drawn from a Gaussian distribution with zeromean and standard deviation s0. We can arrange thesetime series in anN T matrixR. For these time series,we can calculate correlation matrix, that is a Wishartmatrix RRT/T, and distribution of its eigenvalues. Inthe limit N and T , such that Q = T /N isfixed, the distribution of eigenvalues becomes the well-known Marcenko-Pastur distribution [4, 25]:

PRM() = Q

2s20

(+ )( )

, (12)

where

= s20

1 +

1

Q 2

Q

. (13)

where represents eigenvalues and +. Boththe valuesare greater than zero and only when Q1,the gap between zero and disappears and we recoverWigner semi-circle law. For finite T and N, the abruptcut-off at both ends are replaced by rapidly decay-

ing tails. Note that (12) is exact when the elements ofrandom, uncorrelated time series have Gaussian distri-bution. If random elements are drawn from a power-lawdistribution outside the Levy stable range, the eigenvaluedistribution is still found to be in good agreement with(12) [5, 9]. Since volatility time series roughly follows alognormal distribution, we have explicitly constructedNtime series of lengthTwith random elements drawn froma lognormal distribution. We find that (12)is consistentwith the eigenvalue distribution of correlation matrix forthese artificial time series.

Further, as the correlations are introduced among thetime series, some eigenvalues begin to move out of the

bulk distribution (12) [7]. These eigenvalues carry gen-uine information about the market. In this case, effec-tive standard deviations0 in (12) differs from its originalvalue for the time series. We can refer the bulk of the dis-tribution by noise as it represents no-information states.However, the eigenvalues outside this regime correspondto genuine correlations and the associated eigenvectorsrepresent the correlated segments of the market.

In the next section, we show how certain eigenvalues ofthe correlation matrix form the bulk of the distributionand these fit well with the analytical expression (12) fromthe RMT. It has been observed that distributions similarto (12) can arise even when time series have well-defined

correlations. Consequently, we perform further checks onthe eigenvalues falling under the Marcenko-Pastur distri-bution in section IVand compare the results with ana-lytical expressions for GOE.

A. Eigenvalue density

Using volatility correlation matrix (11), we can cal-culate its eigenvalues and sort them in a sequence as

N

0.00 0.01 0.02 0.03 0.04 0.05

0

5

10

15

0 118 236

0

2

4

FIG. 2: (Colour Online) We draw histogram for eigenvaluesof volatility correlation matrix. The y-axis is the frequency

of eigenvalues () in bins of width .001. The smooth plotis the analytical fit (12) from the RMT. This is being es-timated by fitting the data points (14) in the cumulativeprobability distribution (15). The parameters of the fit areQ= 2.351, s20 = 0.09952 0.00005 and = 0.3890 0.0014.We have further multiplied the estimated plot with a factor(.001


5/17

5

{1, 2, . . . , N}, where 1 is the smallest and N isthe largest eigenvalue. The histogram of eigenvalues isshown in figure2. First we notice that the smallest eigen-value 1 = .0015, which is positive. Second, the tail ofthe distribution is decaying smoothly, instead of endingabruptly. We also notice that the largest eigenvalueNis approximately 237, much larger than the values in thebulk of the distribution. These observations are pretty

consistent with the characteristics in our sample data setin which the market was found to be quite volatile andstrongly correlated. Note that since the trace of the cor-relation matrix is fixed to beN, whenNbecomes larger,the peak of the distribution moves closer to zero.

To fit the bulk of the eigenvalue distribution in (12),we use the empirical cumulative distribution function,

F(i) = 1

N

Nj=1

(i j) . (14)

Here () is the Heaviside step function and we haveplotted F(i) as a function of eigenvalues in figure 3.

Since only a subset of eigenvalues are noise, we fit partofF(i) in

FRM() =

d PRM(, s0) , (15)

with parameters s0 and , keeping Q = T /N fixed. Asshown in figure2,a significant number of eigenvalues areoutside the bulk of the distribution. Hence, is intro-duced to take care of the normalization in (15) when it iscompared with only part of empirical cumulative distri-butionF(i) in (14). In addition, when several eigenval-ues are outside the RMT fit, the effective standard devi-

ation s0 changes. In particular if time series are weaklycorrelated, there are only a few eigenvalues greater than+, that is the theoretical bound (13) from the RMT.Then, the effective standard deviation is approximately

s201 1

N

Ni=N0+1

i, (16)

where N0 is the number of eigenvalues which fall underthe Marcenko-Pastur distribution (12).

To fit the data in (15), we first choose a subset ofN1 eigenvalues{1, 2, . . . , N1}and corresponding datapoints in empirical cumulative distribution (14). Then

we fit these data points in (15) and estimate and s0by minimizing the root-mean-square error (RMSE). Thisminimized RMSE for selected N1data points can be rep-resented by,

E(N1) = min,s0

1

N1

N1i=1

F(i) FRM(i, , s0)

2 .(17)

We drawE(N1) as a function ofN1in figure4. It appears

N1

E(N1)

120 140 160 180 200

N1=161

0.

001

0.0

04

0.

008

FIG. 4: (Colour Online) We plot the root-mean-square error(RMSE) of the best RMT fit in N1 data points of empiricalcumulative probability distribution (14). As we changeN1,we approach a local minimum at N1 = 161. This minimum

is pointed out by a vertical dashed line.

that there is a local minimum in E(N1) aroundN1 = 161and beyond this threshold, E(N1) begins to increase al-most linearly. In this way, we find an optimum numberof eigenvalues which are used to fit in the RMT result(15). Note that the value ofN1 is sensitive to the fittingprocedure and the indicator being used. However, theprecise distribution of noise have a decaying tail, insteadof a sharp edge as in (12). Hence a minor change in N1do not overestimate noise.

For N1 = 161, the estimated parameters are s20 =

0.009952 0.00005 and = 0.3890 0.0014. Using(13), we also estimate = .0012 and + = .0272.For these values of parameters, the probability density(12) is shown in figure 2. Finally, we find that there areN0 = 173 eigenvalues such that i + and they fallunder the Marcenko-Pastur distribution. We notice thatthe estimated values ofs20 and are close to what we getfrom (16), that is s200.00429 and N0/N=.4051.

IV. STATISTICAL PROPERTIES OF

EIGENVALUES

In previous section, we investigated which eigenvaluesfall under the Marcenko-Pastur distribution and possi-bly are pure noise. However, it is quite possible for timeseries to contain genuine correlations and then have adistribution very similar to the Marcenko-Pastur distri-bution. For this reason, we perform further diagnostictests to ensure that the eigenvalues bounded by the RMTlimits are indeed noise. Since correlation matrix is a realsymmetric matrix, we use the data to evaluate the nullhypothesis that it belongs to GOE of the RMT. To do so,we compare the short and long-range correlations among


6/17

6

eigenvalues with analytical results for GOE.

By definition, GOE of random matrices has two impor-tant properties. First, ifM is a real symmetric matrixand an element of GOE, all of its elements are statisti-cally independent. Second, the ensemble is invariant un-der the orthogonal transformation. In other words, anytransformation of an element M OTMO, whereO isa real orthogonal matrix, leaves the joint probability ofelements ofMinvariant. Now because of this symmetry,the elements of GOE display some universal properties.Since, some of these properties are self-averaging, onecan observe these by studying a single, large element ofGOE. In particular, we study the short-range correlationsby calculating the nearest-neighbor and next to nearest-neighbor spacing distributions for unfolded eigenvalues.We also compare long-range correlations among eigenval-ues by calculating the number variance. Even for a smallnumber of eigenvalues falling within the RMT bounds,that is N0 = 172, we find a very good agreement withuniversal properties of GOE.

A. Nearest-neighbor spacing distribution

The first test for GOE is the distribution of nearest-neighbor spacing for unfolded eigenvalues of volatilitycorrelation matrix. To unfold the eigenvalues, we usethe technique of Gaussian broadening as it is used in thecontext of Hubbard model in [26]. The Gaussian un-

folding procedure is briefly summarized in appendix A.We consider all the N0 eigenvalues within the theoreti-cal bounds (13) and calculate the unfolded eigenvaluesi. By definition (A3), the unfolded eigenvalue is a mapfrom i to i such that, i has a uniform distribution.Now we estimate the distribution for nearest-neighborspacing d= (i+1 i) and it is shown in figure5. Oneof the standard results from the RMT is that the distribu-tion of nearest-neighbor spacing of unfolded eigenvaluesfor GOE is the famous Wigner surmise [46]:

PGOE(d) = d

2 exp

4d2

. (18)

We fit the estimated density of nearest-neighbor spacingin (1PGOE) with normalization parameter1. As shownin figure5, the empirical data fits well in analytical ex-pression for GOE. We find1 = 0.980.05, which is veryclose to the exact value 1 = 1. We further find that fornearest-neighbor spacing distribution, the Kolmogorov-Smirnov statistics is 0.066 and p-value is 0.48. At thesignificance level of .05, p-values for Kolmogorov-Smirnovtest with reference to their distributions discard GUEand GSE.

dnn

P(dnn

)

0 1 2 3 4

0.2

0.6

1.0

FIG. 5: (Colour Online) We draw histogram for nearest-neighbor spacing distribution of Gaussian unfolded eigenval-ues. The smooth plot is the fit 1PGOE with1= 0.98 0.05andPGOE is given by (18). The Kolmogorov-Smrinov statis-tics and corresponding p-value with reference to analyticalexpression (18) are consecutively 0.066 and 0.48.

B. Next to nearest-neighbor spacing distribution

The second test of GOE is to compare the next tonearest-neighbor spacing distribution of unfolded eigen-values with RMT results. For GOE, the distribution ofnext to nearest-neighbor spacing of unfolded eigenval-ues is shown to be equivalent to nearest-neighbor spacing

distribution for GSE[46]. This is called the GSE testof GOE. The analytical expression for nearest-neighborspacing for GSE is,

PGSE(d) = 218

363d4 exp

64

9d2

. (19)

To calculate empirical values of next to nearest-neighborspacing, we select all the eigenvalues within RMT bounds(13). We further divide these eigenvaluesi in two setswith even and odd indexi. Now, both sets are such thatnext to nearest-neighbor eigenvalues of original sequenceare in the same groups. For each set, following the pro-cedure in appendixA, we perform Gaussian broadeningand calculate unfolded eigenvalues even/odd. Using theseunfolded eigenvalues, we calculate the nearest-neighborspacingsdeven/odd = (even/oddi+1 even/oddi ) in each set. Nowwe combine data from both of the sets to get probabil-ity density for next to nearest-neighbor spacing in orig-inal sequence of eigenvalues. The density distribution isshown in figure6 and we fit it in (2PGSE) with the nor-malization constant 2. We find that 2 = 0.96 0.05,very close to the exact value one. The Kolmogorov-Smirnov statistics for next to nearest-neighbor distri-


7/17

7

dnnn

P(dnnn

)

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.4

0.8

1.2

FIG. 6: (Colour Online) We draw histogram for next tonearest-neighbor spacing distribution of Gaussian unfolded

eigenvalues. The smooth plot is the fit 2PGSE, where 2 =0.96 0.05 and PGSE is given by (19). The Kolmogorov-Smrinov statistics and corresponding p-value with referenceto analytical distribution (19) are 0.063 and 0.55.

bution is 0.063 and p-value can not discard the null-hypothesis at the significance level of 0.5.

C. Number variance

In this section, we compare long-range correlationsamong eigenvalues with the analytical results for GOE.

There are examples of systems, for which the Hamilto-nion do not belong to GOE but short-range correlationsin the spectrum do resemble with short-range correla-tions in GOE [5]. Hence, to establish the fact that theeigenvalues under the Marcenko-Pastur distribution arepure noise, it is required to compare the long-range cor-relations among eigenvalues with analytical results forGOE. One of the quantities that probes long-range, two-level correlations among the eigenvalues is the numbervariance. It is defined as the variance of number of un-folded eigenvalues in the interval of length around eachunfolded eigenvaluei [46]:

()2 = 1N0

N0

i=1

(n(i, ) n(i, ))2 . (20)

HereN0 is number of unfolded eigenvalues and n(i, ) isthe number of eigenvalues in the interval [i /2, i+/2]. Further, n(i, ) is the average of numberof eigenvalues in the interval [i /2, i + /2] andhere averaging is done over the unfolded eigenvalues i.Since unfolded eigenvalues have a uniform distribution,n(i, ) = . Now if the spectrum is translation invari-

ant, the number variance can be written as

(l)2 = 2 0

dr ( r)Y2(r) , (21)

where Y2(r) is related to the two-point correlations. Ifthere are no long-range correlations among eigenvalues,one gets the Poisson spectrum with Y2 = 0 and ()

2 =.

However, the expression ofY2 for GOE takes the follow-ing form

Y2(r) = y(r)2 +

dy (r)

dr

r

dry(r) , (22)

where

y(r) = sin(r)

r . (23)

To estimate number variance empirically, we unfold theeigenvalues using the Gaussian broadening procedure inappendix A. Since number variance takes into accountthe long-range correlations, it is affected by both of theedges at and +, particularly when N0 is not verylarge compared to. Hence, we estimate number varianceonly using the eigenvalues deep in the bulk of the distri-bution. The empirical estimates and theoretical value ofnumber variance (20) for GOE are shown in figure7. Wefind that for 10, the fit is in very good agreementwith the exact result. However, for larger values of ,as it is shown in the inset, number variance diverges toPoisson spectrum. This behavior is common for the caseswhere the number of eigenvalues is not very large.

The results from the sectionsIV A,IV BandIV Cshowthat the short and long-range correlations for eigenvaluesof volatility correlation matrix resemble very well with

the analytical results for GOE of the RMT. These re-sults strongly support the hypothesis that eigenvalues ofthe correlation matrix, which fall under the Marcenko-Pastur distribution, are pure noise. In the next section,we investigate the properties of eigenvectors of volatilitycorrelation matrix and compare these with the eigenvec-tors of return correlation matrix.

V. EIGENVECTOR STATISTICS

In this section, we study the properties of eigenvec-tors of volatility correlation matrix. We first recall someresults for eigenvectors of return correlation matrix (5).In case of price fluctuations, it has been observed thatmost of the eigenvalues fall under the Marcenko-Pasturdistribution and only few of them are outside the bulk.The components of the eigenvectors, which are conjugateto noisy eigenvalues, have Gaussian distribution [3, 8,9].The eigenvector conjugate to the largest eigenvalue is themarket mode. This mode is equivalent to a portfolioin which every asset is equally weighted. Furthermore,other larger eigenvalues, which are outside the theoretical


8/17

8

l

(l)2

0 2 4 6 8 10

0.

0

1.

0

2.

0

0 25 50

0.

0

1.

5

3.

0

FIG. 7: (Colour Online) We plot number variance as a func-tion of spacing parameter . The black points are the es-timated number variance for the eigenvalues bounded bythe theoretical edge +. The smooth line is the theoretical

value of number variance for GOE. The dashed line is num-ber variance for the uncorrelated Poisson eigenvalues, that is2 = . (Inset) Black p oints are estimated number varianceand smooth line is theoretical value of number variance forGOE. We can see that for large values of, the estimated num-ber variance diverge from GOE. This is generally observedwhen we are dealing with a finite number of eigenvalues.

edges from the RMT, are expected to carry genuine cor-relations. Few of the largest eigenvectors are found to bestable in time over a duration as long as ten years[13, 14].However, as one moves from the largest to some smallereigenvalues, the time duration of stability reduces andeventually eigenvectors become random. To understandthe information contained in these largest eigenvectors,one of the approaches is to decompose them in industrygroups [13,14]. It has been observed that while most ofthe small eigenvectors randomly distribute the weight toall the industries, the largest eigenvectors are dominatedby one or two sectors. In sectionV B, we follow the sameapproach to study the properties of the eigenvectors ofvolatility correlation matrix.

In section V A, first we discuss the distribution ofeigenvectors of volatility correlation matrix. We find thatsimilar to return correlation matrix, the eigenvector con-

jugate to the largest eigenvalue is the market mode. Thenwe focus on the distribution of eigenvectors conjugateto the eigenvalues within the RMT bounds. Since thelargest eigenvalueNof correlation matrix is of the orderof the total number of eigenvalues N, we find that it sig-nificantly affects the other eigenvectors. However, oncewe remove the effect of the market mode, we find that theeigenvector distribution is Gaussian, consistent with theRMT. Next, we investigate on the information contentof the eigenvectors conjugate to the largest eigenvaluesin sectionV B.After removing the effect of the common

V~

i

P(V~

i)

4 2 0 2 4

0.

0

0.

1

0.

2

0.

3

0.

4

0.

5

0 1

0.

0

1.

5

3.

0

FIG. 8: (Colour Online) The solid black line is the Gaus-sian distribution for eigenvectors of random matrices. Thedotted and dashed lines are distributions for few eigenvec-tors vi of volatility correlation matrix, which fall deep intothe Marcenko-Pastur distribution. These eigenvectors are ob-tained after removing the effect of the market mode fromvolatility time series and fit very well in Gaussian distribu-tion. (Inset) We show the distribution of components of themarket mode. The dashed line is the Gaussian distributionfor a completely random eigenvector.

market mode, we estimate the weights of different indus-

try groups in these eigenvectors. Interestingly, we findthat compared to return correlation matrix, very few ofthe largest eigenvectors of volatility correlation matrixare dominated by a few industry groups.

A. Distribution of eigenvectors

To study the eigenvector statistics, we first normalizeeach eigenvector vi, associated with eigenvalue i, suchthat vTi vi = N. We find that for the largest eigenvalueN, most of the components of associated eigenvectorare clustered around one. This eigenvector is the marketmode and its distribution is shown in the inset of figure8. Now we examine the eigenvectors conjugate to theeigenvalues falling under the Marcenko-Pastur distribu-tion. According to the RMT, these eigenvectors shouldhave a Gaussian distribution. However, we find that thedistribution is peaked more sharply than a Gaussian dis-tribution. We also observe similar behavior in the returneigenvectors but it is not as strong. Since the largesteigenvalue in both cases are much bigger in comparisonwith eigenvalues in the bulk, market mode has a signifi-


9/17

9

cant influence on other eigenvectors [13,14, 27]. Hence,it is reasonable to remove the effect of the market modeand re-examine the eigenvector distribution.

To remove the effect of the market mode, we regressthe volatility time series on the market mode variableand use the residual to re-calculate the correlation matrix[13, 14]. If the market mode is the eigenvector vN, wecan write the volatility time series for the market as

M = vTNG =Ni=1

vN,iGit. (24)

G is an N T matrix containing time series for nor-malized volatilityi,t, as defined in (9). To remove theinfluence of the market mode, which is a common factorto all the assets, we construct the following regression,

i,t = i +iMt+i,t, (25)wherei andi are stock specific constants and the resid-ualiis such that i= 0 and iM= 0. The estimatedresidual from the above regression is used to constructthe corresponding correlation matrix C, its eigenvaluesi and eigenvectors vi.

There are several observations. First, we find thatone of the eigenvalues, which was related to the mar-ket mode earlier, is zero. We also observe that after aproper rescaling, the eigenvaluesi under the Marcenko-Pastur distribution are quite close to their original valuesi. To see this, we recall that the sum of eigenvaluesiis N, i.e.,

i = N. Previously for original correla-

tion matrix C, the sum of the eigenvalues, excluding thelargest eigenvalue N, was (N N). Now if we ho-mogeneously rescale the new eigenvalues such that theirsum is (N

N), we find that i(N

N)/N are rea-

sonably close to i. One can also see this rescaling fromthe change in the effective variance of the original timeseries. We give more details on this in appendixB.

After removing the common market factor, we findthat the eigenvector distribution for vi fits reasonablywell with a Gaussian distribution. We have shown distri-bution of several eigenvectors in the figure8. Note thatthe eigenvectors are normalized such that vTi vi= N. Inthis figure, the thick black line is the Gaussian distribu-tion with variance one.

B. Industry groups and comparison with return

eigenvectors

In this section, we compare the information containedin the relevant eigenvectors of volatility correlation ma-trix with that of the return correlation matrix. The mainfinding is that very few of the volatility eigenvectors,which are supposed to carry genuine information aboutthe market, are dominated by the industry groups ascompared to the corresponding return eigenvectors. Thisresult is consistent with the observation that in finan-

426

425

424

423

Return Volatility

Industryweightvectors

FIG. 9: (Colour Online) In left column, the components ofweight vectors (27) for four largest eigenvalues of return cor-relation matrix, excluding the market mode, are shown. Theleft column contains the weight vectors for four largest eigen-values of volatility correlation matrix. By comparing theeigenvectors 423, we can quickly observe that volatility eigen-vector is not dominated by a single industry group, howeverreturn eigenvector is. We also point out the industry groupsthat have the largest contributions in these eigenvectors. Forreturn, the eigenvectors and GICS industry groups are fol-lowing: 426 utilities, 425 banks, 424 energy, 423 realestate. For volatility eigenvectors, the largest industry groupsare: 426 real estate, 425 utilities, 424 semiconductors

and semiconductors equipments, 423 consumer services.

cial markets, it is much harder to diversify the volatilityrisk for portfolios 4. We first estimate the contributionsof GICS industry groups in the eigenvectors. Then, by

4 We thank Samuel Vazquez for pointing this out to us.


10/17

10

Return

log(~

i)

lo

g(Ii)

0.20 2.00 20.00

.0005

.005

.05

(a)

Volatility

log(~

i)

log(I

i)

.005 .05 .5 5 50

.0005

.005

.05

(b)

FIG. 10: (Colour Online) Panel (a) shows inverse participa-tion ratio of industry weight vectors (27) for return eigenvec-

tors as a function ofi on a log-log plot. The vertical dashedline shows the position of eigenvalue 407and there are twentyeigenvalues on the right hand side of this. The linear fit ininverse participation ratio for eigenvalues outside the RMTbounds has a slop 1.52, as it is shown by a solid line in the

plot. Panel (b) shows inverse participation ratio of weightvectors for volatility eigenvectors as a function ofi on log-log plot. Again, the vertical line shows the location beyondwhich, the twenty largest eigenvalues fall. In this case, we cansee that compared to return correlation matrix, only few ofthe largest eigenvalues have large inverse participation ratioand are dominated by a few industry groups.

comparing inverse participation ratio of industry contri-butions, we observe that relatively fewer volatility eigen-vectors receive dominating contributions from particularindustry groups.

Using the time series (3) for normalized returns, wecalculate the correlation matrix (5). We follow the pro-cedure described in the sectionIII. It is found that 75%eigenvalues of return correlation matrix fall under the

analytical bounds from the RMT. The movement of themarket hides several correlations among its components[13, 14, 27]. Hence, to see the presence of industrygroups in eigenvectors, we remove the influence of themarket mode using the technique discussed in (25). Inthe cleaned-up volatility and return eigenvectors, we thencalculate the contributions of different industry groups asfollows.

We classify 427 companies in the GICS industry groupsusing a four-digit code system. 24 groups are achieved bythe classification. Each group has na companies, wherea= 1, 2, . . . , gwithg = 24. The number of companies ineach group, that is na, range from 4 to 42. Now, we can

define a g N projection matrix P, which estimates thefraction of each industry contributing in the eigenvectors[13,14]. The entries in the projection matrix are,

Pai =

1/na if stocki is in group a0 otherwise

. (26)

In P, row a is a vector with a weight 1/na to all the nacompanies in group a. We also define a vector ui conju-gate to each eigenvectorvi such that its components aresquare of the later, i.e., ui,j =v

2i,j. Note that vi repre-

sents eigenvectors of correlation matrices after removingthe effect of the market mode. The projection matrixacts on vectors ui and gives g-dimensional weight vectori:

i = iPui. (27)

Here i is the normalization constant such thatga=1i,a = 1. Ideally for the market mode vN, all the

components ofN should be 1/g. In figure 9, we com-pare four weight vectors of return and volatility correla-tion matrix. Now we can simply use inverse participationratioIi,

Ii =

ga=1

4i,a, (28)

of the weight vectorsi as an indicator of the dominanceof a single industry in corresponding eigenvectorvi. If aneigenvector is dominated by a single industry, then in theweight vector, all the elements will be zero excluding one.In this case inverse participation ratio will be one. How-ever, if all the industry groups contribute equally in aneigenvector, the inverse participation ratio will become1/g3.

In figure 10(a) and 10(b), we plot inverse participa-


11/17

11

tion ratio of weight vectors on log-log plot for return andvolatility correlation matrices. The x-axis is eigenvaluesi, that we get after removing influence of the marketmode from normalized return and volatility time series.The gray area indicates the region bounded by the RMTlimits (13) in both cases. In [13, 14], authors studiedreturn correlation matrix for 1000 stocks and observedthat eigenvectors related to twenty largest eigenvalues

were almost stable on the time scale of a year. However,as one moves to the smaller eigenvalues, the eigenvec-tors become more and more unstable over time. In figure10(a)and10(b),we have drawn vertical dashed lines toindicate the position, beyond which the twenty largesteigenvalues fall.

Now, we concentrate on investigating these twentylargest eigenvectors. The structure of industry weightsin eigenvectors are apparently different between returnand volatility eigenvectors. As shown in figure10(a), incase of return correlation matrix, the largest eigenvalueshave large inverse participation ratio and they are dom-inated by only a few industry groups. In fact outside

the RMT bounds, inverse participation ratio appears tofollow a power-law as a function of eigenvalues with anexponent 1.52. Deep within the RMT bounds, the inverseparticipation ratio is of the order of 1/g3. This indicatesthat all the industry groups randomly contribute in theseeigenvectors.

Correspondingly, in the case of volatility correlationmatrix in figure 10(b), the number of eigenvectors withsmall inverse participation ratio is quite large as com-pared to the return correlation matrix. For the sake ofcomparison, let us set a benchmark value of inverse par-ticipation ratio I0 = 1/123. This is the inverse partic-ipation ratio for a hypothetical weight vector such thatit contains equal contribution from half of the industrygroups and no contribution from rest. Sixteen out oftwenty largest volatility eigenvectors have inverse partic-ipation ratio less than this benchmark value. For returneigenvectors, this number is only two. We also noticethat for some smallest eigenvectors of return and volatil-ity correlation matrices, the inverse participation ratio islarge. Consistent with [9], we observe that these eigenvec-tors are localized and receive large contributions from afew stocks in a particular industry group.

The above observations suggest that the eigenvectorsof volatility correlation matrix do not carry as much in-formation about the industry groups as the return corre-lation matrix does. However, it is quite possible that our

approach to remove effect of the market mode is inade-quate and strong non-linear effects due to the market stillhide information about the industry groups in volatilityeigenvectors. To confirm that the residual time seriesiin (25)are indeed independent of the market mode M,we can use a quantity called generalized kurtosis [2]. Theidea is to first transform the residuals i and the marketmode M toi = F1i (i) andM = F1M (M), such thatdistribution of eachi andM is a Gaussian with unitvariance. Now ifi and Mare independent, then so are

i andM. To test this assumption behind our linearmodel, we can study the generalized kurtosis,

i =2iM2 2i M2 2iM , (29)which should be very small if (25) succeeds if removingeffect of the market mode. We can further define thefollowing measure

K = 1

N

Ni=1

i, (30)

as a merit of any model in this context. The value of thisindicator for volatility eigenvectors is Kv = 0.082 andfor return eigenvectors, it is Kr = 0.115. Both of thesevalues are reasonably small and indicate that model (25)does succeed in removing the effect of the market mode.In fact,Kv < Kr indicates that in comparison to return,this model works better for volatility!

The results in this section strongly suggest that theinterpretation of largest eigenvectors of volatility corre-

lation matrix in terms of industry groups is inadequate.In our opinion, this could be attributed to following tworeasons. First, it might be that excluding the largest twoor three eigenvectors of the volatility correlation matrix,other eigenvectors are quite unstable in time. Anotherpossibility could be that, volatility divides the market insectors which differ from the industry groups. A betterunderstanding of time evolution and information contentof largest eigenvectors of volatility correlation matrix willcertainly have important applications in volatility riskmanagement. In this context, it would be interestingto apply tools from cluster analysis [2831] and furtherstudy the time evolution of eigenvectors along the line of

[13,14,3235].

VI. ROBUSTNESS OF RESULTS

In this section, we do some robust diagnostic analy-sis on our results. In our procedure, we estimate thevolatility time series by modeling 427 individual returntime series as GARCH(1,1) processes. In total, there are3427 estimated parameters in the pool. We exploretwo different regimes of number of parameters. First,we consider a single GARCH(1,1) model for all the 427time-series

rt = tt,

2t = 0+1r2t1+1

2t1. (31)

Herertis the return at time tandt is a random variabledrawn from students t-distribution. Coefficients 0, 1and1are parameters to be estimated. The estimation isdone using the standard maximum likelihood estimationprocedure with the joint log-likelihood function L defined


12/17

12

as follows,

L =Ni=1

Li(0, 1, 1|ri) . (32)

HereLi(0, 1, 1|ri) represents the log-likelihood func-tion for individual return time series ri,t for stock i.

The motivation of doing this exercise comes from theobservation that for N= 427, the largest eigenvalue forthe return correlation matrix is around 193. This impliesthat the conjugate eigenvector, i.e., the market itself,is strongly correlated. Hence, one can assume that auniversal GARCH model governs the market. We dofind that all the estimated parameters are statisticallysignificant. Repeating the calculations in sectionsIII,IVandV, we find that although there are small changes inthe numerical values, the main results are qualitativelythe same.

Our second experiment is that instead of fitting returntime-series in a GARCH(1,1), as in sectionII, we modelthe each time series as a more generalized ARMA(pi, qi)-

GARCH(1,1) process [17]. This model is given by

ri,t = i0+

pij=1

ijri,tj +ait+

qik=1

ikai,tk,

ai,t = i,ti,t, (33)

2i,t = i0+

i1a

2i,t1+

i1

2i,t1.

The first equation is the ARMA(pi, qi) mean equationand other two equations are GARCH(1,1) volatility equa-tions. ij is the auto-regressive coefficient and

ij is mov-

ing average parameter. i0, i1 and

i1 are standard

GARCH(1,1) parameters. We first find the optimum

values ofpi and qi by modeling the time series as ARMAprocess based on BIC measure [36,37]. With these op-timal orders, we simultaneously fit the model (33) andestimate all the free parameters [24]. As expected, wefind that although there is a small difference in the nu-merical values, the main results in sectionsIII,IVandVremain intact.

VII. DISCUSSION

In this paper, we study the properties of eigenvaluesand eigenvectors of volatility correlation matrix. We es-timate volatility time series for 427 stocks in S&P 500by modeling the price fluctuations as GARCH(1,1) pro-cess. The empirical distribution for the estimated volatil-ity fits well with the lognormal distribution. Using thesevolatility time series, we construct the volatility corre-lation matrix and study distribution of its eigenvalues.By fitting the empirical cumulative probability distribu-tion in analytical expression, we find that approximately40% eigenvalues fall under the Marcenko-Pastur distri-bution (12). Whereas, for the same time period, approx-

imately 75% eigenvalues of return correlation matrix arewithin the analytical bounds from the RMT. We also findthat the largest eigenvalue for the volatility correlationmatrix is 237 whereas for return correlation matrix it is193. This suggests that correlations in volatility of assetsacross the market are relatively stronger compared to cor-relations in the price fluctuations. To further establishthat volatility eigenvalues falling under the Marcenko-

Pastur distribution are pure noise, we study the shortand long-range correlations among eigenvalues in sectionIV.Since volatility correlation matrix is a real symmetricmatrix, we compare the statistical properties of eigenval-ues with the analytical results from GOE. We find thatfor optimum values of unfolding parameters, the nearest-neighbor spacing distribution, next to nearest-neighborspacing distribution and number variance fit well in the-oretical results for GOE. These results strongly suggestthat approximately 40% eigenvalues of volatility corre-lation matrix carry no relevant information about themarket.

In sectionV A, we study the distribution of eigenvec-

tors associated with eigenvalues within the RMT bounds.We find that eigenvector distribution is sharply peakedaround zero as compared to a Gaussian distribution.We attribute this behavior to the strong correlations involatility across the market. Once the market mode is re-moved from the volatility time series, we find that eigen-vectors indeed have a distribution close to a Gaussian.In section V B, we study the properties of eigenvectorswhich are outside the RMT distribution. These eigen-vectors are supposed to carry genuine information aboutthe market. Similar to return correlation matrix [3, 8],we find that the largest eigenvector is the market mode.The largest eigenvectors of return correlation matrix arerelatively stable in time and are dominated by only a fewindustry groups [13, 14]. We use inverse participationratio of industry weight vectors (27) as an indicator oflarge contributions from a few industries and compare itfor return and volatility eigenvectors. We find that incomparison with return eigenvectors, very few of largestvolatility eigenvectors are dominated by a few industrygroup. This result is consistent with what is observed inpractice. For instance, one can reduce risk of a portfolioby diversifying it across different economic sectors but itis much harder to remove volatility risk. Finally in sec-tionVI, we discuss the robustness of our results. Sinceour results are based on estimation of 427 3 parametersfor all the GARCH(1,1) processes (8), it is necessary to

understand how they are affected when we have a verysmall or large number of parameters. Though there aresmall quantitative differences, our results remain invari-ant qualitatively in both scenarios.

In appendix C, we have also summarized results onapplication of RMT to correlations among volatility re-turns, which are defined as (C1). In this case, we foundthat the largest eigenvalue is 91 and 73% eigenvalues arepure noise. Compared to return eigenvectors, there areless eigenvectors of volatility return correlation matrix


13/17

13

which are dominated by a few industry groups. However,in comparison with volatility correlation matrix, theseeigenvectors carry more information about the industrygroups. In the five largest eigenvectors, we also find thatdominating industry groups are same as what appear inthe five largest return eigenvectors.

In this paper, we use one of the simplest methods,GARCH(1,1), to estimate daily volatility. However, there

are various other proxies and methods of estimation. Forexample, there are other multivariate models of volatility[38]. The implied volatility of an asset is estimated byobserving the option prices in the market. One can alsouse high-frequency data to estimate variance or mean ab-solute deviation of prices. It will be interesting to applytools from the RMT on all these indicators for a deeperunderstanding of correlations among volatility. Since im-plied volatility for an underlying asset varies both withthe strike and maturity, we expect an even rich structurein correlations. This paper certainly provides a first stepin these directions.

Note that the original applications of the RMT in fi-

nancial market were on correlations among price fluctu-ations [3]. In this paper, we instead focus on the secondmoment of these fluctuations. A deeper understandingof financial market will require information about all thecorrelations among its components. Hence, it will be in-teresting to extend our work on realized higher moments,e.g.,skewness and kurtosis, which can be estimated em-pirically.

One of the main results of this paper is to show that asignificant number of eigenvalues of volatility correlationmatrix are pure noise. Since volatility correlation ma-trix has applications in risk management and forecasting[11, 12], it will be natural to understand how volatility

correlation matrix can be cleaned to carry only genuineinformation. The next step will be to see how this cleanedvolatility correlation matrix improves the price forecastand risk estimates. We only list few here, but there isa vast literature where matrix cleaning methods are ap-plied to financial correlations[10,3944]. It is expectedthat these tools will also find applications to volatilitycorrelation matrix.

In sectionV B, we compare the information containedin the largest eigenvectors about the GICS industrygroups. We observe that very few of the largest eigen-vectors of volatility correlation matrix are dominated byindustry groups. Since we expect these eigenvectors tocarry genuine information about the market, this resulthints two possibilities. First, it has been observed forthe eigenvectors of return correlation matrix that fewlargest eigenvectors are quite stable in time. Therefore,these eigenvectors are dominated by a particular industrygroup for a longer time period. As one moves to smallereigenvalues, the time scale of stability begins to reduce.It is quite possible that for volatility correlation matrix,only two or three largest eigenvectors are stable and restof them are just random. However, this possibility donot reconcile with the fact that compared to return cor-

l

(l)2

0 2 4 6 8 10

0.

0

0.

5

1.

0

1.

5

2.

0

c = 1

c = 2

c = 2.65

c = 3

c = 3.5

(a)

FIG. 11: (Colour Online) We show number variance for dif-ferent values of unfolding parameter c, keeping w = 0.0047fixed. The solid line is the theoretical value of number vari-ance for GOE. We find that as we change c, the empiricalestimates approach to the exact result for GOE. In a smallrange around the optimum value ofc = 2.65, empirical esti-mates are relatively stable.

relations, volatility correlations are much stronger in thesame time period. So this leads to a second scenario thatvolatility might organize itself in non-linear structureswhich are overlapped of different industry groups. Sincethese ideas are useful in the volatility risk managementand volatility arbitrage strategies, it will be very inter-

esting to study time evolution and structure of volatilityeigenvectors [13, 14, 2835].

Acknowledgments: We thank Samuel Vazquez andJohn Nieminen for their insightful comments and sug-gestions. AS would like to thank Robert Myers for hissupport and encouragement in this work. AS also thanksApurva Narayan and Heidar Moradi for several interest-ing discussions. This research was supported in part byPerimeter Institute for Theoretical Physics. Research atPerimeter Institute is supported by the Government ofCanada through Industry Canada and by the Provinceof Ontario through the Ministry of Research and Innova-

tion.

Appendix A: Gaussian broadening and unfolded

eigenvalues

In this appendix, we discuss the Gaussian broadeningprocedure for the eigenvalues of volatility correlation ma-trix. The empirical cumulative probability distribution


14/17

14

for eigenvalues i of a random matrix is given by,

F(i) = 1

N

Nj=1

(i j) . (A1)

whereNis the total number of eigenvalues and (i) isthe Heaviside step function. This probability distributioncan be divided into two parts,

F = Fav() +Ff() , (A2)

where Fav is the average part and Ff is the fluctuatingpart, which is zero when averaged over the ensemble.Now one can get unfolded eigenvaluesiusing the averagepart of the cumulative probability distribution,

i = N Fav(i) . (A3)

The unfolded eigenvalues are a map from eigenvalues ito i, such that they have a uniform distribution. Also,note from (A1)and (A3),i are independent ofN.

To separateFav, from the estimated cumulative proba-bility (A1), we use the procedure of Gaussian broadening[26]. We replace delta function peaks at each eigenvaluei in (A1) by a Gaussian distribution with mean i andstandard deviation i. To estimate an optimum valueofi for each eigenvalue, we divide the eigenvalue scalein artificial sub-bands of width w and number them asm ={1, 2, . . . }. We calculate the average distance be-tween eigenvalues in each sub-band and represent it bydm. Then for each eigenvalue i, we find the standarddeviation

i = 2 c dm, (A4)

where c is a broadening parameter and dm is such thatthe eigenvaluei belongs to sub-band m.

The values of sub-band width w and broadening pa-rameter c are generally chosen to get the best fit forshort-range and long-range correlations when comparedwith the theoretical results. For discussion in sectionIV,where we compare the correlations among eigenval-ues of volatility correlation matrix with GOE, we haveusedw = 0.0047 and c = 2.65. Note that in case of nextto nearest-neighbor spacing distribution in sectionIVB,we classify eigenvalues i in two different groups witheven and odd i, and then unfold each group separately.So in these cases, to keep the correlation structure con-

sistent with unfolding in sectionsIV Aand IVC,we useeven/oddi =c d

even/oddm .

Finally, while comparing the properties of empiricaldata with RMT results, one needs to ensure that theagreement is not because of a particular choice of unfold-ing parameters. One should observe that as the unfoldingparameters are varied in a particular range, the empiricalresults converge to theoretical values. Generally, short-range correlations are less sensitive to the unfolding pro-cedure and this convergence is more visible in long-range

correlations. In figure 11, we have shown number vari-ance for various values of c, keeping w = 0.0047 fixed.One can clearly see that asc reaches the optimum value,the estimates for number variance approach to the theo-retical values for GOE. We observe similar behavior whenw varies, keepingc fixed.

Appendix B: Eigenvalues after removing the market

mode

To remove the effect of the market mode, we regressthe normalized volatility time seriesi,t with the marketmode (24) as discussed in sectionV A.As shown in (25),the residual time seriesi can be obtained from

i,t =

i +iMt+i,t, (B1)

which is further used to calculate the correlation matrixC. The elements of correlation matrixC are related tooriginal correlation matrix C as follows

Cij = ij

sisj(B2)

= Cij

sisj ijNN

sisj. (B3)

Here si is the standard deviation for residual time se-ries i. In obtaining (B3), we have also usedi = 0,iM= 0 and that variance of time series

i,t is one. If

market is weakly correlated, N

will not be very large.Also, the dynamics of overall market will have less influ-ence on individual time series and coefficients i will besmall. In this case, if si= sfor alli, to the leading orderfrom (B3),

i = is2

. (B4)

Also, let us assume that only the market mode is outsidethe the Marcenko-Pastur distribution. In this case, allthe i are approximately equal and i 1/N. Usingthis in the variance of residual time series i,

s2

1 N

N . (B5)

This can further be used in (B4) to get i iN/(NN). Note that this is consistent with the argumentsfrom the normalization of eigenvalues in the discussionafter equation (25). However, ifNis very large, the cor-rections to (B4) will not be negligible. In case of volatilitycorrelation matrix, the largest eigenvalue is of the orderofN. Hence, althoughi are close to iN/(N N), weobserve that there are noticeable corrections.


15/17

15

426 425

424 423

422 421

Volatility return

Industryweightvectors

(a)

FIG. 12: (Colour Online) We have shown the components ofthe weight vectors corresponding to six largest eigenvectorsof volatility return correlation matrix. For volatility returneigenvectors, the largest contributions come from followingindustry groups: 426 utilities, 425 banks and energy, 424

banks and energy, 423 real estate, 422 semiconductorsand semiconductors equipments, 421 Household & PersonalProducts. Note that the industry groups in five largest eigen-vectors match with the dominating groups in return eigenvec-tors, that are shown in figure9.

Appendix C: Correlations among volatility return

In this appendix, we briefly mention our results on theapplication of RMT on volatility return. Using volatilitytime seriesi,t in (8), we can define the volatility return

i,t = log

i,t+1

i,t

. (C1)

These volatility return time series are further normalizedsuch that they have zero mean and unit variance. Wecan arrange these time series in N (T 1) matrixGand define the correlation matrix

C = 1T 1GG

T . (C2)

Volatility Return

log(~

i)

lo

g(I

i)

.5 5

.0005

.005

.05

(a)

FIG. 13: (Colour Online) We have plotted inverse participa-

tion ratio of weight vectors for volatility return eigenvectors.On the log-log plot, the x-axis is eigenvalues that we get afterremoving the influence of the market. The vertical dashed lineshows the location beyond which, the twenty largest eigen-values fall. We can see that compared to return correlationmatrix, less eigenvalues have large inverse participation ra-tio. However, compared to volatility correlation matrix, moreeigenvectors receive dominating contributions from industrygroups.

For this matrix, the smallest eigenvalue is .05 and largesteigenvalue is 91. We can follow the procedure in sec-tionIIIand find that now 73% eigenvalues fall under theMarcenko-Pastur distribution. The statistical propertiesof eigenvalues are also consistent with that of GOE.

We further find that eigenvectors corresponding toeigenvalues within the RMT bounds have a Gaussiandistribution. The largest eigenvector is again the mar-ket mode. Following procedure in section V A, we canremove the influence of the market mode and using pro-

jection matrix (26), we can calculate the industry weightvectors. The measure (30) for the merit of linear model(25) for the volatility return correlations is Kvr = 0.32.In figure12,we have shown weight vectors for six largesteigenvectors, excluding the market mode. Once again, incomparison to return eigenvectors, there are less volatil-ity return eigenvectors which are dominated by a fewindustries. For the weight vectors, we can also calculateinverse participation ratio and it is shown in figure 13.We can further compare the inverse participation ratioswith the benchmark value I0, that we defined in sectionV B.We find that twelve among the twenty largest eigen-vectors of volatility return correlation matrix have an in-verse participation ratio less than this benchmark value.This number is less than sixteen, that we get for volatil-ity correlation matrix. This indicates that for the data


16/17

16

we have discussed, volatility return eigenvectors containmore information about the industry groups as comparedto volatility eigenvectors. We also notice that the dom-

inating industry groups in five largest eigenvectors forreturn and volatility return correlation matrices are thesame.

[1] Harry Markowitz. Portfolio selection. The Journal of

Finance, 7(1):7791, 1952.[2] J.P. Bouchaud and M. Potters. Theory of Financial Riskand Derivative Pricing: From Statistical Physics to RiskManagement. Cambridge University Press, 2003.

[3] L. Laloux, P. Cizeau, J.-P. Bouchaud, and M. Potters.Noise Dressing of Financial Correlation Matrices. Phys-ical Review Letters, 83:14671470, 1999.

[4] M. Mehta. Random Matrices. Academic Press, NewYork, USA, 1995.

[5] T. Guhr, A. M. Groeling, and H. A. Weidenmller.Random-matrix theories in quantum physics: commonconcepts. Physics Reports, 299(46):189425, 1998.

[6] T. A. Brody, J. Flores, J. B. French, P. A. Mello,A. Pandey, and S. S. M. Wong. Random-matrix physics:spectrum and strength fluctuations. Rev. Mod. Phys.,

53:385479, 1981.[7] A. M. Sengupta and Partha P. Mitra. Distributions of

singular values for some random matrices. Physical Re-view E, 60(3):3389, 1999.

[8] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. NunesAmaral, and H. E. Stanley. Universal and NonuniversalProperties of Cross Correlations in Financial Time Series.Physical Review Letters, 83:14711474, 1999.

[9] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. Amaral,T. Guhr, and H. E. Stanley. Random matrix approachto cross correlations in financial data. Physical ReviewE, 65(6):066126, 2002.

[10] J. P. Bouchaud and M. Potters. Financial Applicationsof Random Matrix Theory: a short review. arXiv:q-fin/0910.1205, 2009.

[11] R. F. Engle and S. Figlewski. Modeling the dynamics ofcorrelations among implied volatilities.

[12] Luc Bauwens, Sbastien Laurent, and Jeroen V. K. Rom-bouts. Multivariate garch models: a survey. Journal ofApplied Econometrics, 21(1):79109, 2006.

[13] P. Gopikrishnan, B. Rosenow, V. Plerou, and H. E. Stan-ley. Identifying Business Sectors from Stock Price Fluc-tuations. arXiv:cond-mat/0011145, 2000.

[14] P. Gopikrishnan, B. Rosenow, V. Plerou, and H. E. Stan-ley. Quantifying and interpreting collective behavior infinancial markets. Physical Review E, 64:035106, 2001.

[15] Robert F. Engle. Autoregressive conditional het-eroscedasticity with estimates of the variance of unitedkingdom inflation. Econometrica, 50(4):9871007, 1982.

[16] Tim Bollerslev. Generalized autoregressive conditionalheteroskedasticity. Journal of Econometrics, 31(3):307327, 1986.

[17] R.S. Tsay. Analysis of Financial Time Series. CourseS-mart. Wiley, 2010.

[18] Fischer Black. Studies of stock price volatility changes.InProceedings of the 1976 Meetings of the American Sta-tistical Association, Business and Economics StatisticsSection, pages 177181, 1976.

[19] A. Lunde and P. R. Hansen. A forecast comparison ofvolatility models: does anything beat a garch(1,1)? Jour-

nal of Applied Econometrics, 20(7):873889, 2005.

[20] T. G. Andersen and T. Bollerslev. Answering the skep-tics: Yes, standard volatility models do provide accurateforecasts. International Economic Review, 39(4):885905, 1998.

[21] Y. Liu, P. Gopikrishnan, P. Cizeau, M. Meyer, C.-K.Peng, and H. E. Stanley. Statistical properties of thevolatility of price fluctuations. Phys. Rev. E , 60:13901400, 1999.

[22] Yahoo!finance, 2013.[23] J.-P. Bouchaud and M. Potters. More stylized facts of

financial markets: leverage effect and downside correla-tions. Physica A: Statistical Mechanics and its Applica-tions, 299(12):60 70, 2001.

[24] Alexios Ghalanos. rugarch: Univariate GARCH models.,2013. R package version 1.2-7.

[25] V. A. Marenko and L. A. Pastur. Distribution of eigen-values for some sets of random matrices. Mathematics ofthe USSR-Sbornik, 1(4):457, 1967.

[26] H. Bruus and J.-C. Angls dAuriac. The spectrum ofthe two-dimensional hubbard model at low filling. EPL(Europhysics Letters), 35(5):321, 1996.

[27] C. Borghesi, M. Marsili, and S. Micciche. Emergence oftime-horizon invariant correlation structure in financialreturns by subtraction of the market mode. Phys. Rev.E , 76(2):026104, 2007.

[28] R.N. Mantegna. Hierarchical structure in financial mar-kets.The European Physical Journal B - Condensed Mat-ter and Complex Systems, 11(1):193197, 1999.

[29] N. Vandewalle G. Bonanno and R. N. Mantegna. Taxon-omy of stock market indices. Phys. Rev. E, 62:76157618,2000.

[30] L. Giada and M. Marsili. Data clustering and noiseundressing of correlation matrices. Phys. Rev. E ,63(6):061101, 2001.

[31] C. Coronnello, M. Tumminello, F. Lillo, S. Micciche,and R. N. Mantegna. Sector Identification in a Set ofStock Return Time Series Traded at the London StockExchange.Acta Physica Polonica B, 36:2653, 2005.

[32] R. Allez and J.-P. Bouchaud. Eigenvectors dy-namic and local density of states under free addition.arXiv:math/1301.4939, 2013.

[33] R. Allez and J.-P. Bouchaud. Eigenvector dynamics:General theory and some applications. Phys. Rev. E ,86(4):046202, 2012.

[34] D. J. Fenn, M. A. Porter, S. Williams, M. McDon-ald, N. F. Johnson, and N. S. Jones. Temporal evo-lution of financial-market correlations. Phys. Rev. E ,84(2):026109, 2011.

[35] T. Conlon, H. J. Ruskin, and M. Crane. Cross-correlationdynamics in financial time series. Physica A StatisticalMechanics and its Applications, 388:705714, 2009.

[36] Rob J Hyndman with contributions from George Athana-sopoulos, Slava Razbash, Drew Schmidt, Zhenyu Zhou,Yousaf Khan, and Christoph Bergmeir. forecast: Fore-casting functions for time series and linear models, 2013.


17/17

17

R package version 4.06.[37] Rob J. Hyndman and Yeasmin Khandakar. Automatic

time series forecasting: The forecast package for r. Jour-nal of Statistical Software, 27(3):122, 7 2008.

[38] Robert Engle. Dynamic conditional correlation: A sim-ple class of multivariate generalized autoregressive con-ditional heteroskedasticity models. Journal of Business& Economic Statistics, 20(3):33950, 2002.

[39] S. Pafka, M. Potters, and I. Kondor. Exponential Weight-

ing and Random-Matrix-Theory-Based Filtering of Fi-nancial Covariance Matrices for Portfolio Optimization.arXiv:cond-mat/0402573, 2004.

[40] S. Sharifi, M. Crane, A. Shamaie, and H. Ruskin. Ran-dom matrix theory for portfolio optimization: a stabilityapproach.Physica A: Statistical and Theoretical Physics,335(3-4):629643, 2004.

[41] J. Daly, M. Crane, and H.J. Ruskin. Random matrixtheory filters in portfolio optimisation: A stability andrisk assessment. Physica A: Statistical Mechanics and itsApplications, 387(1617):4248 4260, 2008.

[42] J Daly, M Crane, and H J Ruskin. Random matrix theoryfilters and currency portfolio optimisation. Journal ofPhysics: Conference Series, 221(1):012003, 2010.

[43] G. Papp, S. Pafka, M. A. Nowak, and I. Kondor. RandomMatrix Filtering in Portfolio Optimization. Acta Physica

Polonica B, 36:2757, 2005.[44] K. Urbanowicz, P. Richmond, and J. A. Hoyst. Risk

evaluation with enhanced covariance matrix. Physica A:Statistical Mechanics and its Applications, 384(2):468 474, 2007.

random matrix application to correlations among volatility of assets

Documents