svd and ls
DESCRIPTION
SVD and LS. M.A. Miceli University of Rome I Stats in the Château Jouy-en-Josas August 31 - September 4 2009. Motivations. Problems of high dimensionality in estimation: Rank < actual dimension of the data sets inverse problems - PowerPoint PPT PresentationTRANSCRIPT
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
1
SVD and LS
M.A. MiceliUniversity of Rome I
Stats in the ChâteauJouy-en-Josas
August 31 - September 4 2009
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
2
Motivations
• Problems of high dimensionality in estimation:– Rank < actual dimension of the data sets inverse
problems– Threholds in accepting variables eases on every
dimension, as the number of variables/dimensions increases (ex. Wald test).
• How the SVD helps in extracting robust correlations between dependent and independent variables: automatic choice of “model”.
• Why• Some evidence in predicting US CPIs indexes• Some issues about normalizations
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
3
MotivationsGiven a simultaneous linear system of equations
1. Collapsing dimensionality of the system to its min rank = min [rank(Y), rank (X)],
2. Advantages of SVD w.r.t. Principal Components: • PC requires a sqare matrix, e.g. autocorrelation matrix,
and ranks the dimensions within that single matrix;• SVD ranks the correlations between X and Y dimensions
3. Discretionary possibility of getting rid of some - believed negligible – dimensions: we are interested in getting rid of those dimensions that can be generated by a totally random system of same dimensions (Marchenko-Pastur conditions adapted to a rectangular matrix).
ErrorsBXY NMMTNT ,,,
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
4
Definition of SVD of a matrix product
• SVD definition
Having two matrices
one can write
and therefore
If T << max(M,N)? No problems
NNNNNMNM
NNNMMMNM
VSUA
or
VSUA
,,,,
,,,,
'
'
NMMMMTNNNM SUXVY ,,,,,
MTNT XY ,, ,
'' ,,,, NNNMMMNM VSUYX
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
5
Diagonalizing the LS estimator• Consider regressing every column y over the set of
explanatory variables X:
• we write
• We diagonalize both matrices: (X’X) and (X’Y):– X’X
– X’Y rectangular
– NB. The SVD of a square matrix IS the same as the diagonalisation. We will write
nn XyXXb 1)'(
1' XX PPXX
NMMM YXXXB ,,1 )'()'(
'' xyxyxy VSUYX
xxxxxx VSUXX ''
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
6
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
7
(X’ Y) Uxy
0
Sxy Vxy
SVD of the covariance matrix
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
8
X’Y Vxy Uxy Sxy
0
SVD mapping from column basis to row basis
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
9
Y Vxy X Uxy Sxy
Y linear combinX linear combin
SVD: splitting the product X’Y
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
10
Adding diagonalisation of both X and Y matrices
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
11
Y X Uxx Uxy Inv(Dxx) Sxy Vxy ‘ Vyy ’
Returning to the original variables
Replacing the old “B”:any advantage??!!
We may cancel factors: any criterium?
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
12
RMT
1. Marcenko-Pastur conditions compute singular values density and interval limits for square matrices. Bouchaud, Miceli et al (2005) derive them for rectangular matrices.
2. We run exactly the same experiment with purely random generated matrices for “many times”: limits and densities reply the theory
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
13
Marcenko-Pastur limits and density
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
14
RMT
1. Density and limits do change if we use raw or already diagonalized data.
2. Is this “double diagonalization” worthwhile?
• singular values are HD0 in standardization, eigenvectors are NOT.
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
15
Diagonalized “LS estimator”We may approach the same problem in different ways1. raw data
2. normalized factors
3. non normalized factors
“unfortunately” 3. works best. Why? …Is it because factor normalization changes the ranking of
the SVD singular values and this affect eventually the factor selection? NO!
Answer at the end ….
NNxy
NMMMxy
MTNT VUXY ,,,,,
)')(('))(( ,2/1
,,,,2/1
,,, NNyy
yyNNxy
NMMMxy
MMxxMMxx
MTNT VVUUXY
NNyy
NNxy
NMMMMMxy
MMxx
MTNT VVUUXY ,,,,1
,,,, ])[(
Very disturbing
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
16
Example: Forecasting US CPIs Indexes
Time series are mom % changes:• Y:= 9 CPIs Indexes, aug83 – apr07
• X:= 77 macroeconomic series nov83-apr07 including 3 lags of the Ys.
T=282, N=9, M=77, rolling window W=100 or else.
n= N/W, m=M/W.
CPI_CMDTY Commodities SACPI_APPAREL Apparel SACPI_FD Food & Beverages SACPI_HOUS Housing SACPI_SERV Services SACPI_TRASP Transportation SACPI_MEDIC Medical Care SAPPI_TOT_MOM US PPI SAPPI_CORE_MOM US PPI Core SA
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
17
0 50 100 150 200 250 300-0.05
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
CPIs
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
18
0 50 100 150 200 250 300-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Xs
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
19
Estimation by Model III
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
20
Singular values - Model I: raw and random data
0.000
0.200
0.400
0.600
0.800
1.000
1.200
1.400
1.600
1.800
set-
93
mar
-94
set-
94
mar
-95
set-
95
mar
-96
set-
96
mar
-97
set-
97
mar
-98
set-
98
mar
-99
set-
99
mar
-00
set-
00
mar
-01
set-
01
mar
-02
set-
02
mar
-03
set-
03
mar
-04
set-
04
mar
-05
set-
05
mar
-06
set-
06
mar
-07
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
21
0 20 40 60 80 100 120 140 160 1800.2
0.25
0.3
0.35
0.4
0.45
0.5
Singular values: Model I – Random generated DATA
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
22
Singular values - Model 1: raw and random data
0.000
0.200
0.400
0.600
0.800
1.000
1.200
1.400
1.600
1.800
1 2 3 4 5 6 7 8 9
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
23
Singular values for SVD on raw and random DATA
0.2 0.25 0.3 0.35 0.4 0.45 0.50
10
20
30
40
50
60
70
80
90
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80
50
100
150
200
250
300
350
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
24
Interest Rates Coefficients
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
set-96
mar
-97
set-97
mar
-98
set-98
mar
-99
set-99
mar
-00
set-00
mar
-01
set-01
mar
-02
set-02
mar
-03
set-03
mar
-04
set-04
mar
-05
set-05
mar
-06
set-06
R3M_USMACRO R10Y_USMACRO R2Y_USD_M
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
25
Estimation by Model IIFactors are divided by their own eigenvalue
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
26
0 20 40 60 80 100 120 140 160 180
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Singular values: Model II – Data NORMALIZED FACTORS
lambda max= 0.934
Lambda min=0.608
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
27
lambda max= 0.934
Lambda min=0.608
0 20 40 60 80 100 120 140 160 1800.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Singular values: Model II – Random generated NORMALIZED FACTORS
Random generated singular values don’t look very differently ….
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
28
Singular values: Model II: normalized data and random factors
0.600
0.650
0.700
0.750
0.800
0.850
0.900
0.950
1.000
1 2 3 4 5 6 7 8 9
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
29
Singular values: Model II: normalized data and random factors
0.600
0.650
0.700
0.750
0.800
0.850
0.900
0.950
1.000
1 2 3 4 5 6 7 8
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
30
Singular values for SVD on raw and random FACTORS
0.65 0.7 0.75 0.8 0.85 0.9 0.95 10
20
40
60
80
100
120
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 10
50
100
150
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
31
Let’s see estimations
by Model III
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
32
0 20 40 60 80 100 120-10
0
10
20
30
40
50
60
70
80
90
P&L Model III - Factors on raw data
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
33
P&L Model III - CPI Indexes (Model of Non Normalized Factors) – In sample
0 20 40 60 80 100 120-5
0
5
10
15
20
25
30
35
0 20 40 60 80 100 120-5
0
5
10
15
20
25
30
With ALL svd factors 2 svd factors
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
34
Let’s see estimations
by Model II (normalized factors)
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
35
0 20 40 60 80 100 120-50
0
50
100
150
200
250
P&L Model II (Normalized factors) - Factors
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
36
0 20 40 60 80 100 120-150
-100
-50
0
50
100
150
200
250
P&L Model II (Normalized factors) – CPI’s
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
37
0 20 40 60 80 100 120-5
-4
-3
-2
-1
0
1
2
3
4
0 20 40 60 80 100 120-50
-40
-30
-20
-10
0
10
20
30
40
Normalized factorsNon normalized factors
Example of CPI_comdty estimation
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
38
OUT OF SAMPLE
• Estimation on t=1,…,120• Forecast at fixed coefficients for t= 121, … 282
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
39
0 50 100 150 200 250 300-20
0
20
40
60
80
100
120
140
160
P&L: Factors (Model II)
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
40
0 50 100 150 200 250 300-10
0
10
20
30
40
50
60
70
80
90
Forecast on CPI’s
0 50 100 150 200 250 300-10
0
10
20
30
40
50
60
70
80
All factors 2 factors only
Easier to predict: 1. medical care (since stable), 2. commodities (oil), 3. Transports
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
41
0 50 100 150 200 250 300-6
-4
-2
0
2
4
6
Forecasts on Cpi’s Comdty
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
42
Conclusions 1
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
43
Conclusions on the example