Download - Founded 1348
Founded 1348Charles University
Johann Kepler University of LinzJohann Kepler University of Linz
FSV UK
STAKAN III STAKAN III
Institute of Economic Studies Faculty of Social Sciences
Charles UniversityPrague
Institute of Economic Studies Faculty of Social Sciences
Charles UniversityPrague
Jan Ámos VíšekJan Ámos Víšek
Austria, Linz 16. – 18. 6. 2003
ROBUST STATISTICS - ROBUST STATISTICS -- Regression - Regression
Schedule of today talk
A motivation for robust regression
M-estimators in regression
Invariance and equivariance of scale-estimator
Breakdown point and subsample sensitivity of M-estimators
Evaluation of M-estimators
Regression quantiles
continued
Schedule of today talk
A challenge of finding high breakdown point estimators in regression
Definition, properties and evaluation The least median of squares
Definition and properties The least trimmed squares
Can small change of data cause a large change of estimator ?
Definition, never implemented Repeated median
continued
Schedule of today talk
Evaluation: algorithm and its properties The least trimmed squares
How to apply The least trimmed squares
Definition, properties and evaluation The least weighted squares
Debts of robust regression
I have red that every man wears, in mean, 8.3 pairs of socks. But I can’t understand how ?
Why robust methods in regression ?
What about to consider a minimal elipsoid containing
a priori given number of observations.
So the solution seems to be simple !
continued
Why robust methods in regression ?
continued
Why robust methods in regression ?
I am sorry but we have to invent a more intricate solution.
So, for the OLS we have in fact following situation !
Minimal number of observations which can cause that estimator breaks down.
Recalling that is breakdown point
Robust estimators of regression coefficients
M )XY(maxarg Tii
n
1iR p
M-estimators
Necessity of studentization by an estimate of scale of disturbances
Unfortunately they are not scale- and regression-equivariant
however ….
M )(
ˆXY
maxargTiin
1iR p
0)XY(X Tii
n
1i i
If for data )X,Y( the estimate is , )X,aY(than for data the estimate is .
Equivariance in scale
If for data )X,Y( the estimate is , )X,XY( T for data the estimate is again .
Invariance in regression Scale equivariant
a
Bickel (1975), Jurečková, Sen (1984)
However - to reach scale- and regression-equivariance of -
-the estimator of scale has to be scale-equivariant -and regression-invariaant.
For see Víšek (1999) - heuristics and numerical study2Affine invariant
Maronna and Yohai (1981)
Disappointing result - - breakdown point equal to .
p/1( is dimension of model )p
Sensitivity to leverage points
Another spot on beauty
)XY(X T
ii
n
1i i
Uncontrollable subsample sensitivity
for discontinuous -function, i.e.
(n)R )ˆXY(Xˆˆ )n(T
kkk)k,1n()n(
)XY(X Tii
n
1i i
)XY()XY(
)XY(X T
ii
n
1i Tii
Tii
i
0)XY(wX Tii
n
1i ii
E.g. 300 iterations moves to .
YWXXWXˆ 1T11T1j,Mj,Mj,M )(
2L 1L
An advantage – M-estimators can be easy evaluated:
n
2
1
w,...,0,0
0,
0,...,w,0
0,...,0,w
W
This is in fact the classical weighted least squares.
Koenker, Bassett (1978)Regression quantiles
0rr)1()r(
0rr)r( )1,0(
)XY(minargˆ T
i
n
1i iR
)(
p
Regression -quantile
)n/i(n
1i iL ˆwˆ
)1,0(wi
L-estimator
Šindelář (1991)
By the way quantiles are the only statistics which are simultaneously L- and M-estimators.
L-estimator & M-estimators
Evaluation by means of software for linear programming.
An advantage
(and they are not equivariant, of course, with possibly low breakdown point).
Regression quantiles are M-estimators, hence they are sensitive to leverage points
A disadvantage
Ruppert and Carroll (1980)
The OLS are applied on the observations, response variable of which are between
of and . )(T
i1ˆX
The trimmed least squares
)2(Ti
ˆX
Can we establish an estimator of regression coefficients
having also 50% breakdown point?
Median is 50% breakdown point estimator of location.
Motto:
Challenge:
A pursuit lasted since Bickel (1972) to Siegel (1983):
To my knowledge - never implemented
)))((( )i,i,,i(OLSˆmedmedmedˆp1p1j
iii
)n(j
p1p1
Repeated median
Then for any
,XY i0T
ii n,,2,1i Rousseeuw (1983)
The first really applicable 50% breakdown point estimatorThe Least Median of Squares
. )(r...)(r)(r 2)n(
2)2(
2)1(
and let us define the order statistics
2p
1j
Tii
2i XY)(r
Let us recall that for any pR
nh2/n
The optimal . ]2/)1p[(]2/n[h
nhn )(rminargˆ 2
)h(R
)h,n,LMS(
1p
Continued The Least Median of Squares
- evidently 50% breakdown point - scale- and regression-equivariant
Advantages
- only -consistent and not asymptotically normal - not easy evaluate
Disadvantages 3 n
Rousseeuw, Leroy (1987) - PROGRESS
First proposal - repeated selection of subsample of p+1 points
Still unreliable, usually bad – I’m sorry Joss, Marazzi (1990)
Later improved - due to a geometric characterization
Rousseeuw (1983)
The second applicable 50% breakdown point estimatorThe Least Trimmed Squares
Then for any
. )(r...)(r)(r 2)n(
2)2(
2)1(
and that the order statistics are given by
2p
1j
Tii
2i XY)(r
Let us recall once again that for any pR
nh2/n
Again the optimal . ]2/)1p[(]2/n[h
nhn )(rminargˆ h
1i
2)i(
R
)h,n,LTS(
1p
Continued The Least Trimmed Squares
- evidently 50% breakdown point - scale- and regression-equivariant
-consistent and asymptotically normal - nowadays easy to evaluate
Advantages
- high subsample sensitivity, i.e. can be (arbitrarily) large
Disadvantages
n
Rousseeuw, Leroy (1987) – PROGRESS First proposal – based on LMS, in fact, the trimmed least squares.
k,h,1n,LTSh,n,LTS ˆˆ
Probably still in S-PLUS, e.g.. It did not work satisfactorily, sometimes very bad.
Engine knock data - 16 cases, 4 explanatory variables - a small change of data
caused a large change of .
Hettmansperger, T.P., S. J. Sheather (1992): A Cautionary Note on the Method of Least Median Squares.
The American Statistician 46, 79--83.
h,n,LMS
The robust methods probably work in another way than we have assumed – disappointment !!
A first reaction:
It removed the “paradox”.
Boček, P., P. Lachout (1995): Linear programming approach to LMS-estimation.
Mem. vol. Comput. Statist. & Data Analysis 19(1995), 129 - 134.
A new algorithm was nearly immediately available.
Evaluated by S-PLUS
Number of observations: 16
Response variable: Number of knocks of an engine
Method Intrc. spark air intake exhaust •Progress -86.5 4.59 1.21 1.47 .069 .328
Boček 48.4 -.732 3.39 .195 -.011 .203
Engine knock data
- the timing of sparks - ratio air / fuel - intake temperature - exhaust temperature
Explanatory variables:
)(r 2)11(
112
6
2
16h
A small change of data can really cause a large change of any high breakdown point estimator.
The second reaction:
The method too much relies on selected “true” points !
What is the problem ?
Then
Let us agree, for a while, that the majority of data determines the “true” model.
11
1i
2)i( )(r )(r 2
)11(
so for this case we may find the precise solution of the LTS-extremal problem,
just applying OLS on all subsamples of size 11.
368411
16
hence number of all subsamples of size 11 is ,
by Boček and Lachout
Method Intrc. spark air intake exhaust
LMS 48.4 -.732 3.39 .195 -.011 1.432 .203
LTS -88.7 4.72 1.06 1.57 .068 .728 .291
• •
Since Boček-Lachout LMS is “better” than precise LTS,
it is probably really good.
Number of observations: 16,
Engine knock data
Algorithm for for the case when n is large.
A
Is this sum of squared residuals smaller than the sum from the previous step?
Apply OLS on just selected observations, i.e. find new regression plane.
B
NoYes
Select randomly p+1 observationsand find regression plane through them.
Evaluate squared residuals for all observations.
Choose h observations with the smallest squared resi- duals and evaluate the sum of these squared residuals.
h,n,LTS
B
Yes No
End of evaluation Return to A
Continued
Algorithm for the case when n is large.
Have we found already 20 identical models or have we exhausted a priori given number of repetitions ?
27
1i
2)i( )(r )(r 2
)27(
so we have to use just described algorithm. hence number of all subsamples of size 27 is too large
by Boček and Lachout
Method Intrc. urban income young
LMS -272.4 .090 .034 .962 3734.8 281.6
LTS -143.5 .043 .035 .639 3414.5 362.5
• •
Number of observations: 50,
A test of algorithm - Educational data
Explanatory: percentage of residents in urban areas, personal income per capita, percentage of inhabitants under 18
Response: Expenditure on education per capita in 50 U.S. states in 1970
h selected according to “optimal choice”, giving 50% breakdown point
How to select How to select h h reasonablyreasonably??
Number of points of this „cloud“ is .
is only a “bit” smaller thanh 0k
0kh
0k
0kh 0kh
Algorithm for the case when n is large is described in:
Víšek, J.Á. (1996): On high breakdown point estimation. Computational Statistics (1996) 11, 137 – 146.Víšek, J.Á. (2000): On the diversity of estimates Computational Statistics and Data Analysis, 34, (2000), 67 – 89.Čížek, P., J. Á. Víšek (2000): Least trimmed squares. XPLORE, Application guide, 49 – 64.
One implementation is available in package XPLORE (supplied by Humboldt University), TURBO-PASCAL-version from me, MATLAB version from my PhD-student Libora Mašíček.
High subsample sensitivity, i.e.
Disadvantage of LTS
can be rather large (without control by design of experiment)
k,h,1n,LTSh,n,LTS ˆˆ
Víšek, J.Á. (1999): The least trimmed squares - random carriers.
Bulletin of the Czech Econometric Society, 10/1999, 1 - 30.
Víšek, J.Á. (1996): Sensitivity analysis of M-estimates.
Annals of the Instit. of Statist. Math. 48 (1996), 469 – 495.
Sensitivity analysis of M-estimates of nonlinear regression model: Influence of data subsets.
Annals of the Institute of Statistical Mathematics, 261 - 290, 2002.
See also
Víšek, J.Á. (2002): The least weighted squares I. The asymptotic linearity of normal equations. Bulletin of the Czech Econometric Society, no.15, 31 - 58, 2002. The least weighted squares II. Consistency and asymptotic normality. Bulletin of the Czech Econometric Society, no. 16, 1 - 28, 2002.
Disadvantege of LTS ……
nhn )(rwminargˆ n
1i
2)i(i
R
),n,LWS(
1p
non-increasing
,1)1(,0)0(),1,0()1,0(:)z( The Least Weighted Squares
Hence
)n/i(wi
- diagnostic tools for verifying the assumptions (of course, a posteriori),
e.g. test of normality (firstly Theils residuals, later usual tests of good fit, Durbin-Watson
statistics, White tests of homoscedasticity, Hausman test of specification etc.,
- carried out sensitivity studies, i.e.
- offers a lot of modifications of OLS
-and / or accompanying tools, e.g. ridge regression, instrumental variables,
White estimate of covariance matrix of estimates of regression coefficients,probit and logit models, etc. .
Classical OLS developed :
)ˆXY(XXXˆˆ )n(Tkkk
1}k{T}k{)k,1n()n(
May be that one reason why the robust methods are not widely used is the debt of ……
(see previous slide).
Robust instruments. Robust'98 (ed. J. Antoch & G. Dohnal, Union of Czechoslovak
Mathematicians andPhysicists), 1998, pp. 195 - 224. Robust specification test. Proceedings of Prague Stochastics'98 (eds. M. Hušková, P. Lachout, Union
of Czechoslovak Mathematicians andPhysicists), 1998, pp. 581 - 586. Over- and underfitting the M-estimates.
Bulletin of the Czech Econometric Society, vol. 7/2000, 53 - 83. Durbin-Watson statistic for the least trimmed squares.
Bulletin of the Czech Econometric Society, vol. 8, 14/2001, 1 – 40.
Something is already done also for robust methods :
THANKS for A
TTENTION