Download - Founded 1348

Founded 1348Charles University

Johann Kepler University of LinzJohann Kepler University of Linz

FSV UK

STAKAN III STAKAN III

Institute of Economic Studies Faculty of Social Sciences

Charles UniversityPrague

Institute of Economic Studies Faculty of Social Sciences

Charles UniversityPrague

Jan Ámos VíšekJan Ámos Víšek

Austria, Linz 16. – 18. 6. 2003

ROBUST STATISTICS - ROBUST STATISTICS -- Regression - Regression

Schedule of today talk

A motivation for robust regression

M-estimators in regression

Invariance and equivariance of scale-estimator

Breakdown point and subsample sensitivity of M-estimators

Evaluation of M-estimators

Regression quantiles

continued


A challenge of finding high breakdown point estimators in regression

Definition, properties and evaluation The least median of squares

Definition and properties The least trimmed squares

Can small change of data cause a large change of estimator ?

Definition, never implemented Repeated median

continued


Evaluation: algorithm and its properties The least trimmed squares

How to apply The least trimmed squares

Definition, properties and evaluation The least weighted squares

Debts of robust regression

I have red that every man wears, in mean, 8.3 pairs of socks. But I can’t understand how ?

Why robust methods in regression ?

What about to consider a minimal elipsoid containing

a priori given number of observations.

So the solution seems to be simple !

continued


continued


I am sorry but we have to invent a more intricate solution.

So, for the OLS we have in fact following situation !

Minimal number of observations which can cause that estimator breaks down.

Recalling that is breakdown point

Robust estimators of regression coefficients

M )XY(maxarg Tii

n

1iR p

M-estimators

Necessity of studentization by an estimate of scale of disturbances

Unfortunately they are not scale- and regression-equivariant

however ….

M )(

ˆXY

maxargTiin

1iR p

0)XY(X Tii

n

1i i

If for data )X,Y( the estimate is , )X,aY(than for data the estimate is .

Equivariance in scale

If for data )X,Y( the estimate is , )X,XY( T for data the estimate is again .

Invariance in regression Scale equivariant

a

Bickel (1975), Jurečková, Sen (1984)

However - to reach scale- and regression-equivariance of -

-the estimator of scale has to be scale-equivariant -and regression-invariaant.

For see Víšek (1999) - heuristics and numerical study2Affine invariant

Maronna and Yohai (1981)

Disappointing result - - breakdown point equal to .

p/1( is dimension of model )p

Sensitivity to leverage points

Another spot on beauty

)XY(X T

ii

n

1i i

Uncontrollable subsample sensitivity

for discontinuous -function, i.e.

(n)R )ˆXY(Xˆˆ )n(T

kkk)k,1n()n(

)XY(X Tii

n

1i i

)XY()XY(

)XY(X T

ii

n

1i Tii

Tii

i

0)XY(wX Tii

n

1i ii

E.g. 300 iterations moves to .

YWXXWXˆ 1T11T1j,Mj,Mj,M )(

2L 1L

An advantage – M-estimators can be easy evaluated:

n

2

1

w,...,0,0

0,

0,...,w,0

0,...,0,w

W

This is in fact the classical weighted least squares.

Koenker, Bassett (1978)Regression quantiles

0rr)1()r(

0rr)r( )1,0(

)XY(minargˆ T

i

n

1i iR

)(

p

Regression -quantile

)n/i(n

1i iL ˆwˆ

)1,0(wi

L-estimator

Šindelář (1991)

By the way quantiles are the only statistics which are simultaneously L- and M-estimators.

L-estimator & M-estimators

Evaluation by means of software for linear programming.

An advantage

(and they are not equivariant, of course, with possibly low breakdown point).

Regression quantiles are M-estimators, hence they are sensitive to leverage points

A disadvantage

Ruppert and Carroll (1980)

The OLS are applied on the observations, response variable of which are between

of and . )(T

i1ˆX

The trimmed least squares

)2(Ti

ˆX

Can we establish an estimator of regression coefficients

having also 50% breakdown point?

Median is 50% breakdown point estimator of location.

Motto:

Challenge:

A pursuit lasted since Bickel (1972) to Siegel (1983):

To my knowledge - never implemented

)))((( )i,i,,i(OLSˆmedmedmedˆp1p1j

iii

)n(j

p1p1

Repeated median

Then for any

,XY i0T

ii n,,2,1i Rousseeuw (1983)

The first really applicable 50% breakdown point estimatorThe Least Median of Squares

. )(r...)(r)(r 2)n(

2)2(

2)1(

and let us define the order statistics

2p

1j

Tii

2i XY)(r

Let us recall that for any pR

nh2/n

The optimal . ]2/)1p[(]2/n[h

nhn )(rminargˆ 2

)h(R

)h,n,LMS(

1p

Continued The Least Median of Squares

- evidently 50% breakdown point - scale- and regression-equivariant

Advantages

- only -consistent and not asymptotically normal - not easy evaluate

Disadvantages 3 n

Rousseeuw, Leroy (1987) - PROGRESS

First proposal - repeated selection of subsample of p+1 points

Still unreliable, usually bad – I’m sorry Joss, Marazzi (1990)

Later improved - due to a geometric characterization

Rousseeuw (1983)

The second applicable 50% breakdown point estimatorThe Least Trimmed Squares

Then for any

. )(r...)(r)(r 2)n(

2)2(

2)1(

and that the order statistics are given by

2p

1j

Tii

2i XY)(r

Let us recall once again that for any pR

nh2/n

Again the optimal . ]2/)1p[(]2/n[h

nhn )(rminargˆ h

1i

2)i(

R

)h,n,LTS(

1p

Continued The Least Trimmed Squares

- evidently 50% breakdown point - scale- and regression-equivariant

-consistent and asymptotically normal - nowadays easy to evaluate

Advantages

- high subsample sensitivity, i.e. can be (arbitrarily) large

Disadvantages

n

Rousseeuw, Leroy (1987) – PROGRESS First proposal – based on LMS, in fact, the trimmed least squares.

k,h,1n,LTSh,n,LTS ˆˆ

Probably still in S-PLUS, e.g.. It did not work satisfactorily, sometimes very bad.

Engine knock data - 16 cases, 4 explanatory variables - a small change of data

caused a large change of .

Hettmansperger, T.P., S. J. Sheather (1992): A Cautionary Note on the Method of Least Median Squares.

The American Statistician 46, 79--83.

h,n,LMS

The robust methods probably work in another way than we have assumed – disappointment !!

A first reaction:

It removed the “paradox”.

Boček, P., P. Lachout (1995): Linear programming approach to LMS-estimation.

Mem. vol. Comput. Statist. & Data Analysis 19(1995), 129 - 134.

A new algorithm was nearly immediately available.

Evaluated by S-PLUS

Number of observations: 16

Response variable: Number of knocks of an engine

Method Intrc. spark air intake exhaust •Progress -86.5 4.59 1.21 1.47 .069 .328

Boček 48.4 -.732 3.39 .195 -.011 .203

Engine knock data

- the timing of sparks - ratio air / fuel - intake temperature - exhaust temperature

Explanatory variables:

)(r 2)11(

112

6

2

16h

A small change of data can really cause a large change of any high breakdown point estimator.

The second reaction:

The method too much relies on selected “true” points !

What is the problem ?

Then

Let us agree, for a while, that the majority of data determines the “true” model.

11

1i

2)i( )(r )(r 2

)11(

so for this case we may find the precise solution of the LTS-extremal problem,

just applying OLS on all subsamples of size 11.

368411

16

hence number of all subsamples of size 11 is ,

by Boček and Lachout

Method Intrc. spark air intake exhaust

LMS 48.4 -.732 3.39 .195 -.011 1.432 .203

LTS -88.7 4.72 1.06 1.57 .068 .728 .291

• •

Since Boček-Lachout LMS is “better” than precise LTS,

it is probably really good.

Number of observations: 16,

Engine knock data

Algorithm for for the case when n is large.

A

Is this sum of squared residuals smaller than the sum from the previous step?

Apply OLS on just selected observations, i.e. find new regression plane.

B

NoYes

Select randomly p+1 observationsand find regression plane through them.

Evaluate squared residuals for all observations.

Choose h observations with the smallest squared residuals and evaluate the sum of these squared residuals.

h,n,LTS

B

Yes No

End of evaluation Return to A

Continued

Algorithm for the case when n is large.

Have we found already 20 identical models or have we exhausted a priori given number of repetitions ?

27

1i

2)i( )(r )(r 2

)27(

so we have to use just described algorithm. hence number of all subsamples of size 27 is too large

by Boček and Lachout

Method Intrc. urban income young

LMS -272.4 .090 .034 .962 3734.8 281.6

LTS -143.5 .043 .035 .639 3414.5 362.5

• •

Number of observations: 50,

A test of algorithm - Educational data

Explanatory: percentage of residents in urban areas, personal income per capita, percentage of inhabitants under 18

Response: Expenditure on education per capita in 50 U.S. states in 1970

h selected according to “optimal choice”, giving 50% breakdown point

How to select How to select h h reasonablyreasonably??

Number of points of this „cloud“ is .

is only a “bit” smaller thanh 0k

0kh

0k

0kh 0kh

Algorithm for the case when n is large is described in:

Víšek, J.Á. (1996): On high breakdown point estimation. Computational Statistics (1996) 11, 137 – 146.Víšek, J.Á. (2000): On the diversity of estimates Computational Statistics and Data Analysis, 34, (2000), 67 – 89.Čížek, P., J. Á. Víšek (2000): Least trimmed squares. XPLORE, Application guide, 49 – 64.

One implementation is available in package XPLORE (supplied by Humboldt University), TURBO-PASCAL-version from me, MATLAB version from my PhD-student Libora Mašíček.

High subsample sensitivity, i.e.

Disadvantage of LTS

can be rather large (without control by design of experiment)

k,h,1n,LTSh,n,LTS ˆˆ

Víšek, J.Á. (1999): The least trimmed squares - random carriers.

Bulletin of the Czech Econometric Society, 10/1999, 1 - 30.

Víšek, J.Á. (1996): Sensitivity analysis of M-estimates.

Annals of the Instit. of Statist. Math. 48 (1996), 469 – 495.

Sensitivity analysis of M-estimates of nonlinear regression model: Influence of data subsets.

Annals of the Institute of Statistical Mathematics, 261 - 290, 2002.

See also

Víšek, J.Á. (2002): The least weighted squares I. The asymptotic linearity of normal equations. Bulletin of the Czech Econometric Society, no.15, 31 - 58, 2002. The least weighted squares II. Consistency and asymptotic normality. Bulletin of the Czech Econometric Society, no. 16, 1 - 28, 2002.

Disadvantege of LTS ……

nhn )(rwminargˆ n

1i

2)i(i

R

),n,LWS(

1p

non-increasing

,1)1(,0)0(),1,0()1,0(:)z( The Least Weighted Squares

Hence

)n/i(wi

- diagnostic tools for verifying the assumptions (of course, a posteriori),

e.g. test of normality (firstly Theils residuals, later usual tests of good fit, Durbin-Watson

statistics, White tests of homoscedasticity, Hausman test of specification etc.,

- carried out sensitivity studies, i.e.

- offers a lot of modifications of OLS

-and / or accompanying tools, e.g. ridge regression, instrumental variables,

White estimate of covariance matrix of estimates of regression coefficients,probit and logit models, etc. .

Classical OLS developed :

)ˆXY(XXXˆˆ )n(Tkkk

1}k{T}k{)k,1n()n(

May be that one reason why the robust methods are not widely used is the debt of ……

(see previous slide).

Robust instruments. Robust'98 (ed. J. Antoch & G. Dohnal, Union of Czechoslovak

Mathematicians andPhysicists), 1998, pp. 195 - 224. Robust specification test. Proceedings of Prague Stochastics'98 (eds. M. Hušková, P. Lachout, Union

of Czechoslovak Mathematicians andPhysicists), 1998, pp. 581 - 586. Over- and underfitting the M-estimates.

Bulletin of the Czech Econometric Society, vol. 7/2000, 53 - 83. Durbin-Watson statistic for the least trimmed squares.

Bulletin of the Czech Econometric Society, vol. 8, 14/2001, 1 – 40.

Something is already done also for robust methods :

THANKS for A

TTENTION

Download - Founded 1348

Top Related