founded 1348

35
Founded 1348 Charles University

Upload: osmond

Post on 05-Jan-2016

34 views

Category:

Documents


1 download

DESCRIPTION

Charles University. Founded 1348. Austria, Linz 16. – 18. 6 . 2003. Johann Kepler University of Linz. Johann Kepler University of Linz. ROBUST STATISTICS -. ROBUST STATISTICS -. - Regression. - Regression. Jan Ámos Víšek. Jan Ámos Víšek. FSV UK. Institute of Economic Studies - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Founded 1348

Founded 1348Charles University

Page 2: Founded 1348

Johann Kepler University of LinzJohann Kepler University of Linz

FSV UK

STAKAN III STAKAN III

Institute of Economic Studies Faculty of Social Sciences

Charles UniversityPrague

Institute of Economic Studies Faculty of Social Sciences

Charles UniversityPrague

Jan Ámos VíšekJan Ámos Víšek

Austria, Linz 16. – 18. 6. 2003

ROBUST STATISTICS - ROBUST STATISTICS -- Regression - Regression

Page 3: Founded 1348

Schedule of today talk

A motivation for robust regression

M-estimators in regression

Invariance and equivariance of scale-estimator

Breakdown point and subsample sensitivity of M-estimators

Evaluation of M-estimators

Regression quantiles

Page 4: Founded 1348

continued

Schedule of today talk

A challenge of finding high breakdown point estimators in regression

Definition, properties and evaluation The least median of squares

Definition and properties The least trimmed squares

Can small change of data cause a large change of estimator ?

Definition, never implemented Repeated median

Page 5: Founded 1348

continued

Schedule of today talk

Evaluation: algorithm and its properties The least trimmed squares

How to apply The least trimmed squares

Definition, properties and evaluation The least weighted squares

Debts of robust regression

Page 6: Founded 1348

I have red that every man wears, in mean, 8.3 pairs of socks. But I can’t understand how ?

Page 7: Founded 1348

Why robust methods in regression ?

What about to consider a minimal elipsoid containing

a priori given number of observations.

Page 8: Founded 1348

So the solution seems to be simple !

continued

Why robust methods in regression ?

Page 9: Founded 1348

continued

Why robust methods in regression ?

I am sorry but we have to invent a more intricate solution.

Page 10: Founded 1348

So, for the OLS we have in fact following situation !

Minimal number of observations which can cause that estimator breaks down.

Recalling that is breakdown point

Page 11: Founded 1348

Robust estimators of regression coefficients

M )XY(maxarg Tii

n

1iR p

M-estimators

Necessity of studentization by an estimate of scale of disturbances

Unfortunately they are not scale- and regression-equivariant

however ….

M )(

ˆXY

maxargTiin

1iR p

0)XY(X Tii

n

1i i

Page 12: Founded 1348

If for data )X,Y( the estimate is , )X,aY(than for data the estimate is .

Equivariance in scale

If for data )X,Y( the estimate is , )X,XY( T for data the estimate is again .

Invariance in regression Scale equivariant

a

Bickel (1975), Jurečková, Sen (1984)

However - to reach scale- and regression-equivariance of -

-the estimator of scale has to be scale-equivariant -and regression-invariaant.

For see Víšek (1999) - heuristics and numerical study2Affine invariant

Page 13: Founded 1348

Maronna and Yohai (1981)

Disappointing result - - breakdown point equal to .

p/1( is dimension of model )p

Sensitivity to leverage points

Another spot on beauty

)XY(X T

ii

n

1i i

Uncontrollable subsample sensitivity

for discontinuous -function, i.e.

(n)R )ˆXY(Xˆˆ )n(T

kkk)k,1n()n(

Page 14: Founded 1348

)XY(X Tii

n

1i i

)XY()XY(

)XY(X T

ii

n

1i Tii

Tii

i

0)XY(wX Tii

n

1i ii

E.g. 300 iterations moves to .

YWXXWXˆ 1T11T1j,Mj,Mj,M )(

2L 1L

An advantage – M-estimators can be easy evaluated:

n

2

1

w,...,0,0

0,

0,...,w,0

0,...,0,w

W

This is in fact the classical weighted least squares.

Page 15: Founded 1348

Koenker, Bassett (1978)Regression quantiles

0rr)1()r(

0rr)r( )1,0(

)XY(minargˆ T

i

n

1i iR

)(

p

Regression -quantile

)n/i(n

1i iL ˆwˆ

)1,0(wi

L-estimator

Šindelář (1991)

By the way quantiles are the only statistics which are simultaneously L- and M-estimators.

L-estimator & M-estimators

Page 16: Founded 1348

Evaluation by means of software for linear programming.

An advantage

(and they are not equivariant, of course, with possibly low breakdown point).

Regression quantiles are M-estimators, hence they are sensitive to leverage points

A disadvantage

Ruppert and Carroll (1980)

The OLS are applied on the observations, response variable of which are between

of and . )(T

i1ˆX

The trimmed least squares

)2(Ti

ˆX

Page 17: Founded 1348

Can we establish an estimator of regression coefficients

having also 50% breakdown point?

Median is 50% breakdown point estimator of location.

Motto:

Challenge:

A pursuit lasted since Bickel (1972) to Siegel (1983):

To my knowledge - never implemented

)))((( )i,i,,i(OLSˆmedmedmedˆp1p1j

iii

)n(j

p1p1

Repeated median

Page 18: Founded 1348

Then for any

,XY i0T

ii n,,2,1i Rousseeuw (1983)

The first really applicable 50% breakdown point estimatorThe Least Median of Squares

. )(r...)(r)(r 2)n(

2)2(

2)1(

and let us define the order statistics

2p

1j

Tii

2i XY)(r

Let us recall that for any pR

nh2/n

The optimal . ]2/)1p[(]2/n[h

nhn )(rminargˆ 2

)h(R

)h,n,LMS(

1p

Page 19: Founded 1348

Continued The Least Median of Squares

- evidently 50% breakdown point - scale- and regression-equivariant

Advantages

- only -consistent and not asymptotically normal - not easy evaluate

Disadvantages 3 n

Rousseeuw, Leroy (1987) - PROGRESS

First proposal - repeated selection of subsample of p+1 points

Still unreliable, usually bad – I’m sorry Joss, Marazzi (1990)

Later improved - due to a geometric characterization

Page 20: Founded 1348

Rousseeuw (1983)

The second applicable 50% breakdown point estimatorThe Least Trimmed Squares

Then for any

. )(r...)(r)(r 2)n(

2)2(

2)1(

and that the order statistics are given by

2p

1j

Tii

2i XY)(r

Let us recall once again that for any pR

nh2/n

Again the optimal . ]2/)1p[(]2/n[h

nhn )(rminargˆ h

1i

2)i(

R

)h,n,LTS(

1p

Page 21: Founded 1348

Continued The Least Trimmed Squares

- evidently 50% breakdown point - scale- and regression-equivariant

-consistent and asymptotically normal - nowadays easy to evaluate

Advantages

- high subsample sensitivity, i.e. can be (arbitrarily) large

Disadvantages

n

Rousseeuw, Leroy (1987) – PROGRESS First proposal – based on LMS, in fact, the trimmed least squares.

k,h,1n,LTSh,n,LTS ˆˆ

Probably still in S-PLUS, e.g.. It did not work satisfactorily, sometimes very bad.

Page 22: Founded 1348

Engine knock data - 16 cases, 4 explanatory variables - a small change of data

caused a large change of .

Hettmansperger, T.P., S. J. Sheather (1992): A Cautionary Note on the Method of Least Median Squares.

The American Statistician 46, 79--83.

h,n,LMS

The robust methods probably work in another way than we have assumed – disappointment !!

A first reaction:

It removed the “paradox”.

Boček, P., P. Lachout (1995): Linear programming approach to LMS-estimation.

Mem. vol. Comput. Statist. & Data Analysis 19(1995), 129 - 134.

A new algorithm was nearly immediately available.

Evaluated by S-PLUS

Page 23: Founded 1348

Number of observations: 16

Response variable: Number of knocks of an engine

Method Intrc. spark air intake exhaust •Progress -86.5 4.59 1.21 1.47 .069 .328

Boček 48.4 -.732 3.39 .195 -.011 .203

Engine knock data

- the timing of sparks - ratio air / fuel - intake temperature - exhaust temperature

Explanatory variables:

)(r 2)11(

112

6

2

16h

Page 24: Founded 1348

A small change of data can really cause a large change of any high breakdown point estimator.

The second reaction:

The method too much relies on selected “true” points !

What is the problem ?

Then

Let us agree, for a while, that the majority of data determines the “true” model.

Page 25: Founded 1348

11

1i

2)i( )(r )(r 2

)11(

so for this case we may find the precise solution of the LTS-extremal problem,

just applying OLS on all subsamples of size 11.

368411

16

hence number of all subsamples of size 11 is ,

by Boček and Lachout

Method Intrc. spark air intake exhaust

LMS 48.4 -.732 3.39 .195 -.011 1.432 .203

LTS -88.7 4.72 1.06 1.57 .068 .728 .291

• •

Since Boček-Lachout LMS is “better” than precise LTS,

it is probably really good.

Number of observations: 16,

Engine knock data

Page 26: Founded 1348

Algorithm for for the case when n is large.

A

Is this sum of squared residuals smaller than the sum from the previous step?

Apply OLS on just selected observations, i.e. find new regression plane.

B

NoYes

Select randomly p+1 observationsand find regression plane through them.

Evaluate squared residuals for all observations.

Choose h observations with the smallest squared resi- duals and evaluate the sum of these squared residuals.

h,n,LTS

Page 27: Founded 1348

B

Yes No

End of evaluation Return to A

Continued

Algorithm for the case when n is large.

Have we found already 20 identical models or have we exhausted a priori given number of repetitions ?

Page 28: Founded 1348

27

1i

2)i( )(r )(r 2

)27(

so we have to use just described algorithm. hence number of all subsamples of size 27 is too large

by Boček and Lachout

Method Intrc. urban income young

LMS -272.4 .090 .034 .962 3734.8 281.6

LTS -143.5 .043 .035 .639 3414.5 362.5

• •

Number of observations: 50,

A test of algorithm - Educational data

Explanatory: percentage of residents in urban areas, personal income per capita, percentage of inhabitants under 18

Response: Expenditure on education per capita in 50 U.S. states in 1970

h selected according to “optimal choice”, giving 50% breakdown point

Page 29: Founded 1348

How to select How to select h h reasonablyreasonably??

Number of points of this „cloud“ is .

is only a “bit” smaller thanh 0k

0kh

0k

0kh 0kh

Page 30: Founded 1348

Algorithm for the case when n is large is described in:

Víšek, J.Á. (1996): On high breakdown point estimation. Computational Statistics (1996) 11, 137 – 146.Víšek, J.Á. (2000): On the diversity of estimates Computational Statistics and Data Analysis, 34, (2000), 67 – 89.Čížek, P., J. Á. Víšek (2000): Least trimmed squares. XPLORE, Application guide, 49 – 64.

One implementation is available in package XPLORE (supplied by Humboldt University), TURBO-PASCAL-version from me, MATLAB version from my PhD-student Libora Mašíček.

Page 31: Founded 1348

High subsample sensitivity, i.e.

Disadvantage of LTS

can be rather large (without control by design of experiment)

k,h,1n,LTSh,n,LTS ˆˆ

Víšek, J.Á. (1999): The least trimmed squares - random carriers.

Bulletin of the Czech Econometric Society, 10/1999, 1 - 30.

Víšek, J.Á. (1996): Sensitivity analysis of M-estimates.

Annals of the Instit. of Statist. Math. 48 (1996), 469 – 495.

Sensitivity analysis of M-estimates of nonlinear regression model: Influence of data subsets.

Annals of the Institute of Statistical Mathematics, 261 - 290, 2002.

See also

Page 32: Founded 1348

Víšek, J.Á. (2002): The least weighted squares I. The asymptotic linearity of normal equations. Bulletin of the Czech Econometric Society, no.15, 31 - 58, 2002. The least weighted squares II. Consistency and asymptotic normality. Bulletin of the Czech Econometric Society, no. 16, 1 - 28, 2002.

Disadvantege of LTS ……

nhn )(rwminargˆ n

1i

2)i(i

R

),n,LWS(

1p

non-increasing

,1)1(,0)0(),1,0()1,0(:)z( The Least Weighted Squares

Hence

)n/i(wi

Page 33: Founded 1348

- diagnostic tools for verifying the assumptions (of course, a posteriori),

e.g. test of normality (firstly Theils residuals, later usual tests of good fit, Durbin-Watson

statistics, White tests of homoscedasticity, Hausman test of specification etc.,

- carried out sensitivity studies, i.e.

- offers a lot of modifications of OLS

-and / or accompanying tools, e.g. ridge regression, instrumental variables,

White estimate of covariance matrix of estimates of regression coefficients,probit and logit models, etc. .

Classical OLS developed :

)ˆXY(XXXˆˆ )n(Tkkk

1}k{T}k{)k,1n()n(

Page 34: Founded 1348

May be that one reason why the robust methods are not widely used is the debt of ……

(see previous slide).

Robust instruments. Robust'98 (ed. J. Antoch & G. Dohnal, Union of Czechoslovak

Mathematicians andPhysicists), 1998, pp. 195 - 224. Robust specification test. Proceedings of Prague Stochastics'98 (eds. M. Hušková, P. Lachout, Union

of Czechoslovak Mathematicians andPhysicists), 1998, pp. 581 - 586. Over- and underfitting the M-estimates.

Bulletin of the Czech Econometric Society, vol. 7/2000, 53 - 83. Durbin-Watson statistic for the least trimmed squares.

Bulletin of the Czech Econometric Society, vol. 8, 14/2001, 1 – 40.

Something is already done also for robust methods :

Page 35: Founded 1348

THANKS for A

TTENTION