asymptotic distribution of regression m{estimators1

26

Upload: buiduong

Post on 30-Dec-2016

232 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Asymptotic distribution of regression M{estimators1

Asymptotic distribution of regressionM{estimators 1Miguel A. ArconesDept. of Mathematical Scien.State University of New YorkBinghamton, NY 139022Suggested running head: regressionSummaryWe consider the following linear regression model:Yi = Z 0i�0 + Ui; i = 1; : : : ; n;where fUig1i=1 is a sequence of IRm{valued i.i.d.r.v.'s; fZig1i=1 is a sequence of i.i.d. d�m randommatrices; and �0 is a d dimensional parameter to be estimated. Given a function � : IRm ! IR,we de�ne a robust estimator �̂n as a value such thatn�1 nXi=1 �(Yi � Z 0i�̂n) = inf�2IRd n�1 nXi=1 �(Yi � Z 0i�):We study the convergence in distribution of an(�̂n � �0) in di�erent situations, where fang is asequence of real numbers depending on � and on the distributions of Zi and Ui. As a particularcase, we consider the case �(x) = jxjp. In this case, we show that if E[kZkp+kZk2] <1; eitherp > 1=2 or m � 2; and some other regularity conditions hold, then n1=2(�̂n � �0) converges indistribution to a normal limit. For m = 1 and p = 1=2, n1=2(logn)�1=2(�̂n � �0) converges indistribution to a normal limit. For m = 1 and 1=2 > p > 0, n1=(3�2p)(�̂n � �0) converges indistribution.June 8, 19991AMS 1991 subject classi�cations. Primary 62E20; secondary 62F12.Key words and phrases. Regression, robustness, M{estimators, Lp estimators.2E{mail: [email protected]. Web: http://math.binghamton.edu/arcones/.1

Page 2: Asymptotic distribution of regression M{estimators1

1. Introduction. We consider the linear model: Y is a m dimensional response variable,Z is a d�m matrix regressor or predictor variable, U is a m dimensional error independent ofZ and they are related by the equation(1:1) Y = Z 0�0 + U;where �0 2 IRd is a parameter to be estimated. This model represents two variables Y andZ, which are linearly dependent. �0 represents the linear relation between the two variables.U is a random error. The problem is to estimate �0 from a sample (Y1; Z1); : : : ; (Yn; Zn), i.e.(Y1; Z1); : : : ; (Yn; Zn) are i.i.d.r.v.'s with the distribution of (Y; Z).The usual method to estimate �0 is to use the least squares method (see for example Draperand Smith, 1981). One of the advantages of this method is the easy computability of theestimator. The disadvantage of this method is that it is not robust. Since using a computerprogram, it is not di�cult (nor long) to compute the considered estimators, robust estimatorsare prefered to the least squares estimator. We refer for more in robust methods to Huber(1981) and Hampel, Ronchetti, Rousseeuw and Stahel (1986).Given a continuous function � : IRm ! IR, we de�ne �̂n as a value such that(1:2) n�1 nXi=1 �(Yi � Z 0i�̂n) = inf�2IRd n�1 nXi=1 �(Yi � Z 0i�):A popular choice is �(x) = jxj, where jxj is the Euclidean distance. Another posibility is totake �̂n as a value such that(1:3) n�1 nXj=1 jYj � Z 0j �̂njp = inf�2IRd n�1 nXj=1 jYj � Z 0j�jp;where p > 0. Regression M{estimators (in di�erent variations) has been considered by severalauthors, see for example Huber (1973); Jure�ckov�a (1977); Koul (1977); Koenker and Bassett(1978); Yohai and Marona (1979); Ruppert and Carroll (1980); Koenker and Portnoy (1987);Bloom�eld and Steiger (1983); Bai, Chen, Miao and Rao (1990); Bai, Rao and Wu (1992);Davis, Knight, and Liu (1992); and Davis and Wu (1997). Of course, to get that �̂n convergesto �0, we must have that(1:4) E[�(Y � Z 0�0)] = inf�2IRdE[�(Y � Z 0�)]and �0 is the unique value with this property.We can view these regression M{estimators, as a particular type of general M{estimators.Next, we recall the de�nition of an M{estimator. Let (S;S; P ) be a probability space and letfXig1i=1 be a sequence of i.i.d.r.v.'s with values in S. Let X be a copy of X1. Let � be a subsetof IRd. Let g : S��! IR be a function such that g(�; �) : S ! IR is measurable for each � 2 �.2

Page 3: Asymptotic distribution of regression M{estimators1

Huber (1964) introduced the M{estimator �̂n of �0, as a random variable �̂n = �̂n(X1; : : : ; Xn)satisfying(1:5) n�1 nXi=1 g(Xi; �̂n) = inf�2�n�1 nXi=1 g(Xi; �):�̂n is estimating the parameter �0 2 � characterized by(1:6) E[g(X; �)� g(X; �0)] > 0;for each � 6= �0.We use the notation in empirical processes. For instance, we writePf = E[f(X)] and Pnf = n�1 nXi=1 f(Xi);where f is function on S. The heuristic idea to get the limit distribution of an(�̂n � �0) is thefollowing: �̂n is the argument which minimizes the process fPng(�; �) : � 2 �g. So, an(�̂n � �0)is the argument which minimizes the process(1:7) fa2nPn(g(�; �0 + a�1n �)� g(�; �0)) : �0 + a�1n � 2 �g:We expect that an(�̂n � �0) converge to the argument which minimizes the limit process. Thismethod has been used by several authors in di�erent situations (see for example Prakasa Rao,1968; and Kim and Pollard, 1990). To apply this method, we will have to prove (among otherthings) that a2nE[g(X; �0 + a�1n �)� g(X; �0)]! w(�);for some function w(�) andfa2n(Pn � P )(g(�; �0 + a�1n �)� g(�; �0)) : j�j �Mgconveges weakly.We will see that under certain conditions, the function E[�(Y �Z 0�)��(Y �Z 0�0)] is seconddi�erentiable at �0. So,a2nE[�(Y � Z 0�0 � a�1n Z 0�)� �(Y � Z 0�0)]! �0V �;as n!1. Usually, the convergence ofa2nn�1 nXi=1(�(Yi � Z 0i�0 � a�1n Z 0i�)� �(Yi � Z 0i�0)� E[�(Yi � Z 0i�0 � a�1n Z 0i�)� �(Yi � Z 0i�0)])is convergence to a normal limit with convergence of variances. So, usually, the rate of conver-gence of a M{estimator, i.e. an, is determined so thata4nn�1Var(�(Y � Z 0�0 � a�1n Z 0�)� �(Y � Z 0�0))3

Page 4: Asymptotic distribution of regression M{estimators1

converges. Under smooth conditions the order of Var(�(Y � Z 0�0 � �Z 0�) � �(Y � Z 0�0)), as�! 0+, is �2. In this case, an = n1=2 and we have the usual limit theorem to a normal randomvariable. If j�j�qVar(�(Y � Z 0�0 � �Z 0�)� �(Y � Z 0�0));converges as � ! 0+, for some 0 < q < 2, then we choose an so that n�1a4na�qn converges, i.e.an = n1=(4�q). For example, if m = 1 and �(x) = jxjp for some 0 < p < 1=2,j�j�(2p+1)Var(jY � Z 0�0 � �Z 0�jp � jY � Z 0�0jp)converges, which gives that n1=(3�2p)(�̂n��0) converges in distribution (see Theorem 2.18 below).For m = 1 and �(x) = jxj1=2,j�j�2(log��1)�1Var(jY � Z 0�0 � �Z 0�jp � jY � Z 0�0jp)converges, which gives that n1=2(logn)�1=2(�̂n� �0) converges in distribution (see Theorem 2.19below). Even when the rate of convergence for the M{estimator over �(x) = jxjp, 1=2 > p > 0and m = 1, is slower than the usual one, this estimator is more robust (jxjp < jxj2, for x largeand 1=2 > p > 0) than other estimators (least squares regression estimator, in particular).For the M{estimator in (1.3), we will show that if E[kZkp + kZk2] <1; either p > 1=2 orm � 2; and some other regularity conditions hold, then n1=2(�̂n � �0) converges in distributionto a normal limit. Observe that E[jjY �Z 0�0�Z 0�jp�jY �Z 0�0jpj2] <1, for some � 6= 0 impliesthat E[kZk2p] < 1. But, this condition is too strong if p > 1. This makes impossible (if wewant to get best possible conditions) to apply the results in the asymptotics of M{estimatorsin Pollard (1984). He assumed that E[jg(X; �)� g(X; �0)j2] <1, for each � in a neighborhoodof �0.Given a d�m matrix z, we de�ne the following normkzk := max(supb2IRdjbj�1 jb0zj; supb2IRmjbj�1 jzbj):Observe that if either m = 1 or d = 1, last norm is just the Euclidean norm. c designs aconstant that may vary from occurrence to occurrence.2. Asymptotics for regression M{estimators. First, we present several results thatgive asymptotic normality of regression M{estimators for � smooth enough. Next propositiondeals with the consistency of the M{estimator.Proposition 2.1. With the above notation, let � : IRm ! IR be a continuous function.Suppose that:(i) For each a 6= 0, E[�(U + a)� �(U)] > 0.(ii) For each � 6= 0, PrfZ 0� = 0g < 1. 4

Page 5: Asymptotic distribution of regression M{estimators1

(iii) limjxj!1 �(x) =1.(iv) For each x 2 IRm, �(x) � �(0).(v) E[�(U � a)] is a continuous function on a.(vi) For each � 2 IRd,lim�!0E[ supt:jtj�� j�(U � Z 0� � Z 0t)� �(U � Z 0�)j] = 0:Then, �̂n ! �0 a.s., where �̂n is any sequence of random variables satisfying (1.2).Proof. Without loss of generality, we may assume that �(0) = 0. Since � is continuous, itis possible to choose �̂n satisfying (1.2). Hypothesis (ii) implies that PrfZ 0� 6= 0g > 0, for each� 6= 0. By compactness there exists a �0 > 0 such that PrfjZ 0�j � �0g � �0, for each � 2 IRdwith j�j = 1. Take M0 <1 such thatPrfkZk �M0g � �0=2; 2�1�0 infjxj�M0 �(x) PrfjU j �M0g � E[�(U)] + 2:Take t1; : : : ; tm 2 IRd such that jtkj = 1 for each 1 � k � m, and for each t 2 IRd with jtj = 1,there exists a 1 � k � m such that jt � tkj � 2�1M�10 �0. Given j�j � 4M0��10 , there exists a1 � k � m such that jj�j�1� � tkj � 2�1M�10 �0. Moreover, if kzk �M0 and jz0tkj � �0, thenjz0�j � j�jjz0tkj � jz0(� � j�jtk)j� �0j�j � kzkj�j2�1M�10 �0 � 2�1�0j�j � 2M0:So, n�1 nXj=1 �(Ui � Z 0j�) � n�1 nXj=1 �(Uj � Z 0j�)IkZjk�M0;jUj j�M0;jZ0jtk j��0� n�1 nXj=1 infjxj�M0 �(x)IkZjk�M0;jUjj�M0;jZ0jtk j��0! infjxj�M0 �(x) PrfkZk �M0; jU j �M0; jZ 0tkj � �0g � 2 + E[�(U)] a:s:So, eventually infj�j�4M0��10 n�1 nXj=1 �(Uj � Z 0j�) � 1 + n�1 nXj=1 �(Uj):Hence, eventually j�̂n � �0j � 4M0��10 .Take N such that infjxj�N �(x) PrfjU j � Ng � E[�(U)] + 1:If jaj � 2N and jU j � N , then jU � aj > N . So,E[�(U � a)] � infjxj�N �(x) PrfjU j � Ng � E[�(U)] + 1:5

Page 6: Asymptotic distribution of regression M{estimators1

Therefore, limjaj!1E[�(U � a)] =1. From this and condition (v),infjaj��E[�(U � a)� �(U)] > 0;for each � > 0.We have thatinfj�j��E[�(U � Z 0�)� �(U)] � infj�j��E[(�(U � Z 0�)� �(U))IjZ0�j���0 ]� infj�j��PrfjZ 0�j � ��0g infjaj���0E[�(U � a)� �(U)] � �0 infjaj���0E[�(U � a)� �(U)] > 0:We just obtained that for each � > 0,(2:1) infj�j��E[�(U � Z 0�)� �(U)] > 0:By the law of the large numbers for classes of functions satisfying a bracketing condition(Dudley, 1984, Theorem 6.1.5)(2:2) supj�j�M jn�1 nXj=1(�(Uj � Z 0j�)� �(Uj)� E[�(Uj � Z 0j�)� �(Uj)])j ! 0 a:s:for each M < 1. It follows from (2.1) and (2.2) that given 0 < � < M < 1, there exists a� > 0 such that inf��j�j�M n�1 nXj=1(�(Uj � Z 0j�)� �(Uj)) � �;for each n large enough. So, j�̂n � �0j � �, for each n large enough. 2Next, theorem follow directly from Proposition 2.1, above, and Theorem 15 in Arcones(1996).Theorem 2.2. With the above notation, let � : IRm ! IR be a continuous function and let� : IRm ! IR. Suppose that:(i) For each a 6= 0, E[�(U + a)� �(U)] > 0.(ii) For each � 6= 0, PrfZ 0� = 0g < 1.(iii) limjxj!1 �(x) =1.(iv) For each x 2 IRm, �(x) � �(0).(v) E[�(U � a)] is a continuous function on a.(vi) For each � 2 IRd,lim�!0E[ supt:jtj�� j�(U � Z 0� � Z 0t)� �(U � Z 0�)j] = 0:(vii) There exists a positive de�nite symmetric matrix V such thatE[�(U � Z 0�)� �(U)] = �0V � + o(j�j2);6

Page 7: Asymptotic distribution of regression M{estimators1

as � ! 0.(viii) � is �rst di�erentiable with continuity.(ix) E[@�@� (U)] = 0, E[kZk2] <1 andE 24supjtj�� �����@�@� (U � Z 0t)�����235 <1;for some � > 0.Then, n1=2(�̂n � �0)� 2�1n�1=2V �1 nXi=1Zi@�@� (Ui) Pr�! 0;where �̂n is a sequence of random variables satisfying (1.2).We are using the notation @�@� to denote the vector of �rst derivatives of �(�). @2�@�2 denotesthe matrix of second derivatives of �(�).Next theorem follows from Proposition 2.1, above, and Theorem 18 in Arcones (1996).Theorem 2.3. Suppose that:(i) For each a 6= 0, E[�(U + a)� �(U)] > 0.(ii) For each � 6= 0, PrfZ 0� = 0g < 1.(iii) limjxj!1 �(x) =1.(iv) For each x 2 IRm, �(x) � �(0).(v) E[�(U � a)] is a continuous function on a.(vi) For each � 2 IRd,lim�!0E[ supt:jtj�� j�(U � Z 0� � Z 0t)� �(U � Z 0�)j] = 0:(vii) � is second di�erentiable with continuity.(viii) E[@�@� (U)] = 0, E[kZk2] <1 and E[j@�@� (U)j2] <1.(ix) V := E[Z @2�@�2 (U)Z 0] is a positive de�nite symmetric matrix.(x) For some �0 > 0, E " supjtj��0 �����@2�@�2 (U � Z 0t)�����# <1;Then, n1=2(�̂n � �0)� 2�1n�1=2V �1 nXi=1Zi@�@� (Ui) Pr�! 0;where �̂n is a sequence of random variables satisfying (1.2).Many possible �'s are not smooth, for example �(x) = jxj. To consider these cases, we studythe case when the class of functions f�(u � z0�) � �(u) : � 2 �g is a V{C subgraph class. Inthis situation, the conditions to obtain asymptotic limit theorems simplify. Next, we recall the7

Page 8: Asymptotic distribution of regression M{estimators1

de�nition of a V{C subgraph class. Let S be a set and let C be a collection of subsets of S. ForA � S, we de�ne �C(A) = card fA \ C : C 2 Cg, mC(n) = maxf�C(A) : card(A) = ng ands(C) = inffn : mC(n) < 2ng. C is said to be a VC class of sets if s(C) <1. General propertiesof VC classes of sets can be found in Chapters 9 and 11 in Dudley (1984). Given a functionf : S ! IR, the subgraph of f is the set f(x; t) 2 S � IR : 0 � t � f(x) or f(x) � t � 0g. Aclass of functions F is a VC{subgraph class if the collection of subgraphs of F is a VC class.Next, we show that some common classes of functions are VC subgraph classes.Lemma 2.4. Consider the linear model with m = 1. Let � : IR ! IR be a continuousfunction, nondecreasing on [0;1) and nonincreasing on (�1; 0]. Let �0 2 IRd. Then, the classof functions f�(u � z0�) � �(u) : � 2 IRdg, where u 2 R, z 2 IRd, is a VC subgraph class offunctions.Proof. We have to show that fA� [ B� : � 2 IRdg is a VC class of sets, where A� :=f(u; z; t) : 0 � t � �(u � z0�) � �(u)g and B� := f(u; z; t) : 0 � t � �(u � z0�) � �(u)g. Let�1(u) = �(u) for u � 0 and let �2(u) = �(u) for x � 0. Let ��11 (t) = supfu � 0 : �(u) � tg andlet ��12 (t) = inffu � 0 : �(u) � tg. We have that A� = A0� [ A00� , whereA0� := f(u; z; t) : 0 � t; u� z0�; u� z0� � ��12 (t+ �(u))gand A00� := f(u; z; t) : 0 � t; z0� � u; z0� � u+ ��11 (t+ �(u))g:We have that fC(t1; : : : ; tm) : t1; : : : ; tm 2 IRg is a VC class, where C(t1; : : : ; tm) = fx 2 S :Pmj=1 tjfj(x) � 0g and f1; : : : ; fm are functions on S (Dudley, 1984, Theorem 9.2.1). We alsohave that if fCt : t 2 Tg and fDt : t 2 Tg are VC classes, then so are fCt \ Dt : t 2 Tg andfCt [ Dt : t 2 Tg (Dudley, 1984, Proposition 9.2.5). Hence, fA� : � 2 IRdg is a VC class. Asimilar argument gives that fB� : � 2 IRdg is a VC class. Therefore, the result follows. 2A similar argument gives the following:Lemma 2.5. Let � : [0;1) ! [0;1) be a increasing, continuous function, with �(0) = 0and limx!1 �(x) = 1. Then, the class of functions f�(ju � z0�j) � �(juj) : � 2 IRdg, whereu 2 IRm, z 2 IRd, is a VC subgraph class of functions.Next, we consider the consistency of the M{estimator in this VC situation.Proposition 2.6. Suppose that:(i) For each a 6= 0, E[�(U + a)� �(U)] > 0.(ii) For each � 6= 0, PrfZ 0� = 0g < 1.(iii) limjxj!1 �(x) =1.(iv) For each x 2 IRm, �(x) � �(0). 8

Page 9: Asymptotic distribution of regression M{estimators1

(v) E[�(U � a)] is a continuous function on a.(vi) The class of functions f�(u � z0�) � �(u) : j� � �0j � �0g is a VC subgraph class offunctions for some �0 > 0.(vii) For each M <1, E[ supj�j�M j�(U � Z 0�)� �(U)j] <1:Then, �̂n ! �0 a.s., where �̂n is any sequence of random variables satisfying (1.2).Proof. The proof is similar to that of Proposition 2.1. The di�erence is that to obtain(2.2), we use the law of the large numbers for VC classes of functions (see Gin�e and Zinn, 1984,Theorem 8.3). 2The following follows from Theorem 7 in Arcones (1996).Theorem 2.7. With the above notation, let � : IRm ! IR. Assume that:(i) For each a 6= 0, E[�(U + a)� �(U)] > 0.(ii) For each � 6= 0, PrfZ 0� = 0g < 1.(iii) limjxj!1 �(x) =1.(iv) For each x 2 IRd, �(x) � �(0).(v) E[�(U � a)] is a continuous function on a.(vi) The class of functions f�(u � z0�) � �(u) : j� � �0j � �0g is a VC subgraph class offunctions for some �0 > 0.(vii) For each M <1, E[ supj�j�M j�(U � Z 0�)� �(U)j] <1:(viii) There exists a positive de�nite symmetric matrix V such thatE[�(U � Z 0�)� �(U)] = �0V � + o(j�j2);as � ! 0.(ix) E[Z�(U)] = 0 and E[jZ�(U)j2] <1.(x) For each M; � > 0,nPrf supj�j�M j�(U � n�1=2Z 0�)� �(U) + n�1=2�0Z�(U)j � �g ! 0:(xi) There are constants 0 < q < 1 and c > 0 such thatE[(M�1 supj�j�� j�(U � Z 0�)� �(U)j) ^ (M�2 supj�j�� j�(U � Z 0�)� �(U)j2)] � c�2M�1�q;for each � > 0 small enough and each M > 0 large enough.9

Page 10: Asymptotic distribution of regression M{estimators1

(xii) For each � 2 IRd,nE[j�(U �n�1=2Z 0�)��(U)+n�1=2�0Z�(U)j ^ j�(U �n�1=2Z 0�)��(U)+n�1=2�0Z�(U)j2]! 0:Then, n1=2(�̂n � �0)� 2�1n�1=2V �1 nXi=1Zi�(Ui) Pr�! 0;where �̂n is a sequence of random variables satisfying (1.2).Next, we consider the convergence in distribution of the Lp estimators, as de�ned in (1.3).We will need the following lemma. The proof is omitted, since it is simple calculus exercise.Lemma 2.8. There exists a universal constant c, depending only on p, such that:(i) If 1 � p > 0, then jjx� �jp � jxjpj � c(jxjp�1j�j ^ j�jp);for each x; � 2 IRd.(ii) If p � 1, then jjx� �jp � jxjpj � c(jxjp�1j�j _ j�jp);for each x; � 2 IRd.(iii) If 1 > p > 0, thenjjx� �jp � jxjp + pjxjp�2�0xj � c(jxjp�1j�j ^ jxjp�2j�j2);for each x; � 2 IRd.(iv) If p = 1 and d = 1, thenjjx� �j � jxj+ jxj�1�0xj � 2j�jIjxj�j�j;for each x; � 2 IRd.(v) If 2 > p � 1, thenjjx� �jp � jxjp + pjxjp�2�0xj � c(jxjp�2j�j2 ^ j�jp);for each x; � 2 IRd.(vi) If p � 2, then jjx� �jp � jxjp + pjxjp�2�0xj � c(jxjp�2j�j2 _ j�jp);for each x; � 2 IRd.(vii) If 2 � p > 0, thenjjx� �jp � jxjp + pjxjp�2�0x� 2�1p(p� 2)jxjp�4(�0x)2 � 2�1pjxjp�2j�j2j10

Page 11: Asymptotic distribution of regression M{estimators1

� c(jxjp�3j�j3 ^ jxjp�2j�j2);for each x; � 2 IRd.(viii) If 3 � p � 2, thenjjx� �jp � jxjp + pjxjp�2�0x� 2�1p(p� 2)jxjp�4(�0x)2 � 2�1pjxjp�2j�j2j� c(jxjp�3j�j3 ^ j�jp);for each x; � 2 IRd.(ix) If p � 3, thenjjx� �jp � jxjp + pjxjp�2�0x� 2�1p(p� 2)jxjp�4(�0x)2 � 2�1pjxjp�2j�j2j� c(jxjp�3j�j3 _ j�jp);for each x; � 2 IRd.Next theorem gives the asymptotic normality of Lp regression estimators.Theorem 2.9. Let p > 0, suppose that:(i) For each a 6= 0, E[jU � ajp � jU jp] > 0.(ii) For each � 6= 0, PrfZ 0� = 0g < 1.(iii) E[kZk2] <1, E[kZkp] <1, E[jU j2p�2] <1 and E[jU jp�2] <1.(iv) V := 2�1p(p� 2)jU jp�4ZUU 0Z 0 + 2�1pjU jp�2ZZ 0 is a positive de�nite matrix.Then, n1=2(�̂n � �0)� 2�1n�1=2V �1p nXi=1(jU jp�2ZiUi � E[jU jp�2ZiUi) Pr�! 0;where �̂n is any sequence of r.v.'s satisfying (1.3).Proof. The case p > 2, follows from Theorem 2.3. We have thatE " supjtj��0 �����@2�@�2 (U � Z 0t)�����# � E " supjtj��0 jU � Z 0tjp�2jkZk2# <1:The case p = 2 is trivial.To get the case 2 > p > 0, we apply Theorem 2.7. We will only consider the the case2 > p > 1. Other cases are similar. Conditions (i) and (ii) in Theorem 2.7. are assumed.Conditions (iii), (iv) and (v) hold trivially. Condition (vi) follows from Lemma 2.5. Condition(vii) follows from Lemma 2.8. By Lemma 2.8,jE[jU � Z 0�jp � jU jp + pjU jp�2Z 0�U � 2�1p(p� 2)jU jp�4(�0ZU)2 � 2�1pjxjp�2jZ 0�j2]j� cE[(jU jp�3jZ 0�j3 ^ jU jp�2jZ 0�j2)] = o(j�j2):11

Page 12: Asymptotic distribution of regression M{estimators1

This implies that E[pjU jp�2Z 0�U ] = 0. So, conditions (viii) and (ix) hold. As to condition (x),nPrf supj�j�M jjU � n�1=2Z 0�jp � jU jp + pjU jp�2n�1=2�0ZU j � �g� nPrfcjU jp�2n�1kZk2 � �g ! 0:As to condition (xi),nE[(M�1 supj�j�M jjU � n�1=2Z 0�jp � jU jpj ^ (M�2 supj�j�M jjU � n�1=2Z 0�jp � jU jpj2]j� cnE[(M�1jU jp�1n�1=2kZk�) ^ (M�2jU j2p�2n�1kZk2�2)]+cnE[(M�1n�p=2kZkp�p) ^ (M�2n�pkZk2p�2p)]� cE[M�2jU j2p�2kZk2�2] + cnE[(M�1n�p=2kZkp�p)2=p] � cM�2�2 + cM�2=p�2:We have thatnE[jjU �n�1=2Z 0�jp�jU jp+ pjU jp�2n�1=2Z 0�U j ^ jjU �n�1=2Z 0�jp�jU jp+ pjU jp�2n�1=2Z 0�U j2]� cE[(jU jp�2jZ 0�j2) ^ (jU j2p�4n�1jZ 0�j4)]! 0;and condition (xii) follows. 2Observe that the conditions in the last theorem are best possible, we need (i) and (ii), inorder that have E[jU � Z 0�jp � jU jp] > 0 for each � 6= 0. In order that the covariance of thelimit to be de�ned, we need E[kZk2] < 1, E[jU j2p�2] < 1 and E[jU jp�2] < 1. To havejE[jU � Z 0�jp � jU jp]j < 1, we need E[kZkp] < 1. We have V is a positive de�nite matrix,if either p > 1 or p = 1 and m � 2. It also could be positive de�nite for other values of p andsome distributions. For example, if U has a symmetric distribution and m � 2, then2�1p(p� 2)E[jU jp�4j�0ZU j2] + 2�1pE[jU jp�2j�0Zj2] = 2�1p(m�1(p� 2) + 1)E[jU jp�2j�0Zj2] > 0;for each � 6= 0. Observe thatRjuj=1(�0u) duRjuj=1 du = Rjuj=1 j�j2(u(1))2 duRjuj=1 du = m�1j�j2;where u0 = (u(1); : : : ; u(m)). It is interesting to notice that, under regularity conditions, thecondition E[jU jp�2] < 1 is only satis�ed if either p > 1 or m � 2. If p � 2, the conditionE[jU jp�2] <1 is a moment condition. If 2 > p > 0, then,E[jU jp�2] = Z 10 PrfjU jp�2 � tg dt = Z 10 Prft�1=(2�p) � jU jg dt:If U positive density in a neighborhood of 0, then PrfjU j � tg = O(tm) as t ! 0+. So,E[jU jp�2] < 1, if and only if p + m > 2. Previous theorem covers the case either m � 2 orp > 1. Next, we consider the case m = 1 and 1 � p > 0.12

Page 13: Asymptotic distribution of regression M{estimators1

Next, we consider the case m = p = 1. This case has been considered by Basset andKoeneker (1978) and Bloom�eld and Steiger (1983, p. 50). They assumed that U has a densityfU(u) in a neighborhood of 0. To obtain asymptotic normality for the median is not needed tohave a density in a neigborhood of the median. It su�ces that the derivative of the distributionfunction at the median is positive (see for example Smirnov, 1949). The same happens in thisregression case.Theorem 2.10. Let �̂n be a sequence of r.v.'s satisfying (1.3) for p = 1 andm = 1. Supposethat:(i) FU(0) = 2�1, FU (u) is di�erentiable at u = 0 and F 0U(0) > 0, where FU(u) = PrfU � ug.(ii) For each � 6= 0, PrfZ 0� = 0g < 1.(iii) E[kZk2] <1.Then, n1=2(�̂n � �0)� 2�1n�1=2V �1 nXi=1Zisign(Ui) Pr�! 0;where V := F 0U(0)E[ZZ 0].Proof. The result follows from Theorem 2.7 with �(u) = sign(u) and V = F 0U(0)E[ZZ 0].All the conditions in Theorem 2.7 are very easy to check. The only condition di�cult tocheck is condition (viii). Let M(t) = E[jU � tj � jU j + t sign(u)]. We have that M(t) =R t0 2(FU(v)�FU (0)) dv, for t > 0; and M(t) = R 0t 2(FU(0)�FU (v)) dv, for t < 0. This gives thatt�2(M(t)� t2F 0U(0))! 0;as t ! 0, and supt2IR jt�2(M(t) � t2F 0U (0))j < 1. From this and the dominated convergencetheorem, condition, we get thatlim�!0 j�j�2jE[jU � Z 0�j � jU j+ Z 0� sign(U)� (Z 0�)2F 0U(0)]j;i.e. condition (viii) in Theorem 2.7 follows. 2Next lemma deals with the second di�erentiability of the function E[jY �Z 0�jp�jY �Z 0�0jp],for 1 > p > 0.Lemma 2.11. Let 1 > p > 0 and m = 1. Suppose that:(i) E[jU � ajp � jU jp] > 0, for each a 6= 0.(ii) For each � 6= 0, PrfZ 0� = 0g < 1.(iii) E[kZk2] <1.(iv) There exists a � > 0 such that U has a density fU(u) in [��; �].(v) R ��� jujp�2jfU(u)� fU(0)j du.Then, E[jU � Z 0�jp � jU jp]� �0V � = o(j�j2);13

Page 14: Asymptotic distribution of regression M{estimators1

where(2:3) V := E[2�1p(p� 1)jU jp�2IjU j��]E[ZZ 0] + p�p�1fU(0)E[ZZ 0]+ Z ��� 2�1p(p� 1)jujp�2(fU(u)� fU(0)) duE[ZZ 0]:Proof. Conditions (i) and (iv) imply that E[sign(U)jU jp�1] = 0. We have thatE[jU � Z 0�jp � jU jp]� �0V �= E[(jU � Z 0�jp � jU jp + p sign(U)jU jp�1Z 0� � 2�1p(p� 1)jU jp�2(Z 0�)2)IjU j��]+ Z ��� E[(ju� Z 0�jp � jujp � p�p�1(Z 0�)2)Ij�jjZj��]fU(0) du+ Z ��� E[(ju� Z 0�jp � jujp � p�p�1(Z 0�)2)Ij�jjZj>�]fU(0) du+ Z ��� E[ju� Z 0�jp � jujp + p sign(u)jujp�1Z 0� � 2�1p(p� 1)jujp�2(Z 0�)2](fU(u)� fU(0)) du=: I + II + III + IV:By Lemma 2.8, jIj � cE[(jU jp�3j�j3jZj3) ^ (jU jp�2j�j2jZj2)] = o(j�j2);jIIIj � c�p�1E[j�j2jZj2Ij�jjZj>�] = o(j�j2)and jIV j � c Z ��� E[(jujp�3j�j3jZj3) ^ (jujp�2j�j2jZj2)] jfU(u)� fU(0)j du = o(j�j2):It easy to see that for jaj � �,Z ���(ju� ajp � jujp) du = (p+ 1)�1((� + a)p+1 + (� � a)p+1 � 2�p+1):So, jIIj � cE[j(� + Z 0�)p+1 + (� � Z 0�)p+1 � 2�p+1 � p(p+ 1)�p�1(Z 0�)2jIj�jjZj��]:By elementary calculuslima!0 a�2((� + a)p+1 + (� � a)p+1 � 2�p+1 � p(p+ 1)�p�1a2) = 0:From this and the dominated convergence theorem jIIj = o(j�j2). 2If U has a density in IR,V = E[ZZ 0] Z 1�1 2�1p(1� p)jujp�1(fU(0)� fU(u)) du:14

Page 15: Asymptotic distribution of regression M{estimators1

Hence, if E[(Z 0�)2] > 0, for each � 6= 0, and fU(0) � fU(u) for each u 2 IR, V is positivede�nite.The following follows from Theorem 2.7 and lemmas 2.8 and 2.11:Theorem 2.12. Let 1 > p > 1=2 and m = 1. Let �̂n be any sequence of r.v.'s satisfying(1.3). Suppose that:(i) For each a 6= 0, E[jU + ajp � jU jp] > 0.(ii) For each � 6= 0, PrfZ 0� = 0g < 1.(iii) E[kZk2] <1.(iv) There exists a � > 0 such that U has a density fU(u) in [��; �].(v) R ��� jujp�2jfU(u)� fU(0)j du.(vi) E[2�1p(p� 1)jU jp�2IjU j��] + p�p�1fU(0) + R ��� 2�1p(p� 1)jujp�2(fU(u)� fU(0)) du > 0.Then, n1=2(�̂n � �0)� 2�1n�1=2V �1 nXi=1 p sign(Ui)ZijUijp�1 Pr�! 0;where V is as in (2.3).In the case 0 < p < 1=2, these estimators behave as the Lp medians, which were considered inArcones (1994). We will need the following theorem, which is an obvious variation of Theorem1 in Arcones (1994).Theorem 2.13. Let fXig1i=1 be a sequence of i.i.d.r.v.'s with values in S. Let � be a Borelsubset of IRd. Let g : S � � ! IR be a function such that g(�; �) : S ! IR is measurable foreach � 2 �. Let fang and let fbng be two sequences of positive numbers converging to in�nity.Let f�̂n = �̂n(X1; : : : ; Xn)g be a sequence of random variables. Let � > 0. Suppose that:(i) �̂n Pr�! �0 and bnn�1 nXj=1 g(Xj; �̂n) � inf�2� bnn�1 nXj=1 g(Xj; �) + oPr(1):(ii) There are �0; �0 > 0 such that�0(anj� � �0j)� � bnE[g(X; �)� g(X; �0)];for each j� � �0j � �0.(iii) There exists a stochastic process fY (�) : � 2 IRdg such thatfbnn�1 nXj=1(g(Xj; �0 + a�1n �)� g(Xj; �0)) : j�j �Mgconverges weakly to fY (�) : j�j �Mg, for each M <1.(iv) There exists a �1 > 0 such thatlimM!1 lim supn!1 Prf supj���0j��1 jbnn�1Pnj=1(g(Xj; �)� g(Xj; �0)� E[g(Xj; �)� g(Xj; �0)]j2�1�0jan(� � �0)j� +M � 1g = 0:15

Page 16: Asymptotic distribution of regression M{estimators1

(v) With probability one, the stochastic process fY (�) : � 2 IRdg has a unique minimum at~� and for each M <1, when j~�j �M ,infj�j�Mj��~�j>� Y (�) > Y (~�);for each � > 0.Then, an(�̂n � �0) d! ~�:To check condition (iv) in Theorem 2.13, we will use di�erent results depending on the formof the limit process. When the limit is normal, we will use the following theorem:Theorem 2.14. Under the notation in Theorem 2.13, suppose that:(i) fg(x; �)� g(x; �0) : j� � �0j � �g is a VC subgraph class of functions for some � > 0.(ii) For each � > 0,nPrfbnn�1=2 supjtj�M jg(X; �0 + a�1n t)� g(X; �0)j � �g ! 0:(iii) For each s; t 2 IRd, the following limit existslimn!1 b2nCov((g(X; �0 + a�1n s)� g(X; �0))Ibnn�1=2jg(X;�0+a�1n s)�g(X;�0)j�1; (g(X; �0 + a�1n t)� g(X; �0))Ibnn�1=2jg(X;�0+a�1n t)�g(X;�0)j�1):(iv)lim supn!1 b2nE[ supjtj�M jg(X; �0 + a�1n t)� g(X; �0)j2Ibnn�1=2 supjtj�M jg(X;�0+a�1n t)�g(X;�0)j�1] <1:(v) lim�!0 lim supn!1 supjsj;jtj�Mjs�tj�� b2nE[jg(X; �0 + a�1n s)� g(X; �0 + a�1n t)j2�Ibnn�1=2jg(X;�0+a�1n s)�g(X;�0+a�1n t)j�1] = 0:(vi) supjtj�M bnn1=2jE[(g(X; �0 + ta�1n )� g(X; �0))Ibnn�1=2jg(X;�0+a�1n t)�g(X;�0)j�1]j ! 0:Then,fbnn�1=2 nXi=1(g(Xi; �0 + ta�1n )� g(Xi; �0)� E[g(X; �0 + ta�1n )� g(X; �0)]) : jtj �Mg16

Page 17: Asymptotic distribution of regression M{estimators1

converges weakly to the process fG(t) : jtj �Mg with mean zero and covariance given byE[G(s)G(t)] = limn!1nCov((g(X; �0 + a�1n s)� g(X; �0))Ibnn�1=2jg(X;�0+a�1n s)�g(X;�0)j�1; (g(X; �0 + a�1n t)� g(X; �0))Ibnn�1=2jg(X;�0+a�1n t)�g(X;�0)j�1):Last theorem follows from Theorem 2.3 in Arcones (1999). Similar results, but slightlystronger, are in Alexander (1987) and Pollard (1990, Theorem 10.6). When the limit is in�nitedivisible, without a Gaussian part, we will use the following:Theorem 2.15. (Arcones, 1999, Theorem 2.5). Under the notation in Theorem 2.13, letcn(t) be a real number for each jtj �M and each n � 1. Suppose that:(i) The �nite dimensional distributions of fZn(t) : jtj � Mg converge to those of an in�nitelydivisible process fZ(t) : jtj �Mg without Gaussian part, whereZn(t) := bnn�1=2 nXi=1(g(Xi; �0 + ta�1n )� g(Xi; �0))� cn(t):(ii) fg(x; �)� g(x; �0) : j� � �0j � �g is a VC subgraph class of functions for some � > 0.(iii) For each � > 0,lim�!0 lim supn!1 nPrf supjsj;jtj�Mjs�tj�� jg(X; �0 + sa�1n )� g(X; �0 + ta�1n )j � �g = 0:(iv)lim�!0 lim supn!1 b2nE[ supjtj�M jg(X; �0 + ta�1n )� g(X; �0)j2Ibnn�1=2 supjtj�M jg(X;�0+ta�1n )�g(X;�0)j��] = 0:(v) lim�!0 lim supn!1 supjsj;jtj�Mjs�tj�� jbnn1=2E[(g(X; �0 + sa�1n )� g(X; �0 + ta�1n )�Ibnn�1=2 supjtj�M jg(X;�0+ta�1n )�g(X;�0)j�1]� (cn(s)� cn(t))j = 0:Then, fZn(t) : jtj � Mg converges weakly to fZ(t) : jtj �Mg.Next lemma deals with condition (iii) in Theorem 2.13. It is an obvious variation of Lemma4 in Arcones (1994)Lemma 2.16. With the notation of Lemma 2.13, let � > 0, suppose that:(i) The class of functions fg(x; �) � g(x; �0) : j� � �0j � �0g is a VC subgraph class offunctions for some �0 > 0.(ii) For each M > 0,supj�j�M a�nj(Pn � P )(g(�; �0 + a�1n �)� g(�; �0))j = OPr(1):17

Page 18: Asymptotic distribution of regression M{estimators1

(iii) There are constant c; �; > 0 such thatE[G2�(X)] � c��(log ��1) ;for each � > 0 small enough.(iv) (log an) a2���n = O(n).Then, for each � > 0, there exists a � > 0 such thatlimM!1 lim supn!1 Prf supj���0j�� a�nj(Pn � P )(g(�; �)� g(�; �0))j�a�nj� � �0j� +M � 1g = 0:Sometimes, we will use the following variation, that do not require �nite second momentfrom G�(X).Lemma 2.17. With the notation of Lemma 2.13, let � > 0, suppose that:(i) The class of functions fg(x; �) � g(x; �0) : j� � �0j � �0g is a VC subgraph class offunctions for some �0 > 0.(ii) For each M > 0,supj�j�M a�nj(Pn � P )(g(�; �0 + a�1n �)� g(�; �0))j = OPr(1):(iii) There are constants 0 < q < 1 and c > 0 such thatE[(M�1G�(X)) ^ (M�2G2�(X))] � c��M�1�q;for each � > 0 small enough and each M > 0 large enough.(iv) an = O(n1=�).Then, for each � > 0, there exists a � > 0 such thatlimM!1 lim supn!1 Prf supj���0j�� a�nj(Pn � P )(g(�; �)� g(�; �0))j�a�nj� � �0j� +M � 1g = 0:Next, we consider the asymptotics of M{estimators for 0 < p < 1=2.Theorem 2.18. Let 1=2 > p > 0 and m = 1. Let �̂n be any sequence of r.v.'s satisfying(1.3). Suppose that:(i) For each a 6= 0, E[jU � ajp � jU jp] > 0(ii) For each � 6= 0, PrfZ 0� = 0g < 1.(iii) E[kZk2] <1.(iv) There exists a � > 0 such that U has a density fU(u) in [��; �].(v) R ��� jujp�2jfU(u)� fU(0)j du. 18

Page 19: Asymptotic distribution of regression M{estimators1

(vi) E[2�1p(p� 1)jU jp�2IjU j��] + p�p�1fU(0) + R ��� 2�1p(p� 1)jujp�2(fU(u)� fU(0)) du > 0.Then, n1=(3�2p)(�̂n��0) converges in distribution to the argument that minimizes the processfZ(t) + t0V t : t 2 IRdg;where V is an in (2.3) and fZ(t) : t 2 IRdg is the Gaussian process with mean zero andcovariance given byE[Z(t)Z(s)] := 2�1E[jZ 0tj2p+1 + jZ 0sj2p+1 � jZ 0t� Z 0sj2p+1]fU(0) Z 1�1(ju� 1jp � jujp)2 du:Proof. We apply Theorem 2.13 with an = n1=(3�2p), bn = a2n and � = 2. Condition (i) inTheorem 2.13 follows from Proposition 2.6. Condition (ii) follows from Lemma 2.11.To check condition (iii), we have to prove thatfa2nn�1 nXj=1(jUj � a�1n Z 0jtjp � jUjjp � E[jUj � a�1n Z 0jtjp � jUjjp]) : jtj � Mgconverges weakly to fZ(t) : jtj �Mg, for each M <1. To prove this, we apply Theorem 2.14.Condition (i) in Theorem 2.14 follows from Lemma 2.5. As to condition (ii) in Theorem 2.14,nPrf supjtj�M n(2p�1)=(3�2p)jjU � n�1=(3�2p)Z 0tjp � jU jpj � �g� nPrfjU jp�1jZj � c�n(2�2p)=(3�2p); jU j � n�1=(3�2p)jZjg+nPrfjZjp � c�n(1�p)=(3�2p); jU j < n�1=(3�2p)jZjg:We have thatnPrfjU jp�1jZj � c�n(2�2p)=(3�2p); jU j � n�1=(3�2p)jZjg � nPrfjZjp � c�n(1�p)=(3�2p)g� cE[jZjp(3�2p)=(1�p)IjZjp�c�n(1�p)=(3�2p) ]! 0;because 0 < p(3� 2p)=(1� p) < 2. Using that(2:4) PrfjU j � tg � ct;nPrfjZjp � c�n(1�p)=(3�2p); jU j < n�1=(3�2p)jZjg� n(2�2p)=(3�2p)E[jZjIjZjp�c�n(1�p)=(3�2p) ] � cE[jZj2p+1IjZjp�c�n(1�p)=(3�2p) ]! 0:Therefore, condition (ii) in Theorem 2.14 follows. To check condition (iii), we need that(2:5) lim�!0+��(2p+1)E[(jU � �Z 0sjp � jU jp)(jU � �Z 0tjp � jU jp)] = E[Z(s)Z(t)];for each s; t 2 IRd, and and(2:6) n(2p+1)=(3�2p)E[(jU � n�1=(3�2p)Z 0tjp � jU jp)2IjjU�n�1=(3�2p)Z0tjp�jU jpj�n(1�2p)=(3�2p) ]! 0;19

Page 20: Asymptotic distribution of regression M{estimators1

for each t 2 IRd.We have that(2:7) ��(2p+1)E[(jU � �Z 0sjp � jU jp)(jU � �Z 0tjp � jU jp)]= ��(2p+1) Z ��� E[(ju� �Z 0sjp � jujp)(ju� �Z 0tjp � jujp)]fU(0) du+��(2p+1) Z ��� E[(ju� �Z 0sjp � jujp)(ju� �Z 0tjp � jujp)](fU(u)� fU(0)) du+��(2p+1)E[(jU � �Z 0sjp � jU jp)(jU � �Z 0sjp � jU jp)IjU j>�]:By the change of variable v = �u,(2:8) ��(2p+1) Z ��� E[(ju� �Z 0sjp � jujp)(ju� �Z 0sjp � jujp)]fU(0) du= Z ��1����1� E[(ju� Z 0sjp � jujp)(ju� Z 0tjp � jujp)]fU(0) du! Z 1�1E[(ju� Z 0sjp � jujp)(ju� Z 0tjp � jujp)]fU(0) du:We have that Z 1�1(ju� ajp � jujp)(ju� bjp � jujp) du= 2�1 Z 1�1((ju� ajp � jujp)2 + (ju� bjp � jujp)2 � (ju� bjp � ju� ajp)2) du:By the change of variable v = a + (b� a)u,Z 1�1(ju� ajp � ju� bjp)2 du = jb� aj2p+1 Z 1�1(ju� 1jp � jujp)2 du:So, Z 1�1(ju� ajp � jujp)(ju� bjp � jujp) du= 2�1(jaj2p+1 + jbj2p+1 � jb� aj2p+1) Z 1�1(ju� 1jp � jujp)2 duand(2:9) Z 1�1E[(ju� Z 0sjp � jujp)(ju� Z 0tjp � jujp)]fU(0) du = E[Z(s)Z(t)]:By the change of variable v = �u,(2:10) ��(2p+1)j Z ��� E[(ju� �s0Zjp � jujp)(ju� �t0Zjp � jujp)](fU(u)� fU(0)) duj� Z ��1����1� E[jju� Z 0sjp � jujpjjju� Z 0tjp � jujpj]jfU(��1u)� fU(0)j du! 0:20

Page 21: Asymptotic distribution of regression M{estimators1

By Lemma 2.8,(2:11) ��(2p+1)jE[(jU � �s0Zjp � jU jp)(jU � �t0Zjp � jU jp)IjU j>�]j� 2��(2p+1)E[(jU � �Z 0sjp � jU jp)2IjU j>�]+2��(2p+1)E[(jU � �Z 0tjp � jU jp)2IjU j>�]� c�1�2pE[jZj2jU j2p�2IjU j>�]! 0;as �! 0+. (2.5) follows from (2.7){(2.11).As to (2.6), by Lemma 2.8,n(2p+1)=(3�2p)E[(jU � n�1=(3�2p)Z 0tjp � jU jp)2IjjU�n�1=(3�2p)Z0tjp�jU jpj�n(1�2p)=(3�2p) ]� n(2p�1)=(3�2p)E[jU j2p�2jZj2IjU j�cn�1=(3�2p)jZj; cjU jp�1jZj�n(1�2p)=(3�2p) ]+n1=(3�2p)E[jZj2pIjU j�cn�1=(3�2p)jZj; jZjp�cn(1�p)=(3�2p)]Using that(2:12) E[jU j2p�2IjU j�a] � ca2p�1;n(2p�1)=(3�2p)E[jU j2p�2jZj2IjU j�cn�1=(3�2p)jZj; cjU jp�1jZj�n(1�2p)=(3�2p)]� E[jZj2p+1IjZjp�cnp=(3�2p)]! 0:By (2.4), n1=(3�2p)E[jZj2pIjU j�cn�1=(3�2p)jZj; jZjp�cn(1�p)=(3�2p)]� cE[jZj2p+1IjZjp�cn(1�p)=(3�2p) ]! 0:So, (2.6) follows.As to condition (iv) in Theorem 2.14,n(2p+1)=(3�2p)E[ supjtj�M(jU � n�1=(3�2p)Z 0tjp � jU jp)2Isupjtj�M jjU�n�1=(3�2p)Z0tjp�jU jpj�1]� n(2p+1)=(3�2p)E[(cjU j2p�2n�2=(3�2p)kZk2) ^ 1]! 0:As to condition (v) in Theorem 2.14, for jsj; jtj �M , js� tj � �,n(2p+1)=(3�2p)E[(jU � n�1=(3�2p)s0Zjp � jU � n�1=(3�2p)t0Zjp)2]� n(2p+1)=(3�2p)E[(jU � n�1=(3�2p)s0Zjp � jU � n�1=(3�2p)t0Zjp)2I2Mn�1=(3�2p)jZj�jU j]+n(2p+1)=(3�2p)E[(jU � n�1=(3�2p)s0Zjp � jU � n�1=(3�2p)t0Zjp)2)I2Mn�1=(3�2p)jZj>jU j]:By Lemma 2.8 and (2.11),n(2p+1)=(3�2p)E[(jU � n�1=(3�2p)s0Zjp � jU � n�1=(3�2p)t0Zjp)2I2Mn�1=(3�2p)jZj�jU j]21

Page 22: Asymptotic distribution of regression M{estimators1

� cn(2p�1)=(3�2p)E[js� tj2jZj2jU � n�1=(3�2p)s0Zj2p�2I2Mn�1=(3�2p) jZj�jU j]� cn(2p�1)=(3�2p)E[js� tj2jZj2jU j2p�2I2Mn�1=(3�2p)jZj�jU j]� cE[js� tj2jZj2p+1M2p�1]:By Lemma 2.8 and (2.5),n(2p+1)=(3�2p)E[(jU � n�1=(3�2p)s0Zjp � jU � n�1=(3�2p)t0Zjp)2)I2Mn�1=(3�2p)jZj>jU j]� cn1=(3�2p)E[js� tj2pjZj2pI2Mn�1=(3�2p) jZj>jU j] � cME[js� tj2pjZj2p+1]:So, condition (v) holdsAs to condition (vi) in Theorem 2.14, by Lemma 2.8 and (2.4),n2=(3�2p)E[ supj�j�M jjU � n�1=(3�2p)Z 0�jp � jU jpjIn(2p�1)=(3�2p) supj�j�M jjU�n�1=(3�2p)Z0�jp�jU jpj�1]� cn(2�p)=(3�2p)E[ckZkpIcn(p�1)=(3�2p)kZkp�1;n�1=(3�2p)kZk�jU j]+cn1=(3�2p)E[jU jp�1kZkIcn(2p�2)=(3�2p)jU jp�1kZk�1;n�1=(3�2p)kZk<jU j]� cn(1�p)=(3�2p)E[ckZkp+1Icn(p�1)=(3�2p)kZkp�1]! 0;(E[jU jp�1IjU j�a] � cap). Therefore, the conditions in Theorem 2.14 hold.To check condition (iv) in Theorem 2.13, by Lemma 2.16, we need thatE[supj�j�� jjU � Z 0�jp � jU jpjj2] = O(�2p�1):By Lemma 2.8, (2.4) and (2.11),E[supj�j�� jjU � Z 0�jp � jU jpjj2] � cE[jU j2p�2kZk2�2 ^ kZk2p�2p]� cE[jU j2p�2kZk2�2IkZk��jU j] + cE[kZk2p�2pIkZk��jU j]� cE[kZk2p+1�2p+1] � c�2p+1:With probability one, fZ(t) + t0V t : t 2 IRdg attains its minimum at a unique point, byLemma 5 in Arcones (1994). 2In the case p = 1=2 and m = 1, we have the following:Theorem 2.19. Let �̂n be any sequence of r.v.'s satisfying (1.3) with m = 1. Suppose that:(i) For each a 6= 0, E[jU � aj1=2 � jU j1=2] > 0.(ii) For each � 6= 0, PrfZ 0� = 0g < 1.(iii) E[kZk2(log kZk�1)] <1.(iv) There exists a � > 0 such that U has a density fU(u) in [��; �].22

Page 23: Asymptotic distribution of regression M{estimators1

(v) R ��� juj�3=2jfU(u)� fU(0)j du.(vi) fU(0) > 0.(vii) �E[2�3jU j�3=2IjU j��] + 2�1��1=2fU(0)� R ��� 2�3juj�3=2(fU(u)� fU(0)) du > 0.Then, n1=2(2 logn)�1=2(�̂n � �0) converges in distribution to 2�1V �1�, where V is as in(2.3) and � is a d dimensional normal random vector with mean zero and covariance E[��0] =2�2fU(0)E[ZZ 0].Proof. We apply Theorem 2.13. Condition (i) in Theorem 2.13 follow from Proposition2.6. Condition (ii) follows from Lemma 2.11. To get condition (iii), we need thatf(2 logn)�1 nXi=1(jUj � (2 logn)1=2n�1=2Z 0jtj1=2 � jUjj1=2�E[jUj � (2 logn)1=2n�1=2t0Zjj1=2 � jUjj1=2]) : jtj �Mgconverges weakly to fZ(t) := t0� : jtj � Mg, for each M < 1. We apply Theorem 2.14.Condition (i) in Theorem 2.14 follows from Lemma 2.5.As to condition (ii),nPrf supjtj�M jjU � (2 logn)1=2n�1=2t0Zj1=2 � jU j1=2j � � logng� nPrfCM jZj � �2n1=2(logn)3=2g ! 0:To check condtion (iii), we need thatnPrf supjtj�M jjU � (2 logn)1=2n�1=2Z 0tjp � jU j1=2j � � logng ! 0:We claim that(2:13) lim�!0+��2(2 log��1)�1E[(jU � �s0Zj1=2 � jU j1=2)(jU � �t0Zj1=2 � jU j1=2)] = E[s��0t];for each s; t 2 IRd. We have that��2(2 log��1)�1E[(jU � �s0Zj1=2 � jU j1=2)(jU � �t0Zj1=2 � jU j1=2)]= ��2(2 log��1)�1 Z ��� E[(ju� �Z 0sj1=2 � juj1=2)(ju� �Z 0tj1=2 � juj1=2)]fU(0) du+��2(2 log��1)�1 Z ��� E[(ju� �Z 0sj1=2 � juj1=2)(ju� �Z 0tj1=2 � juj1=2)](fU(u)� fU(0)) du+��2(2 log��1)�1E[(jU � �Z 0sj1=2 � jU j1=2)(jU � �Z 0tj1=2 � jU j1=2)IjU j>�]:Let N be a constant. By a change of variables,��2(2 log��1)�1 Z ��� E[(ju� �Z 0sj1=2 � juj1=2)(ju� �t0Zj1=2 � juj1=2)]fU(0) du23

Page 24: Asymptotic distribution of regression M{estimators1

= (2 log��1)�1 Z ���1����1 E[(ju� s0Zj1=2 � juj1=2)(ju� t0Zj1=2 � juj1=2)]fU(0) du= (2 log��1)�1 Z �N����1 E[(ju� s0Zj1=2 � juj1=2)(ju� t0Zj1=2 � juj1=2)]fU(0) du+(2 log��1)�1 Z N�N E[(ju� Z 0sj1=2 � juj1=2)(ju� Z 0tj1=2 � juj1=2)]fU(0) du+(2 log��1)�1 Z ���1�N E[(ju� Z 0sj1=2 � juj1=2)(ju� Z 0tj1=2 � juj1=2)]fU(0) duBy a Taylor expansion(2 log��1)�1 Z ���1N E[(ju� s0Zj1=2 � juj1=2)(ju� t0Zj1=2 � juj1=2)]fU(0) du= (2 log��1)�1 Z ���1N E[2�1u�1=2t0Z2�1u�1=2Z 0s]fU(0) du+ o(1)= (2 log��1)�1 Z ���1N E[2�1u�1=2t0Z2�1u�1=2Z 0s]fU(0) du+ o(1):= 2�3(log��1)�1(log(���1)� logN)E[t0ZZ 0s]fU(0) du+ o(1)! 2�3E[t0ZZ 0s]fU(0) du:Similary = (2 log��1)�1 Z �N����1 E[(ju� s0Zj1=2 � juj1=2)(ju� t0Zj1=2 � juj1=2)]fU(0) du! 2�3E[t0ZZ 0s]fU(0) du:We have that(2 log��1)�1j Z N�N E[(ju� s0Zj1=2 � juj1=2)(ju� t0Zj1=2 � juj1=2)]fU(0) duj� (2 log��1)�12N jtj1=2jsj1=2E[jZj]fU(0)! 0:By Lemma 2.8,��2(2 log��1)�1j Z ��� E[(ju� �s0Zj1=2 � juj1=2)(ju� �t0Zj1=2 � juj1=2)](fU(u)� fU(0)) duj� cE[jtjjsjjZj2](2 log��1)�1 Z ��� juj�1jfU(u)� fU(0)j du! 0as �! 0+. By Lemma 2.8,��2(2 log��1)�1jE[(jU � �s0Zj1=2 � jU j1=2)(jU � �t0Zj1=2 � jU j1=2)IjU j>�]j� c(2 log��1)�1E[jtjjsjjZj2jU j�1IjU j>�]! 0as � ! 0+. (2.13) follow from all these estimations. The rest of the conditions in Theorem2.14 follow similarly to those checked in Theorem 2.18.24

Page 25: Asymptotic distribution of regression M{estimators1

To check condition (iv) in Theorem 2.13, by Lemma 2.16, we need thatE[supj�j�� jjU � Z 0�j1=2 � jU j1=2jj2] = O(�2(log ��1)):By Lemma 2.8, (2.4) and (2.11),E[supj�j�� jjU � Z 0�j1=2 � jU j1=2jj2] � cE[jU j�1kZk2�2 ^ kZk�]� cE[jU j�1kZk2�2IkZk��jU j] + cE[kZk�IkZk��jU j]� cE[jU j�1kZk2�2IkZk��jU j�1] + c�2 � E[kZk2�2(log kZk�1��1)IkZk��1] + c�2� c�2(log ��1):As to condition (v) in Theorem 2.13, we have that ft0V t+ t0� : t 2 IRdg attains its minimumonly at t = �2�1V �1�. 2References[1] Alexander, K. S. (1987). Central limit theorems for stochastic processes under random entropyconditions. Probab. Theor. Rel. Fields 75 351{378.[2] Arcones, M. A. (1994). Distributional convergence of M{estimators under unusual rates. Statist.Probab. Lett. 21, 271{280.[3] Arcones, M. A. (1996). M{estimators converging to a stable limit. Preprint.[4] Arcones, M. A. (1999). Weak convergence for the row sums of a triangular array of empiricalprocesses indexed by a manageable triangular array of functions. Electronic Journal of Probability,Vol. 4 Paper no. 7, pages 1{17.[5] Bai, Z. D.; Chen, X. R.; Miao, B. Q. and Rao, C. R. (1990). Asymptotic theory of least distanceestimate in multivariate linear models. Statistics 4, 503{519.[6] Bai, Z. D.; Rao, C. R. and Wu, Y. (1992). M{estimation of multivariate linear regression param-eters under a convex discrepancy function. Statistica Sinica 2, 237{254.[7] Bassett, G. and Koenker, R. (1978). Asymptotic theory of least absolute error regression. J.Amer. Statist. Assoc. 73, 618{622.[8] Bloom�eld, P. and Steiger, W. L. (1983). Least Absolute Deviations. Theory, Applications, andAlgorithms. Birkh�auser, Boston.[9] Davis, R. A.; Knight, K. and Liu, J. (1992). M{estimation for autoregression with in�nite vari-ance. Stoch. Proc. Appl. 40, 145{180.[10] Davis, R. A. and Wu, W. (1997). M{estimation for linear regression with in�nite variance. Prob.Math. Statist. 17 1{20.[11] Draper, N. R. and Smith, H. (1981). Applied Regression Analysis. John Wiley, New York.[12] Dudley, R. M. (1984). A course on empirical processes. Lect. Notes in Math. 1097, 1{142.Springer{Verlag, New York.[13] Gin�e, E. and Zinn, J. (1984). Some limit theorems for empirical processes. Ann. Prob. 12 929{989.[14] Hampel, F. R.; Ronchetti, E. M.; Rousseeuw, P. J. and Stahel, W. A. (1986). Robust Statistics,the Approach Based on In uence Functions. John Wiley, New York.[15] Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 35, 73{101.[16] Huber, P. J. (1973). Robust regression: asymptotics, conjectures and Monte Carlo. Ann. Statist.1, 799{821.[17] Huber, P. J. (1981). Robust Statistics. John Wiley, New York.[18] Jure�ckov�a, J. (1977). Asymptotic relations of M{estimates and R{estimates in linear regressionmodel. Ann. Statist. 5, 464{472. 25

Page 26: Asymptotic distribution of regression M{estimators1

[19] Kim, J. and Pollard, D. (1990). Cube root asymptotics. Ann. Statist. 18, 191{219.[20] Koenker, R. and Bassett, G. Jr. (1978). Regression quantiles. Econom. 46, 33{50.[21] Koenker, R. and Portnoy, S. (1987). L{estimation for linear models. J. Amer. Statist. Assoc. 82,851{857.[22] Koul, H. L. (1977). Behavior of robust estimators in the regression model with dependent errors.Ann. Statist. 15, 681{699.[23] Pollard, D. (1984). Convergence of Stochastic Processes. Springer{Verlag, New York.[24] Pollard, D. (1990). Empirical Processes: Theory and Applications. NSF{CBMS Regional Con-ference Series in Probab. and Statist., Vol. 2. Institute of Mathematical Statistics, Hayward,California.[25] Prakasa Rao, B. L. S. (1968). Estimation of the location of the cusp of a continuous density. Ann.Math. Statist. 39, 76{87.[26] Ruppert, D. and Carroll, R. J. (1980). Trimmed least squares estimation in the linear model. J.Amer. Statist. Assoc. 75, 828{838.[27] Smirnov, N. V. (1949). Limit distributions for the terms of a variational series. American Math-ematical Society Translations Series (1) 11, 82{143.[28] Yohai, V. J. and Marona, R. A. (1979). Asymptotic behavior of M{estimators for the linearmodel. Ann. Statist. 7, 258{268.

26