Adjusting the Tests for Skewness and Kurtosis forDistributional Misspecifications
Anil K. Bera Gamini PremaratneUniversity of Illinois at Urbana−Champaign National University of Singapore
Abstract
The standard root−b1 test is widely used for testing skewness. However, several studies havedemonstrated that this test is not reliable for discriminating between symmetric andasymmetric distributions in the presence of excess kurtosis. The main reason for the failure ofthe standard root−b1 test is that its variance formula is derived under the assumption of noexcess kurtosis. In this paper we theoretically derive adjustment to the root−b1 test under theframework of Roa’s Score (or the Lagrange multiplier) test principle. Our adjusted testautomatically corrects the variance formula and does not lead to over− or under−rejection ofthe correct null hypothesis. In a similar way, we also suggest an adjusted test for kurtosis inthe presence of asymmetry. These tests are then applied to both simulated and real data. Thefinite sample performances of the adjusted tests are far superior compared to those of theirunadjusted counterparts
Published: 2001URL: http://www.business.uiuc.edu/Working_Papers/papers/01−0116.pdf
Adjusting the Tests for Skewness and Kurtosis
for Distributional Misspeci�cations
Anil K. Bera1 and Gamini Premaratne
Department of Economics,University of Illinois,
1206 S. 6th Street,Champaign, IL 61820,USA.
Abstract
The standardpb1 test is widely used for testing skewness. However, several studies
have demonstrated that this test is not reliable for discriminating between symmetric
and asymmetric distributions in the presence of excess kurtosis. The main reason for
the failure of the standardpb1 test is that its variance formula is derived under the
assumption of no excess kurtosis. In this paper we theoretically derive adjustment
to thepb1 test under the framework of Rao's Score (or the Lagrange multiplier) test
principle. Our adjusted test automatically correct the variance formula and does not
lead to over- or under-rejection of the correct null hypothesis. In a similar way, we also
suggest an adjusted test for kurtosis in the presence of asymmetry. These tests are
then applied to both simulated and real data. The �nite sample performances of the
adjusted tests are far superior compared to those of their unadjusted counterparts.
1Correspondence to: Anil K. Bera, Department of Economics, University of Illinois, 1206 S. 6th Street,
Champaign, IL 61820, U.S.A.; email: [email protected]
1
1 Introduction
Suppose we have n independent observations y1; y2; : : : ; yn on a random variable Y . For
simplicity, assume that they are measured from their mean. Then the sample skewness and
kurtosis are de�ned, respectively, as
qb1 =
m3
m3=22
(1)
and
b2 =m4
m22
; (2)
where mj = n�1Pn
i=1 yji ; j = 2; 3; 4: The use of these coeÆcients in the statistics literature
goes back a long way. Pearson (1895) suggested that one use these coeÆcients to identify
a density within a family of distributions. Fisher (1930) and Pearson (1930) formulated
formal tests of normality based onpb1 and b2 separately. Using the Pearson (1895) family of
distributions and the Rao (1948) score (RS) test principle, Bera and Jarque (1981) derived
the following omnibus statistic for testing normality
RS = n
24pb12
6+(b2 � 3)2
24
35 ; (3)
jointly based onpb1 and b2 [see also D'Agostino and Pearson (1973), Cox and Hinkley
(1974, p. 42), Bowman and Shenton (1975), and Jarque and Bera (1987)]. Under the null
hypothesis of normality, RS is asymptotically distributed as �22.
It is true that the skewness and kurtosis coeÆcientspb1 and b2 jointly provide a powerful
scheme for assessing normality against a wide variety of alternatives. In many practical
applications, researchers are using n(pb1)
2=6 and n(b2� 3)2=24 separately as the �21 statistic
for assessing asymmetry and excess kurtosis [for instance, see A�eck and McDonald (1989),
Richardson and Smith (1993), and Christie-David and Chaudhury (2001)]. The purpose of
2
this paper is to demonstrate that n(pb1)
2=6 and n(b2 � 3)2=24 are not reliable measures of
asymmetry and excess kurtosis, respectively, that can be used as proper test statistics. For
instance, the asymptotic variance 6=n ofpb1 is valid only under the assumption of normality,
and the resulting test is not valid when the population kurtosis measure �2 6= 3. Similarly,
the test statistics n(b2 � 3)2=24 will not provide a reliable inference under asymmetry. Bera
and John (1983, p. 104) clearly stated that n(pb1)
2=6 and n(b2�3)2=24 are not pure tests of
skewness and kurtosis, since the asymptotic distribution of these statistics is derived using
full normality assumption. What is needed is to obtain the correct variances ofpb1 and b2
that are valid under distributional misspeci�cation. In this paper we derive correct variance
formulae using White's (1982) approach to make inference under misspeci�ed models. To
do this, we go back to the original derivation of (3) as given in Bera and Jarque (1981) and
Jarque and Bera (1987), using the Pearson family of distribution, and apply White's (1982)
modi�cation.
The plan of this paper is as follows: In the next section we derive the Rao (1948) score and
White's (1982) modi�cation to it. Section 3 discusses the adjusted RS test for asymmetry
in the presence of excess kurtosis, followed by a �nite sample performance study in Section
4. We do the same in sections 5 and 6 for the test of excess kurtosis allowing for asymmetry.
Section 7 presents an empirical illustration. In the last section we o�er a conclusion.
2 Rao's (1948) Score Test and White's (1982)
Modi�cation
Suppose there are n independent observations y1; y2; : : : ; yn with identical density function
f(y; �), where � is a p � 1 parameter vector with � 2 � � <p. It is assumed that f(y; �)
satis�es the regularity condition stated in Rao (1973, p. 364) and Ser ing (1980, p. 144).
The log-likelihood function, the score function, and the information matrix are then de�ned,
3
respectively, as
l(�) =nXi=1
ln f(yi; �) (4)
s(�) =@l(�)
@�(5)
I(�) = �E"@2l(�)
@�@�0
#: (6)
Let the hypothesis to be tested be H0 : h(�) = 0, where h(�) is an r � 1 vector function
of � with r � p. It is assumed that H(�) = @h(�)@�
has full rank, i.e., rank[H(�)] = r. Rao's
(1948) score statistic for testing H0 can be written as
RS = s(~�)0I(~�)�1s(~�); (7)
where ~� is the restricted maximum likelihood estimator (MLE) of �. Under H0, RS is asymp-
totically distributed as a �2r. RS is not a valid test statistic when the true data generating
process (DGP), g(y), di�ers from f(y; �). This is because some of the standard results break
down under distributional misspeci�cation. For instance, consider the information matrix
equality
Ef
"@lnf(y; �)
@�:@lnf(y; �)
@�0
#= Ef
"�@
2lnf(y; �)
@�@�0
#; (8)
in which Ef [:] denotes expectation under f(y; �). Let us de�ne
J(�g) = Eg
"@lnf(y; �)
@�� @lnf(y; �)
@�0
#(9)
K(�g) = Eg
"�@
2lnf(y; �)
@�@�0
#; (10)
where �g minimizes the Kullback-Leibler information criterion [see White (1982)]
IKL = Eg
"ln
g(y)
f(y; �)
#: (11)
4
One can easily see that J(�g) 6= K(�g), in general. Due to this divergence between J
and K, and taking expectation in (6) under f(y; �) instead of the under of the DGP g(y),
in some cases the standard RS test in (7) is not valid. White (1982) suggested the following
robust form of the RS statistic [see also Kent (1982)]:
RS� =1
ns(~�)0K(~�)�1H(~�)[H(~�)0B(~�)H(~�)]�1H(~�)0K(~�)�1s(~�); (12)
where B(�) = K(�)�1J(�)K(�)�1 and ~� denotes quasi MLE (QMLE). Under H0, RS� is
asymptotically distributed as �2r even under distributional misspeci�cation, i.e., when the
assumed density f(y; �) does not coincide with the true DGP g(y).
For our purposes we need only a special case of RS�. Let us partition � as � = (�01; �02)0,
where �1 and �2 are respectively, (p� r)� 1 and �2 is r� 1 vectors. Let H0 : �2 = �20, where
�20 is a known quantity. Let ~�1 be the QMLE under H0, and let ~� = (~�01; �020)
0. We partition
the score vector and other matrices as follows:
s(�) � s(�1; �2) = [s1(�)0; s2(�)
0]0;
I(�) =
264 I11(�) I12(�)
I21(�) I22(�)
375 ;
I�1(�) =
264 I11(�) I12(�)
I21(�) I22(�)
375 ; (13)
and so on. Under these notations the standard RS statistic in (7) for our special case is
given by
RS = s2(~�1; �20)0I22(~�)s2(~�1; �20):
This can also be written as
RS = s2(~�1; �20)0K22(~�)s2(~�1; �20) = ~s02 ~K
22~s2 (14)
5
after dropping the arguments. Now let us consider RS� in (12). Here h(�) = �2 � �20,
s(�) = (0; s2(~�)0), and H(�) = @h(�)
@�0= [0r�p�rIr]. Hence,
s(~�)0K�1(~�)H(~�)0 =�00 s2(~�)
0
�0B@ ~K11 ~K12
~K21 ~K22
1CA 0
Ir
!
= s2(~�)0 ~K22: (15)
We again drop the arguments of K�1 and other matrices whenever convenient, and \~"
denotes that the quantity is evaluated at � = ~�. Similarly,
H ~BH 0 =�0 Ir
�0B@ ~B11~B12
~B21~B22
1CA0B@ 0
Ir
1CA
= ~B22: (16)
Hence,
RS� =1
n~s02 ~K
22 ~B22~K22~s2: (17)
Note that K22 = (K22 � K21K11K12)�1. In one of our special cases, as will be seen later,
K21 = 0, i.e., K22 = K�122 . We can then simplify B as
B =
0B@ K11 0
0 K22
1CA0B@ J11 J12
J21 J22
1CA0B@ K11 0
0 K22
1CA (18)
=
0B@ K11J11K
11 K11J12K22
K22J21K11 K22J22K
22
1CA ; (19)
and hence, B22 = K22J22K22 = K�1
22 J22K�122 . Therefore, from (17),
RS� =1
n~s02 ~K�1
22 ( ~K�122
~J22 ~K�122 )
�1 ~K�122 ~s2
=1
n~s02 ~J
�122 ~s2: (20)
6
Therefore, when the matrix K(�) is block diagonal, i.e., K12 = 0, the robust version of
the RS statistic requires calculation of J22(�) only. Also, comparing (14) and (20) we see
that RS and RS� have similar forms except that the former uses the Hessian, whereas the
latter uses the outer product gradient form for the underlying information matrix.
3 An Adjusted Test for Asymmetry
Let us start with the Pearson (1895) system of distributions,
d ln f(y)
dy=
c1 � y
c0 � c1y + c2y2; (21)
where we write f(y; �) simply as f(y) with � = (c0; c1; c2)0. The normal distribution is a
special case of this when c1 = c2 = 0. As discussed in Premaratne and Bera (2000), c1 and c2
could be treated, respectively, as the \asymmetry" and \kurtosis" parameters. Suppose we
are interested only in testing asymmetry ignoring excess kurtosis. Then we can start with
the above system with c2 = 0, i.e.,
d ln f(y)
dy=
c1 � y
c0 � c1y: (22)
By integrating (22), it can be shown that (22) leads to gamma density, which is a skewed
distribution. Under c1 = 0, (22) becomes the normal density with mean zero and variance
c0, which is symmetric. Let us derive the Rao score test for c1 = 0 in (22). That will give
us a test for asymmetry without allowing for excess kurtosis. Let us denote
(�; y) =Z c1 � y
c0 � c1ydy =
Z c1 � y
�dy;
with � = (c0; c1)0 and � = c0� c1y. Then by integrating (22) for the ith observation, we have
ln f(yi) = const: (�; yi)
i.e.,
f(yi) = const: exp( (�; yi))
7
or
f(yi) =exp( (�; yi))R1
�1 exp( (�; y)) dy: (23)
Note that in the denominator of (23), we don't use a subscript for y, since this term is purely
a constant. Therefore, the log-likelihood function l(�) = lnQn
i=1 f(yi) can be written as
l(�) = �n ln�Z 1
�1exp (�; y) dy
�+
nXi=1
(�; yi)
= l1(�) + l2(�) (say): (24)
To get the Rao score test, let us �rst derive the score functions @l(�)@c0
and @l(�)@c1
under the null
hypothesis H0 : c1 = 0. One can easily see that
@l1(�)
@c1
�����c1=0
= 0; (25)
@l2(�)
@c1=
nXi=1
Z �i � (c1 � yi)(�yi)�2i
dyi (26)
That is,
@l2(�)
@c1
�����c1=0
=nXi=1
Z c0 � y2ic20
dyi
=nXi=1
(yic0� y3i3c20
): (27)
When evaluated at the restricted MLE ~� = ( ~c0; 0), this becomes
@l2(~�)
@c1=nm1
m2� nm3
3m22
;
where mj =Pn
i=1 yji =n; j = 1; 2; 3; and ~c0 = m2 under the null. Therefore, we have
@l(~�)
@c1=@l1(~�)
@c1+@l2(~�)
@c2=nm1
m2
� nm3
3m22
: (28)
To get the information matrix I(�) we use the J matrix where we take the expectation
with respect to the assumed density f(y), and denote it by, Jf , i.e.,
Jf = Ef
"@lnf(y; �)
@�� @lnf(y; �)
@�0
#: (29)
8
From (23), we can easily see that, under H0 : c1 = 0,
@ ln f(y)
@c0= � �2
2c20+
y2
2c20= � 1
2�2+
y2
2�22(30)
@ ln f(y)
@c1=
y
c0� y3
3c20=
y
�2+
y3
3�22; (31)
where we replace c0 by �2 = Ef (y2) with the density f(y) being N(0; c0). Using (30) and
(31), the Jf matrix of (29) can be calculated as follows:
Jf = Ef
264 (@ ln f
@c0)2 @ ln f
@c1
@ ln f@c1
* (@ ln f@c1
)2
375
= Ef
2666664
14�2
2
+ y4
4�42
� y2
2�32
� y2�2
2
+ y3
2�32
+ y3
6�32
� y5
6�42
* y2
�22
+ y6
9�42
� 2y4
3�32
3777775 : (32)
Under symmetry, Ef (y) = Ef (y3) = Ef(y
5) = 0, and under H0 : c1 = 0; Ef(y2) =
�2; Ef (y4) = 3�22; Ef(y
6) = 15�32. Therefore,
Jf =
264
14�2
2
+3�2
2
4�42
� �22�3
2
0
0 �2�22
+15�3
2
9�42
� 6�22
3�32
375
=
264 1
2�22
0
0 23�2
375 : (33)
From (28) we have the estimated score function s2(~�) under the null hypothesis as
s2(~�) =@l(~�)
@c1= �nm3
3m22
(34)
by putting the sample mean m1 = 0. Since the information matrix in (33) is block diagonal,
the standard score test is
9
RSc1 =n
9
m23
m42
3
2�̂2
=n
6
m3
m3=22
!2
=n
6(qb1)
2: (35)
Note that while deriving this test, we have taken the density (23) as the DGP and have
fully utilized it while taking the expectation in (32) . If f(y) is not the DGP, the test RSc1
will not be valid; in particular, the asymptotic variance formula in (35), V(pb1) = 6=n is
not correct. For instance, in the presence of excess kurtosis, there will be proportionately
more outliers. As Pearson (1963) showed, as we move from short-tailed to very long-tailed
distributions, the contribution to the moments will come increasingly from the extreme tails
of the frequency function. As a result, the variance of a statistic such aspb1 will increase,
and for fat-tailed distributions, 6=n will underestimate the variance.
Let us try to use the modi�ed score statistic in (17). For that we need the Jg and Kg
matrices. As we noted earlier, if K12 = 0, then the modi�ed score test takes a very simple
form (20). After some tedious algebra, we can see that
K12 = Eg
"@2 ln f
@c0@c1
#= 0: (36)
To calculate RS� in (20), we need to �nd
J22 = Eg
"@ ln f
@c1
#2: (37)
From our earlier derivation,@ ln f
@c1=
y
c0� y3
3c20:
Hence, @ ln f
@c1
!2
=y2
c20+
y6
9c40� 2y4
3c30;
10
and
Eg
@ ln f
@c1
!2
=�2c20
+�69c40
� 2�43c30
=1
�2+
�69�42
� 2�43�32
:
Therefore,
~J22 =1
m2+
m6
9m42
� 2m4
3m32
: (38)
Using (34) and (38), RS� in (20) for testing H0 : c1 = 0 can be expressed as
RS�c1 =n
9
m23m
�32h
m�12 + 1
9m6m
�42 � 2
3m4m
�32
i
= nm2
3m�32h
9 +m6m�32 � 6m4m
�22
i
= n(pb1)
2h9 +m6m
�32 � 6m4m
�22
i : (39)
Note that if we impose the normality assumption, then the population counterpart of the
denominator in (39) is
9 + �6��32 � 6�4�
�22 = 9 + 15� 6:3 = 9 + 15� 18 = 6;
as in (35). Basically, the construction of the adjusted statistic RS�c1 indicates that asymp-
totically, an estimate of the variance of thepb1 that is valid under excess kurtosis is
1
n[9 +m6m
�32 � 6m4m
�22 ]: (40)
Let us again consider the term
9 + �6��32 � 6�4�
�22 : (41)
It di�ers from the variance formula under normality by
9 + �6�32 � 6�4�
�22 � 6 = 3 + �6�
�32 � 6�4�
�22 : (42)
11
Let us de�ne the generalized notion of kurtosis as
�2r =�2r+2
�r+12
r = 1; 2; : : :
so that
�2 =�4�22; �4 =
�6�32:
Then (42) is
3 + �4 � 6�2:
It can be shown that for leptokurtic distributions, 3 + �4 � 6�2 � 0. For example, with t7
density we have �2 = 7=5, �4 = 49=5, �6 = 343; then 3 + �4 � 6�2 = 98. Therefore, the
usual variance formula will underestimate the variance ofpb1, and we will reject the null
hypothesis of symmetry too frequently in the presence of excess kurtosis.
We can obtain the formula (40) or its population counterpart
1
n(9 + �6�
�32 � 6�4�
�22 ) (43)
in an alternative way.
Let g(w) = g(w1; w2; : : : ; wk) be a continuous function of random variables w1; w2; w3; : : : ; wk.
A general formula for the asymptotic variance of g(w) is [see, for example, Stuart and Ord
(1994, p. 350)]
V [g(w)] =kX
i=1
g0i(w)V (wi) +kXi6=
kXj=1
g0i(w)g0j(w):Cov(wi; wj) + o(n�1); (44)
where g0i(w) =@g(w)@wi
; i = 1; 2; : : : ; k are in terms of population values. We take g(w1; w2) =
m3=m3=22 . Hence using (44),
V (qb1) =
@g
@m3
!2
V (m2) +
@g
@m2
!2
V (m3) + 2
@2g
@m3@m2
!Cov(m2; m3):
Under symmetry, we utilize �3 = �5 = 0 and have
@g
@m3=
1
�3=22
12
@g
@m2= � 3�3
2�5=22
= 0
@2g
@m3@m2
= � 3
2�5=22
:
Hence,
V (m3) =1
n(�6 � �23 + 9�32 � 6�2�4)
=1
n(�6 + 9�32 � 6�2�4)
Cov(m2; m3) =1
n(�5 � �2�3 + 6�2�1�2 � 2�1�4 � 3�3�2)
= 0:
Therefore,
V (qb1) =
1
�3=22
!21
n(�6 + 9�32 � 6�2�4)
=1
n(9 + �6�
�32 � 6�4�
�22 ); (45)
which is same as (43). Godfrey and Orme (1991) used this approach to develop their test
for skewness, which is asymptotically valid under non-normality.
4 Comparison of the Standard and Adjusted Tests
for Asymmetry: Some Simulation Results
For our simulation study, we generated data under a variety of scenarios. To study the size
properties, we �rst generated data under the normal distribution, for which the standardpb1 provides the ideal test. We used the tabulated critical values for the standard
pb1 test
[see Pearson and Hartley (1976, p. 183)]. Some of these critical values are interpolated. For
our adjusted test RS�c1 in (39), we used the asymptotic �21 critical values. The standard test
is somewhat oversized whereas the adjusted test has a size very close to the nominal levels
13
of 1% and 5%. In fact, the sizes are even lower. This may due to the fact that here we are
using an \unnecessary" adjustment [see Equations (42) and (45)] to the variance ofpb1 and
thereby overestimate the variance which leads to under-rejection. The estimated sizes of the
modi�ed test look better as we increase the sample size.
The next results are for Student's t7-distribution. As explained earlier, for this case, the
standard test grossly underestimates the variance, leading to a false rejection of asymmetry
too often. The quantity in (43) is (9 + �6��32 � 6�4�
�22 )=n = 104=n, which is much larger
than the standard value 6=n. The rejection probabilities for our adjusted test are again close
to the nominal levels 1 and 5% and are in fact, lower than those values. As we noticed for
the data generated under normality, the rejection probabilities, in general, get closer to the
nominal levels as the sample size increases. The following DGP Beta(2,2) is a platykurtic
(i.e., �2 < 3) symmetric distribution. Here, possibly 6=n overestimates the variance, and the
standard test has much lower estimated sizes; the adjusted test also have lower sizes, but
these are much closer to the 1 and 5% values. We then generated data using the procedure of
Ramberg, Dudewicz, Tadikamalla, and Mykytka (1979). Their technique can generate data
for any given level of skewness and kurtosis. We generated data usingp�1 = 0, �2 = 7:0. The
results for this distribution are qualitatively similar to those obtained for the t7 distribution.
Since for this distribution the excess kurtosis is larger than the t7-density, the performance
of the standard test is even worse whereas the behavior of the modi�ed test does not change.
The last two DGP are from symmetric leptokurtic distributions, namely, Laplace and logistic
distributions. Again, the standard test rejects the correct hypothesis of symmetry too often
whereas our modi�cation corrects this over-rejection.
Table 2 provides some simulation results for the power of the tests. The �rst two DGPs are
under positively skewed distributions (�24 and Beta(1,2)), and the estimated probabilities are
the estimated powers. Both the standard and the modi�ed tests have very good power. For
14
Table 1: Estimated Sizes of the Skewness Tests with 5000 Replications
DGP and Standard Test Modi�ed Test
Sample Sizes 1% 5% 1% 5%
DGP: N(0,1) (p�1 = 0; �2 = 3:0)
50 2.12 9.92 0.18 3.66100 2.00 9.90 0.58 4.58200 1.70 10.14 0.50 4.66250 2.16 10.38 0.70 4.74
DGP: t7(p�1 = 0; �2 = 5:0)
50 16.18 31.18 0.26 3.42100 20.28 35.70 0.42 3.42200 25.86 41.70 0.44 3.64250 27.06 42.80 0.38 4.10
DGP: Beta(2,2) (p�1 = 0; �2 = 2:14)
50 0.22 3.30 0.42 3.94100 0.16 2.72 0.50 4.38200 0.04 2.26 0.74 4.28250 0.14 2.26 0.86 4.86
DGP: Generated Data (p�1 = 0; �2 = 7:0)
50 26.2 42.78 0.32 3.22100 34.2 50.82 0.24 3.28200 43.2 57.80 0.36 3.18250 43.5 58.74 0.30 3.66
DGP: Laplace(� = 2) (p�1 = 0; �2 = 6:0)
50 25.72 43.54 0.38 3.82100 31.40 48.78 0.50 3.68200 36.26 52.00 0.38 3.94250 38.60 53.92 0.28 3.62
DGP: Logistic(1,2)(p�1 = 0; �2 = 4:2)
50 12.02 27.14 0.32 3.40100 14.88 30.14 0.36 4.00200 17.92 33.54 0.42 4.30250 19.70 36.72 0.50 3.94
15
Table 2: Estimated Powers of Skewness Tests with 5000 Replications
DGP and Standard Test Modi�ed Test
Sample Sizes 1% 5% 1% 5%
DGP: �24 (p�1 = 1:41; �2 = 6)
50 71.02 90.38 13.66 50.02100 98.80 99.90 46.80 78.90200 100.00 100.00 74.92 91.08250 100.00 100.00 81.44 92.70
DGP: Beta(1,2) (p�1 = 0:56; �2 = 3:27)
50 13.62 51.72 21.82 58.82100 46.88 85.66 74.60 93.44200 92.00 99.36 99.04 99.96250 98.08 99.94 99.94 100.00
DGP: Beta(2,1) (p�1 = �0:56; �2 = 0:67)
50 14.52 51.78 21.74 59.34100 45.62 85.10 73.74 93.40200 92.10 99.48 99.12 99.94250 98.32 99.92 99.90 100.00
the �2-data, the standard test has higher estimated powers, whereas for the Beta(1,2) data,
the modi�ed test performs better. For the negatively skewed distribution B(2,1), the results
are very similar to those for Beta(1,2) although their kurtosis structures are quite di�erent.
These results reinforce our assertion that the standardpb1 test is not a reliable test
for asymmetry in the presence of excess kurtosis. We will wrongly reject the true null of
symmetry too often. On the other hand, our simple variance-adjusted test work remarkably
well. It has very good �nite sample size and power properties.
16
5 An Adjusted Test for Excess Kurtosis
Here we start with,
d ln f(y)
dy=
�yc0 + c2y2
; (46)
which is the Pearson density [equation (21)] with c1 = 0. By integrating (46), we get
ln f(y) = const: ln(c0 + c2y2):(� 1
2c2);
i.e.,
f(y) = const:(c0 + c2y2)� 1
2c2 ;
which is a generalization of Student's t-density. The normal distribution (no excess kurtosis
with �2 = 3) is a special case of this, with c2 = 0. Let us �rst derive the standard RS test
for H0 : c2 = 0. We have
f(yi) =exp( (�; yi))R1
�1 exp( (�; y))dy; (47)
where we now de�ne (�; y) =R �y
c0+c2y2dy with � = (c0; c2)
0. The log-likelihood function can
be written as,
l(c0; c2) = �n ln�Z 1
�1exp( (�; y))dy
�+
nXi=1
(�; yi)
= l1(�) + l2(�) (say). (48)
Under c2 = 0,
@l1(�)
@c2= �n
R1�1 e
�y2
2c0y4
4c20
dy
R1�1 e
�y2
2c0dy
= � 1
4c20E(y4) = �n�4
4c20@l2(�)
@c2=
nXi=1
Z y3
c20dy =
Pni=1 y
4i
4c20=nm4
4c20:
17
Hence,@l(~�)
@c2=
n
4m22
(m4 � 3m22) =
n
4(m4
m22
� 3) =n
4(b2 � 3): (49)
To obtain the information matrix let us now derive
Jg = Eg
0B@ (@ ln f
@c0)2 (@ ln f
@c0
@ ln f@c2
)
* (@ ln f@c2
)2
1CA : (50)
From (47)
ln f(y) = � (�; y)� lnZ 1
�1exp( (�; y))dy (51)
and hence
@ ln f(y)
@c0
�����c2=0
=Z y
c20dy �
R1�1 exp(� y2
2c0) y2
2c20
dyR1�1 exp(� y2
2c0) dy
=y2
2c20� �2
2c20; (52)
and
@ ln f(y)
@c2
�����c2=0
=Z y3
c20dy �
R1�1 exp(� y2
2c0) y4
4c20
dyR1�1 exp(� y2
2c0) dy
=y4
4c20� �4
4c20: (53)
Now
Eg
@ ln f(y)
@c0
!2
= Eg
y2
2c20� �22c20
!2
=1
4c40Eg
�y4 + �22 � 2�2y
2�
=1
4�42(�4 + �22 � 2�22)
=1
4�42(�4 � �22); (54)
18
Eg
@ ln f(y)
@c2
!2
=1
16c40Eg(y
8 + �24 � 2�4y4)
=1
16�42(�8 � �24); (55)
Eg
"@ ln f(y)
@c0� @ ln f(y)
@c2
#=
1
8c40Eg[(y
2 � �2)(y4 � �4)]
=1
8c40Eg[y
6 � �4y2 � �2y
4 + �2�4]
=1
8�42[�6 � �4�2]: (56)
Therefore,
Jg =
264
14�4
2
(�4 � �22)1
8�42
(�6 � �4�2)
* 116�4
2
(�8 � �24)
375 : (57)
Evaluating J under the density f(y) � N(0; c0), we have
Jf =
264
12�2
2
18�4
2
(15� 3)�32
* 116�4
2
(105� 9)�42
375
=
264 1
2�22
32�2
32�2
6
375 ; (58)
and
J22f =
12�2
2
3�22
� 94�2
2
=1
2�22� 4�22
3=
2
3: (59)
Combining (49) and (59), the standard RS statistic is
RSc2 =n
16(b2 � 3)2 � 2
3= n
(b2 � 3)2
24: (60)
To derive the adjusted RS statistic, we need the K matrix,
19
Kg = �Eg
264
@2 ln f@c2
0
@2 ln f@c0@c2
* @2 ln f@c2
2
375 : (61)
After some algebra, it can be shown that
Eg
"�@
2 ln f
@c20
#=
2�22�32
+�44�42
� �224�42
� 2�22�32
=�4 � �224�42
: (62)
Note that this K11 term is the same as J11 in (57). Under H0, we have �4 = 3�22, and this
quantity reduces to
Eg
"�@
2 ln f
@c20
#=
3
4�22� 1
4�22=
1
2�22: (63)
Since@2 ln f
@c22
�����c2=0
= � y6
3c30� �8
16c40+2�66c30
+�44c20
�44c20
; (64)
we have
K22 = Eg
"�@
2 ln f
@c22
#
=�63�32
+�816�42
� �63�32
� �2416�42
=1
16�42(�8 � �24)
= J22; (65)
as seen from (57). Finally, let us �nd K12. We have
@2 ln f
@c0@c2
�����c2=0
= � y4
2c30� �6
8c40+2�44c30
+�44c20
�22c20
; (66)
and hence,
20
K12 = Eg
"�@
2 ln f
@c0@c2
#=
�42�32
+�68�42
� �42�32
� �48�32
=1
8�42(�6 � �4�2)
= J12;
as in (57). Thus, we have
Jg = Kg =
264
�4��224�4
2
18�4
2
(�6 � �4�2)
* 116�4
2
(�8 � �24)
375 : (67)
Under the null hypothesis of no excess kurtosis, i.e., under �4 = 3�22, the above matrix
reduces to
264
12�2
2
18�4
2
(�6 � 3�32)
* 116�4
2
(�8 � 9�42)
375 : (68)
Now we use expression (17) to obtain RS�c2. First, note that since Jg = Kg, K22B22K
22 =
K22. Therefore, (17) can be written as
RS�c2 =1
n~s02 ~K
22~s2; (69)
where K22 = (K22 �K21K�111 K12)
�1. We already had ~s2 =n4(b2 � 3) in (49). Now
K22 �K21K�111 K12 =
1
16�42(�8 � 9�42)�
2�2264�82
(�6 � 3�32)2
=1
32�62[2�8�
22 � 27�62 � �26 + 6�6�
32]: (70)
Let us simplify (70) further by de�ning �2r = �2r+r
�r+12
, r = 1; 2; : : :, i.e., �4 = �6�32
, and
�6 =�8�42
. Thus, we have
K22 �K21K�111 K12 =
1
32[2�8�42� 27� (
�6�32)2 + 6
�6�32]
=1
32[2�6 � 27� �2
4 + 6�4]: (71)
21
Therefore, from (69), the adjusted RS statistic is given by
RS�c2 =n
16(b2 � 3)2:
32
2b6 � 27� b24 + 6b4
= n2(b2 � 3)2
2b6 � 27� b24 + 6b4: (72)
If we impose normality, then �6 = 105; �4 = 15, and we can replace 2b6�27�b24+6b4 by (2�105�27�15�15+6�15) = 48. Then RS�c2 reduces to n(b2�3)2=24 the same as our unadjustedRS test for kurtosis in (60). There is one problem with this adjusted form. The variance
formula (71) does not explicitly take into account the possible presence ofasymmetry through
�1, although it does allow for some other higher moment (sixth and eighths) violations.
Therefore, we try our previous general formula (44) to obtain an alternative expression for
V (b2). After some derivations, we have [see, for example, Stuart and Ord (1994, p. 349)]
V (b2) =1
n[�8�42
+ 64�1 � 8�5�3�42
� 12�6�32
+ 99]: (73)
Utilizing �8 = 105�42 and �6 = 15�32 but still allowing for asymmetry through �3 and �5,
we can write
V (b2) =1
n[24 + 64�1 � 8
�5�3�42
]: (74)
Let us de�ne �2r+1 = �3�2r+3=�r+32 , i.e., �1 = �23=�
32 and �3 = �3�5=�
42. Then we have a
simple form for V (b2),
V (b2) =1
n[24 + 64�1 � 8�3]:
Therefore, the extra quantity due to asymmetry is n�1(64�1 � 8�3). Hence, the RS
statistic for testing excess kurtosis that takes into account the possible presence of asymmetry
can be written as
RS��c2 = n(b2 � 3)2
24 + 64b1 � 8b3: (75)
We will use this form of the adjusted test rather than that given in (72) in our simulation
study. We should, however, note that although we do not study RS�c2 any further since it
22
does not serve our special purpose of taking account of asymmetry, this statistic might be
better than the unadjusted test RSc2 in certain circumstances.
6 Comparison of the Standard and Adjusted Tests
for Excess Kurtosis: Some Simulation Results
In Table 3 we present the simulation results with 5000 replications. The design of our
simulation study is very similar to that in Table 1, except that here we study tests for excess
kurtosis instead of asymmetry. Again, for the standard b2 test [see equation (60)], we use
the tabulated values from Pearson and Hartley (1976, p. 184), and for the adjusted test
[see equation (75)], we use asymptotic �21 values. When the data are generated under the
normal distribution, the standard test has rejection probabilities nearly twice the nominal
sizes of 1 and 5%. The modi�ed test is also oversized for the 1% level but is undersized
when the nominal level is 5%. Under the t-distribution, when the standard test should
have been ideal (since there is no asymmetry), as expected, the test has reasonable power.
The adjusted test uses some redundant adjustments but still its estimated powers are very
close to those of the standard test and, in some cases, are even higher. Therefore, we do
not loose much power in using the adjusted test when the standard test would have been
good enough. The next data set is generated according to Ramberg et al. (1979). This
data set is from a non-normal distribution but with the same structure as the normal, i.e.,
symmetric and with no excess kurtosis. It is not surprising that the results are very similar
to those of the normal distribution. We then used generated data with no excess kurtosis but
with a modest amount of asymmetry (p�1 = 0:85). The results reveal the drawback of the
standard test{it rejects the true null too often, whereas the adjusted test also rejects more
than 1 or 5%, but not excessively. In the last two data sets, we have the same amount of
excess kurtosis (�2 � 3 = 4) but di�erent skewness structures. The estimated powers of the
23
adjusted test seem to be slightly higher than those of the standard test. Also, the adjusted
version appears to be less in uenced by the presence of asymmetry (�1 = 0:5) in terms of
having less uctuations in the estimated powers, as move from the symmetric to asymmetric
data. Although the �nite sample performance of our adjusted test is very good both in terms
of size and power, we should note that it requires the existence of higher moments.
7 An Empirical Illustration
Our empirical illustration is done with some �nancial data. Considerable attention has been
paid to the asymmetry and excess kurtosis of return distributions. Beedless and Simkowitz
(1980), Singleton and Wingender (1986), and Alles and King (1994) have all studied the
symmetry of returns using thepb1 test. DeFusco, Kareles, and Muralidar (1996) correctly
argued that the standardpb1 test is not accurate enough to provide correct information on
the skewness of the distribution. However, they did not suggest any modi�cation to thepb1
test that could be used in the presence of excess kurtosis. Mandelbrot (1963) was probably
the �rst to systematically study the excess kurtosis in the �nancial data, as he noted that the
price changes were usually too peaked and thick tailed compared to samples from the normal
distribution. To take into account this excess kurtosis, Mandelbrot (1963, 1967), Fama
(1965) and others suggested the use of stable distribution. Blattberg and Gonedes (1974)
and Tucker (1992) examined the use of Student's t-distribution. More recently, Premaratne
and Bera (2000) advocated the use of the Pearson type IV distribution, which is a special
case of (21) that takes care of both asymmetry and excess kurtosis in a simple way. However,
there still remains the question of testing for the presence of asymmetry and excess kurtosis
separately while allowing for the presence of the other e�ect. In our empirical illustration we
use tests discussed in the earlier sections.
We considered daily S & P 500 returns from the Center for Research in Security Prices
(CRSP) database from August 27, 1990. We tried two sample sizes, 300 and 500. The results
24
Table 3: Estimated Sizes and Powers of the Two Tests for Excess Kurtosis with 5000 Repli-cations
DGP and Standard Test Modi�ed Test
Sample Sizes 1% 5% 1% 5%
DGP: N(0,1) (�1 = 0; �2 = 3:0)
50 1.46 9.46 1.92 3.16100 1.80 9.66 2.18 3.60200 1.68 9.58 1.72 3.90250 2.12 10.48 1.70 4.52
DGP: t10 (p�1 = 0; �2 = 4:0)
50 10.72 24.76 7.20 10.92100 16.76 34.56 14.90 21.74200 30.56 51.36 29.76 39.42250 36.52 57.92 35.62 47.12
DGP: Generated Data (p�1 = 0; �2 = 3)
50 1.44 8.16 1.40 2.40100 1.54 9.08 1.62 3.32200 1.26 8.26 1.22 3.32250 1.12 7.84 0.86 3.08
DGP: Generated Data (p�1 = 0:85; �2 = 3:0)
50 3.98 15.46 2.98 4.42100 3.36 11.86 3.34 5.66200 3.38 14.80 2.46 5.58250 3.34 14.40 2.30 5.40
DGP: Generated data (p�1 = 0; �2 = 7:0)
50 11.06 36.70 20.38 29.06100 38.60 65.30 48.70 58.96200 72.47 77.06 91.44 81.70250 88.20 96.18 90.84 94.88
DGP: Generated Data (p�1 = 0:5; �2 = 7:0)
50 09.52 22.56 18.18 26.34100 36.22 52.14 46.14 56.56200 74.66 80.98 90.64 87.98250 84.62 95.00 87.98 93.26
25
are given in Table 4. For the standardpb1 test, the critical values at 1% were 0.329 and
0.255, respectively, for n = 300 and 500 [see Pearson and Hartley (1976, p. 183)]. Therefore,
we rejected symmetry. However, our adjustedpb1 test supported the null hypothesis of
symmetry using �21 values. As we noted in our simulation study, the standard
pb1 test
over-rejects the null of symmetry in the presence of excess kurtosis. The two-sided critical
values for b2 at 1% are (2.46,3.79) and (2.57, 3.60), respectively, for n = 300 and 500 [see
Pearson and Hartley (1976, p. 184)]. The two kurtosis tests gave similar results implying
excess kurtosis in the data for both sample sizes. We get similar results possibly due to either
the absence of asymmetry (as revealed by our adjustedpb1 test) or the strong presence of
excess kurtosis.
Table 4: Use of Tests for Daily Returns of the S & P 500 Index
Observations Test for Asymmetry Test for Kurtosispb1 Adjusted
pb1 b2 Adjusted b2
300 0.376� 3.10 3.86� 12.08�
500 0.264� 1.36 4.62� 57.87�
*Signi�cant at the 1% level.
8 Conclusion
In this paper we suggested some adjustments to the standardpb1 and b2 tests, allowing
for excess kurtosis and asymmetry, respectively. From our simulation results, we noticed
that the adjusted tests performed very well compared to their unadjusted counterparts. The
adjusted tests were quite immune to the misspeci�cations that can arise through thick tail
and the lack of symmetry. Of course, more simulation work and empirical applications are
needed to further con�rm the good properties of our suggested tests.
26
References
A�eck-Graves, J. and B. McDonald (1989), Nonnormalities and tests of asset pricing theo-
ries, Journal of Finance, 44, 889-907.
Alles, L,A, and K.L King (1994), Regularities in the variation of skewness in asset returns,
Journal of Financial Research, 17, 427-438.
Bera, A.K. and C. M. Jarque (1981), An eÆcient large-sample test for normality of observa-
tions and regression residuals, Working Paper in Economics and Econometrics, Number
40, The Australian National University, Canberra.
Bera, A.K. and S. John (1983), Test for multivariate normality with Pearson alternatives,
Communication Statistics{Theory and Methods, 12, 103-117.
Blattberg, R. and N. Gonedes (1974), A comparison of stable and Student's t distributions
as statistical models for stock prices, Journal of Business, 47, 244-280.
Bowman, K. O. and L. R. Shenton (1975), Omnibus contours for departures from normality
based onpb1 and b2, Biometrika, 62, 243-250.
Christie-David, R. and M. Chaudhry (2001), Coskewness and cokurtosis in futures markets,
Journal of Empirical Finance, 8, 55-81.
Cox, D.R. and D.V. Hinkley (1974), Theoretical Statistics, Chapman and Hall, London.
D' Agostino, R.D., and E.S. Pearson (1973), Tests for departure from normality: Empirical
results for the distributions of b2 andpb1, Biometrika, 60, 613-622.
DeFusco, R.A., G.V. Karels and K. Muralidhar (1996), Skewness persistence in U.S. com-
mon stock returns: Results from bootstrapping tests, Journal of Business Finance and
Accounting, 23, 1183-1195.
Fama, E.F. (1965), The behavior of stock market prices, Journal of Business, 38, 34-105.
Fisher, R.A. (1930), The moments of the distribution for normal samples of measures of
departure from normality, Proceedings of the Royal Society, A, 130, 16-28.
Godfrey, L.G. and C.D. Orme (1991), Testing for skewness of regression disturbances, Eco-
nomic Letters, 37, 31-34.
Jarque, C.M. and A.K. Bera (1987) Test for normality of observations and regression resid-
uals, International Statistical Review, 55, 163{172.
27
Kent, J.T. (1982), Robust properties of likelihood ratio tests, Biometrika, 69, 19-27.
Mandelbrot, B. (1963), The variation of certain speculative prices, Journal of Business, 36,
394-419.
Mandelbrot, B. (1967), The variation of some other speculative prices, Journal of Business,
40, 393-413.
Pearson, E.S. (1930), A further development of tests for normality, Biometrika, 22, 239-249.
Pearson, E.S. (1963), Some problems arising in approximating to probability distributions,
using moments, Biometrika, 50, 95-112.
Pearson, E.S. and H.O. Hartly (1976), Biometrika Tables for Statisticians, London, Biometrika
Trust.
Pearson, K. (1895), Contribution to the mathematical theory of evolution-II: Skewed varia-
tion in homogeneous material, Philosophical Transactions of the Royal Society of London,
A 186, 343-414.
Premaratne, G. and A. K. Bera (2000), Modeling asymmetry and excess kurtosis in stock
return data, OÆce of Research Working Paper Number 00-0123, University of Illinois.
Ramberg, J.S., E.J. Dudewicz, P.R. Tadikamalla and E.F. Mykytka (1979), A probability
distribution and its uses in �t, Technometrics, 21, 201-213
Rao, C.R. (1948), Large sample tests of statistical hypotheses concerning several parameters
with applications to problems of estimation, Proceedings of the Cambridge Philosophical
Society, 44, 50{57.
Rao, C.R. (1973), Linear Statistical Inference and Its Applications, John Wiley & Sons, New
York.
Richardson, M. and T. Smith (1993), A test for multivariate normality in stock returns,
Journal of Business, 66, 295-321.
Singleton, J.C. and J. Wingender (1986), Skewness persistence in common stock returns,
Journal of Financial and Quantitative Analysis, 21, 336-341.
Beedless, W.L., and M.A. Simkowitz (1980), Morphology of asset asymmetry, Journal of
Business Research, 8, 457-468.
28
Ser ing, R.J. (1980), Approximation Theorems of Mathematical Statistics. John Wiley &
Sons, New York.
Stuart, A. and J. K. Ord (1994), Kendall's Advanced Theory of Statistics Vol 1: Distribution
Theory, Edward Arnold, London.
Tucker, A. L. (1992), A reexamination of �nite and in�nite-variance distributions as models
of daily stock returns, Journal of Business and Economic Statistics, 10, 73-81.
White, H. (1982), Maximum likelihood estimation of misspeci�ed models, Econometrica, 50,
1{25.
29