a revision of school effectiveness analysis

24
8/13/2019 A Revision of School Effectiveness Analysis http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 1/24  http://jebs.aera.net Behavioral Statistics Journal of Educational and  http://jeb.sagepub.com/content/37/1/157 The online version of this article can be found at:  DOI: 10.3102/1076998610396898 originally published online 12 August 2011  2012 37: 157 JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS Nicholas T. Longford A Revision of School Effectiveness Analysis  Published on behalf of  American Educational Research Association and  http://www.sagepublications.com found at:  can be Journal of Educational and Behavioral Statistics Additional services and information for http://jebs.aera.net/alerts Email Alerts: http://jebs.aera.net/subscriptions Subscriptions: http://www.aera.net/reprints Reprints: http://www.aera.net/permissions Permissions: What is This?  - Aug 12, 2011 OnlineFirst Version of Record  - Feb 1, 2012 Version of Record >> by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from by guest on October 14, 2013 http://jebs.aera.net Downloaded from 

Upload: novri-suhermi-riemann

Post on 04-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 1/24

http://jebs.aera.netBehavioral Statistics

Journal of Educational and

http://jeb.sagepub.com/content/37/1/157The online version of this article can be found at:

DOI: 10.3102/1076998610396898originally published online 12 August 2011

2012 37: 157JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS

Nicholas T. LongfordA Revision of School Effectiveness Analysis

Published on behalf of

American Educational Research Association

and

http://www.sagepublications.com

found at: can beJournal of Educational and Behavioral Statistics

Additional services and information for

http://jebs.aera.net/alertsEmail Alerts:

http://jebs.aera.net/subscriptionsSubscriptions:

http://www.aera.net/reprintsReprints:

http://www.aera.net/permissionsPermissions:

What is This?

- Aug 12, 2011OnlineFirst Version of Record

- Feb 1, 2012Version of Record>>

by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from by guest on October 14, 2013http://jebs.aera.netDownloaded from

Page 2: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 2/24

A Revision of School Effectiveness AnalysisNicholas T. Longford

SNTL and Departament d’Economia i Empresa, Universitat Pompeu

Fabra, Barcelona, Spain

Statistical modeling of school effectiveness data was originally motivated by the

dissatisfaction with the analysis of (school-leaving) examination results that

took no account of the background of the students or regarded each school

as an isolated unit of analysis. The application of multilevel analysis was generally regarded as a breakthrough, although more recent assessments of

how they satisfy the goals of school effectiveness studies, to compare

the performances of schools, are much more guarded. This article shows that

the association of the school effects with randomness is not necessary, because

strength can be borrowed across the analy z ed schools even when they

are associated with fixed effects. The methods are illustrated on a reanalysis of

the data from an early study of school effectiveness. It also addresses the

problem of excess zero outcomes by treating them as censored (truncated).

Keywords: borrowing strength; censoring; composite estimator; multilevel analysis;

multiple imputation; school effectiveness

Introduction

A school effectiveness study is concerned with comparing the outcomes,

usually the results of a final (school-leaving) examination, across schools, with

an appropriate adjustment for the background of their students. Multilevel anal-

ysis (Aitkin & Longford, 1986; Goldstein, 2003; Raudenbush & Bryk, 2002) is

the established method for such studies. It combines ordinary or generalized

linear regression with a model for the variation of some of its coefficients

across the schools. Modeling the school-level differences by random coeffi-

cients is generally regarded as essential, because the alternative, regarding

them as fixed and applying the analysis of covariance (ANCOVA), has long

been rightly perceived as grossly inefficient, especially for schools that have

small sample sizes in the study.

This article presents an alternative view in which associating schools with

random terms is inappropriate, and the inefficiency of ANCOVA, with fixed

effects, is addressed by a method that is efficient in small samples. Our perspec-

tive is centered on the replication scheme implied by the adopted model and how

Journal of Educational and Behavioral Statistics

February 2012, Vol. 37, No. 1, pp. 157–179

DOI: 10.3102/1076998610396898

# 2012 AERA. http://jebs.aera.net

157

Page 3: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 3/24

it conforms with the context of the study. We argue in the next section that ran-

dom school effects imply a scheme that is unnatural and show that it results in a

biased assessment of the precision of the estimators. This source of bias is not

recognized in the established approaches. The technical details are given in the

section Evaluation With the Fixed-Effects Assumptions. An estimator that corre-

sponds to our perspective more closely, and incorporates borrowing of strength

(Carlin & Louis, 2000; Efron & Morris, 1972; Robbins, 1955), is presented in the

section A Composite Estimator. We derive its mean squared error (MSE); unlike

its established counterpart, it incorporates the uncertainty about the regression

parameters. An application is presented in the section Application, reanalyzing

the study of Aitkin and Longford (1986). The section Zero Outcomes deals with

the outcomes that are equal to zero, an obvious source of their nonnormality, by

regarding them as censored, or truncated, and applies multiple imputation to fit

the regression for the underlying outcomes.

Our main conclusion is that the designation of school effects as ‘‘fixed’’ or

‘‘random’’ is not important for their estimation but can make a lot of difference

in the estimation of the corresponding MSEs. We show that the established

estimator, based on a random-effects model, is not uniformly more efficient than

the long-discarded estimator based on the (fixed-effects) ANCOVA and that the

MSE of the estimator of a school effect depends on the school effect itself.

Instead of estimating the MSE for each school by a single quantity, we draw the

plausible MSE as a function defined in a range of plausible values of the school’s

effect. In the reanalysis of Aitkin and Longford (1986), we show that these func-

tions are flat for most schools but attain a wide range of values for two schools

with the smallest sample sizes.

Fixed or Random?

We consider a school effectiveness study in which the outcomes yij of a

(standardized) test or assessment are available for all students i ¼ 1, . . . , n j of

the final year of study (a cohort) in schools j

¼ 1, . . . , J in a given academic year,

together with the values xij of a vector of relevant covariates. The standard approach is based on the two-level linear model

yij ¼ b0 j þ x ij b þ "ij ; ð1Þin which b is a vector of unknown parameters, the school-specific intercepts b0 j

are a random sample from a univariate normal distribution, N ðb0 ;s2 BÞ, and eij

are a random sample from N (0, s2). The two random samples are independent,

and their variances s2 and s2 B are unknown. We denote by o the variance ratio;

o

¼ s2

B=s2:

The model in (1), with parallel within-school regressions, has a variety of

extensions which include more flexible patterns of school-level variation

(random slopes, that is, associating variation with some covariates in addition

Longford

158

Page 4: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 4/24

to the intercept), school-specific or modeled residual variances s2 j , adaptations

for distributions other than the normal within the generalized linear framework,

and incorporating further sources of variation in three-level or, in general, multi-

level models, with a structure of clustering that is not necessarily hierarchical

(Rasbash & Goldstein, 1994; Raudenbush, 1993). The assumption of normality

of b0 j can also be dispensed with. For example, their distribution may be a mix-

ture of unrelated normal distributions (McLachlan & Peel, 2000).

The random terms b0 j in (1) can be separated from their expectation b0 and

this parameter can be absorbed in the linear predictor xij b. We obtain the model

yij ¼ x ij b þ d j þ "ij ; ð2Þ

in which every xij is supplemented by a term equal to unity ( xij 0

¼ 1) and b by b0,

representing the average intercept.

Within the frequentist paradigm, inferences are made with reference to

hypothetical replications. For example, the bias of an estimator is defined as its

average deviation from the target (estimand) across replications, and its MSE as

the average of the squares of these deviations. A quantity, such as a parameter, is

declared as fixed when it has the same value in every replication of the study.

A quantity is declared as random when its value is generated in the replications

by a random mechanism. For example, for given indices i and j , eij in (1) and (2)

is drawn at random from a normal distribution, independently across the replica-

tions. Students are associated in these models with random terms eij , because dif-

ferent students may enroll in a school in a hypothetical replication of the study,

and even if the enrollment were the same, the students may perform (slightly)

differently in the (replicate) final exams. We want to assess how the school, with

all its staff, management, practices, ethos, and other attributes, that is, with its

educational process, would have performed with different sets of students.

To promote realism, we assume that such hypothetical sets of students would

have a similar profile (distribution of backgrounds) as the realized set.

The effect of a school’s educational process on its students is subject to

uncertainty, and this is accounted for by the residual variance s2. However, in a

replication, we would study the same schools, with the same educational processes,

because the assessment of the effectiveness of a school, by its position in a league

table or by some other means, refers to a particular school, in the context of a par-

ticular set of schools, well identified to an assessor or a funding or auditing agency,

even if it is made anonymous for an analyst and the research community.

When schools are associated with random effects, a fresh set of them appears

in every replication. Estimation of any quantity associated with a school is then

problematic in the frequentist perspective, because the school appears in replica-

tions only sporadically or not at all. If we make the concession that the same set

of schools appears in every replication, then b0 j for any given school j is like a

moving goalpost, changing from one replication to the next, defying any attempt

School Effectiveness

159

Page 5: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 5/24

at its estimation that would not be tied to the realised replication, other than by

the unconditional expectation b0. In general, the designation of an effect as fixed

or random is not innocuous, because the data generated in replications entail

more variation with a random effect than with a fixed (constant) effect.

The impact of such additional variation on the inferences made depends on the

details of the inferential task and the methods applied.

In brief, we contend that for the inferences about a specific set of schools,

the schools have to be associated with fixed effects. This does not rule out

the application of random-effects models, because estimating some quantities

based on a model that is not valid is not in conflict with any statistical prin-

ciple, so long as the lack of validity is reflected in the inferential statements

we make. That is, the estimators used should be evaluated under the assump-

tion of a model that is, ideally, valid , or at least more credible than the model

used. Assuming in the evaluation that the school effects are fixed addresses

this point.

In the next section, we show that inferential statements based on the so-called

best linear unbiased predictors (BLUP), used as estimators, are not valid in two

important aspects: The MSEs are not correct, not even approximately or asymp-

totically, as J ! ?, and BLUP is not more efficient than the established

ANCOVA-based estimator for every school, although it is for a majority.

Notwithstanding these reservations, the established analysis, based on the model

in (1) or its extension, is useful, but the conclusions drawn (or implied) by the

established approach are somewhat optimistic and have to be carefully qualified.

In the analysis in the section Application, they relate to two schools (out of 18 in

the study) with the smallest enrolment.

Evaluation With the Fixed-Effects Assumptions

In this section, we study the properties of BLUP under the assumption that the

school-specific effects d j are fixed. To avoid distractions that are peripheral to

our argument, we assume in this section that the regression parameters in b aswell as the variances s2

B and s2 are known. The conditional expectation of the

deviation d j in (2) given the data defines the BLUP estimator of d j :

d j ¼ o

1 þ n j o e T

j 1 ;

where e j ¼ ð y1 j x 1 j b ; . . . ; yn j j x n j j bÞTis the vector of residuals for school j

and 1 is the vector of unities of length (n j ) implied by the context, so that e T j 1 is

the within-school total of the residuals. In the standard approach, we claim that

the MSE of d j is equal to s2 B=ð1 þ n j oÞ, the conditional variance of d j .

Assuming that d j is a fixed quantity is, in effect, the same as conditioning on

its value. We have E (e j | d j ) ¼ d j 1 and

Longford

160

Page 6: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 6/24

E d j

d j

¼ n j o

1 þ n j o d j ;

so the bias of d j is – d j / (1

þ n j o). As a predictor, in the original setting of BLUP

(Henderson, 1975; Robinson, 1991), d j is unbiased. In contrast, in our setting,

involving estimation, d j is biased. Further, the sampling variance of d j is

var o

1 þ n j o e T

j 1

d j

¼ n j s

2o2

ð1 þ n j oÞ2 ; ð3Þ

because, with d j fixed and b known, var ðe T j 1Þ ¼ var ðn j d j þ "1 j þ þ "n j j Þ

¼ n j s2: Hence the MSE of d j is

MSE d j ; d j

¼ n j s2o2

ð1 þ n j oÞ2 þ d

2 j

ð1 þ n j oÞ2

¼ n j s2

Boþ d2 j

ð1 þ n j oÞ2 :

ð4Þ

The prediction-sampling variance of BLUP, s2 B=ð1 þ n j oÞ; is commonly

claimed to be the MSE of d j . However, it coincides with the MSE in (4) only

when d2 j ¼ s2

B , that is, for schools whose regressions have the typical deviation

d j ¼ +s B from the average regression xb.With the assumption of random d j , we would claim that the estimator d j is

more efficient than the ANCOVA estimator, which, when b in (1) is

assumed to be known, has the variance s2/n j . This claim is based on the

inequality

s2 B

1 þ n j o <

s2

n j

for all n j . The difference of the two sides diminishes as o ! þ?, or as n j ! ?

while o > 0. The corresponding inequality for d j fixed,

n j s2

Boþ d2 j

ð1 þ n j oÞ2 <

s2

n j

;

is equivalent to

d2 j < 2s2

B þ s2

n j

:

Thus, the ANCOVA estimator is more efficient than the BLUP for the schools for

which jd j j > ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2s2

B þ s2=n j p ¼ s ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2o þ 1=n j p

. In a typical congenial setting,

this is not a trivial proportion of schools. For illustration, suppose the values

of d j , as a collection, are compatible with the normal distribution N ð0;s2 BÞ and

School Effectiveness

161

Page 7: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 7/24

s2 ¼ 10s2 B . Then for schools with n j ¼ 50,

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2s2

B þ s2=n j

p ¼ 1:48s B : A value

d drawn at random from N ð0;s2 BÞ is outside the range (–1.48 s B, 1.48 s B) with

probability .14. So, when s2 B ¼ s2=10, BLUP is inferior to ANCOVA for about

one in seven of the schools with n j ¼ 50. The limits within which BLUP is moreefficient than ANCOVA get narrower with increasing n j , but only slightly. For

example, for n j ¼ 100 (and s2 B ¼ s2=10), the limits are 1:45s B and they

converge to 1:41s B as n j ! þ?. For a given value of s2, the upper limit

s ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2o þ 1=n j

p increases with o.

We cannot identify the ‘‘exceptional’’ schools for which ANCOVA is

more efficient than BLUP. For schools with small n j , assuming that

s2 B s2, the threshold of 2s2

B þ s2=n j is large, so we need not be concerned

with the possible inefficiency of BLUP. For schools with large n j , the deviation

d j is estimated with high precision by both BLUP and ANCOVA, so the possibleinefficiency is inconsequential. However, the issue is relevant for intermediate

sample sizes n j .

A Composite Estimator

In this section, we derive an estimator based on the model with fixed school

effects. By construction, it is superior to BLUP, although it differs from it only

slightly. Its practical advantage is that an expression obtained for its MSE incor-

porates the uncertainty about the regression parameters.

The BLUP d j can be interpreted as a shrinkage estimator, pulling the unbiased

estimator e T j 1 toward zero, its unconditional expectation. The BLUP can also be

described as a composition (combination) of two estimators, e T j 1 and the iden-

tical zero. The former is unbiased but has a relatively large sampling variance and

the latter has no sampling variance, but differs from d j , and is therefore biased for

d j . Composite estimators can be applied more generally, whenever there are sev-

eral contending estimators (Longford, 2008, Chapter 1). In many contexts, we

select one of the contenders. Composition has a greater potential than selection, because by selection we can, at best, only match the performance of the most

efficient contender. In contrast, composition may yield an estimator more efficient

than any of the contending estimators. Indeed, with some qualifications, BLUP is

an important example.

Within the framework of ANCOVA, we consider two estimators: the standard

(ordinary least squares, OLS) estimator b0 j and the ‘‘averaged’’ estimator

b0 ¼ ðb01 þ þ b0 J Þ= J . The former is unbiased but has a relatively large sam-

pling variance of at least s2/n j . The latter has a much smaller sampling variance,

especially when the number of schools, J , is large, but has the bias b0 j – b0.The composition of b0 j and b0 is an alternative to the standard ANCOVA in

which one of these estimators is selected on the basis of a hypothesis test.

Longford

162

Page 8: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 8/24

The following proposition states how the optimal coefficients in the composition

are derived in a general setting.

Proposition

Let y0 and y1 be two distinct estimators of the same quantity y. Suppose y0 is

unbiased and the bias of y1 is equal to B. Let V 0 ¼ var ðy0Þ and V 1 ¼ var ðy1Þ and

C ¼ covðy0 ; y1Þ. Then the composite estimator ~y ¼ ð1 bÞ y0 þ b y1 has the

smallest MSE for

b ¼ V 0 C

V 0 þ V 1 2C þ B2 : ð5Þ

The minimum attained is

MSE ~yðbÞ; yn o

¼ V 0 V 0 C ð Þ2

V 0 þ V 1 2C þ B2 : ð6Þ

The proof is given in the Appendix. It is easy to check that the MSE in (6) is

smaller than both V 0 and V 1 þ B2, the respective MSEs of y1 and y2 .

To apply this proposition to estimating b0 j , we require expressions for the

variances and the covariance of b0 j and b0 . The sampling variance matrix of

the regression parameter estimators in ANCOVA is derived from the matrix of the totals of squares and crossproducts of the indicators of the schools and the

covariates. We partition this matrix as

s2 A B

B T D

1

; ð7Þ

where the J J block A ¼ diag (n j ) corresponds to the school-level intercepts b0 j

and D ¼

P j

Pi x

Tij x ij to the covariates. The J rows of the off-diagonal block B

are the within-school totals xþ j

¼ x 1 j

þ þ x n j j . For a given school j , we com-

bine the estimators b0 j with their average b0 . In the Appendix, we show that the

matrix in (7) can be expressed as

s2 A1 þ A1 B GB T A1 A1 BG

GB T A1 G

; ð8Þ

where G ¼ ( D – BT A –1 B) –1. In our case,

G ¼ X J

j ¼1X

n j

i¼1

x Tij x ij n j x

T j x j !( )

1

;

and A –1 B comprises the vectors of within-school means x j ¼ n1 j xþ j as its

J rows. Further, using the notation of the proposition for estimating b0 j ,

School Effectiveness

163

Page 9: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 9/24

V 0 ¼ var b0 j

¼ s2 1

n j

þ x j G x T j

V 1 ¼ var b0 ¼ s

2 1

J 2X J

h¼1

1

nh þ x 0 G xT0

!

C ¼ cov b0 j ; b0

¼ s2 1

Jn j

þ x j G x T0

B ¼ E b0

b0 j ¼ b0 b0 j ;

where x 0 ¼ J 1ð x 1 þ . . . þ x J Þ is the vector of the means of the within-school

means and b0 ¼ (b01 þ . . . þ b0 J )/ J . Note that V 0, C , and B depend on j , but

V 1 does not. The numerator of the coefficient

b in (5) is

V 0 C ¼ s2 1

n j

1 1

J

þ x j G x j x 0

T

and the denominator is

V 0 þ V 1 2C þ B2 ¼ s2 1

n j

2

Jn j

þ 1

J 2

X J

h¼1

1

nh

þ x j Gx T j 2 x j G x T

0 þ x0 G x T0

( )

þ b0 j b0 2

¼ s2 1

n j

1 1

J

2

þ 1

J 2

Xh6¼ j

1

nh

þ x j x 0

G x j x 0

T( )

þ b0 j b0

2:

The (common) within-school variance s2 is usually estimated with

sufficient precision, so substituting its estimate s2 in both expressions has

negligible consequences. However, the squared deviation (b0 j – b0)2, also

unknown, cannot be estimated directly with the precision required. We substitute

for it its school-level expectation, E ½ j fðb0 j b0Þ2

g ¼ s2

B : The subscript [ j ] indi-cates that the expectation (averaging) is over the schools. For s2

B to be well

defined, we do not have to subscribe to the random-effects perspective, because

we can estimate

s2 B ¼ 1

J

X J

j ¼1

b0 j b0

2;

the version of the school-level variance which refers to the particular set of

schools in the study, not to any superpopulation.

Note that the uncertainty about the regression parameters is taken into account

throughout, unlike in the standard empirical Bayes (BLUP) analysis. The weak-

ness of our approach is the replacement of the squared deviation (b0 j – b0)2 by the

Longford

164

Page 10: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 10/24

Page 11: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 11/24

s2 B ¼ S s2

J 1 1

J

X J

j ¼1

1

n j

þX J

j ¼1

x j x0

G x j x0

T

( ):

It is unbiased if an unbiased estimator s2 is used. Note that the estimator o ¼ s2

B=s2 is biased for o, but the bias is small when s2 is estimated with many

degrees of freedom.

In summary, we propose the composite ANCOVA estimator

~b0 j ¼ 1 b j

b0 j þ b j b0 ; ð9Þ

with

b j ¼1

n j

1

1

J þ

x j G x j

x0 T

1n j

1 1 J

2þ 1 J 2

Ph6¼ j

1nh

þ P J

j ¼1

x j x0

G x j x0

Tþ o; ð10Þ

and estimate its MSE by

s2 1

n j

þ x j G x j x0

T b j

1

n j

1 1

J

þ x j G x j x0

T

;

the naive estimator of (6). The estimators s2 and o are the only sources of uncer-

tainty in these expressions. A more precise but more complex alternative uses a

range of plausible values of the ratio (b0 j – b0)2 / s2 in place of o. We refer to the

established ANCOVA estimator as ANCOVA-OLS, to distinguish it from the

composite estimator given by (9).

Application

The data analyzed originally by Aitkin and Longford (1986) comprise the

scores compiled on the O-level examinations of school-leavers in 18 secondary

schools in a Local Educational Authority in England ( LEA scores) and the scores

on a general scholastic aptitude test, the Verbal Reasoning Quotient, VRQ, estab-

lished by a test soon after enrollment a few years earlier. The sex of each student

is recorded. There are two single-sex schools, with 21 and 22 students, respec-

tively. Their students have much higher LEA scores on average than the other

schools, but their average VRQ scores are also much higher. The other schools

have between 39% and 56% girls among 29 to 79 students. The dataset comprises

907 students in total, 477 boys and 430 girls. Students who do not take any

school-leaving examinations have zero LEA score. The scores are integers; the

highest score in the dataset is 68, attained by two students, and zero score is

attained by 138 students. The VRQ scores are also integers, in the range 70 to 140.

Table 1 lists the estimates and estimated standard errors of school-level devia-

tions d j based on the empirical Bayes model (BLUP) and ANCOVA, with

Longford

166

Page 12: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 12/24

composition and without, that is, using OLS (marked by dagger). This analysis

ignores the obvious nonnormality due to the zero lower limit of the LEA scores.

We address this problem in the next section. The BLUP and ANCOVA compo-

site estimates are not pairwise comparable, because the former add up to zero,

whereas the latter do not. In fact, the OLS estimates, obtained prior to composi-tion, add up to zero, but after their (uneven) shrinkage the composite estimates do

not. No translation brings the two sets of estimates into a close agreement,

because the ANCOVA composite estimates are dispersed more than BLUP. With

BLUP, more shrinkage takes place—compare the corresponding values of b j .

The estimated standard errors for BLUP are much smaller than for ANCOVA

with composition. This difference is largely due to the stardards used in their cal-

culation. For BLUP, we ignore the uncertainty about b and o. For ANCOVA, the

uncertainty about b is accounted for, although the consequences of substituting

s2 B (and then s2

B) for the squared deviation (b0 j – b0)2 are ignored. This is equiv-

alent to substituting o for (b0 j – b0)2 / s2. The standard errors quoted in the

column headed St.e. refer to the parametrization withP

j b0 j ¼ 0. When shrinkage

TABLE 1

Estimates of the School-level Deviations Based on Empirical Bayes Model (BLUP) and

ANCOVA, With Composition (Middle Section) and Without (by OLS); Models With

Parallel Regressions

Sch.

BLUP ANCOVA OLS

n j b0 j St.e. b j ~b0 j St.e. St.e.0 b j

by0 j St.e.y

1 1.49 1.10 0.17 1.58 1.50 1.19 0.10 1.75 1.54 65

2 1.19 1.01 0.14 1.10 1.45 1.10 0.09 1.20 1.48 79

3 0.93 1.25 0.21 0.99 1.67 1.38 0.14 1.15 1.75 48

5 –2.76 1.26 0.22 –3.09 1.65 1.38 0.14 –3.58 1.73 47

6 0.43 1.09 0.17 0.28 1.54 1.20 0.11 0.31 1.59 66

7 –1.82 1.32 0.24 –2.21 1.71 1.50 0.15 –2.61 1.81 418 –1.84 1.21 0.20 –2.18 1.61 1.33 0.13 –2.49 1.68 52

9 –1.28 1.09 0.16 –1.45 1.49 1.17 0.10 –1.61 1.53 67

10 –0.13 1.24 0.21 –0.30 1.63 1.37 0.13 –0.35 1.70 49

11 3.32 1.26 0.22 3.56 1.67 1.38 0.14 4.13 1.75 47

12 –1.68 1.23 0.21 –1.92 1.61 1.34 0.13 –2.20 1.68 50

13 0.63 1.32 0.24 0.63 1.71 1.47 0.15 0.75 1.80 41

14 –1.15 1.24 0.21 –1.42 1.65 1.36 0.13 –1.63 1.72 49

15 –0.04 1.50 0.31 –0.11 1.89 1.74 0.20 –0.14 2.05 29

16 –1.17 1.06 0.15 –1.39 1.50 1.14 0.10 –1.54 1.54 72

17 –0.99 1.12 0.17 –1.18 1.54 1.22 0.11 –1.32 1.59 6220 7.49 1.64 0.37 9.22 2.00 2.10 0.25 12.34 2.25 22

21 –2.61 1.67 0.38 –2.90 2.24 2.14 0.30 –4.15 2.56 21

Note: St. e. ¼ standard error.

School Effectiveness

167

Page 13: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 13/24

is not applied, the estimates by0 j satisfy this constraint. However, after the differ-

ential shrinkage the estimated deviations have a nonzero total.

We are interested only in the contrasts of the deviations, so it may be more

appropriate to estimate the standard errors of ~b0 j ~b0 . They are given in the col-umn headed St.e.0. They are much smaller than the standard errors quoted for ~b0 j

(e.g., 1.10 vs. 1.45 for School 2), but they are still greater than the BLUP-related

standard errors, because they do not ignore the uncertainty about the regression

coefficients on VRQ and Sex. For completeness, the estimates and estimated

standard errors obtained by ANCOVA without shrinkage are given in the

right-most part of Table 1. Note that the standard error of any contrast

b0 j 1 b0 j 2

is estimated straightforwardly with BLUP because the estimators

^b0 j are assumed to be independent. In ANCOVA, they are dependent, providinganother reason for dismissing any straightforward comparison of the pairs of

standard errors. Suffice to say that the BLUP and ANCOVA estimators use sim-

ilar shrinkage coefficients, and are therefore functionally similar.

The standard errors can be studied in greater detail in Figure 1. In the panel atthe top, the estimated root-MSEs based on (4) are drawn as functions of the devia-tions d ¼ d j for d in the range ð~d j 2 s j ; ~d j þ 2 s j Þ, where s j are the commonly

reported (estimated) standard errors s B= ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1 þ n j op

, marked by circles. The black

discs () mark the root-MSEs evaluated at the estimates d j and the horizontal

ticks are drawn at the ANCOVA-OLS standard errors s= ffiffiffiffi

n j p . The three setsof estimated standard errors are summarised in the left-hand part of the panel

by vertical segments drawn in the descending order of the reported standard

errors. For most schools, the ANCOVA root-MSE is uniformly greater than the

standard error for any plausible value of the deviation d j . The two schools with

the fewest students are the exceptions, most notably school 20 (n20 ¼ 22), for

which the ANCOVA-OLS standard error is uniformly smaller. For School 21, the

reported standard error and the standard error evaluated at the estimate nearly

coincide because d21

¼:

s B .

The bottom panel of Figure 1 contains the estimated root-MSEs of theANCOVA composite estimators as functions of the deviation d j ¼ b0 j – b0.The ‘‘reported’’ root-MSEs are the (naively) estimated minima given by thesquare roots of (6), with o substituted for (b0 j – b0)2 / s2. In general, the root-

MSEs are greater than their random-effects counterparts, because they account

for the uncertainty about the regression on VRQ and Sex, but their dependence

on the value of b0 j (or d j ¼ b0 j – b0) is much weaker. For most schools, the

reported root-MSE is the smallest and the ANOVA-OLS root-MSE the largest, but

the two schools with the smallest sample sizes stand out as exceptions. Note that

the number of such exceptions is probably underrepresented in the diagram, because the values of d j are dispersed more than their estimates d j in both panels,

and more than just one of them is likely to be outside the range (– s B, s B).

Longford

168

Page 14: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 14/24

Zero Outcomes

About 15% of the students (138 out of 907) have zero LEA scores; they did not

take part in the final examinations in any subject or obtained no qualifying points

in them. The presence of such observations undermines the validity of the

−10 0 10

0

1

2

3

4

0

1

2

3

4

Deviation

S t a n d a r d e r r o r

Sch: 21

2015

13711

5314

1012

817

16

916

2−

Standard errorsReported

Eval. at estimate

ANCOVA

Random effects

−10 01 0

Deviation

S t a n d a r d e r r o r

Sch: 21

2015

713

311

514

1012

817

616

19

2

Standard errorsReported

Eval. at estimate

ANOVA (OLS)

Fixed effects

FIGURE 1. Standard errors (root-MSEs) of the random-effects (BLUP) and fixed-effects

(ANCOVA) estimators of the deviations b0 j , as functions of the deviation b j. The vertical

dashes are drawn at s B , the estimated school-level standard deviation.

School Effectiveness

169

Page 15: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 15/24

analysis, because the outcomes are distinctly nonnormally distributed, irrespective

of the conditioning (regression model) applied. We deal with this problem by

regarding such outcomes as truncated at zero, as if originally these outcomes were

negative, and fit the regression to the outcomes prior to truncation. There is exten-

sive literature on censoring (e.g., Klein & Moeschberger, 2004), but most of it is

related to survival analysis, with applications to medicine and engineering, involv-

ing distributions other than normal. For an application to educational research, see

Braun and Zwick (1993).

We treat truncation as a cause of data incompleteness and apply multiple

imputation (Longford, 2005, Part 1; Rubin, 2002; Schafer, 1997) to fit models

that refer to the complete dataset, in which some outcomes have negative values

but these values are not known. Two schools, numbers 20 and 21, have no stu-

dents with zero outcomes. Since the students in these schools have substantially

greater mean scores, we base the imputation only on the remaining 16

schools (864 students).

We apply the following iterative procedure. For a set of provisional underly-

ing (pre-truncation) values of y, we generate provisional plausible ordinary

regressions of LEA scores on VRQ and Sex and school. We do not use the OLS

fit, but draw a plausible residual variance ~s2 from its estimated sampling distri-

bution (scaled w2), and then a set of plausible regression parameters ~b from the

sampling distribution of b, in which s2 is replaced by ~s2. For each student with

zero recorded outcome, we generate replicate underlying outcome scores accord-ing to this regression until a negative value is obtained. This value is a random

draw from the conditional distribution of the score given that it is negative. It

replaces the previous provisional imputed value. The iterations are started with

all the values for the truncated outcomes set to zero, and they should be

concluded when convergence in distribution is reached for the imputed values.

Such convergence is difficult to assess, because the 138 values are associated

with several distinct distributions. However, doing more iterations than is necessary

causes no harm.

After preliminary exploration, we adjusted this algorithm as follows. First, wecarry out 20 iterations of (provisional) imputation by conditioning only on VRQ.

Then, we continue with further 20 iterations, separately for each of the 16 schools,

in which we condition also on Sex. For each provisional value (in every itera-

tion), we have to draw several values from a normal distribution, until we obtain

a negative value. We set the maximum number of such draws to 200; if none of

the 200 values is negative, then the provisional value is set to zero. We checked

in several settings, that such instances are very rare. In a set of 10 replications,

generating 10 138 plausible values of the underlying LEA scores, one student

had two zeros imputed and five others had one zero each.The procedure is computationally not demanding; with the code written in

R for the purpose, its execution requires about 0.7 sec. of CPU time. We replicate

Longford

170

Page 16: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 16/24

the procedure M ¼ 10 times, to generate 10 replicate completed datasets on

which we apply the estimation procedures described in the section Application.

The 10 sets of replicate (completed-data) estimates are averaged to obtain the

multiple-imputation (MI) estimates. Their sampling variances are also estimated

as the averages across the replications, with an inflation by the between-

imputation variance. For details and further background to MI, see Schafer

(1997) and Rubin (2002).

The estimates and estimated standard errors for the original (truncated) and the underlying outcomes for the random- and fixed-effects models are listed in

Table 2. For fixed effects, we do not give an estimate of the standard error of the

intercept, because the intercept is confounded with the parameters b0 j . For the

underlying outcomes, the fitted regressions have higher slopes on VRQ and

greater estimated residual variance s2 as well as overall variance s2ð1 þ oÞ.

The estimated standard errors for the regression parameters are inflated only

slightly, because the fraction of the missing information is small. It would be

equal to 15% (on the scale of variance and MSE) if we had no information about

the values of the outcomes when they are truncated. However, we know that theyare negative, and many of them are close to zero, so the fraction is much smaller

than 15%.

The estimates and estimated standard errors of the school-level deviations are

listed in Table 3. The standard errors for the estimates based on the underlying

regression are slightly inflated but, with a few exceptions, the estimates are

changed only slightly. After multiple imputation, the collections of estimates and

estimated standard errors largely retain the features observed on the estimates

based on the original (truncated) outcomes. Figure 2 presents them in a format

that is easier to digest. Each vertical segment is the interval ðd j 2^ s j ;d j þ 2^ s j Þ, where d j is an estimate and ^ s j the associated standard error. It shows

that shrinkage (with BLUP or ANCOVA) alters the estimates and reduces the

TABLE 2

Estimates and Estimated Standard Errors for the Original and Underlying Outcomes;

Models With Fixed and Random Effects

Random Effects Fixed Effects

1 VRQ Sex s2 o VRQ Sex s2 o

Truncated outcomes

Estimate –69.299 0.906 0.959 111.54 0.060 0.813 0.887 94.37 0.128

St. error 0.028 0.718 0.027 0.027 0.669

Underlying outcomes

Estimate –69.803 0.912 0.848 113.12 0.056 0.898 1.244 113.89 0.107

St. error 0.029 0.731 0.026 0.030 0.708

School Effectiveness

171

Page 17: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 17/24

estimated standard errors but substantially so only for the two extreme schools,

20 and 21, at the price of the qualifications and deficiencies discussed earlier. And

these qualifications are essential for these two schools.

Varying Slopes

Bearing in mind that there are bound to be important covariates that were not

recorded, the results can hardly be used for any policy-related decisions or for

choice of alternative schools by a student or parent, especially when we realize

that the relevant coefficients may change from one year to the next (Leckie &

Goldstein, 2009). We have addressed one important model validity issue, related

to the truncation of the outcomes; the variation of the regression slopes (on VRQ)

is another. It corresponds to the model

yij ¼ x ij b þ z ij d j þ "ij ;

in which xij comprises the values of the covariates (including the intercept 1)

for student i in school j , and z ij is its subvector for the intercept and VRQ

or, in general, for the covariates associated with school-level variation; d j and

TABLE 3

Estimates and Estimated Standard Errors of the School-level Deviations (Effects) Based

on Models With Random and Fixed Effects and Parallel Regressions

Sch.

BLUP ANCOVA ANCOVA-OLS

Truncated Underlying Truncated Underlying Truncated Underlying

1 1.37 (1.17) 1.62 (1.18) 1.58 (1.50) 1.81 (1.64) 1.75 (1.54) 2.05 (1.70)

2 1.43 (1.08) 1.71 (1.11) 1.10 (1.45) 1.75 (1.60) 1.20 (1.48) 1.95 (1.64)

3 1.03 (1.31) 1.05 (1.32) 0.99 (1.67) 1.19 (1.82) 1.15 (1.75) 1.42 (1.92)

5 –2.71 (1.32) –2.69 (1.35) –3.09 (1.65) –3.12 (1.82) –3.58 (1.73) –3.72 (1.92)

6 0.24 (1.16) 0.38 (1.21) 0.28 (1.54) 0.27 (1.72) 0.31 (1.59) 0.31 (1.79)

7 –1.43 (1.39) –1.52 (1.47) –2.21 (1.71) –1.92 (1.93) –2.61 (1.81) –2.34 (2.08)

8 –1.48 (1.27) –1.60 (1.30) –2.18 (1.61) –1.95 (1.77) –2.49 (1.68) –2.29 (1.86)

9 –0.91 (1.15) –0.97 (1.17) –1.45 (1.49) –1.13 (1.63) –1.61 (1.53) –1.27 (1.69)

10 0.20 (1.30) –0.26 (1.33) –0.30 (1.63) –0.43 (1.79) –0.35 (1.70) –0.50 (1.89)

11 3.49 (1.32) 3.45 (1.35) 3.56 (1.67) 3.89 (1.83) 4.13 (1.75) 4.65 (1.93)

12 –1.82 (1.29) –1.60 (1.34) –1.92 (1.61) –1.88 (1.79) –2.20 (1.68) –2.21 (1.88)

13 0.52 (1.39) 0.16 (1.45) 0.63 (1.71) 0.14 (1.90) 0.75 (1.80) 0.17 (2.04)

14 –1.90 (1.30) –1.13 (1.33) –1.42 (1.65) –1.41 (1.81) –1.63 (1.72) –1.67 (1.91)

15 –0.01 (1.56) –0.18 (1.59) –0.11 (1.89) –0.26 (2.07) –0.14 (2.05) –0.33 (2.29)

16 –0.92 (1.12) –1.12 (1.15) –1.39 (1.50) –1.35 (1.65) –1.54 (1.54) –1.52 (1.71)

17 –0.42 (1.19) –0.54 (1.21) –1.18 (1.54) –0.69 (1.69) –1.32 (1.59) –0.80 (1.75)

20 6.37 (1.70) 6.20 (1.72) 9.22 (2.00) 8.21 (2.15) 12.34 (2.25) 11.54 (2.47)21 –3.04 (1.72) –3.00 (1.73) –2.90 (2.24) –3.57 (2.41) –4.15 (2.56) –5.43 (2.81)

Longford

172

Page 18: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 18/24

eij are indepedent random samples from N 2ð0 ; S BÞ and N ð0;s2Þ, respec-

tively. The 2 2 variance matrix S B describes the pattern of school-level

variation.

In the ANCOVA version of this model, d j

, j ¼

1, . . . , J , are unknown vectors

of constants representing the interaction of VRQ with the School as a categorical

variable. The ANCOVA composite estimator can be extended to this model by

minimising the MSE of the (multivariate) composition

ð z b j ÞTbz j þ b T

j b0

where z ¼ (1, z )T is the vector at which we want to estimate (predict) the average

outcome and b j is a vector of coefficients, which can be interpreted as inducing

(bivariate) shrinkage. The vector b j depends on Ω

¼ s –2S B and has to be esti-

mated by moment matching. The details for b j and Ω are given in the Appendix.

We note that the optimal vector b j has the form Q1 j P j z , where the matrices

Q j and P j , the multivariate versions of the numerator V 0 – C and the denominator

1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 20 21

−10

10

0

School

E s t i m a t e

Order of the estimates:

BLUP (Truncated)

BLUP (Underlying)

ANCOVA (Truncated)

ANCOVA (Underlying)

ANCOVA-OLS (Truncated)

ANCOVA-OLS (Underlying)

FIGURE 2. Estimates and estimated standard errors (root-MSEs) of the school-level

deviations based on the random- and fixed-effects models with the truncated and underlying

outcomes.

School Effectiveness

173

Page 19: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 19/24

V 0 þ V 1 – 2C þ B2 in (5), do not depend on z . Thus, the solutions for a collection

of vectors z are characterized by the matrix B j ¼ Q1 j P j , common to all of them.

Estimators based on more complex models usually have smaller biases than

their simpler submodels but have greater sampling variation. In our case, the

variation is the dominant contributor to the MSE even for the schools with the

largest n j . This suggests that the model with varying slopes on VRQ is unlikely

to be useful for any purpose associated with comparing the schools. This is

indeed the case with both BLUP and ANCOVA. The problems with estimating

the standard errors of the school-specific coefficients are exacerbated somewhat.

The deviations of a within-school regression from the average regression depend

on VRQ, and therefore the associated sampling variances are (quadratic)

functions of VRQ, which depend on Ω and the deviations d j themselves. We omit

the details, but present the random- and fixed-effects estimates in Table 4, and

the sets of fitted within-school deviations in Figure 3.

The estimated deviation lines d0 j þ d1 j VRQ in Figure 3 are drawn between the

10th and 90th percentiles of the VRQ scores for each school’s sample. By sub-

tracting the average regression, the resolution of each panel is greatly improved.

The lines obtained by multiple imputation for the zero outcomes (the right-hand

panels) differ only slightly from their counterparts with the zero outcomes taken

at face value, but the BLUP (random effects) and ANCOVA (fixed effects)

results differ substantially, both in the extent and pattern of variation. However,

much of this difference is illusory because of the substantial sampling variation

associated with the intercepts and slopes of the lines, as well as their scaled var-

iance matrix Ω.

TABLE 4

Estimates and Estimated Standard Errors for the Original and Underlying Outcomes;

Models With (Random and Fixed) School-Specific Slopes on VRQ

BLUP ANCOVA

1 VRQ Sex s21a VRQa Sex s2

Truncated outcomes

Estimate –71.054 0.921 1.040 109.21 –56.673 0.799 0.653 91.97

St. error 0.044 0.716 0.684

Ω 1:608 0:0179

0:0179 0:00020

5:674 0:0508

0:0508 0:00048

Underlying outcomes

Estimate –71.919 0.931 0.899 111.52 –65.465 0.878 0.949 111.09

St. error 0.048 0.740 0.746Ω 2:094 0:0226

0:0226 0:00025

6:191 0:0553

0:0553 0:00055

a Average across the schools (for the intercept and VRQ in ANCOVA).

Longford

174

Page 20: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 20/24

Discussion

We have shown that the assumption of randomness is not essential for borrowing

strength, that is, for exploiting the similarity of related schools (clusters). With the

assumption, maximum likelihood (ML) estimation is satisfactory. When the effects

(deviations) of the units are assumed to be fixed, ML coincides with OLS and is

unsatisfactory, but that is a problem of ML, not of the assumption. The problem

of estimating these deviations is quintessentially small-sample and distinctly not

asymptotic, and ML is not efficient. It can be improved upon, on average, by com-

position, without altering the assumption.

We have argued that the effects associated with schools should be treated

as fixed, because we wish to make inferences about the educational processes

80 100 120 140

−10

10

0 0

0

BLUP (Truncated outcomes)

VRQ

80 100 120 140

VRQ

I L E A s c o r e

−10

10

0

I L E A s c o r e

−10

10

I L E A s c o r e

−10

10

I L E A s c o r e

BLUP (Underlying outcomes)

80 100 120 140

ANCOVA (Truncated outcomes)

VRQ

80 100 120 140

ANCOVA (Underlying outcomes)

VRQ

FIGURE 3. Fitted school-level deviations from the average regression; models with

random (BLUP) and fixed (ANCOVA) coefficients with school-specific slopes on VRQ.

School Effectiveness

175

Page 21: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 21/24

of specific schools. The composite estimators based on this assumption are

only slightly more efficient than with random effects. Our analysis of the

MSEs shows that the established BLUP estimator is more efficient than the

ANCOVA estimator for a majority of schools but not for all of them.

The MSEs of the estimators depend on the school effects d j , and so can

be studied (and presented) more completely through the ranges of their

plausible values.

Composition is a general principle that can be applied whenever there are

alternative estimators of the target quantity. Thus, we may even consider s2 B and

ðb0 j b0Þ2as alternative estimators of (b0 j – b0)2, and seek their combination

that attains the smallest MSE. In this case, both contending estimators are biased

but their biases are identified. The problem is treated by Longford (2007).

More flexible patterns of school-level differences are introduced bycovariate-by-school interactions. In the random-effects perspective, they cor-

respond to varying slopes (on continuous variables) and varying differences

(among levels of categorical variables). The approach presented in the sec-

tion A Composite Estimator can be extended straightforwardly. For example,

the matrix A is replaced by the block-diagonal matrix of within-school

counts, means, and crossproducts of the variables involved in the

interactions.

Issues similar to those discussed here arise in small-area estimation, in which

the inferential targets are the population means of a variable in the districts of acountry. Multilevel models and BLUP are the adopted standard in such analyses

(see Rao, 2003), but Longford (2005, Part 2, 2007) pointed out the conflict

between the sampling-design and model-based perspectives and the correspond-

ing treatment of the terms associated with the districts as fixed or random. Ran-

dom effects are essential for borrowing strength (exploiting similarity) neither

across the districts in small-area estimation, nor across schools in school-

effectiveness studies.

Another approach to the assessment of schools, or competing institutions in

general, is outlined by Longford and Rubin (2006). The approach is based on the potential outcomes framework (Holland, 1986), in which every student is associ-

ated with a set of fixed outcomes, one for each school, and a comparison of two

schools is defined by the average difference of the outcomes for a school-

specific reference set of students. This approach requires no distributional assump-

tions and the differences of the potential outcomes of a student for two schools are

not assumed to follow any particular pattern. Estimation can be framed as a

missing-data problem (for each student only one outcome is observed and the oth-

ers are missing), and multiple imputation applied to address the uncertainty

involved. Explicit modeling of the (self-)selection process, that is, of the assign-ment of students to schools, is a strength of this approach, but it can be fully rea-

lized only when a rich set of covariates are collected.

Longford

176

Page 22: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 22/24

Appendix

Proof of the Proposition

The MSE of the estimator ~y ¼ ð

1

bÞy

1 þ by

2 is

mðbÞ ¼ ð1 bÞ2V 0 þ 2bð1 bÞC þ b2ðV 1 þ B2Þ

¼ b2 V 0 2C þ V 1 þ B2 2b V 0 C ð Þ þ V 0 ;

and this quadratic function, with a positive quadratic term, attains its minimum

for b ¼ (V 0 – C ) / (V 0 þ V 1 – 2C þ B2), as claimed in (5). The minimum attained,

given by (6), is obtained directly by substituting this solution in m(b).

The expression (8) for the inverse of a partitioned matrix can be derived by

sweeping the matrix in (7). For instance, by adding to the second block-row the

BT

A –1

premultiple of the first block-row we obtain the zero matrix in the bottomoff-diagonal block. The expression is confirmed more simply by multiplication:

A B

B T D

A1 þ A1 BG B T A1 A1 BG

GB T A1 G

¼ I n 0 n; p

0 p;n I p

;

where I is the identity matrix and 0 the matrix of zeros, with their dimensions

indicated in the subscripts, and G ¼ ( D – BT A –1 B) –1.

Multivariate Shrinkage

Suppose y is a p 1 vector of parameters and we wish to estimate the

combination z Ty. Let y0 be an unbiased and y1 another estimator of y. Denote

their respective variance matrices by V 0 and V 1 and their covariance matrix by

C . The bias of y1 is denoted by B.

We seek the estimator

~y ¼ z bð ÞTy0 þ b T y1 ;

which has the minimum MSE. The optimal vector of coefficients b is found byminimising the MSE of ~y :

m b ; z ð Þ ¼ z bð ÞTV 0 z bð Þ þ z bð ÞT

C b þ b T C z bð Þ þ b T V 1 þ BB T

b

¼ b T V 0 C C T þ V 1 þ BB T

b b T 2V 0 C C T

z þ z T V 0 z

¼ b T Qb b T Pz z T P T b þ z T V 0 z ;

where Q ¼ V 0 – C – CT þ V 1 þ BBT and P ¼ V 0 – C . The minimum of this

quadratic function of b is found by matrix differentiation or by completing the

square. In either way, we find that the minimum is attained at b*

¼ Q –1 Pz and

the minimum MSE is m (b*; z ) ¼ z T V 0 z – z T PT Q –1 Pz .

The (r r ) scaled variance matrix S B is estimated by the multivariate version

of the method described in the section A Composite Estimator. Let d j be the

School Effectiveness

177

Page 23: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 23/24

estimate of the vector of deviations for school j , and d0 their average. We apply

moment matching to the statistic

S ¼ 1

J X J

j ¼1d j d0

d j d0 T

:

The expectation of this statistic is

E ðS Þ ¼ 1

J

X J

j ¼1

var d j d0

þ 1

J

X J

j ¼1

d j d0

d j d0

T

¼ S B þ s2

J 1

X J

j ¼1

H h ;

where H ¼ A –1 þ A –1 BGBT A –1 is the top diagonal block in (8) and H j is the r r diagonal submatrix of H that correspond to the school j . The matrix H does not

depend on any parameters other than s2. The school-level variance matrix of b j is

estimated by

S B ¼ 1

J S s2

J 1

X J

j ¼1

H j :

Declaration of Conflicting Interests

The author declared no conflicts of interest with respect to the authorship and/

or publication of this article.

Funding

The author disclosed receipt of the following financial support for the research

and/or authorship of this article: Research and preparation of this article were

supported by the Grant SEJ2006–13537 from the Spanish Ministry of Science

and Technology.

References

Aitkin, M., & Longford, N. T. (1986). Statistical modelling issues in school effectiveness

studies. Journal of the Royal Statistical Society, Series A, 149, 1–43.

Braun, H. I., & Zwick, R. (1993). Empirical Bayes analysis of families of survival curves:

Application to the analysis of degree attainment. Journal of Educational Statistics, 18,

285–303.

Carlin, B. P., & Louis, T. A. (2000). Empirical Bayes: Past, present and future. Journal of

the American Statistical Association, 95, 1286–1289.Efron, B., & Morris, C. N. (1972). Limiting the risk of Bayes and empirical Bayes estima-

tors—Part II: The empirical Bayes case. Journal of the American Statistical Associa-

tion, 67 , 130–139.

Longford

178

Page 24: A Revision of School Effectiveness Analysis

8/13/2019 A Revision of School Effectiveness Analysis

http://slidepdf.com/reader/full/a-revision-of-school-effectiveness-analysis 24/24

Goldstein, H. (2003). Statistical analysis of multilevel data (3rd ed.). London: Edward

Arnold.

Henderson, C. R. (1975). Best linear unbiased estimation and prediction under a selection

model. Biometrics, 31, 423–447.

Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–970.

Klein, J. P., & Moeschberger, M. L. (2004). Survival analysis. Techniques for censored

and truncated data. New York, NY: Springer-Verlag.

Leckie, G., & Goldstein, H. (2009). The limitations of using school league tables to inform

school choice. Journal of the Royal Statistical Society, Series A, 172, 835–851.

Longford, N. T. (2005). Missing data and small-area estimation. Modern analytical

equipment for the survey statistician. New York, NY: Springer-Verlag.

Longford, N. T. (2007). On standard errors of model-based small-area estimators. Survey

Methodology, 33, 69–79.

Longford, N. T. (2008). Studying human populations. An advanced course in statistics.

New York, NY: Springer-Verlag.

Longford, N. T., & Rubin, D. B. (2006). Performance assessment and league tables.

Comparing like with like. Working Paper 994. Barcelona: Department of Economics

and Business, University Pompeu Fabra.

McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York, NY: Wiley.

Rao, J. N. K. (2003). Small area estimation. New York, NY: Wiley.

Rasbash, J., & Goldstein, H. (1994). Efficient analysis of mixed hierarchical and cross-

classified random structures using a multilevel model. Journal of Educational and

Behavioral Statistics, 19, 337–350.

Raudenbush, S. W. (1993). A crossed random effects model for unbalanced data with

applications in cross-sectional and longitudinal research. Journal of Educational Statis-

tics, 18, 321–349.

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and

data analysis methods. Thousand Oaks, CA: Sage.

Robbins, H. (1955). An empirical Bayes approach to statistics. Proceedings of the Third

Berkeley Symposium on Mathematical Statistics and Probability, 1, 157–164. Berkeley:

University of California Press.

Robinson, G. K. (1991). That BLUP is a good thing: The estimation of random effects.

Statistical Science, 6 , 15–32.Rubin, D. B. (2002). Multiple imputation for nonresponse in surveys (2nd ed.). New York,

NY: Wiley.

Schafer, J. L. (1999). Analysis incomplete multivariate data. London: Chapman and Hall.

Author

NICHOLAS T. LONGFORD is Director, SNTL Statistics Research and Consulting, and

Academic Visitor, Universitat Pompeu Fabra, Ramon Trias Fargas 25-27, 08005

Barcelona, Spain; email: [email protected]. His research interests are multilevel

analysis, small-area estimation, dealing with missing values, composite estimation, and

statistical modelling and computing in general.

Manuscript received February 15, 2010

Revision received September 22, 2010

Accepted October 17, 2010

School Effectiveness

179