robust empirical bayes small area estimation with density ... › pdf › 1702.06635.pdf · it can...

Robust Empirical Bayes Small Area Estimation

with Density Power Divergence

Shonosuke Sugasawa

Center for Spatial Information Science, The University of Tokyo

Abstract

A two-stage normal hierarchical model called the Fay–Herriot model and the em-

pirical Bayes estimator are widely used to provide indirect and model-based estimates

of means in small areas. However, the performance of the empirical Bayes estimator

might be poor when the assumed normal distribution is misspecified. In this arti-

cle, we propose a simple modification by using density power divergence and suggest

a new robust empirical Bayes small area estimator. The mean squared error and

estimated mean squared error of the proposed estimator are derived based on the

asymptotic properties of the robust estimator of the model parameters. We investi-

gate the numerical performance of the proposed method through simulations and an

application to survey data.

Key words: Density power divergence; empirical Bayes estimation; Fay–Herriot

model

1

arX

iv:1

702.

0663

5v3

[st

at.M

E]

23

Aug

201

9

1 Introduction

Direct survey estimators based only on area-specific sample data are known to yield

unacceptably large standard errors if the area-specific sample sizes are small. Empir-

ical Bayes methods are widely used to improve direct survey estimators by shrink-

ing toward some synthetic estimator and borrowing strength. For comprehensive

overviews of small area estimation, see Pfeffermann (2013) and Rao & Molina (2015).

A basic area-level model is a two-stage normal hierarchical model known as the

Fay–Herriot model (Fay & Herriot, 1979), described as

yi|θi ∼ N(θi, Di), θi ∼ N(xTi β,A) (i = 1, . . . ,m), (1)

where yi is the direct estimator of the small area mean θi, Di is the sampling variance,

assumed to be known, xi and β are vectors of the covariates and regression coefficients,

respectively, and A is an unknown variance. Let φ = (βT, A)T be the unknown

parameter vector in (1). Since yi ∼ N(xTi β,A + Di) under (1), φ can be estimated

by maximizing the log-marginal likelihood

log f(y;φ) = −m2

log(2π)− 1

2

m∑i=1

log(A+Di)−1

2

m∑i=1

(yi − xTi β)2

A+Di, (2)

with y = (y1, . . . , ym)T. The Bayes predictor of θi under squared error loss is

θi(yi;φ) = yi −Di

A+Di(yi − xT

i β), (3)

and the empirical Bayes estimator of θi is θi = θ(yi; φα).

The empirical Bayes estimator is useful when yi can be well-explained by the

auxiliary information xi. However, xi is not necessarily good auxiliary information

for yi in all the areas, that is, yi could be very far from xtiβ in some areas, which

we call outlying observations. For such observations, the corresponding θi could be

generated from a distribution different from the assumed one (1), that is, the assumed

distribution of θi could be misspecified. In this paper, we consider a situation where

2

there exists such outlying observations and focus on the following two undesirable

properties in this situation:

1. The Bayes predictor (3) might over-shrink outlying yi toward xTi β.

2. The estimator φ based on (2) would be highly influenced by outlying observa-

tions.

These problems have been addressed in studies such as Fay & Herriot (1979) and

Ghosh et al. (2008), but we extend the body of knowledge on this topic by using

density power divergence (Basu et al., 1998).

Our insight is based on an alternative expression for the Bayes predictor (3) using

(2). From Tweedie’s formula (Efron, 2011), the Bayes predictor (3) can be written as

θi(yi;φ) = yi +Di∂

∂yilog f(yi;φ). (4)

The above expression holds as long as yi|θi ∼ N(θi, Di), that is, only the form of

marginal likelihood f(yi;φ) should be changed when the distribution of θi is not

normal as in (1). From (4), one can see that the classical empirical Bayes estimator

θi can be determined by the maximization and derivative of (2). We therefore suggest

replacing the log-marginal likelihood with density power divergence, which includes

Kullback–Leibler divergence as a special case. Density power divergence under (1)

has a closed form, and a new robust Bayes predictor has a simple form. We also

consider robust estimators of the model parameters and provide their asymptotic

properties. Moreover, we construct an estimator of the mean squared error of the

robust empirical Bayes estimator based on the parametric bootstrap and provide its

asymptotic validity.

Generalized likelihood including density power divergence has been used in Bayesian

inference (Agostinelli & Greco, 2013; Ghosh & Basu, 2016; Hooker & Vidyashankar,

2014; Jewson et al., 2018; Nakagawa & Hashimoto, 2019), who address the misspeci-

fication of the assumed distribution of observations, but we deal with the misspecifi-

cation of the assumed distribution of unobserved areal mean θi in (1), which can be

3

regarded as the prior distribution of θi. Moreover, we consider frequentist inference

for the model parameters in (1).

Ghosh et al. (2008) proposed a robust Bayes predictor in the Fay–Herriot model

(1), using the influence function for β to tackle Property 1, but did not address

Property 2. Sinha & Rao (2009) proposed using Huber’s (1973) ψ-function to derive

a Bayes predictor and parameter estimators in general linear mixed models, which

tackled Properties 1 and 2, but as demonstrated in the next section, the resulting

Bayes predictor has limitations when aiming to compensate for Property 1.

2 Density Power Divergence and Bayes Predictor

2.1 Density power divergence

Although the maximum likelihood estimator minimizes empirical estimates of Kullback–

Leibler distance, it is sensitive to distributional assumptions. To overcome this, Basu

et al. (1998) introduced an estimation method based on density power divergence for

independently and identically distributed data. As the observations yi following the

Fay–Herriot model (1) are independent but not identically distributed, we consider

the following function instead of the log-likelihood function (2) (Ghosh & Basu, 2013),

Lα(y;φ) =1

α

m∑i=1

fi(yi;φ)α − 1

1 + α

m∑i=1

∫fi(t;φ)1+αdt, α ∈ (0, 1) (5)

where fi(yi;φ) is the density of yi ∼ N(xTi β,A + Di). Here, α is a tuning constant

related to robustness. Note that

limα→0

{Lα(y;φ)−m

(1

α− 1

)}= log f(y;φ),

so apart from an irrelevant constant (5) is similar to the log-likelihood function when

α ≈ 0.

Under model (1), the yis are independent and yi ∼ N(xTi β,A+Di), so (5) can be

4

expressed as

Lα(y;φ) =

m∑i=1

{si(yi;φ)

α− V α

i

(1 + α)3/2

}, (6)

where Vi = {2π(A+Di)}−1/2 and

si(yi;φ) = V αi exp

{−α(yi − xT

i β)2

2(A+Di)

}.

We propose using function (6) instead of the log-marginal likelihood log f(y;φ).

2.2 Robust Bayes predictor

We define the robust Bayes predictor θRi of θi by replacing log f(y;φ) in (4) with

Lα(y;φ). Since

∂

∂yiLα(y;φ) =

∂

∂yi

si(y;φ)

α=

1

A+Di(yi − xT

i β)si(yi;φ),

the robust Bayes predictor is

θRi = yi −Di

A+Di(yi − xT

i β)si(yi;φ). (7)

The shrinkage factor in (7) is si(yi;φ)Di/(A+Di), which depends on yi, whereas the

shrinkage factor in the classical Bayes predictor (3) is Di/(A + Di), which does not

depend on yi. Further, θRi reduces to θi when α = 0 since si(yi;φ) = 1 under α = 0.

Moreover, as Di → 0, the robust Bayes predictor θRi reduces to the direct estimator

yi as the classical θi does.

2.3 Comparison with related robust Bayes predictors

For related robust Bayes predictors under model (1), Ghosh et al. (2008) proposed

the predictor

θGi = yi −Divi(A)1/2

A+DiψK

{yi − xT

i β(A)

vi(A)1/2

}, (8)

5

where

β(A) =

(m∑i=1

xixTi

A+Di

)−1( m∑i=1

xiyiA+Di

), vi(A) = A+Di−xT

i

(m∑i=1

xixTi

A+Di

)−1xi,

and ψK(t) = umin(1,K/|u|) is Huber’s ψ-function with a tuning constant K > 0

that has a similar role to α. Similarly, Sinha & Rao (2009) used Huber’s ψ-function

to modify an equation for θi and suggested a robust predictor θSRi as a solution to

the equation

D−1/2i ψK

{D−1/2i (yi − θSRi )

}−A−1/2ψK

{A−1/2(θSRi − xT

i β)}

= 0. (9)

Consider observation yi which is very different from the grand (prior) mean xTi β.

For such observation, the auxiliary information xi would not be useful to improve

the direct estimator yi through the model (1), so that it would be better to keep yi

unshrunk. To see such shrinkage property, it would be useful to check the behavior

of a Bayes predictor ηi under large |yi − xtiβ|. Specifically, we consider |ηi − yi| as

|yi − xtiβ| → ∞ with fixed values of the parameters and Di. Ideally, |ηi − yi| → 0,

which means that the Bayes predictor ηi does not shrink yi under large |yi − xtiβ|.

This property was addressed in the context of small area estimation (Datta & Lahiri,

1995) as well as signal estimation (Carvalho et al., 2010). For the classical Bayes

predictor θi in (3), |θi − yi| → ∞ as |yi − xtiβ| → ∞, meaning that over-shrinkage

occurs. For θGi , |θGi − yi| → KDivi(A)1/2/(A + Di). Moreover, if |yi − θSRi | → 0 as

|yi − xtiβ| → ∞, the left-hand side of (9) reduces to −A−1/2K, so |yi − θSRi | 9 0.

On the contrary, for the proposed robust Bayes predictor θRi in (7), |θRi − yi| → 0 as

|yi − xtiβ| → ∞ holds when α > 0, since (yi − xTi β)si(yi;φ)→ 0 as |yi − xtiβ| → ∞.

When there are no random effects, that is, A = 0, the conventional Bayes predictor

(3) reduces to xTi β. However, the robust Bayes predictors, θRi , θGi , and θSRi , do not

have the property, which might be a drawback as a compensation for robustness.

6

3 Robust Empirical Bayes Estimator and Mean Squared Error

3.1 Robust parameter estimation

We define the robust estimator φα of φ as φα = argmax Lα(y;φ), where Lα(y;φ) is

given in (6). Then, the robust estimator φα satisfies

∂Lα∂β

=

m∑i=1

xisi(yi;φ)(yi − xTi β)

A+Di= 0,

2∂Lα∂A

=

m∑i=1

{(yi − xT

i β)2si(yi;φ)

(A+Di)2− si(yi;φ)

A+Di+

αV αi

(α+ 1)3/2(A+Di)

}= 0.

(10)

We adopt a Newton–Raphson algorithm for solving these estimating equations, where

derivatives are given in the proof of Theorem 2 in the Supplementary Material. A

reasonable starting point would be the maximum likelihood estimates. By substitut-

ing the robust estimator φα into the robust Bayes predictor (7), we obtain the robust

empirical Bayes estimator θRi = θRi (yi, φα).

3.2 Selection of tuning parameter

The parameter α is related to robustness but is not easy to interpret. Following

Ghosh et al. (2008), we consider selection of α based on the mean squared error of

the robust Bayes predictor (7), which enables us to specify α in an interpretable way.

The mean squared error formula is given in the following theorem.

Theorem 1. Under model (1), E{(θRi − θi)2} = g1i(A) + g2i(A), where

g1i(A) =ADi

A+Di, g2i(A) =

D2i

A+Di

{V 2αi

(2α+ 1)3/2− 2V α

i

(α+ 1)3/2+ 1

}

and g2i(A) is increasing in α.

The mean squared error of the classical Bayes predictor (3) corresponds to g1i(A),

so the excess mean squared error of θRi over θi is g2i(A), which approaches 0 when α =

0. Therefore, there is a trade-off between the robustness of θRi and the mean squared

error evaluated under model (1). We define Ex(α) = 100×∑m

i=1 g2i(Aα)/∑m

i=1 g1i(Aα)

7

as the percentage relative excess mean squared errors, where Aα is the robust esti-

mate of A from (10) under given α. We propose selecting α such that Ex(α) does

not exceed a user-specified percentage c%; in other words, we compute α∗ to satisfy

Ex(α∗) = c. This has a unique solution since Ex(α) increases in α from Theorem

1. We adopt the bisectional method (Burden & Faires, 2010, §2) to compute α∗. In

practice, we first compute α∗ for a specified value of c, and all the estimation pro-

cedures are conducted with α = α∗. For theoretical simplicity, we assume that α is

known in §3.3, 3.4 and 3.5, but the selection procedure is used in all the numerical

examples given in §4 to investigate its possible effect.

3.3 Asymptotic properties of the robust estimators

We consider the asymptotic properties of the robust estimator under model (1). To

this end, we assume the regularity conditions:

1. 0 < D∗ ≤ min1≤i≤mDi ≤ max1≤i≤mDi ≤ D∗ < ∞, where D∗ and D∗ do not

depend on m;

2. max1≤i≤m xTi (XTX)−1xi = O(m−1), where X = (x1, . . . , xm)T;

3. XTX/m converges to a positive definite matrix as m→∞.

Similar conditions are used by Prasad & Rao (1990). Since the derivatives in (10)

have zero expectations under model (1), we obtain the following result.

Theorem 2. Under Conditions 1–3, βα and Aα are asymptotically independent and

distributed as N(β,m−1J−1β KβJ−1β ) and N(A,KA/mJ

2A), respectively, where

Jβ =1

m(α+ 1)3/2

m∑i=1

V αi xix

Ti

A+Di, JA =

1

2m

m∑i=1

V αi (α2 + 2)

(A+Di)2(α+ 1)5/2,

Kβ =1

m(2α+ 1)3/2

m∑i=1

V 2αi xix

Ti

A+Di, KA =

1

m

m∑i=1

V 2αi

(A+Di)2

{2(2α2 + 1)

(2α+ 1)5/2− α2

(α+ 1)3

}.

When α = 0, the asymptotic covariance matrix of βα and the asymptotic variance

of Aα reduce to the asymptotic covariance matrix and variance of the maximum

8

likelihood estimator of β and A given by Datta & Lahiri (2000), because (6) reduces

to the log-likelihood (2).

3.4 Mean squared error of the robust empirical Bayes estimator

To evaluate the risk of the estimator θRi , we consider the mean squared error, Mi =

E{(θRi − θi)2}, where the expectation is taken with respect to the joint distribution

of the θis and yis following the assumed model (1). The mean squared error can be

regarded as the integrated Bayes risk, and is a standard measure of risk in small area

estimation (Rao & Molina, 2015).

Since θRi depends on the estimator φα, the mean squared error Mi takes account

of the additional variability due to φα. Therefore, it is difficult to evaluate Mi an-

alytically, and a second-order approximation of Mi has been used. Following this

convention, we provide an approximation for Mi in the following theorem.

Theorem 3. Under Conditions 1–3,

Mi = g1i(A) + g2i(A) +g3i(A)

m+g4i(A)

m+

2g5i(A)

m+ o(m−1), (11)

where g1i(A) and g2i(A) are given in Theorem 1, and

g3i(A) =D2i V

2αi

B2i (2α+ 1)3/2

xTi J−1β KβJ

−1β xi, g4i(A) =

D2i V

2αi KA

B3i (2α+ 1)7/2J2

A

(α4 − 1

2α2 + 1

),

g5i(A) =αD2

i xTi J−1β KβJ

−1β xi

2B4i

(3BiC11 − αC21) +D2iKA

24B6i J

2A

{3αB2

i C21 + (α− 2)(3α+ 8)C11

}+D2i x

Ti J−1β xi

B4i

(BiC12 − αC22) +D2i

2B6i JA

{αC32 − 2BiC22 + (2− α)B2

i C12

}+

D2i

2B4i

{bA −

αV αi

(α+ 1)3/2BiJA

}{(2− α)BiC11 − αC21

}.

Here, Bi = A+Di, bA = limm→∞mE(Aα −A) is the first-order bias of Aα and

Cjk = (2j − 1)!!Bji

{V kαi (kα+ 1)−j−1/2 − V kα+α

i (kα+ α+ 1)−j−1/2},

where (2j − 1)!! = (2j − 1)(2j − 3) · · · (1).

9

The derivation is given in the Supplementary Material. It should be noted that

the approximation formula (11) is based on known α, so it might be different under

estimated (selected) α. The approximation formula (11) reduces to the mean squared

error of the classical empirical Bayes estimator given by Datta & Lahiri (2000) and

Datta et al. (2005) since Cjk = 0 and g5i(A) = 0 under α = 0.

3.5 Estimation of the mean squared error

Because the approximation of the mean squared error given in Theorem 3 depends

on the unknown parameter A, it cannot be used in practice. We use a second-order

unbiased estimator of the mean squared error. An estimator T is called second-order

unbiased if E(T ) = T + o(m−1). As shown in Theorem 3, g3i(A), g4i(A), and g5i(A)

are smooth functions, so g3i(Aα), g4i(Aα), and g5i(Aα) are second-order unbiased. On

the contrary, g1i(Aα) and g2i(Aα) may have considerable bias, since g1i(A) and g2i(A)

are O(1). Since the derivation of these biases and bias-corrected estimators of these

terms require tedious algebra, we use the parametric bootstrap method in a similar

way to Butar & Lahiri (2003). We define the bootstrap estimator

Mi = 2g12i(Aα)− 1

B

B∑b=1

g12i(A(b)α ) +

g3i(Aα)

m+g4i(Aα)

m+

2g5i(Aα)

m, (12)

where g12i(A) = g1i(A) + gi2(A) and A(b)α is the bootstrap estimator based on the

parametric bootstrap samples y(b)1 , . . . , y

(b)m generated from

y(b)i = xT

i βα + v(b)i + ε

(b)i , v

(b)i ∼ N(0, Aα), ε

(b)i ∼ N(0, Di).

Following Chang & Hall (2015), we obtain the following theorem.

Theorem 4. Assume Conditions 1–3 and let M †i be the ideal version of Mi obtained

by taking B =∞. Define Mi = M †i +Ui, where Ui denotes an error term arising from

doing only a finite number of bootstrap replications. Then, E(M †i ) −Mi = o(m−1)

and Ui = Op{(mB)−1/2}.

From Theorem 4, the ideal version of the estimator (12), M †i , is second-order

10

unbiased. Moreover, Theorem 4 implies that if B = O(m1+δ) for some small δ >

0, the error from the finite numbers of bootstrap iterations is op(m−1), and thus

the estimator Mi would perform similarly to the ideal estimator M †i . However, the

theoretical result is based on known α, so that the bootstrap estimator (12) is not

necessarily justified under estimated α.

Although we adopt an additive form of bias correction in (12) following Butar

& Lahiri (2003), other forms of bias correction have been proposed, such as those

of Hall & Maiti (2006). Concerning g5i(Aα), we must compute the estimate of bA,

the first-order bias of Aα, which can be calculated from the parametric bootstrap

samples. Alternatively, we may use a parametric bootstrap to compute g5i(Aα). As

shown in the proof of Theorem 3, E{(θRi − θRi )(θRi − θi)} = m−1g5i(A) + o(m−1), so

we can use

B−1B∑b=1

{θRi (y

(b)i , φ(b)α )− θRi (y

(b)i , φα)

}{(θRi (y

(b)i , φα)− θi(y(b)i , φα))

}

instead of m−1g5i(A), where φ(b)α is the parametric bootstrap estimator. Similarly to

Theorem 4, we can evaluate an error from a finite number of bootstrap iterations,

but its evaluation is similar and the detailed proof is omitted.

4 Examples

4.1 Simulation studies

We first investigate the estimation accuracy of the proposed robust estimator together

with the existing estimators. We consider the Fay–Herriot model

yi = θi + εi, θi = β0 + β1xi +A1/2ui, i = 1, . . . ,m,

where m = 30, β0 = 0, β1 = 2, A =0·5, and εi ∼ N(0, Di). The auxiliary variables xi

are generated from the uniform distribution on (0, 1). Further, we divide m areas into

five groups with an equal number of areas and set the same value of Di within the

same groups. The group Di pattern is (0·2, 0·4, 0·6, 0·8, 1·0). For the distribution of

11

ui, we adopt the structure: ui ∼ (1− ξ)N(0, 1) + ξN(0, 102), where c determines the

degree of misspecification of the assumed distribution (contamination by outliers).

We consider three scenarios: (I) ξ = 0, (II) ξ =0·15, and (III) ξ =0·30. Note that,

in scenarios (II) and (III), some observations have very large residuals and auxiliary

information xi would be useful for such outlying observations.

We estimate θi by using the proposed robust empirical Bayes estimator with

density power divergence. We used two inflation rates, c = 1 and c = 5, and α was

selected following the procedure given in §3.2. We adopt four alternative methods:

the classical empirical Bayes estimator, the robust Bayes estimator defined in (9)

with the model parameters estimated by the robust estimation equation proposed by

Sinha & Rao (2009) and the maximum likelihood method, and the robust empirical

Bayes estimator (8) proposed by Ghosh et al. (2008) with the maximum likelihood

estimator for the model parameters. Following Sinha & Rao (2009), we set K=1·345

in Huber’s ψ-function in equation (9). A suitable value of K = Ki in (8) is selected

in the same way as in Ghosh et al. (2008) with a 5% inflation rate.

We compute the mean squared errors of those estimators based on 20000 repli-

cates. Table 1 reports the values of the mean squared errors averaged within the

same groups as well as estimated Monte Carlo errors in the parenthesis. From the

reported values, the Monte Carlo errors seems negligibly small compared with the

mean squared errors. Since the normality assumption in the standard empirical Bayes

method is correct in scenario (I), it would be natural that the empirical Bayes method

provides smaller mean squared errors than the other methods, but the performance

of some robust methods including the proposed method seem comparable with that

of the standard method. On the other hand, there are outlying observations in sce-

narios (II) and (III), under which the proposed methods tend to produce smaller

mean squared errors than the other methods, especially for groups with large sam-

pling variances. In particular, the performance of the proposed method with c = 5 is

better than that with c = 1 in these scenarios since the proposed method gets more

robust with larger c. However, the performance of the proposed method with c = 5 is

worse than c = 1 in scenario (I), which would be a reasonable price for the stronger

12

robustness as confirmed in scenarios (II) and (III). In the Supplementary Material,

we provide additional results for other scenarios of ui such as heavy tailed or skewed

distributions.

Table 1: Mean squared errors averaged within the same group. The estimated MonteCarlo errors are reported in the parenthesis. All the values are multiplied by 1000.DEB1: density power divergence with 1% inflation rate, DEB2: density power di-vergence with 5% inflation rate, EB: empirical Bayes, REB1: robust empirical Bayesof Sinha & Rao (2009), REB2: robust Bayes of Sinha & Rao (2009) with maximumlikelihood, GEB: robust empirical Bayes of Ghosh et al. (2008).

Scenario Group DEB1 DEB2 EB REB1 REB2 GEB

1 159(0·3) 159(0·3) 156(0·3) 173(0·3) 158(0·3) 158(0·3)2 256(0·4) 258(0·4) 252(0·4) 280(0·5) 258(0·4) 309(0·5)

(I) 3 320(0·5) 326(0·6) 316(0·5) 343(0·6) 323(0·6) 369(0·7)4 356(0·6) 366(0·6) 352(0·6) 378(0·7) 359(0·6) 393(0·7)5 383(0·7) 397(0·7) 378(0·7) 400(0·7) 382(0·6) 412(0·7)

1 186(0·3) 179(0·3) 192(0·3) 328(5·5) 189(0·3) 189(0·3)2 353(0·6) 327(0·6) 372(0·6) 1017(12·8) 362(0·6) 364(0·6)

(II) 3 506(0·9) 458(0·8) 545(1·0) 1833(16·9) 523(0·9) 529(0·9)4 640(1·2) 571(1·1) 701(1·3) 2702(21·3) 667(1·2) 679(1·2)5 771(1·5) 678(1·3) 858(1·6) 3594(24·7) 810(1·5) 825(1·5)

1 194(0·3) 190(0·3) 196(0·3) 223(2·9) 195(0·3) 195(0·3)2 382(0·6) 367(0·6) 389(0·7) 517(6·2) 385(0·6) 385(0·6)

(III) 3 562(1·0) 534(0·9) 578(1·0) 949(9·9) 568(1·0) 568(1·0)4 739(1·2) 696(1·2) 764(1·3) 1518(14·4) 748(1·3) 747(1·2)5 900(1·5) 840(1·5) 937(1·6) 2253(18·5) 911(1·5) 913(1·5)

We next investigate the finite sample performance of the bootstrap estimator of

the mean squared error Mi. We adopt the same data-generating model with the

three scenarios of the distribution of ui in the previous study with m = 20. We also

consider the naive estimators of the mean squared error, M(n1)i and M

(n2)i , obtained

by replacing A with Aα in the mean squared error formula given in Theorems 1 and

3, respectively. Note that M(n1)i ignores the variability of the estimation of model

parameters, and M(n2)i ignores the bias of g1i(Aα) + g2i(Aα). The motivation using

these estimators together with Mi is to clarify the importance of the second order

unbiasedness under finite sample settings.

We estimate the true values of the mean squared error of the robust empirical

13

Bayes estimator Mi in advance, based on 5000 simulated data. The relative bias and

square root of the relative mean squared error of the estimator Mi are

RBias(Mi) = 100× E(Mi −Mi)/Mi,

RRMSE(Mi) = 100× E{(Mi −Mi)2}/M2

i .

These values are computed as averages based on 2000 simulation runs with the boot-

strap sample size 1000; they are also averaged within the same groups.

Table 2 reports the relative biases and square roots of the relative mean squared

errors of Mi, M(n1)i , and M

(n2)i . The bootstrap estimator Mi outperforms the other

estimators owing to the second-order unbiasedness provided in Theorem 4. The crude

estimators M(n1)i and M

(n2)i seem undesirable in practice since they have serious

negative biases.

Table 2: Relative bias and root relative mean squared errors for the estimators of themean squared error.

RBias RRMSE

Scenario Group M(n1)i M

(n2)i Mi M

(n1)i M

(n2)i Mi

1 −30·9 −14·2 1·3 42·2 19·1 18·32 −34·6 −20·0 −4·5 46·9 31·3 31·0

I 3 −37·3 −23·5 −8·0 49·8 36·9 36·54 −34·8 −25·9 −9·2 50·1 42·9 43·45 −36·8 −26·3 −9·5 51·7 43·5 44·41 −8·3 −4·4 4·3 20·6 13·0 11·42 −10·9 −6·7 4·1 25·7 20·2 18·3

II 3 −15·1 −10·6 1·3 29·8 24·8 22·04 −12·5 −9·5 4·4 31·7 29·0 28·15 −16·2 −12·4 2·1 34·1 30·8 29·41 −2·8 −1·7 3·8 12·3 9·1 9·02 −5·2 −3·8 3·9 16·0 13·6 12·5

III 3 −7·1 −5·4 4·1 18·5 16·2 14·94 −6·4 −5·3 6·0 20·4 19·1 18·75 −9·7 −8·3 4·3 22·6 20·9 19·7

14

4.2 Fresh milk expenditure data

We consider an application to fresh milk expenditure data from the U.S. Bureau of

Labor Statistics, which was used in Arora & Lahiri (1997) and You & Chapman

(2006). In the data set, the estimated values of the average expenditure on fresh milk

for 1989, yi, are available for 43 areas, with the sampling variances Di. Following

Arora & Lahiri (1997), we consider the Fay–Herriot model (1) with xTi β = βj (j =

1, . . . , 4) if the ith area belongs to the jth region. The four regions are R1 = {1, . . . , 7},

R2 = {8, . . . , 14}, R3 = {15, . . . , 25}, and R4 = {26, . . . , 43}.

Figure 1 illustrates the scatterplot of yi with the maximum likelihood estimates

of β1, . . . , β4, suggesting that there are some outliers in regions R1 and R2. To see

this, we compute the standardized residuals

ri = (A+Di)−1/2

{yi −

4∑j=1

βjI(i ∈Mj)}, i = 1, . . . ,m.

When the Fay–Herriot model (1) is correctly specified, the distribution of ri is close

to standard normal. However, as shown in Table 4, the absolute values of ri are high

in some areas.

We estimate the parameters using the robust estimation equation of Sinha & Rao

(2009) and the density power divergence method proposed as the solution to (10) with

1% and 5% inflation rates. Table 3 shows that the estimates of β3 and β4 are similar

for the four methods, whereas those of β1, β2, and A are not necessarily because of

the outlying areas in regions R1 and R2.

To estimate θi, we adopt the classical empirical Bayes estimator θEBi , the proposed

robust empirical Bayes estimator θRi with a 5% inflation rate, and the robust empirical

Bayes estimator θSRi proposed by Sinha & Rao (2009). We use Mi given in (12) to

estimate the mean squared error of Mi with B = 1000. We then define the mean

squared errors of θEBi and θSRi as MEBi and MSR

i , respectively; these are estimated

from the result in Datta & Lahiri (2000) for MEBi and ‘saeRobust’ package in “R” for

MSRi . Table 4 shows that the differences between θEBi and θRi are large in areas with

15

large absolute values of ri. Similar phenomena can be observed for the relationship

between MEBi and Mi. On the contrary, the values of θSRi and MSR

i are different

from the others, which might come about from the lower estimate of A as presented

in Table 3.

Table 3: Estimates of the model parameters from the four methods. Standard errorsare shown in the parenthesis. The estimates and standard errors of A are multipliedby 100.

β1 β2 β3 β4 A

Maximum likelihood 0·97 1·10 1·19 0·73 1·55(0·07) (0·07) (0·06) (0·04) (0·68)

Robust maximum likelihood 1·01 1·18 1·19 0·73 0·80(0·06) (0·07) (0·05) (0·03) (0·53)

Density power divergence (1% inflation) 0·97 1·12 1·19 0·73 1·50(0·07) (0·07) (0·06) (0·04) (0·65)

Density power divergence (5% inflation) 0·98 1·15 1·19 0·73 1·35(0·06) (0·07) (0·06) (0·04) (0·60)

Table 4: Values of the empirical Bayes estimator, proposed robust empirical Bayesestimator, and robust empirical Bayes estimator of Sinha & Rao (2009) with theirestimates of the mean squared errors. The values of MEB

i , Mi, and MSRi are multiplied

by 100.

area region yi ri θEBi θRi θSRi MEBi Mi MSR

i

1 1 1·10 0·64 1·02 1·02 1·03 1·35 1·35 0·814 1 0·63 −2·05 0·78 0·76 0·91 0·85 0·85 4·705 1 0·75 −1·25 0·86 0·87 0·92 0·96 0·96 4·449 2 1·41 1·48 1·21 1·24 1·23 1·42 1·40 4·1911 2 0·62 −3·01 0·80 0·73 1·07 0·77 0·78 5·0612 2 1·46 1·54 1·20 1·24 1·23 1·63 1·62 5·6620 3 1·29 0·48 1·23 1·22 1·22 1·31 1·32 0·7725 3 1·19 −0·01 1·19 1·19 1·19 0·81 0·84 0·5631 4 0·89 0·63 0·76 0·76 0·75 1·54 1·63 0·7837 4 0·44 −1·84 0·54 0·54 0·61 0·64 0·65 3·59

16

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

0 10 20 30 40

0.4

0.6

0.8

1.0

1.2

1.4

area

expe

nditu

re

Figure 1: Scatterplot of yi with the maximum likelihood estimates of β1, . . . , β4.

5 Final Remarks

The proposed method would be recommended compared with existing methods espe-

cially when there exist outlying observations, as shown in our numerical studies. Since

we revealed some asymptotic properties of the proposed method only under the cor-

rect model, investigating asymptotic properties under general model misspecification

would be an interesting future work. Although this study focused on the Fay–Herriot

model, which is standard in small area estimation, the nested error regression model

(Batesse et al., 1988) would be more useful when unit-level data is available. While

several robust methods have been already proposed (Chambers et al., 2014; Cham-

bers & Tzavidis, 2006; Sinha & Rao, 2009), the extension of the proposed method to

unit-level data would be an interesting research direction. Extending the proposed

idea to the non-normal model based on natural exponential family (Ghosh & Maiti,

2004) would also be worthwhile. Finally, several forms of generalized likelihood other

than density power divergence have been proposed, such as γ-divergence (Fujisawa &

Eguchi, 2008). The main advantage of density power divergence in this context is its

17

mathematical simplicity. As presented in §2, the robust Bayes predictor has a simple

form. A detailed comparison among generalized likelihood methods is left to a future

study.

Acknowledgments

The author was supported by the Japan Society of the Promotion of Science (KAK-

ENHI) grant number 18K12757.

Appendix

Proof of Theorem 1. Since θi = E(θi|yi),

E{(θRi − θi)2} = E{(θi − θi)2}+ E{(θRi − θi)2}

=ADi

A+Di+ E{(θRi − θi)2} ≡ g1i(A) + g2i(A).

Since θRi − θi = (A+Di)−1Di(yi − xT

i β)(1− si),

g2i(A) =D2i

(A+Di)2E{

(yi − xTi β)2(1− si)2

}.

Using Lemma 1 in the Supplementary Material, we obtain the expression for g2i(A).

We next show that g2i(A) is increasing in α ∈ (0, 1). For notational simplicity, we

put µi = xTi β. Since (yi − µi)2(1− si)2 is a continuous and differentiable function of

yi and α, we have

∂g2i(A)

∂α= − 2D2

i

(A+Di)2E{

(yi − µi)2(1− si)∂si∂α

}.

Note that si = fi(yi;φ)α. If f(yi;φ) ≤ 1, then 1 − si ≥ 0 and si is decreasing with

respect to α. Then, it follows that (1 − si)∂si/∂α ≤ 0. On the other hand, we have

(1− si)∂si/∂α ≤ 0 if f(yi;φ) ≥ 1 by a similar argument. Hence, (1− si)∂si/∂α ≤ 0

always follows, thereby we have ∂g2i(A)/∂α ≥ 0 for α ∈ (0, 1), which completes the

proof.

18

Proof of Theorem 4. It follows that

E{g12i(Aα)− g12i(A)} = bA∂g12i(A)

∂A+

1

2m

∂2g12i(A)

∂A2

KA

J2A

+1

6

∂3g12i(A)

∂A3∗

E{(Aα −A)3},

where A∗ is between A and Aα. From Lemma 2 in the Supplementary Material,

it holds that E{g12i(Aα) − g12i(A)} = m−1d(A) + o(m−1), where d(·) is a smooth

function. Then, from Butar & Lahiri (2003), we have E(M †i −Mi) = o(m−1).

From the definition of Ui, we have

Ui =1

B

B∑b=1

g12i(A(b)α )− E∗{g12i(A∗α)},

where E∗ denotes the expectation with respect to the bootstrap sample. Noting

E(Ui|y) = 0, we observe that

var(Ui) = E{var(Ui|y)} =1

BE[var{g12i(A(1)

α )|y}]

=1

BE{E([g12i(A

(1)α )− E∗{g12i(A∗α)}]2|y)}

=1

BE{g′12i(A†)2(A(1)

α −A)2},

where g′12i(A) = ∂g12i(A)/∂A and A† = εA+ (1− ε)A(1)α for some ε ∈ [0, 1]. Straight-

forward calculation shows that

g′12i(A) =D2i

(A+Di)2− g2i(A)

A+Di+

2παDi

A+Di

{Uα+2i

(α+ 1)3/2−

U2α+2i

(2α+ 1)3/2

}.

Note that 0 ≤ Ui ≤ (2πDi)−1/2, thereby supA |g′12i(A)| ≤ C(Di, α) < ∞ under

Condition 1. Hence, it follows that

var(Ui) ≤1

BC(Di, α)2E{(A(1)

α −A)2} = O{(mB)−1},

which completes the proof.

19

References

Agostinelli, G. & Greco, L. (2013). A weighted strategy to handle likelihood

uncertainty in Bayesian inference. Comput. Stat. 28, 319–339.

Arora, V. & Lahiri, P. (1997). On the superiority of the Bayesian method over

the BLUP in small area estimation problems. Statist. Sinica 7, 1053–1063.

Basu, A., Harris, I. R., Hjort, N. L. & Jones, M. C. (1998). Robust and efficient

estimation by minimizing a density power divergence. Biometrika 85, 549–559.

Battese, G.E., Harter, R.M. & Fuller, W.A. (1988). An error-components

model for prediction of county crop areas using survey and satellite data. J. Am.

Statist. Assoc. 83, 28–36.

Burden, R. L. & Faires, J. D. (2010). Numerical Analysis, 9th Edition, Stanford:

Brooks-Cole Publishing.

Butar, F. B. & Lahiri, P. (2003). On measures of uncertainty of empirical Bayes

small-area estimators. J. Stat. Plan. Infer. 12, 63–76.

Carvalho, C. M., Polson, N. G. & Scott, J. G. (2010). The horseshoe estimator

for sparse signals. Biometrika 97, 465–480.

Chambers, R. L., Chandra, H., Salvati, N. & Tzavidis, N. (2014). Outlier

robust small area estimation. J. R. Stat. Soc. B. 76, 47–69.

Chambers, R. L. & Tzavidis, N. (2006). M-quantile models for small area estima-

tion. Biometrika 93, 255–268.

Chang, J. & Hall, P. (2015). Double-bootstrap methods that use a single double-

bootstrap simulation. Biometrika 102, 203–214.

Datta, G. S. & Lahiri, P. (1995). Robust hierarchical Bayes estimation of small

area characteristics in the presence of covariates and outliers. J. Multivariate Anal.

54, 310–328.

20

Datta, G. S. & Lahiri, P. (2000). A unified measure of uncertainty of estimated

best linear unbiased predictors in small area estimation problems. Stat. Sinica.

10, 613–627.

Datta, G.S., Rao, J.N.K. & Smith, D.D. (2005). On measuring the variability of

small area estimators under a basic area level model. Biometrika 92, 183–196.

Efron, B. (2011). Tweedie’s formula and selection bias. J. Am. Stat. Assoc. 106,

1602–1614.

Fay, R. E. & Herriot, R. A. (1979). Estimates of income for small places: an

application of James–Stein procedures to census data. J. Am. Stat. Assoc. 74,

269–277.

Fujisawa, H. & Eguchi, S. (2008). Robust parameter estimation with a small bias

against heavy contamination J. Multivariate Anal. 99, 2053–2081.

Ghosh, A. & Basu, A. (2013). Robust estimation for independent non-homogeneous

observations using density power divergence with applications to linear regression.

Electr. J. Stat. 7, 2420–2456.

Ghosh, A. & Basu, A. (2016). Robust Bayes estimation using the density power

divergence. Ann. Inst. Stat. Math. 68, 413–437.

Ghosh, M. & Maiti, T. (2004). Small-area estimation based on natural exponential

family quadratic variance function models and survey weights. Biometrika 91, 95–

112.

Ghosh, M., Maiti, T. & Roy, A. (2008). Influence functions and robust Bayes and

empirical Bayes small area estimation. Biometrika 95, 573–585.

Hall, P. & Maiti, T. (2006). On parametric bootstrap methods for small area

prediction. J. R. Stat. Soc. B. 68, 221–238.

Hooker, G. & Vidyashankar, A. B. (2014). Bayesian model robustness via dis-

parities. TEST, 23, 556–584.

21

Huber, P. J. (1973). Robust regression: asymptotics, conjectures and Monte Carlo.

Ann. Stat. 1, 799–821.

Jewson, J., Smith, J. Q. & Holmes, C. (2018). Principles of Bayesian inference

using general divergence criteria. Entropy 20, 442.

Nakagawa, T. & Hashimoto, S. (2019). Robust Bayesian inference via γ-

divergence. Commun. Stat. Theory., to appear.

Pfeffermann, D. (2013). New important developments in small area estimation.

Stat. Sci. 28, 40–68.

Prasad, N. & Rao, J. N. K. (1990). The estimation of mean-squared errors of

small-area estimators. J. Am. Stat. Assoc. 90, 758–766.

Rao, J.N.K. & Molina, I. (2015). Small Area Estimation, 2nd Edition, New York:

Wiley.

Sinha, S. K. & Rao, J. N. K. (2009). Robust small area estimation. Can. J. Stat.

37, 381–399.

You, Y. & Chapman, B. (2006). Small area estimation using area level models and

estimated sampling variances. Surv. Method. 32, 97–103.

22

Supplementary material for “Robust Empirical Bayes

Small Area Estimation with Density Power Divergence”

S1 Useful Lemma

In what follows, we use si instead of si(yi;φ) when there is no confusion.

Lemma S1. When yi ∼ N(xTi β,A+Di), it holds that

E{(yi − xTi β)2j−1ski } = 0, j, k = 1, 2, . . .

E{(yi − xTi β)2jski } = V kα

i (kα+ 1)−j−1/2(2j − 1)!!(A+Di)j , j, k = 0, 1, 2, . . . ,

where (2j − 1)!! = (2j − 1)(2j − 3) · · · (1).

Proof. Note that

E{(yi − xTi β)cski } =

V kαi

{2π(A+Di)}1/2

∫ ∞−∞

(t− xTi β)c exp

{−(kα+ 1)(t− xT

i β)2

2(A+Di)

}dt

=V kαi

(kα+ 1)1/2E(Zc),

where Z ∼ N{0, (A+Di)/(kα+ 1)}. Hence, the expectation is 0 when c is odd. On

the other hand, when c = 2j, j = 0, 1, 2, . . ., it follows that E(Z2j) = (2j − 1)!!(A+

Di)j(kα+ 1)−j , which completes the proof.

S2 Proof of Theorem 2

Let Fβ and FA be the first and second estimating functions in (10), and defined

Fφ = (FTβ , FA)T Under Conditions 1-3 in the main article, the theory of unbiased

estimating equation by Godambe (1960) shows that φα = (βTα, Aα)T is consistent and

asymptotically normal, with the asymptotic covariance matrix given by

limm→∞

E

(1

m

∂Fφ∂φT

)−1E

(1

mFφF

Tφ

)E

(1

m

∂Fφ∂φT

)−1.

23

Straightforward calculation shows that

∂Fβ∂βT

=m∑i=1

xixTi si

(A+Di)2

{α(yi − xT

i β)2 − (A+Di)}

∂FA∂β

=m∑i=1

xisi(yi − xTi β)

(A+Di)3

{α(yi − xT

i β)2 − (α+ 2)(A+Di)}

∂FA∂A

=1

2

m∑i=1

{αsi(yi − xT

i β)4

(A+Di)4− (α2 + 2α)V α

i

(α+ 1)3/2(A+Di)2

− 2si(α+ 2)

(A+Di)3(yi − xT

i β)2 +si(α+ 2)

(A+Di)2

}.

Then, using lemma S1, E(∂Fβ/∂β

T)

= mJβ, E(∂FA/∂β

)= 0 and E

(∂FA/∂A

)=

−mJA. Moreover, from (12) in the main article and lemma S1, E(FβFtβ) = Kβ,

E(FβFA) = 0 and E(F 2A) = KA. Hence, βα and Aα is asymptotically independent

and their asymptotic covariance matrices are J−1β KβJ−1β and J−1A KAJ

−1A , respectively.

S3 Proof of Theorem 3

The mean squared error Mi = E{(θRi − θi)2} can be decomposed as

Mi = E{(θRi − θi)2}+ 2E{(θRi − θi)(θRi − θRi )}+ E{(θRi − θRi )2},

and the first term reduces to g1i(A) + g2i(A) whose expressions are given in Theorem

1.

We first evaluate the third term. Taylor series expansion shows that

θRi − θRi =∂θRi∂φT

(φα − φ) +1

2(φα − φ)T

∂2θRi∂φ∗∂φT

∗(φα − φ),

where φ∗ is on the line connecting φ and φα. Then, we get

E{(θRi − θRi )2} = E[{∂θRi∂φT

(φα − φ)}2]

+R1 +R2,

where R1 = E{(φα − φ)T(∂θRi /∂φ)(φα − φ)T(∂2θRi /∂φ∗∂φT∗ )(φα − φ)T} and R2 =

E[{

(φα − φ)T(∂2θRi /∂φ∗∂φT∗ )(φα − φ)T

}2]/4. Here we use the following lemma.

24

Lemma S2. Under Conditions 1-3 in the main article, E(|φα(k)−φk|r) = O(m−r/2)

for any r > 0 and k = 1, . . . , p+ 1, where φα(k) is the kth element of φα.

A rigorous proof of the lemma requires a uniform integrability, but intuitively,

from Theorem 2, E(mr|φα(k) − φk|r) = O(1) under Conditions 1-3, which leads to

Lemma S2.

In what follows, we use ui = yi − xTi β and Bi = A+Di for notational simplicity.

The straightforward calculation shows that

∂θRi∂β

= −DisixiB2i

(αu2i −Bi

),

∂θRi∂A

= −Disiui2B3

i

{αu2i − (2− α)Bi

}.

Moreover, we have

∂2θRi∂β∂βT

= −DisixixTi

B3i

(αu3i − 3Biui)

∂2θRi∂A2

=Disiui12B5

i

{−3α2u3i + 3α2Biu

2i + (4α− 3α2)Biui + (α− 2)(3α+ 8)B2

i

}.

Note that

R1 =

p+1∑j=1

p+1∑k=1

p+1∑`=1

E{(∂θRi

∂φj

)( ∂2θRi∂φk∂φ`

)(φα(j) − φj)(φα(k) − φk)(φα(`) − φ`)

}

≡p+1∑j=1

p+1∑k=1

p+1∑`=1

U1jk`.

From Holder’s inequality,

|U1jkl| ≤ E{∣∣∣(∂θRi

∂φj

)( ∂2θRi∂φ∗k∂φ

∗`

)∣∣∣4}1/4E{∣∣∣(φα(j) − φj)(φα(k) − φk)(φα(`) − φ`)∣∣∣4/3}3/4

≤ E(∣∣∣∂θRi∂φj

∣∣∣8)1/8E(∣∣∣ ∂2θRi∂φ∗k∂φ

∗`

∣∣∣8)1/8 ∏a∈{j,k,`}

E(∣∣∣φα(a) − φa∣∣∣4)1/4.

Since E(|∂θRi /∂φj |8

)<∞ and E

(|∂2θRi /∂φ∗k∂φ∗` |8

)<∞, R1 = o(m−1) from Lemma

S2. A similar evaluation shows that R2 = o(m−1). Using the similar argument given

25

in the proof of Theorem 3 in Kubokawa et al. (2016),

E[{∂θRi∂φT

(φα − φ)}2]

= tr[E(∂θRi∂φ

∂θRi∂φT

)E{

(φα − φ)(φα − φ)T}]

+ o(m−1)

=1

mtr{E(∂θRi∂β

∂θRi∂βT

)J−1β KβJ

−1β

}+

1

mE{(∂θRi

∂A

)2}J−1A KAJ

−1A + o(m−1).

From Theorem 2 and

E(∂θRi∂β

∂θRi∂βT

)=

D2i V

2αi xix

Ti

(A+D2i )(2α+ 1)3/2

E{(∂θRi

∂A

)2}=

D2i V

2αi

(A+Di)3(2α+ 1)7/2(α4 − 1

2α2 + 1

),

we obtain E{(θRi − θRi )2} = m−1g3i(A) +m−1g4i(A) + o(m−1).

Concerning E{(θRi − θi)(θRi − θRi )},

E{(θRi − θi)(θRi − θRi )} = E{(θRi − θi)(θRi − θRi )} =Di

BiE{(1− si)ui(θRi − θRi )}.

By Taylor series expansion,

θRi − θRi =∂θRi∂φT

(φα − φ) +1

2(φα − φ)T

∂2θRi∂φ∂φT

(φα − φ) +R3,

where

R3 =1

6

p+1∑k=1

p+1∑j=1

p+1∑`=1

∂3θRi∂φ∗k∂φ

∗j∂φ

∗`

(φα(k) − φk)(φα(j) − φj)(φα(`) − φ`).

Similarly to the evaluation of R1, E{(1− si)uiR3} = o(m−1). Then,

E{(θRi − θi)(θRi − θRi )}

=Di

BiE{

(1− si)ui∂θRi∂φT

(φα − φ)}

+Di

2BiE{

(1− si)ui(φα − φ)T∂2θRi∂φ∂φT

(φα − φ)}

+ o(m−1)

≡ T1 + T2 + o(m−1),

26

where

T2 =Di

2mBitr[E{

(1− si)ui∂2θRi∂β∂βT

}J−1β KβJ

−1β

]+

DiKA

2mBiJ2A

E{

(1− si)ui∂2θRi∂A2

}+ o(m−1).

From Lohr and Rao (2009),

E(βα − β|yi) = bβ −m−1B−1i J−1β xisiui + op(m−1)

E(Aα −A|yi) = bA −m−1J−1A{u2i siB2i

− siBi

+αV α

i

(α+ 1)3/2Bi

}+ op(m

−1),

where bβ = limm→∞mE(βα − β) and bA = limm→∞mE(Aα −A), so

T1 = − Di

mB2i

E{si(1− si)u2i

∂θRi∂βT

J−1β xi

}− Di

mB3i JA

E{si(1− si)ui

∂θRi∂A

(u2i −Bi)}

+Di

BiE{

(1− si)ui∂θRi∂A

}{bA −

αV αi

m(α+ 1)3/2BiJA

}+ o(m−1).

Combining these results and using Lemma S1, E{(θRi − θi)(θRi − θRi )} = m−1g5i(A) +

o(m−1), which completes the proof.

S4 Additional simulation study

We show results of additional simulation studies regarding estimation accuracy of

several estimators of θi. We use the same data generating model for yi as in Section

4·1 in the main article. We consider the following additional scenarios of the true

generating distribution of ui:

(IV) ui ∼ t2, (V) ui ∼ Ga(0.5, 0.5),

(VI) ui ∼ Ga(2, 2), (VII) ui ∼ ST3(2)/31/2,

where tn is a t-distribution with n degrees of freedom, Ga(a, b) is a gamma distribution

with shape parameter a and rate paramour b, which are scaled to have mean zero

and variance 1, and STn(a) denotes a skew t-distribution with n degrees of freedom

27

and skewing parameter a in the parametrization given in ‘skewt’ package in “R”.

For estimating θi, we employ the same six methods used in the main article, and

compute mean squared errors based on 20000 replicates. Table S1 reports the values

of mean squared errors averaged within the same groups. Since we found that the

estimated Monte Carlo errors are negligibly small as given in Table 1 in the main

article, they are not shown here. In scenario (IV) and (VII), the generated values

of ui sometimes contain outliers due to the heavy tailed properties of t- or skew t-

distributions, under which the proposed methods tend to provide better performance

than the other methods. Note that the true distributions of ui in scenarios (V)

and (VI) are skewed, but it would not produce extreme values. In particular, the

distribution of ui in scenario (V) is more skewed than scenario (VI), and the proposed

methods tend to perform better than the other methods in scenario (V). On the other

hand, the standard empirical Bayes method performs quite well in scenario (VI) in

spite of misspecification of the normality assumption, and the proposed method is

comparable or slightly better than the other robust methods.

References

Ghosh, M., Maiti, T. & Roy, A. (2008). Influence functions and robust Bayes and

empirical Bayes small area estimation. Biometrika 95, 573–585.

Godambe, V. P. (1960). An optimum property of regular maximum likelihood

estimation. Ann. Math. Stat. 31, 1208–1211.

Kubokawa, T., Sugasawa, S., Ghosh, M. & Chaudhuri, S. (2016). Prediction

in heteroscedastic nested error regression models with random dispersions. Stat.

Sinica 26, 465–492.

Lohr, S. L. & Rao, J. N. K. (2009). Jackknife estimation of mean squared error

of small area predictors in nonlinear mixed models. Biometrika 96, 457–468.

Sinha, S. K. & Rao, J. N. K. (2009). Robust small area estimation. Can. J. Stat.

37, 381–399.

28

Table S1: Simulated mean squared errors averaged within the same group. Thevalues are multiplied by 1000. DEB1: density power divergence with 1% inflationrate, DEB2: density power divergence with 5% inflation rate, EB: empirical Bayes,REB1: robust empirical Bayes of Sinha & Rao (2009), REB2: robust Bayes of Sinha& Rao (2009) with maximum likelihood, GEB: robust empirical Bayes of Ghosh etal. (2008).

Scenario Group DEB1 DEB2 EB REB1 REB2 GEB

1 180 178 183 249 180 1802 329 323 340 591 332 337

(IV) 3 456 446 478 1250 464 4924 563 550 597 1402 578 6215 661 644 712 2004 691 732

1 145 143 145 216 143 1502 232 229 234 334 235 266

(V) 3 293 289 297 397 304 3214 335 332 340 427 354 3525 365 363 372 449 392 371

1 156 156 154 183 154 1572 249 249 247 296 250 293

(VI) 3 310 312 308 358 317 3514 352 358 351 395 362 3805 378 387 377 415 389 398

1 163 161 164 231 162 1632 279 274 284 509 280 311

(VII) 3 363 355 374 703 371 4114 435 426 452 771 455 4765 489 479 512 895 520 523

29

robust empirical bayes small area estimation with density ... › pdf › 1702.06635.pdf · it can...

Documents