robust empirical bayes small area estimation with density ... › pdf › 1702.06635.pdf · it can...
TRANSCRIPT
![Page 1: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/1.jpg)
Robust Empirical Bayes Small Area Estimation
with Density Power Divergence
Shonosuke Sugasawa
Center for Spatial Information Science, The University of Tokyo
Abstract
A two-stage normal hierarchical model called the Fay–Herriot model and the em-
pirical Bayes estimator are widely used to provide indirect and model-based estimates
of means in small areas. However, the performance of the empirical Bayes estimator
might be poor when the assumed normal distribution is misspecified. In this arti-
cle, we propose a simple modification by using density power divergence and suggest
a new robust empirical Bayes small area estimator. The mean squared error and
estimated mean squared error of the proposed estimator are derived based on the
asymptotic properties of the robust estimator of the model parameters. We investi-
gate the numerical performance of the proposed method through simulations and an
application to survey data.
Key words: Density power divergence; empirical Bayes estimation; Fay–Herriot
model
1
arX
iv:1
702.
0663
5v3
[st
at.M
E]
23
Aug
201
9
![Page 2: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/2.jpg)
1 Introduction
Direct survey estimators based only on area-specific sample data are known to yield
unacceptably large standard errors if the area-specific sample sizes are small. Empir-
ical Bayes methods are widely used to improve direct survey estimators by shrink-
ing toward some synthetic estimator and borrowing strength. For comprehensive
overviews of small area estimation, see Pfeffermann (2013) and Rao & Molina (2015).
A basic area-level model is a two-stage normal hierarchical model known as the
Fay–Herriot model (Fay & Herriot, 1979), described as
yi|θi ∼ N(θi, Di), θi ∼ N(xTi β,A) (i = 1, . . . ,m), (1)
where yi is the direct estimator of the small area mean θi, Di is the sampling variance,
assumed to be known, xi and β are vectors of the covariates and regression coefficients,
respectively, and A is an unknown variance. Let φ = (βT, A)T be the unknown
parameter vector in (1). Since yi ∼ N(xTi β,A + Di) under (1), φ can be estimated
by maximizing the log-marginal likelihood
log f(y;φ) = −m2
log(2π)− 1
2
m∑i=1
log(A+Di)−1
2
m∑i=1
(yi − xTi β)2
A+Di, (2)
with y = (y1, . . . , ym)T. The Bayes predictor of θi under squared error loss is
θi(yi;φ) = yi −Di
A+Di(yi − xT
i β), (3)
and the empirical Bayes estimator of θi is θi = θ(yi; φα).
The empirical Bayes estimator is useful when yi can be well-explained by the
auxiliary information xi. However, xi is not necessarily good auxiliary information
for yi in all the areas, that is, yi could be very far from xtiβ in some areas, which
we call outlying observations. For such observations, the corresponding θi could be
generated from a distribution different from the assumed one (1), that is, the assumed
distribution of θi could be misspecified. In this paper, we consider a situation where
2
![Page 3: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/3.jpg)
there exists such outlying observations and focus on the following two undesirable
properties in this situation:
1. The Bayes predictor (3) might over-shrink outlying yi toward xTi β.
2. The estimator φ based on (2) would be highly influenced by outlying observa-
tions.
These problems have been addressed in studies such as Fay & Herriot (1979) and
Ghosh et al. (2008), but we extend the body of knowledge on this topic by using
density power divergence (Basu et al., 1998).
Our insight is based on an alternative expression for the Bayes predictor (3) using
(2). From Tweedie’s formula (Efron, 2011), the Bayes predictor (3) can be written as
θi(yi;φ) = yi +Di∂
∂yilog f(yi;φ). (4)
The above expression holds as long as yi|θi ∼ N(θi, Di), that is, only the form of
marginal likelihood f(yi;φ) should be changed when the distribution of θi is not
normal as in (1). From (4), one can see that the classical empirical Bayes estimator
θi can be determined by the maximization and derivative of (2). We therefore suggest
replacing the log-marginal likelihood with density power divergence, which includes
Kullback–Leibler divergence as a special case. Density power divergence under (1)
has a closed form, and a new robust Bayes predictor has a simple form. We also
consider robust estimators of the model parameters and provide their asymptotic
properties. Moreover, we construct an estimator of the mean squared error of the
robust empirical Bayes estimator based on the parametric bootstrap and provide its
asymptotic validity.
Generalized likelihood including density power divergence has been used in Bayesian
inference (Agostinelli & Greco, 2013; Ghosh & Basu, 2016; Hooker & Vidyashankar,
2014; Jewson et al., 2018; Nakagawa & Hashimoto, 2019), who address the misspeci-
fication of the assumed distribution of observations, but we deal with the misspecifi-
cation of the assumed distribution of unobserved areal mean θi in (1), which can be
3
![Page 4: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/4.jpg)
regarded as the prior distribution of θi. Moreover, we consider frequentist inference
for the model parameters in (1).
Ghosh et al. (2008) proposed a robust Bayes predictor in the Fay–Herriot model
(1), using the influence function for β to tackle Property 1, but did not address
Property 2. Sinha & Rao (2009) proposed using Huber’s (1973) ψ-function to derive
a Bayes predictor and parameter estimators in general linear mixed models, which
tackled Properties 1 and 2, but as demonstrated in the next section, the resulting
Bayes predictor has limitations when aiming to compensate for Property 1.
2 Density Power Divergence and Bayes Predictor
2.1 Density power divergence
Although the maximum likelihood estimator minimizes empirical estimates of Kullback–
Leibler distance, it is sensitive to distributional assumptions. To overcome this, Basu
et al. (1998) introduced an estimation method based on density power divergence for
independently and identically distributed data. As the observations yi following the
Fay–Herriot model (1) are independent but not identically distributed, we consider
the following function instead of the log-likelihood function (2) (Ghosh & Basu, 2013),
Lα(y;φ) =1
α
m∑i=1
fi(yi;φ)α − 1
1 + α
m∑i=1
∫fi(t;φ)1+αdt, α ∈ (0, 1) (5)
where fi(yi;φ) is the density of yi ∼ N(xTi β,A + Di). Here, α is a tuning constant
related to robustness. Note that
limα→0
{Lα(y;φ)−m
(1
α− 1
)}= log f(y;φ),
so apart from an irrelevant constant (5) is similar to the log-likelihood function when
α ≈ 0.
Under model (1), the yis are independent and yi ∼ N(xTi β,A+Di), so (5) can be
4
![Page 5: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/5.jpg)
expressed as
Lα(y;φ) =
m∑i=1
{si(yi;φ)
α− V α
i
(1 + α)3/2
}, (6)
where Vi = {2π(A+Di)}−1/2 and
si(yi;φ) = V αi exp
{−α(yi − xT
i β)2
2(A+Di)
}.
We propose using function (6) instead of the log-marginal likelihood log f(y;φ).
2.2 Robust Bayes predictor
We define the robust Bayes predictor θRi of θi by replacing log f(y;φ) in (4) with
Lα(y;φ). Since
∂
∂yiLα(y;φ) =
∂
∂yi
si(y;φ)
α=
1
A+Di(yi − xT
i β)si(yi;φ),
the robust Bayes predictor is
θRi = yi −Di
A+Di(yi − xT
i β)si(yi;φ). (7)
The shrinkage factor in (7) is si(yi;φ)Di/(A+Di), which depends on yi, whereas the
shrinkage factor in the classical Bayes predictor (3) is Di/(A + Di), which does not
depend on yi. Further, θRi reduces to θi when α = 0 since si(yi;φ) = 1 under α = 0.
Moreover, as Di → 0, the robust Bayes predictor θRi reduces to the direct estimator
yi as the classical θi does.
2.3 Comparison with related robust Bayes predictors
For related robust Bayes predictors under model (1), Ghosh et al. (2008) proposed
the predictor
θGi = yi −Divi(A)1/2
A+DiψK
{yi − xT
i β(A)
vi(A)1/2
}, (8)
5
![Page 6: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/6.jpg)
where
β(A) =
(m∑i=1
xixTi
A+Di
)−1( m∑i=1
xiyiA+Di
), vi(A) = A+Di−xT
i
(m∑i=1
xixTi
A+Di
)−1xi,
and ψK(t) = umin(1,K/|u|) is Huber’s ψ-function with a tuning constant K > 0
that has a similar role to α. Similarly, Sinha & Rao (2009) used Huber’s ψ-function
to modify an equation for θi and suggested a robust predictor θSRi as a solution to
the equation
D−1/2i ψK
{D−1/2i (yi − θSRi )
}−A−1/2ψK
{A−1/2(θSRi − xT
i β)}
= 0. (9)
Consider observation yi which is very different from the grand (prior) mean xTi β.
For such observation, the auxiliary information xi would not be useful to improve
the direct estimator yi through the model (1), so that it would be better to keep yi
unshrunk. To see such shrinkage property, it would be useful to check the behavior
of a Bayes predictor ηi under large |yi − xtiβ|. Specifically, we consider |ηi − yi| as
|yi − xtiβ| → ∞ with fixed values of the parameters and Di. Ideally, |ηi − yi| → 0,
which means that the Bayes predictor ηi does not shrink yi under large |yi − xtiβ|.
This property was addressed in the context of small area estimation (Datta & Lahiri,
1995) as well as signal estimation (Carvalho et al., 2010). For the classical Bayes
predictor θi in (3), |θi − yi| → ∞ as |yi − xtiβ| → ∞, meaning that over-shrinkage
occurs. For θGi , |θGi − yi| → KDivi(A)1/2/(A + Di). Moreover, if |yi − θSRi | → 0 as
|yi − xtiβ| → ∞, the left-hand side of (9) reduces to −A−1/2K, so |yi − θSRi | 9 0.
On the contrary, for the proposed robust Bayes predictor θRi in (7), |θRi − yi| → 0 as
|yi − xtiβ| → ∞ holds when α > 0, since (yi − xTi β)si(yi;φ)→ 0 as |yi − xtiβ| → ∞.
When there are no random effects, that is, A = 0, the conventional Bayes predictor
(3) reduces to xTi β. However, the robust Bayes predictors, θRi , θGi , and θSRi , do not
have the property, which might be a drawback as a compensation for robustness.
6
![Page 7: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/7.jpg)
3 Robust Empirical Bayes Estimator and Mean Squared Error
3.1 Robust parameter estimation
We define the robust estimator φα of φ as φα = argmax Lα(y;φ), where Lα(y;φ) is
given in (6). Then, the robust estimator φα satisfies
∂Lα∂β
=
m∑i=1
xisi(yi;φ)(yi − xTi β)
A+Di= 0,
2∂Lα∂A
=
m∑i=1
{(yi − xT
i β)2si(yi;φ)
(A+Di)2− si(yi;φ)
A+Di+
αV αi
(α+ 1)3/2(A+Di)
}= 0.
(10)
We adopt a Newton–Raphson algorithm for solving these estimating equations, where
derivatives are given in the proof of Theorem 2 in the Supplementary Material. A
reasonable starting point would be the maximum likelihood estimates. By substitut-
ing the robust estimator φα into the robust Bayes predictor (7), we obtain the robust
empirical Bayes estimator θRi = θRi (yi, φα).
3.2 Selection of tuning parameter
The parameter α is related to robustness but is not easy to interpret. Following
Ghosh et al. (2008), we consider selection of α based on the mean squared error of
the robust Bayes predictor (7), which enables us to specify α in an interpretable way.
The mean squared error formula is given in the following theorem.
Theorem 1. Under model (1), E{(θRi − θi)2} = g1i(A) + g2i(A), where
g1i(A) =ADi
A+Di, g2i(A) =
D2i
A+Di
{V 2αi
(2α+ 1)3/2− 2V α
i
(α+ 1)3/2+ 1
}
and g2i(A) is increasing in α.
The mean squared error of the classical Bayes predictor (3) corresponds to g1i(A),
so the excess mean squared error of θRi over θi is g2i(A), which approaches 0 when α =
0. Therefore, there is a trade-off between the robustness of θRi and the mean squared
error evaluated under model (1). We define Ex(α) = 100×∑m
i=1 g2i(Aα)/∑m
i=1 g1i(Aα)
7
![Page 8: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/8.jpg)
as the percentage relative excess mean squared errors, where Aα is the robust esti-
mate of A from (10) under given α. We propose selecting α such that Ex(α) does
not exceed a user-specified percentage c%; in other words, we compute α∗ to satisfy
Ex(α∗) = c. This has a unique solution since Ex(α) increases in α from Theorem
1. We adopt the bisectional method (Burden & Faires, 2010, §2) to compute α∗. In
practice, we first compute α∗ for a specified value of c, and all the estimation pro-
cedures are conducted with α = α∗. For theoretical simplicity, we assume that α is
known in §3.3, 3.4 and 3.5, but the selection procedure is used in all the numerical
examples given in §4 to investigate its possible effect.
3.3 Asymptotic properties of the robust estimators
We consider the asymptotic properties of the robust estimator under model (1). To
this end, we assume the regularity conditions:
1. 0 < D∗ ≤ min1≤i≤mDi ≤ max1≤i≤mDi ≤ D∗ < ∞, where D∗ and D∗ do not
depend on m;
2. max1≤i≤m xTi (XTX)−1xi = O(m−1), where X = (x1, . . . , xm)T;
3. XTX/m converges to a positive definite matrix as m→∞.
Similar conditions are used by Prasad & Rao (1990). Since the derivatives in (10)
have zero expectations under model (1), we obtain the following result.
Theorem 2. Under Conditions 1–3, βα and Aα are asymptotically independent and
distributed as N(β,m−1J−1β KβJ−1β ) and N(A,KA/mJ
2A), respectively, where
Jβ =1
m(α+ 1)3/2
m∑i=1
V αi xix
Ti
A+Di, JA =
1
2m
m∑i=1
V αi (α2 + 2)
(A+Di)2(α+ 1)5/2,
Kβ =1
m(2α+ 1)3/2
m∑i=1
V 2αi xix
Ti
A+Di, KA =
1
m
m∑i=1
V 2αi
(A+Di)2
{2(2α2 + 1)
(2α+ 1)5/2− α2
(α+ 1)3
}.
When α = 0, the asymptotic covariance matrix of βα and the asymptotic variance
of Aα reduce to the asymptotic covariance matrix and variance of the maximum
8
![Page 9: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/9.jpg)
likelihood estimator of β and A given by Datta & Lahiri (2000), because (6) reduces
to the log-likelihood (2).
3.4 Mean squared error of the robust empirical Bayes estimator
To evaluate the risk of the estimator θRi , we consider the mean squared error, Mi =
E{(θRi − θi)2}, where the expectation is taken with respect to the joint distribution
of the θis and yis following the assumed model (1). The mean squared error can be
regarded as the integrated Bayes risk, and is a standard measure of risk in small area
estimation (Rao & Molina, 2015).
Since θRi depends on the estimator φα, the mean squared error Mi takes account
of the additional variability due to φα. Therefore, it is difficult to evaluate Mi an-
alytically, and a second-order approximation of Mi has been used. Following this
convention, we provide an approximation for Mi in the following theorem.
Theorem 3. Under Conditions 1–3,
Mi = g1i(A) + g2i(A) +g3i(A)
m+g4i(A)
m+
2g5i(A)
m+ o(m−1), (11)
where g1i(A) and g2i(A) are given in Theorem 1, and
g3i(A) =D2i V
2αi
B2i (2α+ 1)3/2
xTi J−1β KβJ
−1β xi, g4i(A) =
D2i V
2αi KA
B3i (2α+ 1)7/2J2
A
(α4 − 1
2α2 + 1
),
g5i(A) =αD2
i xTi J−1β KβJ
−1β xi
2B4i
(3BiC11 − αC21) +D2iKA
24B6i J
2A
{3αB2
i C21 + (α− 2)(3α+ 8)C11
}+D2i x
Ti J−1β xi
B4i
(BiC12 − αC22) +D2i
2B6i JA
{αC32 − 2BiC22 + (2− α)B2
i C12
}+
D2i
2B4i
{bA −
αV αi
(α+ 1)3/2BiJA
}{(2− α)BiC11 − αC21
}.
Here, Bi = A+Di, bA = limm→∞mE(Aα −A) is the first-order bias of Aα and
Cjk = (2j − 1)!!Bji
{V kαi (kα+ 1)−j−1/2 − V kα+α
i (kα+ α+ 1)−j−1/2},
where (2j − 1)!! = (2j − 1)(2j − 3) · · · (1).
9
![Page 10: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/10.jpg)
The derivation is given in the Supplementary Material. It should be noted that
the approximation formula (11) is based on known α, so it might be different under
estimated (selected) α. The approximation formula (11) reduces to the mean squared
error of the classical empirical Bayes estimator given by Datta & Lahiri (2000) and
Datta et al. (2005) since Cjk = 0 and g5i(A) = 0 under α = 0.
3.5 Estimation of the mean squared error
Because the approximation of the mean squared error given in Theorem 3 depends
on the unknown parameter A, it cannot be used in practice. We use a second-order
unbiased estimator of the mean squared error. An estimator T is called second-order
unbiased if E(T ) = T + o(m−1). As shown in Theorem 3, g3i(A), g4i(A), and g5i(A)
are smooth functions, so g3i(Aα), g4i(Aα), and g5i(Aα) are second-order unbiased. On
the contrary, g1i(Aα) and g2i(Aα) may have considerable bias, since g1i(A) and g2i(A)
are O(1). Since the derivation of these biases and bias-corrected estimators of these
terms require tedious algebra, we use the parametric bootstrap method in a similar
way to Butar & Lahiri (2003). We define the bootstrap estimator
Mi = 2g12i(Aα)− 1
B
B∑b=1
g12i(A(b)α ) +
g3i(Aα)
m+g4i(Aα)
m+
2g5i(Aα)
m, (12)
where g12i(A) = g1i(A) + gi2(A) and A(b)α is the bootstrap estimator based on the
parametric bootstrap samples y(b)1 , . . . , y
(b)m generated from
y(b)i = xT
i βα + v(b)i + ε
(b)i , v
(b)i ∼ N(0, Aα), ε
(b)i ∼ N(0, Di).
Following Chang & Hall (2015), we obtain the following theorem.
Theorem 4. Assume Conditions 1–3 and let M †i be the ideal version of Mi obtained
by taking B =∞. Define Mi = M †i +Ui, where Ui denotes an error term arising from
doing only a finite number of bootstrap replications. Then, E(M †i ) −Mi = o(m−1)
and Ui = Op{(mB)−1/2}.
From Theorem 4, the ideal version of the estimator (12), M †i , is second-order
10
![Page 11: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/11.jpg)
unbiased. Moreover, Theorem 4 implies that if B = O(m1+δ) for some small δ >
0, the error from the finite numbers of bootstrap iterations is op(m−1), and thus
the estimator Mi would perform similarly to the ideal estimator M †i . However, the
theoretical result is based on known α, so that the bootstrap estimator (12) is not
necessarily justified under estimated α.
Although we adopt an additive form of bias correction in (12) following Butar
& Lahiri (2003), other forms of bias correction have been proposed, such as those
of Hall & Maiti (2006). Concerning g5i(Aα), we must compute the estimate of bA,
the first-order bias of Aα, which can be calculated from the parametric bootstrap
samples. Alternatively, we may use a parametric bootstrap to compute g5i(Aα). As
shown in the proof of Theorem 3, E{(θRi − θRi )(θRi − θi)} = m−1g5i(A) + o(m−1), so
we can use
B−1B∑b=1
{θRi (y
(b)i , φ(b)α )− θRi (y
(b)i , φα)
}{(θRi (y
(b)i , φα)− θi(y(b)i , φα))
}
instead of m−1g5i(A), where φ(b)α is the parametric bootstrap estimator. Similarly to
Theorem 4, we can evaluate an error from a finite number of bootstrap iterations,
but its evaluation is similar and the detailed proof is omitted.
4 Examples
4.1 Simulation studies
We first investigate the estimation accuracy of the proposed robust estimator together
with the existing estimators. We consider the Fay–Herriot model
yi = θi + εi, θi = β0 + β1xi +A1/2ui, i = 1, . . . ,m,
where m = 30, β0 = 0, β1 = 2, A =0·5, and εi ∼ N(0, Di). The auxiliary variables xi
are generated from the uniform distribution on (0, 1). Further, we divide m areas into
five groups with an equal number of areas and set the same value of Di within the
same groups. The group Di pattern is (0·2, 0·4, 0·6, 0·8, 1·0). For the distribution of
11
![Page 12: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/12.jpg)
ui, we adopt the structure: ui ∼ (1− ξ)N(0, 1) + ξN(0, 102), where c determines the
degree of misspecification of the assumed distribution (contamination by outliers).
We consider three scenarios: (I) ξ = 0, (II) ξ =0·15, and (III) ξ =0·30. Note that,
in scenarios (II) and (III), some observations have very large residuals and auxiliary
information xi would be useful for such outlying observations.
We estimate θi by using the proposed robust empirical Bayes estimator with
density power divergence. We used two inflation rates, c = 1 and c = 5, and α was
selected following the procedure given in §3.2. We adopt four alternative methods:
the classical empirical Bayes estimator, the robust Bayes estimator defined in (9)
with the model parameters estimated by the robust estimation equation proposed by
Sinha & Rao (2009) and the maximum likelihood method, and the robust empirical
Bayes estimator (8) proposed by Ghosh et al. (2008) with the maximum likelihood
estimator for the model parameters. Following Sinha & Rao (2009), we set K=1·345
in Huber’s ψ-function in equation (9). A suitable value of K = Ki in (8) is selected
in the same way as in Ghosh et al. (2008) with a 5% inflation rate.
We compute the mean squared errors of those estimators based on 20000 repli-
cates. Table 1 reports the values of the mean squared errors averaged within the
same groups as well as estimated Monte Carlo errors in the parenthesis. From the
reported values, the Monte Carlo errors seems negligibly small compared with the
mean squared errors. Since the normality assumption in the standard empirical Bayes
method is correct in scenario (I), it would be natural that the empirical Bayes method
provides smaller mean squared errors than the other methods, but the performance
of some robust methods including the proposed method seem comparable with that
of the standard method. On the other hand, there are outlying observations in sce-
narios (II) and (III), under which the proposed methods tend to produce smaller
mean squared errors than the other methods, especially for groups with large sam-
pling variances. In particular, the performance of the proposed method with c = 5 is
better than that with c = 1 in these scenarios since the proposed method gets more
robust with larger c. However, the performance of the proposed method with c = 5 is
worse than c = 1 in scenario (I), which would be a reasonable price for the stronger
12
![Page 13: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/13.jpg)
robustness as confirmed in scenarios (II) and (III). In the Supplementary Material,
we provide additional results for other scenarios of ui such as heavy tailed or skewed
distributions.
Table 1: Mean squared errors averaged within the same group. The estimated MonteCarlo errors are reported in the parenthesis. All the values are multiplied by 1000.DEB1: density power divergence with 1% inflation rate, DEB2: density power di-vergence with 5% inflation rate, EB: empirical Bayes, REB1: robust empirical Bayesof Sinha & Rao (2009), REB2: robust Bayes of Sinha & Rao (2009) with maximumlikelihood, GEB: robust empirical Bayes of Ghosh et al. (2008).
Scenario Group DEB1 DEB2 EB REB1 REB2 GEB
1 159(0·3) 159(0·3) 156(0·3) 173(0·3) 158(0·3) 158(0·3)2 256(0·4) 258(0·4) 252(0·4) 280(0·5) 258(0·4) 309(0·5)
(I) 3 320(0·5) 326(0·6) 316(0·5) 343(0·6) 323(0·6) 369(0·7)4 356(0·6) 366(0·6) 352(0·6) 378(0·7) 359(0·6) 393(0·7)5 383(0·7) 397(0·7) 378(0·7) 400(0·7) 382(0·6) 412(0·7)
1 186(0·3) 179(0·3) 192(0·3) 328(5·5) 189(0·3) 189(0·3)2 353(0·6) 327(0·6) 372(0·6) 1017(12·8) 362(0·6) 364(0·6)
(II) 3 506(0·9) 458(0·8) 545(1·0) 1833(16·9) 523(0·9) 529(0·9)4 640(1·2) 571(1·1) 701(1·3) 2702(21·3) 667(1·2) 679(1·2)5 771(1·5) 678(1·3) 858(1·6) 3594(24·7) 810(1·5) 825(1·5)
1 194(0·3) 190(0·3) 196(0·3) 223(2·9) 195(0·3) 195(0·3)2 382(0·6) 367(0·6) 389(0·7) 517(6·2) 385(0·6) 385(0·6)
(III) 3 562(1·0) 534(0·9) 578(1·0) 949(9·9) 568(1·0) 568(1·0)4 739(1·2) 696(1·2) 764(1·3) 1518(14·4) 748(1·3) 747(1·2)5 900(1·5) 840(1·5) 937(1·6) 2253(18·5) 911(1·5) 913(1·5)
We next investigate the finite sample performance of the bootstrap estimator of
the mean squared error Mi. We adopt the same data-generating model with the
three scenarios of the distribution of ui in the previous study with m = 20. We also
consider the naive estimators of the mean squared error, M(n1)i and M
(n2)i , obtained
by replacing A with Aα in the mean squared error formula given in Theorems 1 and
3, respectively. Note that M(n1)i ignores the variability of the estimation of model
parameters, and M(n2)i ignores the bias of g1i(Aα) + g2i(Aα). The motivation using
these estimators together with Mi is to clarify the importance of the second order
unbiasedness under finite sample settings.
We estimate the true values of the mean squared error of the robust empirical
13
![Page 14: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/14.jpg)
Bayes estimator Mi in advance, based on 5000 simulated data. The relative bias and
square root of the relative mean squared error of the estimator Mi are
RBias(Mi) = 100× E(Mi −Mi)/Mi,
RRMSE(Mi) = 100× E{(Mi −Mi)2}/M2
i .
These values are computed as averages based on 2000 simulation runs with the boot-
strap sample size 1000; they are also averaged within the same groups.
Table 2 reports the relative biases and square roots of the relative mean squared
errors of Mi, M(n1)i , and M
(n2)i . The bootstrap estimator Mi outperforms the other
estimators owing to the second-order unbiasedness provided in Theorem 4. The crude
estimators M(n1)i and M
(n2)i seem undesirable in practice since they have serious
negative biases.
Table 2: Relative bias and root relative mean squared errors for the estimators of themean squared error.
RBias RRMSE
Scenario Group M(n1)i M
(n2)i Mi M
(n1)i M
(n2)i Mi
1 −30·9 −14·2 1·3 42·2 19·1 18·32 −34·6 −20·0 −4·5 46·9 31·3 31·0
I 3 −37·3 −23·5 −8·0 49·8 36·9 36·54 −34·8 −25·9 −9·2 50·1 42·9 43·45 −36·8 −26·3 −9·5 51·7 43·5 44·41 −8·3 −4·4 4·3 20·6 13·0 11·42 −10·9 −6·7 4·1 25·7 20·2 18·3
II 3 −15·1 −10·6 1·3 29·8 24·8 22·04 −12·5 −9·5 4·4 31·7 29·0 28·15 −16·2 −12·4 2·1 34·1 30·8 29·41 −2·8 −1·7 3·8 12·3 9·1 9·02 −5·2 −3·8 3·9 16·0 13·6 12·5
III 3 −7·1 −5·4 4·1 18·5 16·2 14·94 −6·4 −5·3 6·0 20·4 19·1 18·75 −9·7 −8·3 4·3 22·6 20·9 19·7
14
![Page 15: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/15.jpg)
4.2 Fresh milk expenditure data
We consider an application to fresh milk expenditure data from the U.S. Bureau of
Labor Statistics, which was used in Arora & Lahiri (1997) and You & Chapman
(2006). In the data set, the estimated values of the average expenditure on fresh milk
for 1989, yi, are available for 43 areas, with the sampling variances Di. Following
Arora & Lahiri (1997), we consider the Fay–Herriot model (1) with xTi β = βj (j =
1, . . . , 4) if the ith area belongs to the jth region. The four regions are R1 = {1, . . . , 7},
R2 = {8, . . . , 14}, R3 = {15, . . . , 25}, and R4 = {26, . . . , 43}.
Figure 1 illustrates the scatterplot of yi with the maximum likelihood estimates
of β1, . . . , β4, suggesting that there are some outliers in regions R1 and R2. To see
this, we compute the standardized residuals
ri = (A+Di)−1/2
{yi −
4∑j=1
βjI(i ∈Mj)}, i = 1, . . . ,m.
When the Fay–Herriot model (1) is correctly specified, the distribution of ri is close
to standard normal. However, as shown in Table 4, the absolute values of ri are high
in some areas.
We estimate the parameters using the robust estimation equation of Sinha & Rao
(2009) and the density power divergence method proposed as the solution to (10) with
1% and 5% inflation rates. Table 3 shows that the estimates of β3 and β4 are similar
for the four methods, whereas those of β1, β2, and A are not necessarily because of
the outlying areas in regions R1 and R2.
To estimate θi, we adopt the classical empirical Bayes estimator θEBi , the proposed
robust empirical Bayes estimator θRi with a 5% inflation rate, and the robust empirical
Bayes estimator θSRi proposed by Sinha & Rao (2009). We use Mi given in (12) to
estimate the mean squared error of Mi with B = 1000. We then define the mean
squared errors of θEBi and θSRi as MEBi and MSR
i , respectively; these are estimated
from the result in Datta & Lahiri (2000) for MEBi and ‘saeRobust’ package in “R” for
MSRi . Table 4 shows that the differences between θEBi and θRi are large in areas with
15
![Page 16: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/16.jpg)
large absolute values of ri. Similar phenomena can be observed for the relationship
between MEBi and Mi. On the contrary, the values of θSRi and MSR
i are different
from the others, which might come about from the lower estimate of A as presented
in Table 3.
Table 3: Estimates of the model parameters from the four methods. Standard errorsare shown in the parenthesis. The estimates and standard errors of A are multipliedby 100.
β1 β2 β3 β4 A
Maximum likelihood 0·97 1·10 1·19 0·73 1·55(0·07) (0·07) (0·06) (0·04) (0·68)
Robust maximum likelihood 1·01 1·18 1·19 0·73 0·80(0·06) (0·07) (0·05) (0·03) (0·53)
Density power divergence (1% inflation) 0·97 1·12 1·19 0·73 1·50(0·07) (0·07) (0·06) (0·04) (0·65)
Density power divergence (5% inflation) 0·98 1·15 1·19 0·73 1·35(0·06) (0·07) (0·06) (0·04) (0·60)
Table 4: Values of the empirical Bayes estimator, proposed robust empirical Bayesestimator, and robust empirical Bayes estimator of Sinha & Rao (2009) with theirestimates of the mean squared errors. The values of MEB
i , Mi, and MSRi are multiplied
by 100.
area region yi ri θEBi θRi θSRi MEBi Mi MSR
i
1 1 1·10 0·64 1·02 1·02 1·03 1·35 1·35 0·814 1 0·63 −2·05 0·78 0·76 0·91 0·85 0·85 4·705 1 0·75 −1·25 0·86 0·87 0·92 0·96 0·96 4·449 2 1·41 1·48 1·21 1·24 1·23 1·42 1·40 4·1911 2 0·62 −3·01 0·80 0·73 1·07 0·77 0·78 5·0612 2 1·46 1·54 1·20 1·24 1·23 1·63 1·62 5·6620 3 1·29 0·48 1·23 1·22 1·22 1·31 1·32 0·7725 3 1·19 −0·01 1·19 1·19 1·19 0·81 0·84 0·5631 4 0·89 0·63 0·76 0·76 0·75 1·54 1·63 0·7837 4 0·44 −1·84 0·54 0·54 0·61 0·64 0·65 3·59
16
![Page 17: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/17.jpg)
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
0 10 20 30 40
0.4
0.6
0.8
1.0
1.2
1.4
area
expe
nditu
re
Figure 1: Scatterplot of yi with the maximum likelihood estimates of β1, . . . , β4.
5 Final Remarks
The proposed method would be recommended compared with existing methods espe-
cially when there exist outlying observations, as shown in our numerical studies. Since
we revealed some asymptotic properties of the proposed method only under the cor-
rect model, investigating asymptotic properties under general model misspecification
would be an interesting future work. Although this study focused on the Fay–Herriot
model, which is standard in small area estimation, the nested error regression model
(Batesse et al., 1988) would be more useful when unit-level data is available. While
several robust methods have been already proposed (Chambers et al., 2014; Cham-
bers & Tzavidis, 2006; Sinha & Rao, 2009), the extension of the proposed method to
unit-level data would be an interesting research direction. Extending the proposed
idea to the non-normal model based on natural exponential family (Ghosh & Maiti,
2004) would also be worthwhile. Finally, several forms of generalized likelihood other
than density power divergence have been proposed, such as γ-divergence (Fujisawa &
Eguchi, 2008). The main advantage of density power divergence in this context is its
17
![Page 18: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/18.jpg)
mathematical simplicity. As presented in §2, the robust Bayes predictor has a simple
form. A detailed comparison among generalized likelihood methods is left to a future
study.
Acknowledgments
The author was supported by the Japan Society of the Promotion of Science (KAK-
ENHI) grant number 18K12757.
Appendix
Proof of Theorem 1. Since θi = E(θi|yi),
E{(θRi − θi)2} = E{(θi − θi)2}+ E{(θRi − θi)2}
=ADi
A+Di+ E{(θRi − θi)2} ≡ g1i(A) + g2i(A).
Since θRi − θi = (A+Di)−1Di(yi − xT
i β)(1− si),
g2i(A) =D2i
(A+Di)2E{
(yi − xTi β)2(1− si)2
}.
Using Lemma 1 in the Supplementary Material, we obtain the expression for g2i(A).
We next show that g2i(A) is increasing in α ∈ (0, 1). For notational simplicity, we
put µi = xTi β. Since (yi − µi)2(1− si)2 is a continuous and differentiable function of
yi and α, we have
∂g2i(A)
∂α= − 2D2
i
(A+Di)2E{
(yi − µi)2(1− si)∂si∂α
}.
Note that si = fi(yi;φ)α. If f(yi;φ) ≤ 1, then 1 − si ≥ 0 and si is decreasing with
respect to α. Then, it follows that (1 − si)∂si/∂α ≤ 0. On the other hand, we have
(1− si)∂si/∂α ≤ 0 if f(yi;φ) ≥ 1 by a similar argument. Hence, (1− si)∂si/∂α ≤ 0
always follows, thereby we have ∂g2i(A)/∂α ≥ 0 for α ∈ (0, 1), which completes the
proof.
18
![Page 19: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/19.jpg)
Proof of Theorem 4. It follows that
E{g12i(Aα)− g12i(A)} = bA∂g12i(A)
∂A+
1
2m
∂2g12i(A)
∂A2
KA
J2A
+1
6
∂3g12i(A)
∂A3∗
E{(Aα −A)3},
where A∗ is between A and Aα. From Lemma 2 in the Supplementary Material,
it holds that E{g12i(Aα) − g12i(A)} = m−1d(A) + o(m−1), where d(·) is a smooth
function. Then, from Butar & Lahiri (2003), we have E(M †i −Mi) = o(m−1).
From the definition of Ui, we have
Ui =1
B
B∑b=1
g12i(A(b)α )− E∗{g12i(A∗α)},
where E∗ denotes the expectation with respect to the bootstrap sample. Noting
E(Ui|y) = 0, we observe that
var(Ui) = E{var(Ui|y)} =1
BE[var{g12i(A(1)
α )|y}]
=1
BE{E([g12i(A
(1)α )− E∗{g12i(A∗α)}]2|y)}
=1
BE{g′12i(A†)2(A(1)
α −A)2},
where g′12i(A) = ∂g12i(A)/∂A and A† = εA+ (1− ε)A(1)α for some ε ∈ [0, 1]. Straight-
forward calculation shows that
g′12i(A) =D2i
(A+Di)2− g2i(A)
A+Di+
2παDi
A+Di
{Uα+2i
(α+ 1)3/2−
U2α+2i
(2α+ 1)3/2
}.
Note that 0 ≤ Ui ≤ (2πDi)−1/2, thereby supA |g′12i(A)| ≤ C(Di, α) < ∞ under
Condition 1. Hence, it follows that
var(Ui) ≤1
BC(Di, α)2E{(A(1)
α −A)2} = O{(mB)−1},
which completes the proof.
19
![Page 20: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/20.jpg)
References
Agostinelli, G. & Greco, L. (2013). A weighted strategy to handle likelihood
uncertainty in Bayesian inference. Comput. Stat. 28, 319–339.
Arora, V. & Lahiri, P. (1997). On the superiority of the Bayesian method over
the BLUP in small area estimation problems. Statist. Sinica 7, 1053–1063.
Basu, A., Harris, I. R., Hjort, N. L. & Jones, M. C. (1998). Robust and efficient
estimation by minimizing a density power divergence. Biometrika 85, 549–559.
Battese, G.E., Harter, R.M. & Fuller, W.A. (1988). An error-components
model for prediction of county crop areas using survey and satellite data. J. Am.
Statist. Assoc. 83, 28–36.
Burden, R. L. & Faires, J. D. (2010). Numerical Analysis, 9th Edition, Stanford:
Brooks-Cole Publishing.
Butar, F. B. & Lahiri, P. (2003). On measures of uncertainty of empirical Bayes
small-area estimators. J. Stat. Plan. Infer. 12, 63–76.
Carvalho, C. M., Polson, N. G. & Scott, J. G. (2010). The horseshoe estimator
for sparse signals. Biometrika 97, 465–480.
Chambers, R. L., Chandra, H., Salvati, N. & Tzavidis, N. (2014). Outlier
robust small area estimation. J. R. Stat. Soc. B. 76, 47–69.
Chambers, R. L. & Tzavidis, N. (2006). M-quantile models for small area estima-
tion. Biometrika 93, 255–268.
Chang, J. & Hall, P. (2015). Double-bootstrap methods that use a single double-
bootstrap simulation. Biometrika 102, 203–214.
Datta, G. S. & Lahiri, P. (1995). Robust hierarchical Bayes estimation of small
area characteristics in the presence of covariates and outliers. J. Multivariate Anal.
54, 310–328.
20
![Page 21: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/21.jpg)
Datta, G. S. & Lahiri, P. (2000). A unified measure of uncertainty of estimated
best linear unbiased predictors in small area estimation problems. Stat. Sinica.
10, 613–627.
Datta, G.S., Rao, J.N.K. & Smith, D.D. (2005). On measuring the variability of
small area estimators under a basic area level model. Biometrika 92, 183–196.
Efron, B. (2011). Tweedie’s formula and selection bias. J. Am. Stat. Assoc. 106,
1602–1614.
Fay, R. E. & Herriot, R. A. (1979). Estimates of income for small places: an
application of James–Stein procedures to census data. J. Am. Stat. Assoc. 74,
269–277.
Fujisawa, H. & Eguchi, S. (2008). Robust parameter estimation with a small bias
against heavy contamination J. Multivariate Anal. 99, 2053–2081.
Ghosh, A. & Basu, A. (2013). Robust estimation for independent non-homogeneous
observations using density power divergence with applications to linear regression.
Electr. J. Stat. 7, 2420–2456.
Ghosh, A. & Basu, A. (2016). Robust Bayes estimation using the density power
divergence. Ann. Inst. Stat. Math. 68, 413–437.
Ghosh, M. & Maiti, T. (2004). Small-area estimation based on natural exponential
family quadratic variance function models and survey weights. Biometrika 91, 95–
112.
Ghosh, M., Maiti, T. & Roy, A. (2008). Influence functions and robust Bayes and
empirical Bayes small area estimation. Biometrika 95, 573–585.
Hall, P. & Maiti, T. (2006). On parametric bootstrap methods for small area
prediction. J. R. Stat. Soc. B. 68, 221–238.
Hooker, G. & Vidyashankar, A. B. (2014). Bayesian model robustness via dis-
parities. TEST, 23, 556–584.
21
![Page 22: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/22.jpg)
Huber, P. J. (1973). Robust regression: asymptotics, conjectures and Monte Carlo.
Ann. Stat. 1, 799–821.
Jewson, J., Smith, J. Q. & Holmes, C. (2018). Principles of Bayesian inference
using general divergence criteria. Entropy 20, 442.
Nakagawa, T. & Hashimoto, S. (2019). Robust Bayesian inference via γ-
divergence. Commun. Stat. Theory., to appear.
Pfeffermann, D. (2013). New important developments in small area estimation.
Stat. Sci. 28, 40–68.
Prasad, N. & Rao, J. N. K. (1990). The estimation of mean-squared errors of
small-area estimators. J. Am. Stat. Assoc. 90, 758–766.
Rao, J.N.K. & Molina, I. (2015). Small Area Estimation, 2nd Edition, New York:
Wiley.
Sinha, S. K. & Rao, J. N. K. (2009). Robust small area estimation. Can. J. Stat.
37, 381–399.
You, Y. & Chapman, B. (2006). Small area estimation using area level models and
estimated sampling variances. Surv. Method. 32, 97–103.
22
![Page 23: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/23.jpg)
Supplementary material for “Robust Empirical Bayes
Small Area Estimation with Density Power Divergence”
S1 Useful Lemma
In what follows, we use si instead of si(yi;φ) when there is no confusion.
Lemma S1. When yi ∼ N(xTi β,A+Di), it holds that
E{(yi − xTi β)2j−1ski } = 0, j, k = 1, 2, . . .
E{(yi − xTi β)2jski } = V kα
i (kα+ 1)−j−1/2(2j − 1)!!(A+Di)j , j, k = 0, 1, 2, . . . ,
where (2j − 1)!! = (2j − 1)(2j − 3) · · · (1).
Proof. Note that
E{(yi − xTi β)cski } =
V kαi
{2π(A+Di)}1/2
∫ ∞−∞
(t− xTi β)c exp
{−(kα+ 1)(t− xT
i β)2
2(A+Di)
}dt
=V kαi
(kα+ 1)1/2E(Zc),
where Z ∼ N{0, (A+Di)/(kα+ 1)}. Hence, the expectation is 0 when c is odd. On
the other hand, when c = 2j, j = 0, 1, 2, . . ., it follows that E(Z2j) = (2j − 1)!!(A+
Di)j(kα+ 1)−j , which completes the proof.
S2 Proof of Theorem 2
Let Fβ and FA be the first and second estimating functions in (10), and defined
Fφ = (FTβ , FA)T Under Conditions 1-3 in the main article, the theory of unbiased
estimating equation by Godambe (1960) shows that φα = (βTα, Aα)T is consistent and
asymptotically normal, with the asymptotic covariance matrix given by
limm→∞
E
(1
m
∂Fφ∂φT
)−1E
(1
mFφF
Tφ
)E
(1
m
∂Fφ∂φT
)−1.
23
![Page 24: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/24.jpg)
Straightforward calculation shows that
∂Fβ∂βT
=m∑i=1
xixTi si
(A+Di)2
{α(yi − xT
i β)2 − (A+Di)}
∂FA∂β
=m∑i=1
xisi(yi − xTi β)
(A+Di)3
{α(yi − xT
i β)2 − (α+ 2)(A+Di)}
∂FA∂A
=1
2
m∑i=1
{αsi(yi − xT
i β)4
(A+Di)4− (α2 + 2α)V α
i
(α+ 1)3/2(A+Di)2
− 2si(α+ 2)
(A+Di)3(yi − xT
i β)2 +si(α+ 2)
(A+Di)2
}.
Then, using lemma S1, E(∂Fβ/∂β
T)
= mJβ, E(∂FA/∂β
)= 0 and E
(∂FA/∂A
)=
−mJA. Moreover, from (12) in the main article and lemma S1, E(FβFtβ) = Kβ,
E(FβFA) = 0 and E(F 2A) = KA. Hence, βα and Aα is asymptotically independent
and their asymptotic covariance matrices are J−1β KβJ−1β and J−1A KAJ
−1A , respectively.
S3 Proof of Theorem 3
The mean squared error Mi = E{(θRi − θi)2} can be decomposed as
Mi = E{(θRi − θi)2}+ 2E{(θRi − θi)(θRi − θRi )}+ E{(θRi − θRi )2},
and the first term reduces to g1i(A) + g2i(A) whose expressions are given in Theorem
1.
We first evaluate the third term. Taylor series expansion shows that
θRi − θRi =∂θRi∂φT
(φα − φ) +1
2(φα − φ)T
∂2θRi∂φ∗∂φT
∗(φα − φ),
where φ∗ is on the line connecting φ and φα. Then, we get
E{(θRi − θRi )2} = E[{∂θRi∂φT
(φα − φ)}2]
+R1 +R2,
where R1 = E{(φα − φ)T(∂θRi /∂φ)(φα − φ)T(∂2θRi /∂φ∗∂φT∗ )(φα − φ)T} and R2 =
E[{
(φα − φ)T(∂2θRi /∂φ∗∂φT∗ )(φα − φ)T
}2]/4. Here we use the following lemma.
24
![Page 25: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/25.jpg)
Lemma S2. Under Conditions 1-3 in the main article, E(|φα(k)−φk|r) = O(m−r/2)
for any r > 0 and k = 1, . . . , p+ 1, where φα(k) is the kth element of φα.
A rigorous proof of the lemma requires a uniform integrability, but intuitively,
from Theorem 2, E(mr|φα(k) − φk|r) = O(1) under Conditions 1-3, which leads to
Lemma S2.
In what follows, we use ui = yi − xTi β and Bi = A+Di for notational simplicity.
The straightforward calculation shows that
∂θRi∂β
= −DisixiB2i
(αu2i −Bi
),
∂θRi∂A
= −Disiui2B3
i
{αu2i − (2− α)Bi
}.
Moreover, we have
∂2θRi∂β∂βT
= −DisixixTi
B3i
(αu3i − 3Biui)
∂2θRi∂A2
=Disiui12B5
i
{−3α2u3i + 3α2Biu
2i + (4α− 3α2)Biui + (α− 2)(3α+ 8)B2
i
}.
Note that
R1 =
p+1∑j=1
p+1∑k=1
p+1∑`=1
E{(∂θRi
∂φj
)( ∂2θRi∂φk∂φ`
)(φα(j) − φj)(φα(k) − φk)(φα(`) − φ`)
}
≡p+1∑j=1
p+1∑k=1
p+1∑`=1
U1jk`.
From Holder’s inequality,
|U1jkl| ≤ E{∣∣∣(∂θRi
∂φj
)( ∂2θRi∂φ∗k∂φ
∗`
)∣∣∣4}1/4E{∣∣∣(φα(j) − φj)(φα(k) − φk)(φα(`) − φ`)∣∣∣4/3}3/4
≤ E(∣∣∣∂θRi∂φj
∣∣∣8)1/8E(∣∣∣ ∂2θRi∂φ∗k∂φ
∗`
∣∣∣8)1/8 ∏a∈{j,k,`}
E(∣∣∣φα(a) − φa∣∣∣4)1/4.
Since E(|∂θRi /∂φj |8
)<∞ and E
(|∂2θRi /∂φ∗k∂φ∗` |8
)<∞, R1 = o(m−1) from Lemma
S2. A similar evaluation shows that R2 = o(m−1). Using the similar argument given
25
![Page 26: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/26.jpg)
in the proof of Theorem 3 in Kubokawa et al. (2016),
E[{∂θRi∂φT
(φα − φ)}2]
= tr[E(∂θRi∂φ
∂θRi∂φT
)E{
(φα − φ)(φα − φ)T}]
+ o(m−1)
=1
mtr{E(∂θRi∂β
∂θRi∂βT
)J−1β KβJ
−1β
}+
1
mE{(∂θRi
∂A
)2}J−1A KAJ
−1A + o(m−1).
From Theorem 2 and
E(∂θRi∂β
∂θRi∂βT
)=
D2i V
2αi xix
Ti
(A+D2i )(2α+ 1)3/2
E{(∂θRi
∂A
)2}=
D2i V
2αi
(A+Di)3(2α+ 1)7/2(α4 − 1
2α2 + 1
),
we obtain E{(θRi − θRi )2} = m−1g3i(A) +m−1g4i(A) + o(m−1).
Concerning E{(θRi − θi)(θRi − θRi )},
E{(θRi − θi)(θRi − θRi )} = E{(θRi − θi)(θRi − θRi )} =Di
BiE{(1− si)ui(θRi − θRi )}.
By Taylor series expansion,
θRi − θRi =∂θRi∂φT
(φα − φ) +1
2(φα − φ)T
∂2θRi∂φ∂φT
(φα − φ) +R3,
where
R3 =1
6
p+1∑k=1
p+1∑j=1
p+1∑`=1
∂3θRi∂φ∗k∂φ
∗j∂φ
∗`
(φα(k) − φk)(φα(j) − φj)(φα(`) − φ`).
Similarly to the evaluation of R1, E{(1− si)uiR3} = o(m−1). Then,
E{(θRi − θi)(θRi − θRi )}
=Di
BiE{
(1− si)ui∂θRi∂φT
(φα − φ)}
+Di
2BiE{
(1− si)ui(φα − φ)T∂2θRi∂φ∂φT
(φα − φ)}
+ o(m−1)
≡ T1 + T2 + o(m−1),
26
![Page 27: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/27.jpg)
where
T2 =Di
2mBitr[E{
(1− si)ui∂2θRi∂β∂βT
}J−1β KβJ
−1β
]+
DiKA
2mBiJ2A
E{
(1− si)ui∂2θRi∂A2
}+ o(m−1).
From Lohr and Rao (2009),
E(βα − β|yi) = bβ −m−1B−1i J−1β xisiui + op(m−1)
E(Aα −A|yi) = bA −m−1J−1A{u2i siB2i
− siBi
+αV α
i
(α+ 1)3/2Bi
}+ op(m
−1),
where bβ = limm→∞mE(βα − β) and bA = limm→∞mE(Aα −A), so
T1 = − Di
mB2i
E{si(1− si)u2i
∂θRi∂βT
J−1β xi
}− Di
mB3i JA
E{si(1− si)ui
∂θRi∂A
(u2i −Bi)}
+Di
BiE{
(1− si)ui∂θRi∂A
}{bA −
αV αi
m(α+ 1)3/2BiJA
}+ o(m−1).
Combining these results and using Lemma S1, E{(θRi − θi)(θRi − θRi )} = m−1g5i(A) +
o(m−1), which completes the proof.
S4 Additional simulation study
We show results of additional simulation studies regarding estimation accuracy of
several estimators of θi. We use the same data generating model for yi as in Section
4·1 in the main article. We consider the following additional scenarios of the true
generating distribution of ui:
(IV) ui ∼ t2, (V) ui ∼ Ga(0.5, 0.5),
(VI) ui ∼ Ga(2, 2), (VII) ui ∼ ST3(2)/31/2,
where tn is a t-distribution with n degrees of freedom, Ga(a, b) is a gamma distribution
with shape parameter a and rate paramour b, which are scaled to have mean zero
and variance 1, and STn(a) denotes a skew t-distribution with n degrees of freedom
27
![Page 28: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/28.jpg)
and skewing parameter a in the parametrization given in ‘skewt’ package in “R”.
For estimating θi, we employ the same six methods used in the main article, and
compute mean squared errors based on 20000 replicates. Table S1 reports the values
of mean squared errors averaged within the same groups. Since we found that the
estimated Monte Carlo errors are negligibly small as given in Table 1 in the main
article, they are not shown here. In scenario (IV) and (VII), the generated values
of ui sometimes contain outliers due to the heavy tailed properties of t- or skew t-
distributions, under which the proposed methods tend to provide better performance
than the other methods. Note that the true distributions of ui in scenarios (V)
and (VI) are skewed, but it would not produce extreme values. In particular, the
distribution of ui in scenario (V) is more skewed than scenario (VI), and the proposed
methods tend to perform better than the other methods in scenario (V). On the other
hand, the standard empirical Bayes method performs quite well in scenario (VI) in
spite of misspecification of the normality assumption, and the proposed method is
comparable or slightly better than the other robust methods.
References
Ghosh, M., Maiti, T. & Roy, A. (2008). Influence functions and robust Bayes and
empirical Bayes small area estimation. Biometrika 95, 573–585.
Godambe, V. P. (1960). An optimum property of regular maximum likelihood
estimation. Ann. Math. Stat. 31, 1208–1211.
Kubokawa, T., Sugasawa, S., Ghosh, M. & Chaudhuri, S. (2016). Prediction
in heteroscedastic nested error regression models with random dispersions. Stat.
Sinica 26, 465–492.
Lohr, S. L. & Rao, J. N. K. (2009). Jackknife estimation of mean squared error
of small area predictors in nonlinear mixed models. Biometrika 96, 457–468.
Sinha, S. K. & Rao, J. N. K. (2009). Robust small area estimation. Can. J. Stat.
37, 381–399.
28
![Page 29: Robust Empirical Bayes Small Area Estimation with Density ... › pdf › 1702.06635.pdf · it can be easily obtained equation can be solved by simple numerical methods, for example,](https://reader033.vdocuments.net/reader033/viewer/2022060500/5f1a5079095e3b55945b4d11/html5/thumbnails/29.jpg)
Table S1: Simulated mean squared errors averaged within the same group. Thevalues are multiplied by 1000. DEB1: density power divergence with 1% inflationrate, DEB2: density power divergence with 5% inflation rate, EB: empirical Bayes,REB1: robust empirical Bayes of Sinha & Rao (2009), REB2: robust Bayes of Sinha& Rao (2009) with maximum likelihood, GEB: robust empirical Bayes of Ghosh etal. (2008).
Scenario Group DEB1 DEB2 EB REB1 REB2 GEB
1 180 178 183 249 180 1802 329 323 340 591 332 337
(IV) 3 456 446 478 1250 464 4924 563 550 597 1402 578 6215 661 644 712 2004 691 732
1 145 143 145 216 143 1502 232 229 234 334 235 266
(V) 3 293 289 297 397 304 3214 335 332 340 427 354 3525 365 363 372 449 392 371
1 156 156 154 183 154 1572 249 249 247 296 250 293
(VI) 3 310 312 308 358 317 3514 352 358 351 395 362 3805 378 387 377 415 389 398
1 163 161 164 231 162 1632 279 274 284 509 280 311
(VII) 3 363 355 374 703 371 4114 435 426 452 771 455 4765 489 479 512 895 520 523
29