empirical and hierarchical bayesian methods with

97
EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH APPLICATIONS TO SMALL AREA ESTIMATION By ANANYA ROY A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2007 1

Upload: others

Post on 15-Feb-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH APPLICATIONSTO SMALL AREA ESTIMATION

By

ANANYA ROY

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2007

1

Page 2: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

c© 2007 Ananya Roy

2

Page 3: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

To my teachers and my family

3

Page 4: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

ACKNOWLEDGMENTS

I would like to extend my sincerest gratitude to my advisor, Professor Malay Ghosh

for his invaluable guidance and constant support throughout my graduate studies. It was

a privilege to have him as my advisor, who was always available to help. I would also like

to thank Professors Brett Presnell, Ronald Randles, Michael Daniels and Murali Rao for

serving on my committee. My special thanks to Professors Dalho Kim and Ming-Hui Chen

for helping me with computation.

I am indebted to my professors from my undergraduate days, whose teaching inspired

me to pursue higher studies in Statistics. I would especially like to thank Professor

Bhaskar Bose from my higher secondary days. He introduced me to Statistics and helped

me develop a basic understanding of the subject. My sincerest thanks to Professor Indranil

Mukherjee for his constant encouragement during my undergraduate days. The extent of

his knowledge and problem-solving capabilities have always inspired me. During my stay

at the University of Florida, I have been lucky to have marvelous teachers like Professors

Malay Ghosh, Brett Presnell, Andre Khuri and Ronald Randles. They have taught me a

lot and would always inspire me to be a good teacher and researcher.

I would like to thank my friends and classmates in the department, especially

George, Mihai, for helping me whenever I needed. Life would have been really different

in Gainesville if not for friends like Vivekananda, Siulidi and Bharati. I am grateful for

their support, guidance and, well, just for being there. I have been fortunate to have come

to know and grow close to people like Dolakakima, Bhramardi and little Munai. Their

genuine affection and warmth, along with the love and support of my friends, and, of

course ’Asha Parivar’, has made Gainesville a second home to me.

I don’t think any of this would have been possible if not for the encouragement and

support of my family. My heartfelt thanks to my parents, who have always supported

me in all the choices I have made and encouraged me to move forward. I owe everything

that I have achieved to my mother who has always been my inspiration. Last, but not the

4

Page 5: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

least, I would like to thank my husband Yash for being such a positive force in my life.

Without him it would have been very difficult to survive the pressure and demand of the

last year.

5

Page 6: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

CHAPTER

1 REVIEW OF SMALL AREA TECHNIQUES . . . . . . . . . . . . . . . . . . . 10

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2 Overview of SAE Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3 Analysis of Discrete data in SAE . . . . . . . . . . . . . . . . . . . . . . . 171.4 Robust Estimation in SAE . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.5 Layout of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 INFLUENCE FUNCTIONS AND ROBUST BAYES AND EMPIRI-CAL BAYES SMALL AREA ESTIMATION . . . . . . . . . . . . . . . . . 22

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.2 Influence Functions And Robust Bayes Estimation . . . . . . . . . . . . . . 242.3 Robust Empirical Bayes Estimation and MSE Evaluation . . . . . . . . . 312.4 MSE Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.5 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 EMPIRICAL BAYES ESTIMATION FOR BIVARIATE BINARY DATAWITH APPLICATIONS TO SMALL AREA ESTIMATION . . . . . . 40

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.2 Bayes And Empirical Bayes Estimators . . . . . . . . . . . . . . . . . . . 403.3 MSE Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.4 Estimation of The MSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.5 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4 HIERARCHICAL BAYES ESTIMATION FOR BIVARIATE BINARYDATA WITH APPLICATIONS TO SMALL AREA ESTIMATION . . 58

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.2 Hierarchical Bayes Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5 SUMMARY AND FUTURE RESEARCH . . . . . . . . . . . . . . . . . . . 74

APPENDIX

6

Page 7: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

A PROOF OF THEOREM 2-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

B PROOFS OF THEOREMS 4-1 AND 4-2 . . . . . . . . . . . . . . . . . . . . . . 85

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7

Page 8: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

LIST OF TABLES

Table page

2-1 Relative biases of the point estimates and relative bias of MSE estimates. . . . 39

3-1 Relative biases of the point estimates and relative bias of MSE estimates forthe small area mean: the first variable. . . . . . . . . . . . . . . . . . . . . . . . 57

3-2 Relative biases of the point estimates and relative bias of MSE estimates forthe small area mean: the second variable. . . . . . . . . . . . . . . . . . . . . . 57

4-1 Bivariate HB estimates along with the Direct estimates and the True values . . 70

4-2 Measures of precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4-3 Posterior variance, correlation and 95% HPDs along with predictive p-values . . 73

8

Page 9: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy

EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH APPLICATIONSTO SMALL AREA ESTIMATION

By

Ananya Roy

August 2007

Chair: Malay GhoshMajor: Statistics

The topic of this dissertation focuses on the formulation of empirical and hierarchical

Bayesian techniques in the context of small area estimation.

In the first part of the dissertation, we consider robust Bayes and empirical Bayes

(EB) procedure for estimation of small area means. We have introduced the notion of

influence functions in the small area context, and have developed robust Bayes and EB

estimators based on such influence functions. We have derived an expression for the

predictive influence function, and based on a standardized version of the same, we have

proposed some new small area estimators. The mean squared errors and estimated mean

squared errors of these estimators have also been found. The findings are validated by a

simulation study.

In the second part, we have considered small area estimation for bivariate binary data

in the presence of covariates. We have developed EB estimators along with their mean

squared errors. In this case the covariates were assumed to be completely observed. In

the presence of missing covariates, we have developed a hierarchical Bayes approach for

bivariate binary responses. Under the assumption that the missing mechanism is missing

at random (MAR), we have suggested methods to estimate the small area means along

with the associated standard errors. Our findings have been supported by appropriate

data analyses.

9

Page 10: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

CHAPTER 1REVIEW OF SMALL AREA TECHNIQUES

1.1 Introduction

The terms “small area” and “local area” are commonly used to denote a small

geographical area, but they may also be used to describe a “small domain”, i.e., a small

subpopulation such as a specific age-sex-race group of people within a large geographical

area. Small area estimation (SAE) is gaining increasing prominence in survey methodology

due to a need for such statistics in both the public and private sectors. The reason behind

its success is that the same survey data, originally targeted towards a higher level of

geography (e.g., states) needs to be used also for producing estimates at a lower level

of aggregation (e.g., counties, subcounties or census tracts). In such cases, the direct

estimates, which are usually based on area-specific sample data, are often unavailable

(e.g., due to zero sample size) and almost always unreliable due to large standard errors

and coefficients of variation arising from the scarcity of samples in individual areas. SAE

techniques provide alternative tools for inference to improve upon the direct estimators.

The use of small area statistics goes back to 11th century England and 17th century

Canada (Brackstone, 1987). However, these statistics were all based either on a census or

on administrative records aiming at complete enumeration.

The problem of SAE is twofold. First is the fundamental issue of producing reliable

estimates of parameters of interest (e.g., means, counts, quantiles etc.) for small domains.

The second problem is to assess the estimation error. In the absence of sufficient samples

from the small areas, the only option is to borrow strength from the neighboring areas i.e.,

those with similar characteristics. In Section 2, we outline the advances in SAE over the

past few decades and some popular works significant to the enhancement of the techniques

in this field.

10

Page 11: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

1.2 Overview of SAE Techniques

As mentioned earlier, one of the fundamental problems in SAE is to find reliable

estimators of the characteristics of interest for a small area. Some of the newer methods

of estimation are the empirical Bayes (EB), Hierarchical Bayes (HB) and empirical best

linear unbiased prediction (EBLUP) procedures which have made significant impact on

small area estimation. In this context, the work of Fay and Herriot (1979) is notable.

Fay and Herriot (1979) suggested the use of the James-Stein Estimator in SAE. Their

objective was to estimate the per capita income (PCI) based on the US Population Census

1970 for several small places, many of which had fewer than 500 people. Earlier, in the

Bureau of the Census, the general formula for estimation of PCI in the current year was

given by product of the PCI values of the previous year and the ratio of administrative

estimate of PCI for the place in 1970 to the derived estimate for PCI in 1969. But this

method had many drawbacks resulting from ignoring the presence of some auxiliary

variables and also due to small coverage for small places, thereby increasing the standard

errors of the direct estimates. The Fay and Herriot model is as follows:

Yi|θiind∼ N(θi, Vi) and θi

ind∼ N(xTi b, A), i = 1, · · · ,m (1–1)

where Vi’s and auxiliary variables xi’s are all known but A is unknown. In this case, let

Y ∗i = xT

i (XTΣ−1X)−1XTΣ−1Y , (1–2)

be the regression estimator of θi’s where Σ = Diag(V1 + A, · · · , Vm + A). Then, the

suggested estimator for the ith small area was given by

θi =A∗

A∗ + Vi

Yi +Vi

A∗ + Vi

Y ∗i . (1–3)

Here A∗ is the estimate of the unknown variance A determined from the equation∑

i(Yi−Y ∗i )2

A+Vi= m − p, where p is the number of auxiliary variables in the regression

model and m > p. A∗ was found by an iterative procedure with the constraint A∗ ≥ 0.

11

Page 12: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

The idea behind proposing this estimator followed from the James-Stein Estimator

which has the property of dominating the sample mean with respect to squared error

loss for m ≥ 3. Under the model (1–1) with Vi = V ∀i and A known, the James-Stein

estimator is given by

θi = (1− ((m− p− 2)V/S))Yi + ((m− p− 2)V/S)Y ∗i , (1–4)

where

S =∑

i

(Yi − Y ∗i )2. (1–5)

We observe here that the (1–3) is just the EB estimator under model (1–1). Following

the argument of Efron and Morris (1971,1972), who remarked that EB estimators such as

(1–4) may perform overall but poorly on individual components in case of departure from

the assumed prior, Fay and Herriot proposed the final EB estimators as:

θ′i = θi if Yi −√

Vi ≤ θi ≤ Yi +√

Vi

= Yi −√

Vi if θi < Yi −√

Vi

= Yi +√

Vi if θi > Yi +√

Vi.

There are other approaches regarding estimation of the unknown A. Prasad and Rao

(1990) proposed the use of the unweighted least squares and the method of moments

approach to estimate the regression coefficients and variance A respectively. Datta and

Lahiri (2000) suggest the use of maximum likelihood and residual maximum likelihood to

estimate the same.

The Fay-Herriot estimator is an example of a composite estimator. A composite

estimator is a weighted average of the direct survey estimator and some regression

estimator, in the presence of auxiliary information. These are called model-based

estimators (e.g., Fay-Herriot estimator). Otherwise, it is a weighted average of the

overall mean and the sample mean for that small area. These are called the design-based

estimators. There has been many proposed aproaches regarding the choice of optimal

12

Page 13: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

weights in the latter case. Some of the works include Schaible (1978) and Drew, Singh,

and Choudhry (1982).

Another kind of estimation in vogue, in SAE, is synthetic estimation. Brackstone

(1987) described synthetic estimates as follows: “An unbiased estimate is obtained from a

sample survey for a large area; when this estimate is used to derive estimates for subareas

under the assumption that the small areas have the same characteristics as the large

area, we identify these estimates as synthetic estimates”. Ghosh and Rao (1994) provided

model-based justifications for some of the synthetic and composite estimators in common

use.

One of the simplest models in common use is the nested error unit level regression

model, employed originally by Battese et al. (1988) for predicting areas under corn

and soybeans in 12 counties in Iowa, USA. They used a variance components model as

suggested by Harter and Fuller (1987) in the context of SAE. They used the model

yij = xTijb + vi + eij, j = 1, · · · , ni, i = 1, · · · , T. (1–6)

where vi and eij are mutually independent error terms with zero means and variances

σ2v and σ2

e respectively. Here ni samples are drawn from areas with Ni observations. The

random term vi represents the joint effect of area characteristics that are not accounted by

the concomitant variables xij’s. Under the model, the small area mean, denoted by θi, is

θi ≡ xTi(p)b + vi (1–7)

where xi(p) = N−1i

∑Ni

j=1 xij. Let ui = vi + eij, u = (u1, · · · , uT )T and E(uuT ) ≡ V =

block diag(V 1, V 2, · · · , V T ),

where

V i = σ2vJ i + σ2

eI i, (1–8)

13

Page 14: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

with J i the square matrix of order ni with every element equal to 1 and I i the identity

matrix of order ni. Then the generalized least-squares estimator of b is given by

b = (XT V −1X)−1XT V −1Y . (1–9)

Then the best linear unbiased predictor (BLUP) for the ith small area mean, when (σ2v , σ

2e)

is known, is given by

θi = xTi b + (yi. − xT

i. b)gi, (1–10)

where xi. = n−1i

∑ni

j=1 xij and gi = σ2v/(σ

2v + n−1

i σ2e).

In practice, however, σ2v and σ2

e are seldom known. A common procedure is to replace

them in the BLUP formula by standard variance components estimates like Maximum

Likelihood Estimators (MLE), Restricted MLE (REML) or Analysis of Variance (ANOVA)

estimators. The resulting predictors are known as the Empirical BLUP (EBLUP) or EB

predictors. Prasad and Rao (1990), among others, provide details. An alternative method

is to find Hierarchical Bayes (HB) predictors by specifying prior distributions for β, σ2v and

σ2e and computing the posterior distributions f(θi|y,X) given all the observations in all

the areas (Datta and Ghosh, 1991). Markov chain Monte Carlo(MCMC) techniques are

used for simulations from the posterior.

As pointed out earlier, another important aspect of SAE, apart from the estimation

of parameters, is the calculation of the error associated with estimation known as mean

squared error (MSE). To this end, we discuss here the work of Prasad and Rao (1990).

Prasad and Rao pointed out that the models of Fay and Herriot (1979) and Battese et al.

(1988) are all special cases of the general mixed linear model

y = Xb + Zv + e (1–11)

where y is the vector of sample observations, X and Z are known matrices, and v

and e are distributed independently with means 0 and covariance matrices G and R,

respectively, depending on some parameters λ, the variance components. Following

14

Page 15: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Henderson (1975),they proposed the best linear unbiased estimator of µ = lT β + mT v as

t(λ,y) = lT β + mT GZT V −1(y −Xβ), (1–12)

where V = Cov(y) = R + ZGZT and β is the GLS estimator of b. They proposed the

estimators for the variance components using Henderson’s methods. Prasad and Rao also

proposed an estimator for the MSE of the estimators. They extended the work of Kackar

and Harville (1984), and approximated the true prediction MSE of the EBLUP under

normality of the error terms. They showed

E[t(λ)− t(λ)]2.= tr[(∇b)(∇bT )E[(λ− λ)(λ− λ)T ], (1–13)

where ∇bT = col1≤j≤p(∂bT /∂λj) and bT = mT GZT V −1. Here λ is the Henderson’s

estimator of λi and λ = max(λ, 0). Hence,

MSE[t(λ)].= MSE[t(λ)] + tr[(∇b)(∇bT )E[(λ− λ)(λ− λ)T ] (1–14)

They provided general conditions under which the precise order of the neglected terms

in the approximation is o(t−1) for large t. These conditions are satisfied by both

Fay-Herriot and the nested error models. For the estimation of the MSE, the models

were considered individually as it is difficult to find general conditions. For example, the

MSE approximation for the Fay and Herriot (1979) model is given as follows:

E[θEBi − θi]

2 = Vi(1−Bi) + B2i x

Ti (XT D−1X)−1xi + E(θEB

i (A)− θEBi (A))2. (1–15)

Prasad and Rao estimated Vi(1 − Bi) by Vi(1 − Bi) and B2i x

Ti (XT D−1X)−1xi by

B2i x

Ti (XT D

−1X)−1xi, where D = diag(V1 + A, · · · , Vm + A). They approximated

E(θEBi (A)− θEB

i (A))2 by 2B2i (Vi + A)−1Var(A). While under normality of the error terms,

Var(A) was approximated as Var(A) = 2m−1[A2 + 2A∑

Vi/m +∑

V 2i /m], correct up to

the second order. Hence, the estimated MSE up to the second order is

Vi(1− Bi) + B2i x

Ti (XT D

−1X)−1xi + 2B2

i (Vi + A)−1Var(A). (1–16)

15

Page 16: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

where, Var(A) = 2m−1[A2 + 2A∑

Vi/m +∑

V 2i /m]. Lahiri and Rao (1995) showed that

the estimator of MSE as given by (1–16) is robust to departures from normality of the

random area effect vi’s, though not the sampling error ei’s.

Datta and Lahiri (2000) extended the results of Prasad and Rao to general linear

mixed models of the form,

Y i = X ib + Zivi + ei, i = 1, · · · ,m, (1–17)

where Y i is the vector of sample observations of order ni, X i and Zi are known matrices,

and vi and ei are distributed independently with means 0 and covariance matrices Gi

and Ri, respectively, depending on some parameters λ, the variance components. In

their paper, Datta and Lahiri first estimated the variances of the BLUP for the case of

λ known. They also developed MSE estimators correct up to the second order, when λ

is estimated by MLE or REML estimator. They also show that MSE approximations for

the ML or REML methods are exactly the same in the second order asymptotic sense.

However, the second order accurate estimator of the MSE based on the latter method

requires less bias correction than the one based on the former method. In the same

thread, Datta et al. (2005) derived second order approximation to the MSE of small area

estimators based on the Fay-Herriot model as given by (1–1), when the model variances

are estimated by the Fay-Herriot iterative method based on weighted residual sum of

squares. They also derived a noninformative prior on the model for which the posterior

variance of a small area mean is second-order unbiased for the MSE.

Many other models have been proposed in the context of SAE. Cressie (1990) and

Ghosh, Natarajan, Stroud, and Carlin (1998) emphasize the use of linear models and

generalized linear models, respectively. Erickson and Kadane (1987) carried out Sensitivity

Analysis for SAE. Pfeffermann and Burck (1990), Rao and Yu (1992) and Ghosh et al.

(1996), studied the extent of possible use of time series and cross-sectional models in SAE.

16

Page 17: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

1.3 Analysis of Discrete data in SAE

The models discussed in the earlier section has mainly been for continuous data. We

devote this section to discussing some models for discrete measurements. Some of the

recent works in SAE has been focussed on the analysis of categorical data. McGibbon and

Tomberlin (1989) considered the model where yij ∼ Bin(1, pij) and

logit(pij) = xTijb + ui; ui ∼ N(0, σ2

u). (1–18)

Here yij’s are conditionally independent. The independent random effects are denoted by

ui’s. The authors assume a diffuse prior for b with σ2u assumed to be known. The joint

posterior distribution of b and {ui, i = 1, · · · ,m} is approximated by the multivariate

normal distribution with mean equal to the true posterior mode and covariance matrix

equal to the inverse information matrix evaluated at the mode. The estimator of pij is

given by

pij = [1 + exp{−(xTij b + ui)}]−1 (1–19)

where b and ui are the respective modes of the posterior. Then the area proportions are

estimated by pi =∑Ni

j=1 pij/Ni for small sampling fractions ni/Ni. For σ2u unknown, the

MLE of σ2u is plugged in, yielding an EB estimator. Taylor expansion is used to obtain a

naive estimator of the variance of the EB estimator. But the drawback of this estimator is

it ignores the variance component resulting from the estimation of σ2u. Farrell, McGibbon,

and Tomberlin (1997) showed that the variance estimator could be improved upon by the

use of parametric Bootstrap procedure.

Some other papers relating to the analysis of binary outcomes are by Jiang and Lahiri

(2001) where they propose a method to find the Best Predictor (BP) of logit(pij) and

use the method of moments to obtain sample estimates of b and ui as given in Eq. 1–18.

Malec et al. (1997) also consider a logistic linear mixed model for Bernoulli outcomes and

use HB approach to derive model dependent predictors.

17

Page 18: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

There has been considerable work done in the context of disease mapping. A simple

model assumes that the observed small area disease counts yi|θiind∼ Poisson(niθi) with

θii.i.d.∼ Gamma(a, b), where θi is the true incidence rate and ni is the number exposed

in the ith small area. Maiti (1998) assumed a log-normal distribution for θi’s. He also

assumed spatial dependence model for the log-transformed incidence rates. Some other

works in this area include papers by Ghosh et al. (1999), Lahiri and Maiti (2002) and

Nandram et al. (1999).

Ghosh et al. (1998) considered Hierarchical Bayes (HB) generalized linear model(GLM)s

for a unified analysis of both discrete and continuous data. The responses yij’s are

assumed to be conditionally independent with pdf

f(yij|θij) = exp(φ−1ij (yijθij − ψ(θij)) + c(yij, θij), (1–20)

(j = 1, · · · , ni, i = 1, · · · ,m), where φ(> 0) are known. The canonical parameters are

modelled as h(θij) = xTijb + ui + eij, where the link function h(.) is a strictly increasing

function. Ghosh et al. used the hierarchical model in the following way:

yij|θij, b, ui, σ2u, σ

2e

ind∼ f

h(θij)|b, ui, σ2u, σ

2e

ind∼ N(xTijb + ui, σ

2e); ui|b, σ2

u, σ2e

ind

N (0, σ2u) (1–21)

b ∼ Uniform(Rk); (σ2u)−1 ∼ Gamma(a, b); (σ2

e)−1 ∼ Gamma(c, d),

where b, σ2u, σ

2e are mutually independent and (a, b, c, d) are fixed parameter values.

MCMC methods were implemented to evaluate the posterior distribution of θij’s or any

functions of them.

Ghosh and Maiti (2004) provided small-area estimates of proportions based on

natural exponential family models with quadratic variance function (QVF) involving

survey weights. In the paper, they considered area level covariates, i.e., the same

covariates for all the units in the same area. They introduced the natural exponential

family quadratic variance function model, which has the same set up as Eq. 1–20 with

18

Page 19: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

θij = θi for all j in the ith area; E(yij|θi) = ψ′(θi) = µi(say) and var = ψ′′(θi)/φi =

V (µi)/φi (say), (i = 1, · · · ,m). Here for the quadratic variance function structure,

V (µi) = v0 + v1µi + v2µ2i with v0, v1, v2 never simultaneously zero. For example, for the

Binomial distribution, v0 = 0, v1 = 1 and v2 = −1, and for the Poisson distribution,

v0 = v2 = 0 and v1 = 1. A conjugate prior was proposed for the canonical parameters in

the model,

π(θi) = exp[λ{miθi − ψ(θi)}]C(λ,mi), (1–22)

where λ is the overdipersion parameter and mi = g(xib). Using results from Morris

(1983), E(µi) = mi and Var(µi) = V (mi)λ−v2

, where λ > max (0, v2),Ghosh and Maiti

provide pseudo BLUPs of the small area means, conditional on the weighted means

yiw =∑m

i=1

∑ni

j=1 wijyij, where wij > 0’s are all jand∑ni

j=1 wij = 1.They eventually

arrived at the pseudo EBLUPs by simultaneously estimating b and λ, using the method

of unbiased estimating functions following Godambe and Thompson (1989). They also

provided approximate formulae for the MSE and their approximate estimators.

1.4 Robust Estimation in SAE

There has been some research in the past on controlling influential observations in

surveys, though not much in the context of SAE. Among others, we refer to Chambers

(1986) and Gwet and Rivest (1992), who used robust estimation techniques specifically in

the context of ratio estimation. Chambers (1986) mainly considered the robust estimation

of a finite population total when the sampled data contains ’representative’ outliers such

that there is no good reason to assume that the nonsampled part of the target population

is free of outliers. Chambers assumed a superpopulation model ξ0, under which the

random variables rI = (yI − βxI)σ−1I are independent and identically distributed with zero

mean and unit variance. Here xI ’s are known real values associated with the population

elements yI ’s, σI = σ2v(xI), β is unknown and σ2 is assumed to be unknown. Under ξ0,

19

Page 20: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

the best linear unbiased estimator of the population total T is given by

TLS = T1 + βLS

∑2

xI ,

where T1 is the total of all the n sampled units ,∑

2 xI indicates the sum of the xI ’s

corresponding to the unsampled populations values, and βLS = {∑1 yIxI/v(xI)}×{∑1 x2

I/v(xI)}−1 is the standard least squares estimates. To make TLS robust to outliers,

Chambers (1986) proposed the estimator

Tn = T1 + bn

∑2

xI +∑

1

uIψ[(yI − bnxI)σ−1I ],

where uI = xIσ−1I (

∑2 xJ)(

∑2 x2

Jσ−2J ). Here, bn is some outlier robust estimator of β

and ψ is a real-valued function. In his paper, Chambers discussed the different choices

of ψ and primarily discussed the bias and variance robustness of Tn under a gross error

model. Based on the asymptotic theory given in the paper, it appears that the variance

robustness of Tn can only be achieved by choosing ψ such that this estimator is no longer

bias robust.

Zaslavsky et al. (2001) used robust estimation techniques to downweight the effect

of clusters, with an application to 1990 post enumeration survey. In the context of SAE,

Datta and Lahiri (1995) developed a robust HB method which was particularly well-suited

when there were one or more outliers in the dataset. They used a class of scale mixtures of

normal distributions on the random effects in the basic area model. Cauchy distribution

was used for the outliers.

We have tried to outline some of the major works significant in the advances of SAE.

Review papers on SAE include Rao (1986), Chaudhuri (1994), Ghosh and Rao (1994), Rao

(1999) and Pfeffermann (2002). There is a book on SAE by Rao (2003).

20

Page 21: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

1.5 Layout of this Dissertation

The first part of this dissertation deals with robust Bayes and empirical Bayes

methods for SAE. In the second part, we propose EB and HB methods in SAE when the

responses are bivariate binary.

In Chapter 2, we have considered the basic area level model and proposed some

new robust small area estimation procedures based on standardized residuals. The

approach consists of first finding influence functions corresponding to each individual area

level observation by measuring the divergence between the two posteriors of regression

coefficients with and without that observation. Next, based on these influence functions,

properly standardized, we have proposed some new robust Bayes and empirical Bayes

(EB) small area estimators. The approximated MSE’s and estimated MSE’s of these

estimators have also been found.

In Chapter 3, we have addressed small area problems when the response is bivariate

binary in the presence of covariates. We have proposed EB estimators of the small area

means by assigning a conjugate prior to the parameters after suitable transformation and

subsequent modelling on the covariates. We have also provided second-order approximate

expressions for the MSE’s and estimated MSE’s of these estimators.

In Chapter 4, we have proposed HB estimators in the bivariate binary setup, where

some of the covariates may be partially missing. We implemented hierarchical Bayesian

models which accomodate this missingness. Estimators of small area means along with the

associated posterior standard errors, and posterior correlations have also been provided.

21

Page 22: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

CHAPTER 2INFLUENCE FUNCTIONS AND ROBUST BAYES AND EMPIRICAL BAYES

SMALL AREA ESTIMATION

2.1 Introduction

As noted in Chapter 1, empirical and hierarchical Bayesian methods are widely used

for small area estimation. One of the key features of such estimation procedures is that

the direct survey estimator for a small area is typically shrunk towards some synthetic or

regression estimator, thereby borrowing strength from similar neighboring areas.

Shrinkage estimators of the above type are quite reasonable if the assumed prior

is approximately true. There may be over - or under - shrinking towards the regression

estimator if the assumed prior is not true. Indeed, there is a standard criticism against

any subjective Bayesian method is that it can perform very poorly in the event of

significant departure from the assumed prior (see e.g., Efron and Morris (1971),Fay

and Herriot (1979)).

Efron and Morris (1971,1972) remarked that Bayes and empirical Bayes estimators

might perform well overall but poorly on individual component. e.g., suppose we assume

the model where

yi|θiind∼ N(θi, 1) and θi

iid∼ N(µ,A), i = 1, · · · ,m. (2–1)

But suppose the assumed prior on θi’s is actually a mixture of several distributions, one

of which is N(µ1, A1), A1 < A. In that case, Efron and Morris (1971) have shown that the

Bayes risk for the component which belongs to the subpopulation can be very high. Hence

the shrinkage estimators which may be good for most of the components can be singularly

inappropriate for a particular component θi. In general, if there is wide departure of the

true prior from the assumed prior, the Bayes risk of the Bayes estimator can be quite high.

Moreover, there are practical problems associated with over- and under-shrinking in small

area estimation problems, especially with widely varying sample sizes. If this happens, the

resulting large area aggregate estimates may differ widely from the direct estimates, which

22

Page 23: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

for all practical purposes, are believed to be quite reliable. This was, for example, pointed

out in Fay and Herriot (1979).

Efron and Morris (1971,1972) proposed limited translation rule (LTR) estimators to

guard against problems of the above type in the context of compound decision problems.

The LTR estimators were obtained with a special choice of what these authors referred to

as “relevance function”. The main purpose of the LTR estimators was to “hedge Bayesian

bets with frequentist caution” (Efron and Morris, 1972).

In this chapter, we have introduced the notion of influence functions in the small area

context, and developed robust EB and HB estimators based on such influence functions.

These influence functions are akin to the relevance functions of Efron and Morris, but are

derived on the basis of a general divergence measure between two distributions. In this

way, we are assigning some theoretical underpinnings to the work of Efron and Morris as

well. Also, unlike the intercept model of Efron and Morris, our results are derived in the

more general regression context, a scenario more suitable for small area estimation.

The importance of influence functions in robust estimation is well-emphasized

in Hampel (1974), Huber (1981), Hampel et al. (1986) and many related papers.

The predominant idea is to detect influential observations in terms of their effects on

parameters, most typically the regression parameters. Johnson and Geisser (1982, 1983,

1985) took instead a predictive point of view for detection of influential observations. In

the process, they developed Bayesian predictive inference function (PIF’s) and applied

these in several univariate and multivariate prediction problems based on regression

models. These predictive influence functions are based on Kullback-Leibler divergence

measures.

Our proposed influence functions are motivated in a Bayesian way based on some

general divergence measures as introduced in Amari (1982) and Cressie and Read (1984).

Included as special cases are the well-known Kullback-Leibler and Hellinger divergence

measures.

23

Page 24: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

The general divergence measure between two normal distributions is derived

in Section 2. Also, in this section, we have applied these results to find influential

observations for certain Bayesian small area estimation regression models. Based on

these influence functions, we have proposed certain robust HB small area estimators. The

Bayes risk of the estimators (also referred to as the MSE) is also derived. Robust EB

estimators have been proposed in Section 3, and second order correct expansions of their

Bayes risks or MSE’s are provided in this section. Estimators of the MSE’s which are

correct up to the second order are derived in Section 4. Section 5 contains a simulation

study which investigates situations where the proposed EB estimators will perform better

than the regular EB estimators.

2.2 Influence Functions And Robust Bayes Estimation

We consider a small area setting where

yi|θiind∼ N(θi, Vi), Vi(> 0) known,

θiind∼ N(xT

i b, A), i = 1, · · · ,m. (2–2)

Here xi’s are the p (< m)- component design vectors. We write XT = (x1, · · · ,xm) and

assume that rank(X) = p. Also, let y = (y1, · · · , ym)T and θ = (θ1, · · · , θm)T . The

above model is often referred to in the literature as the Fay-Herriot (1979) model, where

yi’s are the direct survey estimators of the small area parameters θi, and the xi’s are the

associated covariates.

The Bayes estimator of θi when both b and A (> 0) are known is given by

θBi = (1−Bi)yi + Bix

Ti b, (2–3)

where Bi = Vi

A+Vi, i = 1, · · · ,m. We first begin with a simple EB or HB scenario where b is

unknown but A is known. In this case, the HB estimator of θi with a uniform prior on b is

given by

θHBi = (1−Bi)yi + Bix

Ti b, (2–4)

24

Page 25: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

where b = (XTΣ−1X)−1XTΣ−1y, Σ = Diag(V1 + A, · · · , Vm + A). The same estimator

is obtained as an EB estimator of b when one estimates A instead from the marginal

independent N(xTi b, A + Vi) distributions of the yi’s.

As discussed in the introduction, the above EB (or HB) estimator can often over - or

under - shrink the direct estimators yi towards the regression estimators xTi b. What we

propose now is a robust Bayesian procedure which is intended to guard against such model

misspecification.

To this end, we begin finding the influence of (yi, xi), i = 1, · · · ,m in the context of

estimating the parameter b. The derivation of the influence function is based on a general

divergence measure as introduced in Amari (1982) Cressie and Read (1984). For two

densities f1 and f2, this general divergence measure is given by

Dλ(f1, f2) =1

λ(λ + 1)Ef1

[(f1

f2

− 1

]. (2–5)

The above divergence measure should be interpreted as its limiting value when λ → 0

or λ → −1. We may note that Dλ(f1, f2) is not necessarily symmetric in f1 and f2, but

the symmetry can always be achieved by considering 12[Dλ(f1, f2) + Dλ(f2, f1)]. Also, it

may be noted that as λ → 0, Dλ(f1, f2) → Ef1

[log f1

f2

], while if λ → −1, Dλ(f1, f2) →

Ef2

[log f2

f1

]. These are the two Kullback-Leibler divergence measures. Also D− 1

2(f1, f2) =

4(1 − ∫ √f1f2) = 2H2(f1, f2), where H(f1, f2) = {2(1 − ∫ √

f1f2)}1/2, the Hellinger

divergence measure.

In the present context, we consider the divergence between the densities of b and

b(−i), where b(−i) is the EB (or HB) estimator of b under the assumed model given in Eq.

2–2 with (yi,xi) removed. To this end, we first prove a general divergence result involving

25

Page 26: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

two normal distributions based on the general divergence measure. The result is probably

known, but we include a proof for the sake of completeness.

Theorem 2-1. Let f1 and f2 denote the Np(µ1,Σ1) and N(µ2,Σ2) pdf’s respectively.

Then

Dλ(f1, f2) =1

λ(λ + 1)[exp{λ(λ + 1)

2(µ1 − µ2)

T ((1 + λ)Σ2 − λΣ1)−1(µ1 − µ2)}

× |Σ1|−λ2 |Σ2|−λ−1

2 |(1 + λ)Σ2 − λΣ1| 12 − 1].

Proof:

Dλ(f1, f2)

=1

λ(λ + 1)

× E

[( |Σ2||Σ1|

)λ2

exp{λ

2(X − µ2)

TΣ−12 (X − µ2)−

λ

2(X − µ1)

TΣ−11 (X − µ1)} − 1

],

where X ∼ N(µ1,Σ1). Next we write

(X − µ2)TΣ−1

2 (X − µ2)−1

2(X − µ1)

TΣ−11 (X − µ1)

= (X − µ1 + µ1 − µ2)TΣ−1

2 (X − µ1 + µ1 − µ2)−1

2(X − µ1)

TΣ−11 (X − µ1)

=[Σ− 1

21 (X − µ1 + µ1 − µ2)

]T

Σ121 Σ−1

2 Σ121 Σ

− 12

1 (X − µ1 + µ1 − µ2)

−1

2

[Σ− 1

21 (X − µ1)

]T [Σ− 1

21 (X − µ1)

]

= (Z + φ)T C(Z + φ)− 1

2ZT Z,

26

Page 27: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

where Z = Σ− 1

21 (X − µ1), φ = Σ

− 12

1 (µ1 − µ2) and C = Σ121 Σ−1

2 Σ121 . Noting that

Z ∼ Np(0, Ip), we get

Dλ(f1, f2)

=1

λ(1 + λ)

×[( |Σ2|

|Σ1|)λ

2 1

(2π)p2

∫exp{−1

2ZT (1 + λIp − λC)Z + λφT CZ +

λ

2φT Cφ}dZ − 1

]

=1

λ(1 + λ)

×[( |Σ2|

|Σ1|)λ

2

|(1 + λ)Ip − λC|− 12 exp{λ

2φT (C − λC(1 + λIp − λC)−1C)φ} − 1

]

=1

λ(1 + λ)

( |Σ2||Σ1|

)λ2[exp{λ

2φT (C−1 − λ

1 + λIp)

−1φ}|(1 + λ)Ip − λC|− 12 − 1

], (2–6)

where the final step of Eq. 2–6 requires the standard matrix inversion formula (Rao, 1973,

p 33) (A + BDBT )−1 = A−1 −A−1B(D−1 + BT A−1B)BT A−1 with A = C−1,B = Ip

and D = − λ1+λ

Ip. Again,

φT (C−1 − λ

1 + λIp)

−1φ

= (µ1 − µ2)TΣ

− 12

1

[Σ− 1

21 Σ2Σ

− 12

1 − λ

1 + λΣ− 1

21 Σ2Σ

− 12

1

]−1

Σ− 1

21 (µ1 − µ2)

= (µ1 − µ2)T (Σ2 − λ

1 + λΣ1)

−1(µ1 − µ2)

= (1 + λ)(µ1 − µ2)T [(1 + λ)Σ2 − λΣ1]

−1(µ1 − µ2). (2–7)

Moreover,

|(1 + λ)Ip − λC|− 12 = |(1 + λ)Σ

121 Σ−1

1 Σ121 − λΣ

121 Σ−1

2 Σ− 1

21 |− 1

2

= |Σ1|− 12 |(1 + λ)Σ−1

1 − λΣ−12 |− 1

2

= |Σ1|− 12 |Σ−1

2 (1 + λ)Σ2 − λΣ1)Σ−11 |− 1

2

= |Σ2| 12 |(1 + λ)Σ2 − λΣ1|− 12 .

27

Page 28: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Remark 1. It is clear from the above theorem that if Σ1 and Σ2 are known, the divergence

function Dλ(f1, f2) is one-to-one with (µ1 − µ2)T [(1 + λ)Σ2 − λΣ1]

−1(µ1 − µ2). In the

special case when Σ1 = Σ2 = Σ, the above divergence measure is a one-to-one function of

the Mahalanobis distance (µ1 − µ2)TΣ−1(µ1 − µ2).

We now utilize Theorem 2-1 in deriving the influence of (yi,xi) in the given small

area context. Note that

b|y, X ∼ N [(XTΣ−1X)−1XTΣ−1y, (XTΣ−1X)−1]

and

b|y(−i),X(−i) ∼ N [(XT(−i)Σ

−1(−i)X(−i))

−1XT(−i)Σ

−1(−i)y(−i), (X

T(−i)Σ

−1(−i)X(−i))

−1],

where Σ(−i) is a diagonal matrix similar to Σ with the ith diagonal element (Vi + A)

removed. Now writing µ1 = (XTΣ−1X)−1XTΣ−1y,

µ2 = (XT(−i)Σ

−1(−i)X(−i))

−1XT(−i)Σ

−1(i) y(−i), Σ1 = (XTΣ−1X)−1 and

Σ2 = (XT(−i)Σ

−1(−i)X(−i))

−1, one gets

µ1 − µ2 = (XTΣ−1X)−1XTΣ−1y

− (XTΣ−1X − (Vi + A)−1xixTi )−1(XTΣ−1y − (Vi + A)−1xiyi). (2–8)

By a standard matrix inversion formula (Rao, 1973, p 33),

(XTΣ−1X − (Vi + A)−1xixTi )−1

= (XTΣ−1X)−1 +(XTΣ−1X)−1(Vi + A)−1xix

Ti (XTΣ−1X)−1

1− (Vi + A)−1xTi (XTΣ−1X)−1xi

.(2–9)

From Eq. 2–8 and Eq. 2–9, on simplification,

µ1 − µ2 =(XTΣ−1X)−1(Vi + A)−1xi(yi − xT

i b)

1− (Vi + A)−1xTi (XTΣ−1X)−1xi

. (2–10)

28

Page 29: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Also

(1 + λ)Σ2 − λΣ1 = (XTΣ−1X)−1 + (1 + λ)(XTΣ−1X)−1(Vi + A)−1xix

Ti (XTΣ−1X)−1

1− (Vi + A)−1xTi (XTΣ−1X)−1xi

.

(2–11)

It is easy to see from Eq. 2–10 and Eq. 2–11 that (µ1 −µ2)T [(1 + λ)Σ2 − λΣ1]

−1(µ1 −µ2)

is a quadratic form in yi − xTi b. We propose restricting the amount of shrinking by

controlling the residuals yi − xTi b. However, in order to make these residuals scale-free, we

standardize them as (yi − xTi b)/Di, where

D2i = V (yi − xT

i b) = Vi + A− xTi (XTΣ−1X)−1xi, i = 1, · · · ,m.

Based on these standardized residuals, we propose a robust HB estimator of θi as

θRHBi = yi −BiDiψ

(yi − xT

i b

Di

), i = 1, · · · ,m, (2–12)

where ψ(t) = sgn(t)min(K, |t|).Remark 2. For the special case of the intercept model as considered in Lindley and Smith

(1972) with Vi = 1 and Bi = B = (1 + A)−1 for all i, the above robust EB estimator of θi

reduces to

θRHBi = yi −BDψ

(yi − y

D

), i = 1, · · · ,m,

where D2 = (m− 1)/(mB).

The selection of K is always an open question. Often this is dictated from the user’s

point of view. A somewhat more concrete recommendation can be given from the following

theorem which provides the Bayes risk of robust Bayes estimator given in Eq. 2–12 under

the model Eq. 2–2. Throughout this paper, let Φ denote the standard normal df and φ the

standard normal pdf. .

Theorem 2-2. Consider the model (2–2). Then

E(θRHBi − θi)

2 = g1i(A) + g2i(A) + g3im(A),

29

Page 30: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

where g1i(A) = Vi(1 − Bi), g2i(A) = 2B2i D

2i {(1 + K2)Φ(−K) − Kφ(K)} and g3im(A) =

B2i x

Ti (XTΣ−1X)−1xi, expectation being taken over the joint distribution of y and θ.

Proof:

E(θRHBi − θi)

2 = E(θi − θBi + θB

i − θRHBi )2

= E(θi − θBi )2 + E(θB

i − θRHBi )2

= Vi(1−Bi) + E(θBi − θRHB

i )2. (2–13)

But

θBi − θRHB

i = yi −Bi(yi − xTi b)− yi + Bi(yi − xT

i b)

−BiDi

{yi − xT

i b

Di

− ψ

(yi − xT

i b

Di

)}

= −Bi[xTi (b− b) + Di

{(yi − xT

i b

Di

)− ψ

(yi − xT

i b

Di

)}]. (2–14)

It is easy to check that E{xTi (b − b)}2 = xT

i (XTΣ−1X)−1xi. Also, noting that (yi −xT

i b)/Di ∼ N(0, 1), and is thus distributed symmetrically about zero, one gets

E

{yi − xT

i b

Di

− ψ

(yi − xT

i b

Di

)}2

= E{(Z −K)2I[Z>K] + (Z + K)2I[Z<−K]}

= 2E{(Z −K)2I[Z>K]}

= 2{(1 + K2)Φ(−K)−Kφ(K)}, (2–15)

where Z denotes the N(0, 1) variable. Now by the independence of b and yi − xTi b and the

fact that E(b) = b, one gets the result from Eq. 2–13-Eq. 2–15.

It may be noted that if the assumed model Eq. 2–2 is correct, then the Bayes risk of

the regular EB estimator θEBi is g1i(A) + g3im(A). Hence, the excess Bayes risk of θREB

i

over θEBi under the assumed model is g2i(A) which is strictly decreasing in K, and as

intuitively expected, tends to zero as K → ∞. Hence, setting a value, say, M to g2i(A)

will determine the value of K. Thus, the choice of K will be based on a tradeoff between

30

Page 31: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

protection against model failure and the excess Bayes risk that one is willing to tolerate

when the assumed model is true.

We now proceed to the study of robust EB estimators in the next section.

2.3 Robust Empirical Bayes Estimation and MSE Evaluation

In the previous section, the variance component A (> 0) was assumed to be known .

If, however, A is unknown, we estimate A from the marginal distributions of the yi’s. We

may recall that marginally yiind∼ N(xT

i b, Vi + A). There have been various proposals for

estimating A including method of moments (Fay and Herriot, 1979; Prasad and Rao, 1990)

method of maximum likelihood (ML), or restricted maximum likelihood (REML) (Datta

and Lahiri, 2000, Datta et al., 2005). The analytical results that we are going to obtain

are valid for all these choices, although our numerical results will be based on the ML

estimate of A, which we denote by A. Let Σ = Diag(V1 + A, · · · , Vm + A), Bi = Vi

A+Vi, i =

1, · · · ,m and b = (XT Σ−1

X)−1XT Σ−1

y. Then the robust EB estimators of the θi’s are

given by

θREBi = yi − BiDiψ

(yi − xT

i b

Di

), (2–16)

where ψ is defined in the previous section after Eq. 2–12.

Our objective in this section is to find E(θREBi − θi)

2 correct up to O(m−1), i.e. we

provide an asymptotic expansion of this expectation with an explicit coefficient of m−1,

while the remainder term when multiplied by m converges to zero, subject to certain

assumptions. Here E denotes expectation over the joint distribution of y and θ. We need

certain preliminary results before calculating the Bayes risks .

First we need the asymptotic distribution of A. Based on the marginal distribution of

the yi’s, the likelihood for b and A is given by

L(b, A) = (2π)−m2

m∏1

(Vi + A)−12 exp[−1

2

m∑1

(yi − xTi b)2/(Vi + A)]. (2–17)

31

Page 32: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Accordingly the Fisher information matrix for (b, A) is given by

I(b, A) =

XTΣ−1X 0

0T 12

∑m1 (Vi + A)−2

. (2–18)

Due to the asymptotic independence of b and A, A is asymptotically N(A, 2(∑m

1 (Vi +

A)−2)−1). We now have the following simple but important lemma.

Lemma 2-1. Assume 0 < V∗ ≤ min1≤i≤mVi ≤ max1≤i≤mVi ≤ V ∗ < ∞. Then

E|A− A|r = Oe(m− r

2 ) and E[max1≤i≤m|Bi −Bi|r] = O(m− r2 ) for any arbitrary r > 0.

Here Oe refers to the exact order which means that the expectation is bounded both

below and above by terms such as C1m−r/2 and C2m

−r/2 for large m. This is in contrast

to order O which only requires the upper bound.

A rigorous proof of the lemma requires a uniform integrability argument which we

omit. However, intuitively, E[|A − A|r{∑m1 (Vi + A)−2}r/2] = Oe(1). But m(V ∗ + A)−2≤

∑m1 (Vi + A)−2 ≤ m(V∗ + A)−2. Accordingly,

∑m1 (Vi + A)−2 = Oe(m). This leads to

E|A− A|r = Oe(m− r

2 ).

Next we write |Bi − Bi| = Vi|A − A|/[(Vi + A)(Vi + A)] ≤ |A − A|/(V∗ + A) for all

i = 1, · · · ,m which leads to the desired conclusion. Also, we conclude from the lemma

that

P (|A− A| > ε) = O(m− r2 ), P (max1≤i≤m|Bi −Bi| > ε) = O(m− r

2 ), (2–19)

for any arbitrary ε > 0 and arbitrary large r(> 0) which we require in the sequel. We may

point out that the assumption regarding the boundedness of the Vi’s is required in any

asymptotic Bayes risk (or MSE) calculations in this context as evident, for example, from

Prasad and Rao (1990).

We are now in a position to find an expression for E(θREBi −θi)

2 which is second order

correct, i.e. the bias is of the order o(m−1). The following theorem is proved.

Theorem 2-3. Assume (i) 0 < V∗ ≤ min1≤i≤mVi ≤ max1≤i≤mVi ≤ V ∗ < ∞, and

32

Page 33: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

(ii) max1≤i≤mxiT (XT X)−1xi = O(m−1). Then

E(θREBi − θi)

2 = g1i(A) + g2i(A) + g3im(A) + g4im(A) + o(m−1),

where g1i(A), g2i(A) and g3im(A) are defined in Theorem 2, and

g4im(A) = B2i D

2i (1−Bi)

2[m∑

j=1

(1−Bj)2]−1[2(2K2 − 1)− 4(K2 − 1)Φ(K)− 5Kφ(K)],

The proof of this theorem is deferred to the appendix.

2.4 MSE Estimation

The present section is devoted to obtain an estimator of the approximate MSE

derived in the previous section which is correct up to O(m−1). As noted already , under

the assumptions of Theorem 3, g3im and g4im are all O(m−1). Hence, substitution of A

by A in all these expressions suffices to find terms which are Op(m−1). However, a similar

substitution for g1i or g2i will ignore terms which are Op(m−1). Accordingly, to estimate

g1i or g2i we need further Taylor series expansion.

To this end, we first state the following lemmas.

Lemma 2-2. Assume conditions of Theorem 2-3. Then

E(A− A) = (trΣ−2)−1tr[(XTΣ−1X)−1(XTΣ−2X)] + O(m−2).

Proof. From Cox and Snell (1968), due to assumption (i) of Theorem 3,

E(A− A) = (1/2)I−1(A)[K(A) + 2J(A)] + O(m−2), (2–20)

33

Page 34: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

where I(A) = 12

∑mj=1(Vj + A)−2 = 1

2tr(Σ−2) and K(A) = tr(I−1M 1), J(A) = tr(I−1M 2),

where

M 1 = E

∂3logL∂b21∂A

· · · ∂3logL∂b1∂bp∂A

∂3logL∂b1∂A2

· · · · · · · · · · · ·∂3logL

∂b1∂bp∂A· · · ∂3logL

∂b2p∂A∂3logL∂bp∂A2

∂3logL∂b1∂A2 · · · ∂3logL

∂bp∂A2∂3logL∂A3

=

XTΣ−2X 0

0T 2tr(Σ−3)

, (2–21)

and

M 2 = E

(∂2logL∂b21

)(∂logL∂A

) · · · (∂2logL∂b1∂bp

)(∂logL∂A

) (∂logL∂b1

)(∂2logL∂A2 )

· · · · · · · · · · · ·(∂2logL

∂b1∂bp)(∂logL

∂A) · · · (∂2logL

∂b2p)(∂logL

∂A) (∂logL

∂bp)(∂2logL

∂A2 )

(∂logL∂b1

)(∂2logL∂A2 ) · · · (∂logL

∂bp)(∂2logL

∂A2 ) (∂logL∂A

)(∂2logL∂A2 )

=

0p×p 0p×1

01×p −tr(Σ−3)

. (2–22)

Since

I−1 =

(XTΣ−1X)−1 01

0Tp {tr(Σ−2)}−1

,

K(A) + 2J(A) = tr[(XTΣ−1X)−1(XTΣ−2X) + {2tr(Σ−3)− 2tr(Σ−3)}{tr(Σ−2)}−1]

= tr[(XTΣ−1X)−1(XTΣ−2X)]. (2–23)

The Lemma now follows from Eq. 2–20 and Eq. 2–23.

34

Page 35: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Lemma 2-3. Assume condition (i) of Theorem 2-3. Then

E(Bi −Bi)

= Bi(1−Bi){m∑

j=1

(1−Bj)2}−1[2(1−Bi)

−tr{(m∑1

(1−Bj)xjxTj )−1(

m∑1

(1−Bj)2xjx

Tj )}] + o(m−1)

= g5im + o(m−1) (say).

Proof: By Taylor expansion and Lemma 1,

E(Bi −Bi) = −Vi(Vi + A)−2E[(A− A)− (A− A)2

Vi + A] + o(m−1). (2–24)

Now by Lemma 2, and the fact that E(A− A)2 = (2trΣ−2)−1 + o(m−1), we get

E(Bi −Bi)

= Bi(Vi + A)−1[2

(Vi + A)(trΣ−2)− tr{(XTΣ−1X)−1(XTΣ−2X}

trΣ−2 ] + o(m−1)

= Bi(trΣ−2)−1[2(Vi + A)−2 − (Vi + A)−1tr{(XTΣ−1X)−1(XTΣ−2X)}] + o(m−1)

= Bi[2(1−Bi)2{

m∑j=1

(1−Bj)2}−1

−(1−Bi){m∑

j=1

(1−Bj)2}−1tr{(

m∑1

(1−Bj)xjxTj )−1(

m∑1

(1−Bj)2xjx

Tj )}] + o(m−1)

= Bi(1−Bi){m∑

j=1

(1−Bj)2}−1[2(1−Bi)−

tr{(m∑1

(1−Bj)xjxTj )−1(

m∑1

(1−Bj)2xjx

Tj )}] + o(m−1).

This proves the lemma.

Remark 3. It may be noted that the leading term in E(A − A) is O(m−1) and g5im =

O(m−1).

The lemmas provide approximations to E(A − A) and E(Bi − Bi) both correct up

to O(m−1). Thus, by Lemmas 2 and 3 we get the following theorem which gives us the

estimate of the MSE correct up to the second order.

35

Page 36: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Theorem 2-4. Assume the conditions (i) and (ii) of Theorem 2-3. Then the estimate of

E( ˆθREBi − θi)

2 correct up to O(m−1) is given by

g1i(A) + Vig5im(A) + g2i(A) + g6im(A) + g3im(A) + g4im(A).

where g1i, g2i, g3im and g4im are defined in Theorem 2, and

g5im(A) = Bi(1−Bi){m∑

j=1

(1−Bj)2}−1[2(1−Bi)−tr{(

m∑1

(1−Bj)xjxTj )−1(

m∑1

(1−Bj)2xjx

Tj )}],

g6im(A) = 2B2i [

m∑i=1

(Vj + A)−2]−1tr[{m∑

i=1

(Vj + A)−1xjxTj }−1{

m∑i=1

(Vj + A)−2xjxTj }]

[(1 + K2)Φ(−K)−Kφ(K)].

(2–25)

Proof : In view of Lemma 2-3,

E[Vi(1− Bi)] = g1i(A)− Vig5im(A) + o(m−1) (2–26)

Hence, we estimate g1i(A) by g1i(A) + Vig5im(A).

Next, by one step Taylor approximation,

g2i(A) = g2i(A) + (A− A)∂g2i

∂A.

Hence ,

E[g2i(A)] = g2i(A) + E(A− A)∂g2i

∂A(2–27)

36

Page 37: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Next we calculate

d

dA(B2

i D2i )

=d

dA[V 2

i (Vi + A)−2{Vi + A− xTi (

m∑j=1

(Vj + A)−1xjxTj )−1xi}]

= V 2i [−(Vi + A)−2 + 2(Vi + A)−3xT

i (m∑

j=1

(Vj + A)−1xjxTj )−1xi

+(Vi + A)−2xTi (

m∑j=1

(Vj + A)−1xjxTj )−1(

m∑j=1

(Vj + A)−2xjxTj )(

m∑j=1

(Vj + A)−1xjxTj )−1xi]

= −B2i + O(m−1). (2–28)

Hence, by Lemma 2-2 and assumptions of Theorem 2-3,

E[g2i(A)] =g2i(A)

− 2B2i [

m∑j=1

(Vj + A)−2]−1tr[(m∑

j=1

(Vj + A)−1xjxTj )−1(

m∑j=1

(Vj + A)−2xjxTj )]

× {(1 + K2)Φ(−K)−Kφ(K)}+ O(m−2)

=g2i(A)− g6im(A) + O(m−2).

(2–29)

Thus, we estimate g2i(A) by g2i(A) + g6im(A). The theorem follows.

2.5 Simulation Study

We consider a simulation study to see the effectiveness of the proposed estimators and

compare them with standard empirical Bayes estimators. The layout of the simulation is

as follows:

• Generate θi = µ + ui, i = 1, · · · ,m.• Generate yi = θi + ei, i = 1, · · · ,m.

We take ei ∼ N(0, Vi), where for m = 10,

37

Page 38: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Vi =

.01 i = 1, · · · 4

.1 i = 5, · · · , 7

1 i = 8, · · · , 10

and for m = 20, the numbers of small areas are doubled within each group. We consider

different distributions for generating ui’s. (i) Contaminated normal distribution. In

particular .9×N(0, 1) + .1×N(0, 36) so that 1 of 10 areas is expected to be an outlier. (ii)

Normal distribution N(0, 4.5). Note that 4.5 is the variance of the contaminated normal

distribution. Then we calculate the following quantities to measure the performance of the

SAE. Let θi be the estimate of θi. Here we define the absolute relative bias as

RB(θi) =1

R

R∑r=1

| θ(r)i − θ

(r)i

θ(r)i

|,

and simulated mean squared error as

SMSE(θi) =1

R

R∑r=1

(θ(r)i − θ

(r)i )2,

where R is the number of replicates which is taken as 1000. The notations used in Table

2-1 are as follows:

RB: Relative bias of standard EB estimates; RBR: relative bias of robust EB

estimates; RBI: relative improvement of relative bias of robust estimates over the standard

Bayes estimates; SMSE: empirical MSE of standard EB estimates; SMSER: empirical MSE

of robust EB estimates; RMI: Relative improvment of MSE of robust estimators over the

standard Bayes estimators. REBN: relative bias of the naive MSE estimates under robust

EB method; RI: relative improvement of bias in the estimated MSE over naive estimated

MSE under the robust method. Here the naive MSE estimates are obtained by taking only

the first two terms of MSE estimates. The values corresponding to normal random effects

are reported within parentheses.

38

Page 39: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

It is evident from Table 2-1 the robust EB outperforms the standard EB. The relative

bias in SAE is about 24 % less on an average under the robust method when m=10 and

the distribution of the random effects is a mixture of two normal distributions. This

improvement reduces to 10% when m=20. In the case of normal distribution of the

random effects, the respective improvements are only 10% and 5 % respectively for m=10

and 20. The MSE is about 60% less on an average under robust method when m=10 while

for m=20 its only about 2%. For normal random effects this improvement reduces to 24%

and 2% for m=10 and m=20 respectively. The corrected MSE estimates are at least 5%

less biased than the naive MSE estimates. For m=20 this difference is about 3%. Thus the

bias corrected MSE estimates are useful at least for small samples.

Table 2-1. Relative biases of the point estimates and relative bias of MSE estimates.

K RB RBR RBI SMSE SMSER RMI REBN REB RI

m=10.5 2.743 2.011 .267 .909 .344 .622 -.106 -.087 .055

(1.150) (1.051) (.086) (.460) (.345) (.250) (-.065) (-.050) (.0434)1 - 2.202 .197 - .355 .609 -.154 -.133 .059

(-) (1.067) (.072) (-) (.350) (.239) (-.078) (-.062) (.046)1.5 - 2.067 .246 - .372 .590 -.176 -.153 .062

(-) (0.969) (.157) (-) (.350) (.239) (-.084) (-.066) (.051)

m=20.5 2.581 2.392 .073 .329 .320 .027 -.059 -.050 .028

(.683) (.653) (.059) (.322) (.321) (.003) (-.026) (-.018) (.025)1 - 2.390 .074 - .328 .003 -.090 -.080 .0305

(-) (.677) (.023) (-) (.318) (.012) (-.041) (-.033) (.025)1.5 - 2.120 .179 - .318 .033 -.089 -.078 .035

(-) (.649) (.064) (-) (.310) (.037) (-.081) (-.074) (.023)

39

Page 40: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

CHAPTER 3EMPIRICAL BAYES ESTIMATION FOR BIVARIATE BINARY DATA WITH

APPLICATIONS TO SMALL AREA ESTIMATION

3.1 Introduction

Analysis of multivariate binary data is required in many areas of statistics. For

example, in longitudinal data analysis, quite often one encounters binary responses for

two or more variables (Bishop et al. (1975); Zhao and Prentice (1990)). Bivariate binary

data arise very naturally in opthalmological studies. Item response theory, which occupies

a central role in psychometry and educational statistics, deals almost exclusively with

multivariate binary data. The classic example is the Rasch model with its many and

varied extensions.

Our interest in the analysis of multivariate binary data stems from their potential

application in small area estimation. We consider the case when the response is bivariate

binary. In Section 2, we have first derived the Bayes estimators, and subsequently the EB

estimators for the small area parameters. Section 3 is devoted to the derivation of the

MSE’s of Bayes estimators, and second order correct approximations for the MSE’s of the

EB estimators. Section 4 is devoted to the second order correct estimators of the MSE’s

of the EB estimators. We have conducted a simulation study in Section 5 to illustrate our

method.

3.2 Bayes And Empirical Bayes Estimators

Let y = (yij1, yij2)T (j = 1, · · · , ni; i = 1, · · · , k) denote the bivariate binary response

for the jth unit in the ith small area. Thus each ylij (l = 1 or 2) assumes the values 0 or 1.

The joint probability function for y is given by

p(yij1, yij2) =exp(φij1yij1 + φij2yij2 + φij3yij1yij2)

1 + exp(φij1) + exp(φij2) + exp(φij1 + φij2 + φij3). (3–1)

With the reparametrization

p1ij =exp(φij1)

1 + exp(φij1) + exp(φij2) + exp(φij1 + φij2 + φij3)= P (yij1 = 1, yij2 = 0),

40

Page 41: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

pij2 =exp(φij2)

1 + exp(φij1) + exp(φij2) + exp(φij1 + φij2 + φij3)= P (yij1 = 0, yij2 = 1),

and

pij3 =exp(φij1 + φij2 + φij3)

1 + exp(φij1) + exp(φij2) + exp(φij1 + φij2 + φij3)= P (yij1 = 1, yij2 = 1),

we can rewrite the model given in Eq. 3–1 as

p(yij1, yij2) = pyij1(1−yij2)ij1 p

yij2(1−yij1)ij2 p

yij1yij2

ij3 p(1−yij1)(1−yij2)4ij , (3–2)

where p4ij = 1 −∑3l=1 plij. Let θ1ij = pij1 + pij3, θ2ij = pij2 + pij3. Our primary interest

is simultaneous estimation of the small area means (θ1i, θ2i)T , where θ1i =

∑ni

j=1 wijθ1ij,

θ2i =∑ni

j=1 wijθ2ij,∑ni

j=1 wij = 1 for each i. Here the weights wij’s are assumed to be

known, based on some given sampling scheme. We begin with the likelihood

L =k∏

i=1

ni∏j=1

p(y1ij, y2ij). (3–3)

Writing pT = (pT11, · · · ,pT

1n1, · · · , pT

k1, · · · ,pTknk

), the conjugate Dirichlet prior is given by

π(p) ∝k∏

i=1

ni∏j=1

[pλm1ij−1ij1 p

λm2ij−1ij2 p

λm3ij−1ij3 p

λm4ij−14ij ], (3–4)

where λ is the dispersion parameter, m4ij = 1 −∑3l=1 mlij, and the mlij are functions of

the covariates xij associated with the binary vector y. In particular, we take

mlij = exp(xTijbl)/[1 + exp(xT

ijb1) + exp(xTijb2) + exp(xT

ijb3)], (l = 1, 2, 3) (3–5)

where b1, b2, b3 are the regression coefficients. Writing z1ij = yij1(1 − yij2),z2ij =

yij2(1− yij1), z3ij = yij1yij2 and z4ij = 1−∑3l=1 zlij, the posterior for p is now given by

π(p|Y ) ∝k∏

i=1

ni∏j=1

[pz1ij+λm1ij−1ij1 p

z2ij+λm2ij−1ij2 p

z3ij+λm3ij−1ij3 p

z4ij+λm4ij−14ij ]. (3–6)

41

Page 42: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Before proceeding further, we recall a few facts for the Dirichlet distribution. Suppose

u1, · · · , ur have a Dirichlet pdf given by

f(u1, · · · , ur) ∝ (r∏

l=1

uαl−1l )(1−

r∑

l=1

ul)αr+1−1,

where αl > 0 (l = 1, · · · , r+1). Then E(uj) = αj/∑r+1

l=1 αl, V(uj) = αj(∑

l( 6=j) αl)/[(∑r+1

l=1 αl)2

(∑r+1

l=1 αl + 1)], Cov(uj, uj′) = −αjαj′/[(∑r+1

l=1 αl)2(

∑r+1l=1 αl + 1)], (1 ≤ j 6= j′ ≤ r). Hence

from Eq. 3–6, the Bayes estimator(posterior mean) of plij is given by

pBlij = (zlij + λmlij(b))/(λ + 1); j = 1, · · · , ni, i = 1, · · · ,m. (3–7)

The resulting estimators for θ1ij and θ2ij are

θB1ij = pB

1ij + pB3ij, θB

2ij = pB2ij + pB

3ij. (3–8)

In an EB scenario, the parameter b1, b2 and b3 are all unknown and need to be

estimated from the marginal distributions of all the y’s. We assume that λ is known,

and outline later its adaptive estimation. Direct estimation of λ is impossible from the

marginal distributions of the y’s since marginally P (yij1 = 1, yij2 = 0) = E(z1ij) =

m1ij, P (yij1 = 0, yij2 = 1) = E(z2ij) = m2ij and P (yij1 = 1, yij2 = 1) = E(z3ij) = m3ij,

and the joint marginal of the y is uniquely determined from these probabilities resulting in

nonidentifiability in λ.

One way to estimate b1, b2, b3 from the marginal Dirichlet-Multinomial distribution

of the y’s is the method of maximum likelihood. But the MLE’s are mathematically

intractable, and are not readily implementable in practice. Instead, we appeal to the

theory of optimal unbiased estimating functions as proposed in Godambe and Thompson

(1989).

42

Page 43: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

To this end let g1ij = z1ij −m1ij, g2ij = z2ij −m2ij and g3ij = z3ij −m3ij. Then writing

zij = (z1ij, z2ij, z3ij)T ,Σij = V (zij),

DTij =

−E(∂g1ij

∂b1) −E(

∂g2ij

∂b1) −E(

∂g3ij

∂b1)

−E(∂g1ij

∂b2) −E(

∂g2ij

∂b2) −E(

∂g3ij

∂b2)

−E(∂g1ij

∂b3) −E(

∂g2ij

∂b3) −E(

∂g3ij

∂b3)

, (3–9)

the optimal estimating equations are given by∑k

i=1

∑ni

j=1 DTijΣ

−1ij gij = 0. Solving these

equations, we find the estimators of b1, b2 and b3. We first observe that

Σij = Diag(m1ij,m2ij,m3ij)−

m1ij

m2ij

m3ij

(m1ij m2ij m3ij

). (3–10)

Again from Eq. 3–9, DTij = Σij ⊗ xij, where ⊗ denotes the Kronecker product. Hence, the

equations∑k

i=1

∑ni

j=1 DTijΣ

−1ij gij = 0 can be rewritten as

∑ki=1

∑ni

j=1[I3⊗xij]gij = 0 which

further simplifies into

k∑i=1

ni∑j=1

xijzlij =k∑

i=1

ni∑j=1

xijmlij (l = 1, 2, 3). (3–11)

Solving Eq. 3–11, one estimates b1, b2 and b3. The EB estimators of plij (l = 1, 2, 3) are

now given by

pEBlij = (zlij + λmlij(b))/(λ + 1), (3–12)

j = 1, · · · , ni, i = 1, · · · , k; l = 1, 2, 3, where bT

= (bT

1 , bT

2 , bT

3 ). Then the EB estimators of

θ1ij and θ2ij are given by

θEB1ij = pEB

1ij + pEB3ij , θEB

2ij = pEB2ij + pEB

3ij . (3–13)

We write ˆθBi = (ˆθB

1i,ˆθB2i)

T , ˆθEBi = (ˆθEB

1i , ˆθEB2i )T , where ˆθB

ri =∑ni

j=1 wij θBrij,

ˆθEBri =

∑ni

j=1 wij θEBrij , (r = 1, 2; i = 1, · · · , k).

43

Page 44: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

3.3 MSE Approximation

In this section we find the MSE matrix of ˆθBi = (ˆθB

1i,ˆθB2i)

T , and also of ˆθEBi =

(ˆθEB1i , ˆθEB

2i )T which is second order correct, i.e. correct up to O(k−1).

Theorem 3-1. Let Uk =∑k

i=1

∑ni

j=1[Σij ⊗ xijxTij] = Oe(k). Then

MSE(ˆθEBi ) = λ(λ + 1)−2Ai(b) + λ2(1 + λ)−2Bi(b) + O(k−1),

where

Ai(b) =

ni∑j=1

w2ij

(m1ij + m3ij)(1−m1ij −m3ij) m3ij − (m1ij + m3ij)(m2ij + m3ij)

m3ij − (m1ij + m3ij)(m2ij + m3ij) (m2ij + m3ij)(1−m2ij −m3ij)

.

and,

Bi(b) =

1 0 1

0 1 1

[

ni∑j=1

wij∂mij(b)

∂b]T U−1

k [

ni∑j=1

wij∂mij(b)

∂b]

1 0

0 1

1 1

.

Proof:

Note that for i = 1, · · · , k,

MSE(ˆθEBi ) = E[(ˆθEB

i − θi)(ˆθEB

i − θi)T ]

= E[(ˆθBi − θi + ˆθEB

i − ˆθBi )(ˆθB

i − θi + ˆθEBi − ˆθB

i )T ]

= E[(ˆθBi − θi)(

ˆθBi − θi)

T ] + E[(ˆθEBi − ˆθB

i )(ˆθEBi − ˆθB

i )T ]

= MSE(ˆθBi ) + E[(ˆθEB

i − ˆθBi )(ˆθEB

i − ˆθBi )T ]. (3–14)

We first observe that for j 6= j′, 1 ≤ l, l′ ≤ 3,

E(pBlij − plij)(p

Bl′ij′ − pl′ij′) = E[{(λ + 1)−1(zlij − plij)− λ(λ + 1)−1(plij −mlij)}

×{(λ + 1)−1(zl′ij′ − pl′ij′)− λ(λ + 1)−1(pl′ij′ −ml′ij′)}]

= 0,

44

Page 45: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

since (zlij, plij) is distributed independently of (zl′ij′ , pl′) for j 6= j′and all 1 ≤ l, l′ ≤ 3.

Hence,

E[(ˆθBri − θri)(

ˆθBsi − θsi)] =

ni∑j=1

w2ijE(θB

rij − θrij)(θBsij − θsij), (3–15)

for all r, s = 1, 2. Thus,

MSE(ˆθBi ) = E[(ˆθB

i − θi)(ˆθB

i − θi)T ]

=

ni∑j=1

w2ijE

(θB1ij − θ1ij)

2 (θB1ij − θ1ij)(θ

B2ij − θ2ij)

(θB1ij − θ1ij)(θ

B2ij − θ2ij) (θB

2ij − θ2ij)2

=

ni∑j=1

w2ij

1 0 1

0 1 1

E[(pB

ij − pij)(pBij − pij)

T ]

1 0

0 1

1 1

. (3–16)

Now for each l = 1, 2, 3,

E(pBlij − plij)

2 = E[(λ + 1)−1(zlij − plij)− λ(λ + 1)−1(plij −mlij)]2

= (λ + 1)−2E[zlij − plij − λ(plij −mlij)]2. (3–17)

First conditioning on plij, it follows from Eq. 3–17 that

E(pBlij − plij)

2 = (λ + 1)−2E[plij(1− plij) + λ2(plij −mlij)2]

= (λ + 1)−2[mlij −m2lij −

mlij(1−mlij)

λ + 1+ λ2mlij(1−mlij)

λ + 1]

= λ(λ + 1)−2mlij(1−mlij). (3–18)

45

Page 46: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Again for 1 ≤ l 6= l′ ≤ 3,

E(pBlij − plij)(p

Bl′ij − pl′ij) = E[{(λ + 1)−1(zlij − plij)− λ(λ + 1)−1(plij −mlij)}

×{(λ + 1)−1(zl′ij − pl′ij)− λ(λ + 1)−1(pl′ij −ml′ij)}]

= (λ + 1)−2E[−plijpl + λ2Cov(plij, pl′ij)]

= (λ + 1)−2E[−λmlijml′ij

λ + 1− λ2mlijml′ij

λ + 1]

= −λ(λ + 1)−2mlijml′ij. (3–19)

Hence,

MSE(pBij) = λ(λ + 1)−2[Diag(m1ij,m2ij,m3ij)−

m1ij

m2ij

m3ij

(m1ij m2ij m3ij

)]. (3–20)

It follows from Eq. 3–16 and Eq. 3–20 that

MSE(ˆθBi ) = λ(λ + 1)−2

×ni∑

j=1

w2ij

(m1ij + m3ij)(1−m1ij −m3ij) m3ij − (m1ij + m3ij)(m2ij + m3ij)

m3ij − (m1ij + m3ij)(m2ij + m3ij) (m2ij + m3ij)(1−m2ij −m3ij)

.

(3–21)

Next we observe that

E[(ˆθEBi − ˆθB

i )(ˆθEBi − ˆθB

i )T ]

=∑∑

1≤j,j′≤ni

wijwij′E

(θEB

1ij − θB

1ij)(θEB

1ij′ − θB

1ij′) (θEB

1ij − θB

1ij)(θEB

2ij − θB

2ij′)

(θEB

2ij − θB

2ij)(θEB

1ij′ − θB

1ij′) (θEB

2ij − θB

2ij)(θEB

2ij′ − θB

2ij′)

=∑∑

1≤j,j′≤ni

wijwij′

1 0 1

0 1 1

E[(pEB

ij − pBij)(p

EBij′ − pB

ij′)T ]

1 0

0 1

1 1

. (3–22)

46

Page 47: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Writing mij(b) ≡ mij = (m1ij,m2ij, m3ij)T ,

E[(pEBij −pB

ij)(pEBij′ −pB

ij′)T ] = λ2(1+λ)−2E[{mij(b)−mij(b)}{mij′(b)−mij′(b)}T ]. (3–23)

By one-step Taylor expansion, for l = 1, 2, 3,

mlij(b).= mlij(b) + (

∂mlij(b)

∂b)T (b− b). (3–24)

Hence for 1 ≤ l 6= l′ ≤ 3,

E[{mlij(b)−mlij(b)}][{ml′ij(b)−ml′ij(b)}] .=

[∂mlij(b)

∂b

]T

E[(b− b)(b− b)T ][∂mlij(b)

∂b

],

(3–25)

Next we find an approximation to E[(b− b)(b− b)T ] which is correct up to O(k−1).

To this end, let Sk(b) =∑k

i=1

∑ni

j=1 DTijΣ

−1ij gij. Noting that E(gij) = 0,

E[−∂Sk(b)

∂b] =

k∑i=1

ni∑j=1

E[DTijΣ

−1ij Dij +

∂b(DT

ijΣ−1ij )gij]

=k∑

i=1

ni∑j=1

DTijΣ

−1ij Dij =

k∑i=1

ni∑j=1

[Σij ⊗ xijxTij]. (3–26)

We denote by U k the expression in the rightmost side of Eq. 3–26. We assume that

U k = O(k). This leads to the approximation

E[(b− b)(b− b)T ] = U−1k U kU

−1k = U−1

k = O(k−1). (3–27)

Now

E[(pEBij − pB

ij)(pEBij′ − pB

ij′)T ]

.= V ijj′ ,

47

Page 48: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

where

V ijj′ = λ2(1 + λ)−2

(∂m1ij(b)

∂b)T U−1

k (∂m1ij′ (b)

∂b) (

∂m1ij(b)

∂b)T U−1

k (∂m2ij′ (b)

∂b) (

∂m1ij(b)

∂b)T U−1

k (∂m3ij′ (b)

∂b)

(∂m2ij(b)

∂b)T U−1

k (∂m1ij′ (b)

∂b) (

∂m2ij(b)

∂b)T U−1

k (∂m2ij′ (b)

∂b) (

∂m2ij(b)

∂b)T U−1

k (∂m3ij′ (b)

∂b)

(∂m3ij(b)

∂b)T U−1

k (∂m1ij′ (b)

∂b) (

∂m3ij(b)

∂b)T U−1

k (∂m2ij′ (b)

∂b) (

∂m3ij(b)

∂b)T U−1

k (∂m3ij′ (b)

∂b)

,

(3–28)

where

∂m1ij(b)/∂b =

[m1ij(1−m1ij) −m1ijm2ij −m1ijm3ij

]T

⊗ xij; (3–29)

∂m2ij(b)/∂b =

[−m1ijm2ij m2ij(1−m2ij) −m2ijm3ij

]T

⊗ xij; (3–30)

∂m3ij(b)/∂b =

[−m1ijm3ij −m2ijm3ij m3ij(1−m3ij)

]T

⊗ xij. (3–31)

By Eq. 3–23-Eq. 3–30,

E[(ˆθEBi − ˆθB

i )(ˆθEBi − ˆθB

i )T ]

=λ2

(1 + λ)2

1 0 1

0 1 1

[

ni∑j=1

wij∂mij(b)

∂b]T U−1

k [

ni∑j=1

wij∂mij(b)

∂b]

1 0

0 1

1 1

.(3–32)

Hence,

MSE(ˆθEBi ) = MSE(ˆθB

i )

+λ2

(1 + λ)2

1 0 1

0 1 1

[

ni∑j=1

wij∂mij(b)

∂b]T U−1

k [

ni∑j=1

wij∂mij(b)

∂b]

1 0

0 1

1 1

.(3–33)

The theorem follows.

48

Page 49: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

3.4 Estimation of The MSE

In this section, we find an approximation of the expression given in the right hand

side of Eq. 3–33 which is second order correct, i.e. the first neglected term is o(k−1). We

may note that the second order term in the right hand side of Eq. 3–33 is O(k−1). Hence,

substitution of b for b estimates the same correctly up to O(k−1). However, estimation of

MSE(ˆθBi ) with the simple substitution of b for b is not adequate since the first neglected

term is O(k−1). Hence, we need to estimate MSE(ˆθBi ).

To this end, we begin with the evaluation of E[m1ij(b)(1 − m1ij(b))]. By two-step

Taylor expansion,

m1ij(b).= m1ij(b) + (

∂m1ij(b)

∂b)T (b− b) +

1

2(b− b)T ∂2m1ij(b)

∂b∂bT(b− b). (3–34)

Hence,

E[m1ij(b)(1−m1ij(b))]

.= E[{m1ij(b) + (

∂m1ij(b)

∂b)T (b− b) +

1

2(b− b)T ∂2m1ij(b)

∂b∂bT(b− b)}

× {1−m1ij(b)− (∂m1ij(b)

∂b)T (b− b)− 1

2(b− b)T ∂2m1ij(b)

∂b∂bT(b− b)}]

.= m1ij(b)[1−m1ij(b)] + (1− 2m1ij(b))(

∂m1ij(b)

∂b)T E(b− b)

+ tr[{1

2(1− 2m1ij(b))

∂2m1ij(b)

∂b∂bT− (

∂m1ij(b)

∂b)(

∂m1ij(b)

∂b)T}E(b− b)(b− b)T ].

(3–35)

49

Page 50: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

and this approximation is correct up to O(k−1). The expression for ∂m1ij(b)/∂b is given in

Eq. 3–29. Based on that expression, one gets

∂2m1ij

∂b∂bT=

m1ij(1−m1ij)(1− 2m1ij) −m1ijm2ij(1− 2m1ij) −m1ijm3ij(1− 2m1ij)

−m1ijm2ij(1− 2m1ij) −m1ijm2ij(1− 2m2ij) 2m1ijm2ijm3ij

−m1ijm3ij(1− 2m1ij) 2m1ijm2ijm3ij −m1ijm3ij(1− 2m3ij)

⊗ xijx

Tij.

(3–36)

We need to find E(b− b). We follow Cox and Snell (1968) to find this expectation.

To this end, we first note that ∂Sk/∂b = ∂

∂b[∑k

i=1

∑ni

j=1{I3 ⊗ xij}

z1ij −m1ij

z2ij −m2ij

z3ij −m3ij

] is

a constant. Hence, since E(Sk) = 0 the expectation of the product of any element of Sk

with any element of ∂Sk/∂b is zero. Thus denoting J = (Jlu,rs,mn) = (E(Sk,lu,∂Sk,rs

∂bmn)), we

have Jlu,rs,mn = 0 for all l, r,m = 1, 2, 3 and u, s, n = 1, · · · , p. Accordingly, from Cox and

50

Page 51: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Snell(1968),

E(b− b) =1

2U−1

k

tr(U−1k K11)

...

tr(U−1k K1p)

...

tr(U−1k K31)

...

tr(U−1k K3p)

+ 2

tr(U−1k J11)

...

tr(U−1k J1p)

...

tr(U−1k J31)

...

tr(U−1k J3p)

=1

2U−1

k

tr(U−1k K11)

...

tr(U−1k K1p)

...

tr(U−1k K31)

...

tr(U−1k K3p)

, (3–37)

where

K lu =

∂2

∂b1∂bT

1

∂2

∂b1∂bT

2

∂2

∂b1∂bT

3

∂2

∂b2∂bT

1

∂2

∂b2∂bT

2

∂2

∂b2∂bT

3

∂2

∂b3∂bT

1

∂2

∂b3∂bT

2

∂2

∂b3∂bT

3

(k∑

i=1

ni∑i=1

(zlij −mlij)xiju), (3–38)

51

Page 52: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

l = 1, 2, 3, u = 1, · · · , p. In particular, we have

∂2

∂bl∂bTl

[k∑

i=1

ni∑j=1

(zlij −mlij)xiju] = −k∑

i=1

ni∑i=1

mlij(1−mlij)(1− 2mlij)xijxTijxiju,

(3–39)

∂2

∂bl∂bTl′

[k∑

i=1

ni∑j=1

(zlij −mlij)xiju] =∂2

∂bl′∂bTl

k∑i=1

ni∑j=1

(zlij −mlij)xiju

=k∑

i=1

ni∑j=1

mlijml′ij(1− 2mlij)xijxTijxiju, (3–40)

∂2

∂bl′∂bTl′

[k∑

i=1

ni∑j=1

(zlij −mlij)xiju] =k∑

i=1

ni∑j=1

mlijml(1− 2ml′ij)xijxTijxiju, (3–41)

∂2

∂bl′∂bTl′′

[k∑

i=1

ni∑j=1

(zlij −mlij)xiju] = −2k∑

i=1

ni∑j=1

mlijml′ijml′′ij)xijxTijxiju, (3–42)

where 1 ≤ l 6= l′ 6= l′′ ≤ 3, 1 ≤ u ≤ p. Thus E(b− b) = O(k−1).

From Eq. 3–39-Eq. 3–42, one obtains the elements of the matrices K lu for all

l = 1, 2, 3 and u = 1, · · · , p. Thus from Eq. 3–36-Eq. 3–42, the bias-corrected estimate of

m1ij(b)(1−m1ij(b)) is given by

m1ij(b)[1−m1ij(b)]− 1

2(1− 2m1ij(b))(

∂m1ij(b)

∂b|b=

ˆb)T U−1

k (b)

tr(U−1k (b)K11(b))

...

tr(U−1k (b)K3p(b))

−tr[{1

2(1− 2m1ij(b))

∂2m1ij(b)

∂b∂bT|b=

ˆb− (

∂m1ij(b)

∂b|b=

ˆb)(

∂m1ij(b)

∂b|b=

ˆb)T}U−1

k ].

52

Page 53: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Similarly, one finds bias-corrected estimates for m2ij(b)(1 − m2ij(b)) and m3ij(b)(1 −m3ij(b)). Next we calculate

E[m1ij(b)m2ij(b)]

= E[{m1ij(b) + (∂m1ij(b)

∂b)T (b− b) +

1

2(b− b)T ∂2m1ij(b)

∂b∂bT(b− b)}

× {m2ij(b) + (∂m2ij(b)

∂b)T (b− b) +

1

2(b− b)T ∂2m2ij(b)

∂b∂bT(b− b)}]

= m1ij(b)m2ij(b) + [m1ij(b)∂m2ij(b)

∂b+ m2ij(b)

∂m1ijb

∂b]T E(b− b)

+ tr[{1

2{m1ij(b)

∂2m2ij(b)

∂b∂bT+ m2ij(b)

∂2m1ij(b)

∂b∂bT}+ (

∂m1ij(b)

∂b)(

∂m1ij(b)

∂b)T}

× E{(b− b)(b− b)T}].

Hence we estimate m1ij(b)m2ij(b) by

m1ij(b)m2ij(b)

− [m1ij(b)∂m2ij(b)

∂b|b=

ˆb+ m2ij(b)

∂m1ijb

∂b|b=

ˆb]T

1

2U−1

k (b)

tr(U−1k (b)K11(b))

...

tr(U−1k (b)K3p(b))

− tr[{1

2{m1ij(b)

∂2m2ij(b)

∂b∂bT|b=

ˆb+ m2ij(b)

∂2m1ij(b)

∂b∂bT|b=

ˆb}

+ (∂m1ij(b)

∂b|b=

ˆb)(

∂m1ij(b)

∂b|b=

ˆb)T}U−1

k ].

Similarly, one finds bias-corrected estimates of m1ij(b)m3ij(b) and m2ij(b)m3ij(b). Hence

we get the following theorem.

Theorem 3-2. Assume Uk = Oe(k),then MSE(ˆθEBi ) can be estimated, correct up to

O(k−1), by

λ(λ + 1)−2Ci(b) + λ2(1 + λ)−2Bi(b), (3–43)

where

Ci(b) =

ni∑j=1

w2ij

1 0 1

0 1 1

Dij(b)

1 0

0 1

1 1

.

53

Page 54: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

with

(Dij(b)p,q = mpij(b)[1−mpij(b)]

−1

2(1− 2mpij(b))(

∂mpij(b)

∂b|b=

ˆb)T U−1

k (b)

tr(U−1k (b)K11(b))

...

tr(U−1k (b)K3p(b))

−tr[{1

2(1− 2mpij(b))

∂2mpij(b)

∂b∂bT|b=

ˆb− (

∂mpij(b)

∂b|b=

ˆb)(

∂mpij(b)

∂b|b=

ˆb)T}U−1

k ],

for p = q = 1, 2, 3,

= mpij(b)mqij(b)

−[mpij(b)∂mqij(b)

∂b|b=

ˆb+ mqij(b)

∂mpijb

∂b|b=

ˆb]T

×1

2U−1

k (b)

tr(U−1k (b)K11(b))

...

tr(U−1k (b)K3p(b))

−tr[{1

2{mpij(b)

∂2mqij(b)

∂b∂bT|b=

ˆb+ mqij(b)

∂2mpij(b)

∂b∂bT|b=

ˆb}

+(∂mpij(b)

∂b|b=

ˆb)(

∂mqij(b)

∂b|b=

ˆb)T}U−1

k ], for p 6= q = 1, 2, 3.

Remark. As mentioned in Section 2, direct estimation of the dispersion parameter λ is

impossible due to its nonidentifiability. It may appear that an adaptive estimator of λ is

obtained by minimizing some suitable real-valued function of the approximate estimated

MSE matrix with respect to λ. For example, we can try to minimize the trace of the

estimated MSE matrix. Writing the trace of the matrix given in Eq. 3–43 as g(λ), we have

g(λ) =k∑

i=1

[λ(λ + 1)−2tr(Ci(b)) + λ2(1 + λ)−2tr(Bi(b))].

Obviously, g(λ) attains its minimum 0 atλ = 0. But λ = 0 gives only the direct estimator,

which we want to avoid for small area estimation. Also, solving g′(λ) = 0 subject to the

54

Page 55: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

constraint λ > 0, one gets

λ = max[

∑ki=1 tr(Ci(b))∑k

i=1 tr(Ci(b))− 2∑k

i=1 tr(Bi(b)), 0].

A plot of g(λ) reveals that the function increases from 0 to λ and then decreases from

λ onwards. Further, since∑k

i=1 tr(Ci(b)) = O(k) and∑k

i=1 tr(Bi(b)) = O(1), for large k,

λ is close to 1. This is also revealed in our data analysis given in the next section. Due to

the same condition, the dominating term in the estimated MSE is the first term, namely

λ(λ+1)−2∑k

i=1 tr(Ci(b)). Since λ(λ+1)−2 = λ−1(λ−1 +1)−2, the dominating first term has

the same expression for λ and λ−1. Hence, the minimum is attained only for very small

or very large values of λ for most practical applications. Thus, the choice of λ reflects a

trade-off between the direct and the regression estimate, and practitioner should be guided

by the amount of bias that he/she is willing to tolerate.

3.5 Simulation Study

In this section, we will discuss a simulation study to compare the performance

of the EB estimators with the usual sample mean. Here, we consider k = 10 and

k = 20 with ni = 5 ∀ i. To start with, we generated the covariates from a N(0, 1)

distribution and kept them fixed for the rest of the analysis. We generated zlij’s from

a multinomial (p1ij, p2ij, p3ij, p4ij) distribution where the plij’s are simulated from a

dirichlet(λm1ij, λm2ij, λm3ij, λm4ij) distribution to replicate the bivariate binary model

described in Section 2. Here the mlij’s were modelled on the covariates through the

multinomial link. For our simulation, we assumed b1 = (0, 1)′, b2 = (0, 2)′ and

b3 = (0, 1)′. Our parameters of interest were the small area means θr = 1ni

∑ni

j=1(prij +

p3ij), r = 1, 2. We used the optim function in R to estimate the parameters. We calculate

the following summary measures to examine the performance of the estimators. Let θ(r)i

be the estimate of θ(r)i for the ith small area at the Rth simulation. Then we define the

55

Page 56: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Average Relative Deviation (ARD) as

ARD =1

kR

k∑i=1

R∑r=1

| θ(r)i − θ

(r)i

θ(r)i

|. (3–44)

The Average Relative Mean Squared Error (ARMSE) is given by

ARMSE =1

kR

k∑i=1

R∑r=1

(θ(r)i − θ

(r)i )2

θ(r)i

2. (3–45)

The Average Deviation (AD) is denoted by

AD =1

kR

k∑i=1

R∑r=1

|θ(r)i − θ

(r)i |. (3–46)

The Average MSE is given by

AMSE =1

kR

k∑i=1

R∑r=1

(θ(r)i − θ

(r)i )2, (3–47)

and, the estimated MSE is given by

EMSE =1

kR

k∑i=1

R∑r=1

MSE(θi)(r)

, (3–48)

where MSE(θi)(r)

is the second order correct estimate of the MSE as given by 3–43.

In addition to the relative bias of the point estimates and simulated MSE’s, we report

relative bias of the estimates of the mean squared errors as defined below.

Relative bias with respect to the empirical MSE:

REBi =E{ MSE(θi)} − SMSE(θi)

SMSE(θi), i = 1 · · · ,m

where, SMSE(θi) = 1R

∑ki=1

∑Rr=1(θ

(r)i − θ

(r)i )2E{ MSE(θi) = 1

R

∑Rr=1

MSE(θi)(r)

.

As matter of interest we also computed the direct estimators to compare their

performances with our empirical Bayes estimators. We observed there was considerable

improvement in terms of bias and MSE when we used our EB estimators as compared

to the raw estimators, e.g., for λ = 5, the improvement was 48.5% for ARD, 98.1 %

56

Page 57: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

for ARMSE, 42.6 % for AD and 65.9 % for SMSE for the first variable. For the second

variable, the improvements were 50.4 % for ARD, 87.6 % for ARMSE, 24.5 % for AD

and 71.5 % for SMSE. Table 3-1 and Table 3-2 summarizes the performance of the EB

estimates for the first and second small area mean respectively in terms of their bias and

MSE estimates.

Table 3-1. Relative biases of the point estimates and relative bias of MSE estimates forthe small area mean: the first variable.

λ ARD ARMSE AD AMSE EMSE REB

k=10.5 0.614 0.036 0.081 0.011 0.037 2.3851 0.289 0.330 0.079 0.012 0.043 2.3555 0.192 0.071 .069 0.008 0.023 3.276

k=20.5 0.358 2.32 0.080 0.0108 0.036 2.4061 0.290 .0402 0.089 0.013 0.039 2.3475 0.180 0.005 0.068 0.0073 0.022 2.406

Table 3-2. Relative biases of the point estimates and relative bias of MSE estimates forthe small area mean: the second variable.

λ ARD ARMSE AD AMSE EMSE REB

k=10.5 0.422 0.007 0.074 0.009 0.007 0.2141 0.208 0.182 0.084 0.010 0.009 0.2835 0.151 0.050 0.063 0.006 0.005 0.196

k=20.5 0.209 0.355 0.075 0.010 0.007 0.2211 0.290 .008 0.079 0.011 0.008 0.2255 0.1412 0.023 0.062 0.006 0.004 0.206

57

Page 58: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

CHAPTER 4HIERARCHICAL BAYES ESTIMATION FOR BIVARIATE BINARY DATA

WITH APPLICATIONS TO SMALL AREA ESTIMATION

4.1 Introduction

In Chapter 3, we considered empirical Bayes estimation of the small area parameters

for bivariate binary data. In this chapter we have implemented the hierarchical Bayes

procedure for bivariate binary data when there is missingness present in the covariates.

For our purpose, we have assumed the missingness to be missing at random (MAR)(Little

and Rubin, 2002). We have incorporated ideas from Lipsitz and Ibrahim (1996) and

Ibrahim, Lipsitz, and Chen (1999) to treat the missing covariates.

The general HB models which accomodate this missingness is discussed in Section

2. We have, after suitable reparametrization, converted the bivariate binary likelihood

into a multinomial likelihood, and subsequently have utilized the multinomial-Poisson

relationship leading eventually to a Poisson reparameterization. Bayesian analogues of

log-linear models are used to estimate the Poisson parameters. We have applied the

HB methodology to obtain the small area estimates, the associated standard errors

and posterior correlations. Due to the analytical intractability of the posterior, the HB

procedure is implemented via Markov chain Monte Carlo (MCMC) integration techniques,

in particular, the Gibbs sampler. In Section 3, The methodology is illustrated with the

real data related to low birthweight and infant mortality.

4.2 Hierarchical Bayes Method

Let y = (yij1, yij2)T (j = 1, · · · , ni; i = 1, · · · , k) denote the bivariate binary response

for the jth unit in the ith small area. Thus each yijl (l = 1 or 2) assumes the values 0 or 1.

The joint probability function for y is given by

p(yij1, yij2) =exp(φij1yij1 + φij2yij2 + φij3yij1yij2)

1 + exp(φij1) + exp(φij2) + exp(φij1 + φij2 + φij3). (4–1)

58

Page 59: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Following the same transformations as in Chapter 3, we arrive at the model

p(yij1, yij2) = pyij1(1−yij2)ij1 p

yij2(1−yij1)ij2 p

yij1yij2

ij3 p(1−yij1)(1−yij2)ij4 , (4–2)

where pij4 = 1 − ∑3l=1 pijl.Then writing zij1 = yij1(1 − yij2), zij2 = yij2(1 − yij1),

zij3 = yij1yij2 and zij4 = (1− yij1)(1− yij2), we get

p(yij1, yij2) ∝ pzij1

ij1 pzij2

ij2 pzij3

ij3 pzij4

ij4 . (4–3)

Next we write pijl = ζijl/∑4

r=1 ζijr and use the well-known relationship between

the multinomial and Poisson distribution, namely, (zij1, zij2, zij3, zij4) have the same

distribution as the joint conditional distribution of (uij1, uij2, uij3, uij4) given∑4

q=1 uijq = 1, where uijq are independent Poisson(ζijq) (q=1,2,3,4).

We begin modelling the ζijl as

log(ζijl) = xTijbl + vi, (4–4)

where xij = (xij1, xij2, . . . , xijq)T , bl = (bl1, bl2, . . . , blq)

T , and viiid∼ N(0, σ2). In the

presence of missing covariates, we write xTij = (xT

ij,mis, xTij,obs) and assume that the

missingness is MAR. In the presence of missingness in some of the components of

xij, we model only those missing components for necessary imputation. Further, in

order to reduce the number of nuisance parameters, we model the joint distribution

of these missing components as the product of several smaller dimensional conditional

distributions as proposed in Lipsitz and Ibrahim (1996) and Ibrahim, Lipsitz, and

Chen (1999). Specifically, we partition xij,mis into several component vectors, say,

x(1)ij,mis, · · · ,x

(r)ij,mis, parameterized by α1, . . . , αr, and conditional on the observed xij,obs

59

Page 60: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

and αT = (αT1 , · · · ,αT

r ), write the joint pdf of xij,mis as

f(xij,mis|xij,obs,α) = f(x(1)ij,mis, . . . , x

(r)ij,mis|xij,obs,α)

= f(x(r)ij,mis|xij,obs, x

(1)ij,mis, . . . , x

(r−1)ij,mis, αr)

× f(x(r−1)ij,mis|xij,obs, x

(1)ij,mis, . . . , x

(r−2)ij,mis, αr−1)

× · · · × f(x(1)ij,mis|xij,obs,α1). (4–5)

It is important to note that since we are assuming MAR missing covariates, we do not

need to model the missing data mechanism.

Write zij = (zij1, zij2, zij3, zij4)T and bT = (bT

1 , bT2 , bT

3 , bT4 ). Let D = (zij, xij, j =

1, 2, . . . , ni, i = 1, 2, . . . , k) denote the complete data. Also, let Dobs = (zij, xij,obs, j =

1, 2, . . . , ni, i = 1, 2, . . . , k) denote the observed data. Then, the complete data likelihood

under the hierarchical model is given by

L(b, σ2,α|D)

=

{k∏

i=1

[ni∏

j=1

4∏

l=1

exp{zijl(xTijbl + vi)}

zijl!exp

{− exp(xT

ijbl + vi)}]

1√2πσ

exp(− v2

i

2σ2

)}

×k∏

i=1

ni∏j=1

f(xij,obs,xij,mis,α) (4–6)

and the observed data likelihood is of the form

L(b, σ2,α|Dobs)

=

∫ {k∏

i=1

∫ [ni∏

j=1

4∏

l=1

exp{zijl(xTijbl + vi)}

zijl!exp

{− exp(xT

ijbl + vi)}]

1√2πσ

× exp(− v2

i

2σ2

)dvi

}k∏

i=1

ni∏j=1

f(xij,obs,xij,mis|α)µ(dxij,mis), (4–7)

where µ is some σ-finite measure, typically the Lebesgue measure or the counting measure.

We assume bl’s, l = 1, . . . , 4, σ2, α are a priori mutually independent with bl ∼uniform(Rq) for l = 1, . . . , 4, σ2 ∼ Inverse Gamma(c, d), where Inverse Gamma(c, d) pdf is

60

Page 61: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

given by π(y) ∝ exp(− c2y

)y−d/2−1. We let π(α) denote the prior distribution of α. Then,

the resulting posterior distribution is given by

π(b, σ2,α|Dobs) ∝ L(b, σ2, α|Dobs) exp(− c

2σ2)(σ2)−d/2−1π(α). (4–8)

Direct evaluation of the posterior moments of pijl’s are not possible as it entails

evaluation of numerical integrals which are very difficult to compute. We use the Gibbs

sampling method instead. Let xmis denote the full vector of all the xij,mis, and v =

(v1, . . . , vk)T . To facilitate Gibbs sampling, we consider the following joint posterior based

on the complete data likelihood as:

π(b, σ2,α,v,xmis|Dobs) ∝ L(b, σ2, α|D) exp(− c

2σ2)(σ2)−d/2−1π(α), (4–9)

where L(b, σ2,α|D) is given by Eq. 4–6. It is easy to show that after integrating out v

and xmis from Eq. 4–9, the joint marginal posterior of (b, σ2,α) is Eq. 4–8.

To prove the propriety of the posterior, we will mainly follow the techniques given

in Chen and Shao (2000), Chen et al. (2004) and Chen et al. (2002). We note that

our covariates are bounded. We assume that aij ≤ xij,miss ≤ bij, which means all

the components of xij,miss are bounded by aij and bij. We start with introducing some

notations:

let M = {(i, j) : at least one component of xij is missing},Mij = {1 ≤ r ≤ q : xijr is missing}, and M∗

r = {(i, j) : r ∈ Mij}, where j = 1, · · · , ni,

i = 1, · · · , k.

61

Page 62: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Here,following Chen, Ibrahim, and Shao (2004),

f(zijl|xij, bl, v)

=1

zijl!exp{zijl(x

Tijbl + vi)} exp

{− exp(xT

ijbl + vi)}

=1

zijl!exp{zijlηijl} exp

{− exp(ηijl)

}

exp(− exp(ηijl)

)if zijl = 0

(zijl+1)!

zijl!exp(−η+

ijl) exp(−η−ijl) if zijl ≥ 1 .

(4–10)

Denote by I1l = {(i, j) : zijl = 0} and I2l = {(i, j) : zijl ≥ 1}. Let f1(η) =

exp{−exp(η)}, f2(η) = exp(−η+) and f2(η) = exp(−η−). Note that f1 and f2 are

non-increasing in η. Also f1(∞) = f2(∞) = 0, and, f1(∞) < ∞, f2(∞) < ∞. On the

other hand, f3 is non-decreasing in η, f3(−∞) = 0 and f3(∞) < ∞. Let n =∑k

i=1 ni and N l2 be the cardinality of I2l, l = 1, · · · , 4. Denote uT

ij = (xTij, ei) with

e = (0, .., 1, 0, ..0) ∈ Rk as the basis vector with 1 at the ith component. Let

U (l) = (uij, 1 ≤ j ≤ ni, 1 ≤ i ≤ k, uln+ij, (i, j) ∈ I2l) (4–11)

be the (n + N l2) × (q + k) design matrix such that u

(l)n+ij = uij for (i, j) ∈ I3l. Let

w(l)ij = 1 if (i, j) ∈ I1l

⋃I2l and w

(l)n+ij = −1 if (i, j) ∈ I2l. Define U

(l)∗ to be the matrix with

rows

w(l)ij uT

ij, 1 ≤ j ≤ ni, 1 ≤ i ≤ kw(l)n+iju

(l)Tn+ij, (i, j) ∈ I2l (4–12)

For 1 ≤ r ≤ q, 1 ≤ l ≤ 4, let

alij =

aij if (i, j) ∈ I1l

⋃I2l

bij o.w.

aln+ij = bij if (i, j) ∈ I2l

(4–13)

62

Page 63: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

blij =

bij if (i, j) ∈ I1l

⋃I2l

aij o.w.

bln+ij = aij if (i, j) ∈ I2l

(4–14)

We are now led to the following theorem.

Theorem 4-1. For 1 ≤ r ≤ q and 1 ≤ l ≤ 4, let x(l)ijr = a

(l)ij for every (i, j) ∈ M∗

r, x(l)n+ij,r =

a(l)n+ij for every j ∈M∗

r

⋂I2l, or, let x

(l)ijr = b

(l)ij for every (i, j) ∈M∗

r, x(l)n+ij,r = b

(l)n+ij

for every j ∈M∗r

⋂I2l,

(C1) the matrix U (l) as defined in Eq. 4–11 is of full rank;(C2) there exists a positive vector a = (a1, · · · , an+N l

2) ∈ Rn+N l

2 , such that

U (l)T∗ a = 0 (4–15)

where U(l)∗ is defined in Eq. 4–12

(C3)∫∞−∞ |u|qdfi(u) < ∞ for i = 1, 2, 3.

If conditions (C1)-(C3) are satisfied, then the posterior is proper provided

∫ k∏i=1

ni∏j=1

f(xij,obs|α)dα < ∞. (4–16)

We will defer the proof of this theorem to the appendix.

To implement the Gibbs sampling algorithm, we sample from the following conditional

distributions in turn: (i) π(b|v,xmis, Dobs), (ii) π(v|b, σ2,xmis, Dobs), (iii) π(σ2|v, Dobs),

(iv) π(xmis|b, v,α, Dobs), and (v) π(α|xmis, Dobs).

For (i), given v, xmis, and Dobs, the bl’s are conditionally independent with

π(bl|v,xmis, Dobs) ∝ exp

{k∑

i=1

ni∑j=1

[zijl(x

Tijbl + vi)− exp(xT

ijbl + vi)]}

.

It can be shown that π(bl|v,xmis, Dobs) is log-concave in bl and, hence, we can sample bl

via the adaptive rejection algorithm of Gilks and Wild (1992) for l = 1, . . . , 4. For (ii),

63

Page 64: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

given b, σ2, xmis, and Dobs, the vi’s are conditionally independent with

π(vi|b, σ2,xmis, Dobs) ∝ exp

{− v2

i

2σ2+

ni∑j=1

4∑

l=4

[zijl(x

Tijbl + vi)− exp(xT

ijbl + vi)]}

,

which is log-concave in vi. Again, we sample vi using the adaptive rejection algorithm of

Gilks and Wild (1992) for i = 1, 2, . . . , k. For (iii), we have

σ2 | v, Dobs ∼ Inverse Gamma(c + k, d +

k∑i=1

v2i

).

Thus, sampling σ2 is straightforward. For (iv), given b, v, α, Dobs, the xij,mis’s are

conditionally independent with

π(xij,mis|b,v,α, Dobs) ∝ exp

{4∑

l=4

[zijl(x

Tijbl + vi)− exp(xT

ijbl + vi)]}

f(xij|α).

Sampling xij,mis depends on the form of f(xij|α). For the data given in the next section,

xij,mis is a vector of discrete covariates and hence, it is easy to sample xij,mis from its

conditional posterior distribution. For (v), π(α|xmis, Dobs) ∝[∏k

i=1

∏ni

j=1 f(xij|α)]π(α).

For various covariate distributions specified through a series of smaller dimensional

conditional distributions, sampling α is straightforward.

4.3 Data Analysis

The motivating example for our study is to estimate jointly the proportion of

newborns with low birthweight and infant mortality rate at low levels of geography such

as districts within a state. The data from the infant mortality study was conducted by

National Center for Health Statistics (NCHS) involving birth cohorts from 1995 to 1999.

The original survey was designed to obtain reliable estimates at the state level. The same

data needs to be used to produce estimates at the district level. Other than the intercept

term, the covariates considered were (i) mother’s schooling categorized into less than

12 years, equal to 12 years, or greater than 12 years; (ii) mother’s age categorized into

less than 20 years, between 20 and 34 years, and greater than 34 years; (iii) the smoking

habit of the mother categorized into mother does not smoke and the average number of

64

Page 65: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

cigarettes smoked during pregnancy is at least 1. It turned out that there was missingness

in the response to both (i) and (iii). If one leaves out completely all the entries where such

missingness occurs, then one is left with only a few observations where one encounters

both low birthweight and infant mortality, and indeed the total number of such entries will

be disproportionately small in comparison with the remaining outcomes. Also, it is very

clear that both these covariates have direct impact on both low birthweight and infant

mortality so that the missingness cannot be treated as “missing completely at random”.

Instead, this missingness falls in the category of “missing at random” (MAR) (Little and

Rubin, 2002).

We use our method to estimate infant mortality rate and low birthweight rate

simultaneously over a period of 5 years for several small areas in one of the states (name

withheld for confidentiality). We constructed the small areas by cross-classifying the

different cities in the state with mother’s race, namely, Non-Hispanic White, Non-Hispanic

Black, Non-Hispanic American Indian, Non-Hispanic Asian/Pacific Islander and Hispanic.

We get 48 such small domains. For each small domain, we consider the NCHS full data

as constituting our population values, and we draw a 2% simple random sample from

this population of individuals. Thus the population low birthweight and infant mortality

rates are known for these small domains, and will serve as the “gold standard” for

comparison with the different estimates. In particular, we will find the HB estimates,

and compare the same with the direct estimates, namely the sample means. In our

analysis we consider the response variables and covariates as follows: yij1 = infant death

(1=yes, 0=no); yij2 = low birthweight (1=yes, 0=no); xij1 = 1 denoting the intercept;

(xij2, xij3) = (0, 0), (1, 0), or (0, 1) according as mother’s schooling is less than 12 years,

equal to 12 years, or is greater than 12 years; (xij4, xij5) = (0, 0), (1,0) or (0,1) if mother’s

age is less than 20 years, between 20 and 34 years, or is greater than 34 years; and

xij6 = 0 or 1 if mother does not smoke, or the average number of cigarettes smoked during

pregnancy is at least 1. Here y = (yij1, yij2)T is the response vector with the covariates

65

Page 66: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

xij = (xij1, xij2, xij3, xij4, xij5, xij6)T , j = 1, ..., ni, i = 1, 2, ..., k = 48. Covariates xij1,

xij4, xij5 are completely observed, while xij2, xij3 and xij6 have missing values. The models

considered for these three missing covariates are given as follows:

f(xij6|α6) =exp{xij6(α61 + α62xij4 + α63xij5)}1 + exp(α61 + α62xij4 + α63xij5)

,

where α6 = (α61, α62, α63)T ,

P ((xij2, xij3) = (1, 0)|α2,α3)

=exp(α21 + α22xij4 + α23xij5 + α24xij6)

1 + exp(α21 + α22xij4 + α23xij5 + α24xij6) + exp(α31 + α32xij4 + α33xij5 + α34xij6),

and

P ((xij2, xij3) = (0, 1)|α2,α3)

=exp(α31 + α32xij4 + α33xij5 + α34xij6)

1 + exp(α21 + α22xij4 + α23xij5 + α24xij6) + exp(α31 + α32xij4 + α33xij5 + α34xij6),

where αl = (αl1, αl2,αl3,αl4)T for l = 2, 3. We do not need to model xij1, xij4, xij5.

We assume improper uniform priors for α2, α3 and α6, i.e., π(α) ∝ 1, where α =

(αT2 , αT

3 ,αT6 )T .

To prove the propriety of the posterior under an uniform prior for α, we start with

the complete likelihood for the covariates: Let

L(α6,α2, α3|Dcomp) =k∏

i=1

ni∏j=1

[f(xij6|α6)

× exp{xij2(α21 + α22xij4 + α23xij5 + α24xij6)}+ xij3(α31 + α32xij4 + α33xij5 + α34xij6)}1 + exp(α21 + α22xij4 + α23xij5 + α24xij6) + exp(α31 + α32xij4 + α33xij5 + α34xij6)

],

where Dcomp = {(xij2, xij3, xij6, xij4, xij5) j = 1, . . . , ni, i = 1, . . . , k}.We consider the following four cases.

66

Page 67: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Case 1: All (xij2, xij3, xij6) are observed.

L1(α6,α2,α3|Dobs,1) =∏

i,j: (xij2,xij3,xij6) are observed

[f(xij6|α6)

× exp{xij2(α21 + α22xij4 + α23xij5 + α24xij6)}+ xij3(α31 + α32xij4 + α33xij5 + α34xij6)}1 + exp(α21 + α22xij4 + α23xij5 + α24xij6) + exp(α31 + α32xij4 + α33xij5 + α34xij6)

],

where Dobs,1 = ((xij2, xij3, xij6, xij4, xij5) are observed 1 ≤ j ≤ ni, 1 ≤ i ≤ k}.Case 2: xij6 observed but (xij2, xij3) are missing. Let Dobs,2 = ((xij6, xij4, xij5) :

(xij6, xij4, xij5) are observed but (xij2, xij3) are missing 1 ≤ j ≤ ni, 1 ≤ i ≤ k}.Let

L2(α6,α2, α3|Dobs,2) =∏

i,j: xij6 is observed

(f(xij6|α6)×

[ ∑xij2,mis,xij3,mis

× exp{xij2(α21 + α22xij4 + α23xij5 + α24xij6)}+ xij3(α31 + α32xij4 + α33xij5 + α34xij6)}1 + exp(α21 + α22xij4 + α23xij5 + α24xij6) + exp(α31 + α32xij4 + α33xij5 + α34xij6)

])

=∏

i,j: xij6 is observed

f(xij6|α6).

Case 3: xij6 is missing but (xij2, xij3) are observed. In this case, we sum over all possible

values of xij6. Let Dobs,3 = ((xij2, xij3, xij4, xij5) : (xij2, xij3, xij4, xij5) are observed but xij6

is missing 1 ≤ j ≤ ni, 1 ≤ i ≤ k}. Similarly to Case 2, we have

L3(α6,α2,α3|Dobs,3) =∏

i,j: (xij2,xij3) are observed but xij6 missing

∑xij6,mis

[f(xij6|α6)

× exp{xij2(α21 + α22xij4 + α23xij5 + α24xij6)}+ xij3(α31 + α32xij4 + α33xij5 + α34xij6)}1 + exp(α21 + α22xij4 + α23xij5 + α24xij6) + exp(α31 + α32xij4 + α33xij5 + α34xij6)

].

Case 4: (xij2, xij3, xij6) all are missing. In this case, the observed likelihood after

summing over missing values is equal to 1.

67

Page 68: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Let Dobs = (Dobs,1, Dobs,2, Dobs,3) and let mis denote the collection of all missing values.

Combining Cases 1 - 3 leads to

L(α6, α2,α3|Dobs) =∑

mis

L(α6,α2,α3|Dcomp)

=L1(α6,α2,α3|Dobs,1)L2(α6,α2, α3|Dobs,2)L3(α6, α2,α3|Dobs,3).

To start with, let uTij = (x1ij, x4ij, x5ij, x6ij)

T , u1ij = (uTij,0

T1×4)

T and u1ij =

(0T1×4, u

Tij)

T . Also let γ = (αT2 ,αT

3 )T such that γr = α2r,and, γ4+r = α3r, r = 1, · · · , 4.

Let us consider partition of the index set {(i, j) : (xij2, xij3) is observed} into three

parts: J1 = {(i, j) : x2ij = 1, x3ij = 0}, J2 = {(i, j) : x2ij = 0, x3ij = 1} and

J3 = {(i, j) : x2ij = 0, x3ij = 0}. Then the likelihood L(α2,α3, α6|Dobs) can be written as

L(α2, α3,α6|Dobs) =∏

i,j: xij6 is observed

f(xij6|α6)

(i,j)∈J1

∑xij6,mis

[f(xij6|α6)

exp (uT1ijγ)

1 + exp (uT1ijγ) + exp (uT

2ijγ)

]

(i,j)∈J2

∑xij6,mis

[f(xij6|α6)

exp (uT2ijγ)

1 + exp (uT1ijγ) + exp (uT

2ijγ)

]

×∏

(i,j)∈J3

∑xij6,mis

[f(xij6|α6)

1

1 + exp (uT1ijγ) + exp (uT

2ijγ)

]. (4–17)

We follow mainly the techniques of Chen and Shao (2000) and Chen, Ibrahim, and

Shao (2006) to prove the propriety of the posterior. First, for (i, j) ∈ Mc6 = {(i, j) :

x6ij is observed}, let x∗ij = (x1ij, x4ij, x5ij) and X∗ be the the n∗ × 3 design matrix with

rows xij, where n∗ is the cardinality of Mc6. Let z∗ij = 1 − 2x6ij, and X∗ be the matrix

with rows z∗ijxij, (i, j) ∈Mc6. Let us state the following conditions:

(D1) X is of full rank;(D2) there exists a positive vector a = (a1, · · · , an∗) ∈ Rn∗ , such that

X∗a = 0 (4–18)

68

Page 69: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

For (i, j) ∈ Mc23 = (i, j) : (x2ij, x3ij) is observed, let u∗1ij

T = (uTij,obs, x

∗T6ij,0

T ) where

x∗6ij is either equal to (1 − x2ij) or x2ij if (i, j) ∈ M∗6 = {(i, j) : x6ij is missing}; also

u∗2ijT = (0T , uT

ij,obs, x∗T6ij) where x∗6ij is either equal to (1 − x3ij) or x3ij if (i, j) ∈ M∗

6. Let

U ∗∗ = (−u∗1ij, (i, j) ∈ J1⋃

J3, (u∗2ij − u∗1ij), (i, j) ∈ J1, (u∗1ij − u∗2ij), (i, j) ∈ J2,−u∗2ij, (i, j) ∈J2

⋃J3). Let n∗∗ be the cardinality of Mc

23. Then let us state the following conditions:

(H1) U ∗∗ is of full rank;(H2) there exists a positive vector a = (a1, · · · , a2n∗∗) ∈ R2n∗∗ , such that

U ∗∗T a = 0. (4–19)

Now we state the following theorem:

Theorem 4-2. Suppose conditions (D1) and (D2), and, (H1) and (H2) hold, then we have

R3

R8

L(α2,α3,α6|Dobs)dα2dα3dα6 < ∞. (4–20)

For our analysis, we assume an Inverse Gamma(1,1) prior for σ2. Table 4-1

provides the sample sizes for all the 48 small domains, the population proportions of

low birthweight and infant mortality (P1,P2), the corresponding sample means (D1, D2),

and the corresponding HB estimates (HB1, HB2).

69

Page 70: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Table 4-1. Bivariate HB estimates along with the Direct estimates and the True values

area ni P1 P2 D1 D2 HB1 HB2

1 695 0.007 0.068 0.004 0.062 0.006 0.0802 3 0.015 0.099 0.000 0.000 0.007 0.0823 41 0.006 0.061 0.000 0.122 0.011 0.1084 1477 0.006 0.068 0.006 0.071 0.006 0.0755 425 0.014 0.131 0.024 0.150 0.006 0.0776 345 0.006 0.067 0.008 0.064 0.006 0.0827 510 0.005 0.067 0.004 0.061 0.009 0.1028 888 0.009 0.116 0.008 0.114 0.009 0.1169 44 0.004 0.081 0.000 0.091 0.005 0.07310 1718 0.004 0.063 0.002 0.058 0.007 0.08311 379 0.006 0.068 0.005 0.082 0.005 0.07512 126 0.013 0.130 0.03 0.135 0.007 0.08213 5 0.005 0.068 0.000 0.200 0.009 0.09114 15 0.003 0.077 0.000 0.067 0.006 0.07615 13 0.003 0.070 0.000 0.000 0.007 0.08316 975 0.005 0.072 0.007 0.060 0.010 0.10717 349 0.011 0.135 0.017 0.149 0.007 0.08318 4 0.005 0.112 0.000 0.500 0.007 0.08019 58 0.003 0.074 0.000 0.069 0.005 0.07820 288 0.005 0.082 0.003 0.090 0.007 0.08221 27 0.002 0.078 0.000 0.111 0.007 0.07922 237 0.004 0.067 0.000 0.076 0.006 0.07623 987 0.005 0.068 0.004 0.078 0.006 0.07924 6 0.019 0.052 0.000 0.000 0.005 0.07825 50 0.008 0.084 0.000 0.040 0.008 0.08426 365 0.004 0.059 0.003 0.088 0.004 0.07227 453 0.008 0.064 0.011 0.077 0.005 0.07328 78 0.019 0.120 0.012 0.128 0.005 0.07829 4 0.016 0.093 0.000 0.000 0.007 0.08130 15 0.004 0.076 0.000 0.133 0.005 0.07231 19 0.007 0.064 0.000 0.158 0.007 0.08632 446 0.008 0.071 0.002 0.074 0.005 0.07333 76 0.015 0.133 0.000 0.145 0.005 0.06934 8 0.008 0.085 0.000 0.125 0.007 0.08635 22 0.006 0.056 0.045 0.136 0.009 0.09336 290 0.014 0.133 0.014 0.131 0.005 0.07337 3 0.009 0.074 0.000 0.000 0.008 0.08638 18 0.006 0.061 0.000 0.111 0.006 0.08239 933 0.005 0.066 0.004 0.062 0.007 0.08440 207 0.012 0.119 0.019 0.126 0.006 0.08241 4 0.000 0.084 0.000 0.000 0.008 0.08542 21 0.008 0.084 0.000 0.048 0.007 0.08943 1472 0.007 0.070 0.007 0.071 0.008 0.08944 27 0.004 0.073 0.000 0.037 0.005 0.07645 275 0.006 0.067 0.003 0.069 0.005 0.07246 1131 0.006 0.065 0.008 0.060 0.008 0.08547 191 0.014 0.125 0.016 0.126 0.008 0.08748 303 0.006 0.060 0.003 0.066 0.008 0.085

70

Page 71: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Based on the figures in Table 4-1, we calculate the following summary measures

to examine the performance of the estimators. Let ti be the generic symbol for a true

proportion in the ith small area and ei be the corresponding generic symbol for the

estimate. Then we define the Average Relative Deviation (ARD) as

ARD = m−1

m∑i=1

|ei − ti|/ti. (4–21)

The Average Relative Mean Squared Error (ARMSE) is given by

ARMSE = m−1

m∑i=1

(ei − ti)2/t2i . (4–22)

The Average Deviation (AD) is denoted by

AD = m−1

m∑i=1

|ei − ti|, (4–23)

and, the Average MSE is given by

AMSE = m−1

m∑i=1

(ei − ti)2. (4–24)

Table 4-2 summarizes the performance of the HB estimates, and also of the direct

estimates. It follows from the summary table Table 4-2 that the HB estimates outperform

the direct estimates according to all the four criteria. For the HB estimates, ARD,

ARMSE, AD and AMSE improvement over the direct estimates are 39.39%, 74.88%,

37.93% and 71.43% respectively, for estimating infant mortality rate. For low birthweight

proportions, corresponding improvements are 46.59%, 85.09%, 43.05% and 86% respectively.

Table 4-2. Measures of precision

D1 D2 HB1 HB2

ARD 0.7973 0.464 0.4832 0.2478ARMSE 1.6525 0.6244 0.4151 0.0931AD 0.0058 0.0367 0.0036 0.0209AMSE 0.00007 0.005 0.00002 0.0007

71

Page 72: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Table 4-3 provides the posterior standard deviations of the HB estimates denoted by

SE1 and SE2 respectively, and the posterior correlation denoted by CORR. In addition, we

provide 95% highest posterior density (HPD) intervals associated with the HB estimates.

These HPD intervals are denoted by I1 and I2.

We used the posterior predictive p-values of Gelman et al. (2003, Ch. 6) for assessing

the model fit. The posterior predictive p-value for the ith area is defined as

ppi = Pr(T (yrep,θi) ≥ T(y, θi)|Dobs),

where yrep denotes the replicated data that could have been observed, and

T (y,θi) = (θi − Eyθi)TΣ−1

y (θi − Eyθi),

θi = (θi1, θi2)T , and Ey[θi] and Σy are the posterior mean and variance-covariance

matrix of θi based on the observed data y for i = 1, 2, . . . , k = 48. Here the p-value

is estimated by calculating the proportion of cases in which the simulated discrepancy

variable T (yrep,θi) exceeds the realized value T (y,θi):

p-valuei =1

N

N∑i=1

I(T (yrep,θi) ≥ T(y,θi)), i = 1, · · · ,48,

N being the number of replications of the data. The posterior predictive p-values are also

shown in Table 4-3.

If the model is reasonably accurate, the replications should look similar to the

observed data y. In the case where the data y in in conflict with the posited model,

T (y,θi))’s are expected to have extreme values, which in turn should give p-values close to

0 or 1. Hence, an ideal fit requires the posterior predictive p-values to be in the ballpark of

0.5. The figures provided in Table 3 show that this is indeed the case for our fitted model.

72

Page 73: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Table 4-3. Posterior variance, correlation and 95% HPDs along with predictive p-values

I1 I2area SE1 SE2 CORR p-value

1 0.001 0.002 0.115 0.005 0.007 0.076 0.085 0.5362 0.001 0.002 0.151 0.005 0.008 0.077 0.087 0.5443 0.002 0.006 0.181 0.007 0.015 0.097 0.119 0.5564 0.001 0.002 0.124 0.004 0.007 0.071 0.080 0.5205 0.001 0.002 0.121 0.005 0.007 0.072 0.081 0.5726 0.001 0.002 0.099 0.005 0.007 0.077 0.086 0.5527 0.002 0.005 0.153 0.006 0.013 0.092 0.112 0.4968 0.009 0.130 0.149 0.008 0.102 0.005 0.073 0.5489 0.001 0.002 0.168 0.004 0.007 0.069 0.078 0.53610 0.001 0.003 0.106 0.006 0.009 0.078 0.088 0.58811 0.001 0.002 0.170 0.004 0.006 0.071 0.08 0.54412 0.001 0.003 0.170 0.006 0.009 0.077 0.087 0.52413 0.001 0.004 0.152 0.006 0.011 0.084 0.099 0.55214 0.001 0.002 0.151 0.005 0.007 0.072 0.081 0.53615 0.001 0.003 0.144 0.005 0.009 0.078 0.089 0.54016 0.002 0.005 0.178 0.007 0.013 0.098 0.117 0.49617 0.001 0.002 0.162 0.006 0.009 0.078 0.088 0.47218 0.001 0.003 0.105 0.005 0.008 0.075 0.085 0.59619 0.001 0.002 0.154 0.004 0.007 0.074 0.083 0.50820 0.001 0.002 0.174 0.006 0.009 0.078 0.087 0.47621 0.001 0.004 0.126 0.004 0.009 0.071 0.087 0.55622 0.001 0.002 0.142 0.004 0.007 0.072 0.081 0.52423 0.001 0.002 0.097 0.005 0.008 0.075 0.084 0.61224 0.001 0.002 0.181 0.004 0.007 0.073 0.083 0.54025 0.001 0.003 0.072 0.006 0.009 0.079 0.089 0.57626 0.001 0.004 0.104 0.002 0.005 0.064 0.079 0.59627 0.001 0.002 0.102 0.004 0.006 0.069 0.078 0.60428 0.001 0.002 0.157 0.004 0.007 0.073 0.083 0.56029 0.001 0.002 0.132 0.005 0.008 0.076 0.085 0.60430 0.001 0.003 0.106 0.004 0.006 0.067 0.077 0.49231 0.001 0.003 0.201 0.005 0.008 0.081 0.091 0.60432 0.001 0.003 0.172 0.004 0.007 0.067 0.078 0.54033 0.001 0.002 0.148 0.004 0.007 0.064 0.074 0.49634 0.001 0.003 0.106 0.005 0.008 0.081 0.091 0.49235 0.002 0.004 0.163 0.007 0.012 0.085 0.101 0.58836 0.001 0.003 0.08 0.004 0.006 0.068 0.079 0.54437 0.001 0.003 0.191 0.006 0.011 0.079 0.092 0.54438 0.001 0.002 0.088 0.005 0.007 0.077 0.086 0.53239 0.001 0.003 0.146 0.006 0.009 0.079 0.089 0.55640 0.001 0.004 0.14 0.004 0.008 0.075 0.089 0.53641 0.001 0.004 0.152 0.005 0.010 0.078 0.093 0.53642 0.001 0.003 0.102 0.006 0.009 0.084 0.094 0.48843 0.001 0.003 0.156 0.006 0.010 0.083 0.094 0.53644 0.001 0.004 0.052 0.003 0.007 0.069 0.084 0.54445 0.001 0.002 0.105 0.004 0.006 0.067 0.076 0.56046 0.001 0.003 0.179 0.006 0.010 0.079 0.091 0.55247 0.001 0.003 0.164 0.006 0.009 0.081 0.092 0.50848 0.001 0.003 0.241 0.006 0.010 0.079 0.091 0.564

73

Page 74: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

CHAPTER 5SUMMARY AND FUTURE RESEARCH

In this dissertation, we have developed some new robust Bayes and empirical Bayes

estimation procedures in the context of small area estimation under a normal model. It

turns out that the proposed EB estimators perform quite well in comparison with the

regular EB estimators under a contaminated normal model, both in terms of the bias

and the mean squared error, especially for the extreme small area parameters. A closed

form expression of the MSE can not be obtained under this complex situation. We have

developed second-order correct MSE estimators for the robust estimators.

The other part of this dissertation is small area estimation when the response is

bivariate binary in the presence of covariates. We have first developed an EB estimation

procedure followed by a hierarchical Bayes procedure. In the latter case, we have

covariates which are missing at random. We have shown that both the methods perform

much better than the direct estimators through a simulation study in the former case and

real data analysis in the latter situation.

In this dissertation, we have considered only area level small area estimation models

for the case of robust estimation of parameters. It is worthwhile to extend the proposed

approach to unit level small area estimation models as considered for example in Battese

et al. (1988) A more challenging task is to extend these ideas to non-normal models,

especially for some of the discrete distributions involving binary, count or categorical data.

The posterior of the regression coefficient in such instances does not come out in a closed

form. A more managable alternative seems to involve a prediction problem where one

decides to look at the predictive distribution of a future observation, and use the same to

derive the influence of a given observation.

A natural extension of the first part of this dissertation would be to develop

similar robust procedures in the case of general linear mixed models. As noted in

the introduction, the normal model considered in the dissertation and also the other

74

Page 75: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

popular small area models are special cases of the general linear mixed models. It will be

worthwhile to develop robust methods in this more general case.

Another future undertaking will be extending the results for the bivariate binary

case in the multivariate context. Also we have developed EB techniques when there are

no missing covariates. It will be interesting to extend these results when some of the

covariates are missing.

75

Page 76: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

APPENDIX APROOF OF THEOREM 2-3

We start with the identity

E(θi

REB − θi)2 = E(θB

i − θi)2 + E(θREB

i − θi

B)2

= Vi(1−Bi) + E(θREBi − θi

B)2. (A–1)

Next we write

E(θREBi − θi

B)2 = E(θREB

i − θi

B+ θREB

i − θiREB

)2

= E(θREBi − θi

B)2 + E(θi

REB − θREBi )2

+2E(θREBi − θi

B)(θi

REB − θREBi ). (A–2)

Now observe that

θi

REB − θREBi

= BiDiψ

(yi − xT

i b

Di

)− BiDiψ

(yi − xT

i b

Di

)

= BiDi[yi − xT

i b

Di

+ (K − yi − xTi b

Di

)I[yi−xT

i

˜bDi

>K]

− (K +yi − xT

i b

Di

)I[yi−xT

i

˜bDi

<−K]

]

−BiDi[yi − xT

i b

Di

+ (K − yi − xTi b

Di

)I[yi−xT

i

ˆbDi

>K]

− (K +yi − xT

i b

Di

)I[yi−xT

i

ˆbDi

<−K]

]

= Bi(yi − xTi b)− Bi(yi − xT

i b)−Bi{(yi − xTi b−KDi)I

[yi−xTi

˜b>KDi]

+(yi − xTi b + KDi)I

[yi−xTi

˜b<−KDi]}+ Bi{(yi − xT

i b−KDi)I[yi−xT

iˆb>KDi]

+(yi − xTi b + KDi)I

[yi−xTi

ˆb<−KDi]}. (A–3)

We first show that xTi (b − b) = Op(m

−1). To this end, we begin with the one-step Taylor

expansion,

b.= b + (A− A)

∂b

∂A= b + (XTΣ−1X)−1XTΣ−1(y −Xb) + (A− A)

∂b

∂A. (A–4)

76

Page 77: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

But

∂b

∂A= −[(XTΣ−1X)−1(XTΣ−2X)(XTΣ−1X)−1XTΣ−1(y −Xb)

+ (XTΣ−1X)−1XTΣ−2(y −Xb)]

(A–5)

so that E[∂˜b

∂A] = 0, and after some simplifications,

V (∂b

∂A) = (XTΣ−1X)−1[3(XTΣ−2X)(XTΣ−1X)−1(XTΣ−2X)

+ XTΣ−3X](XTΣ−1X)−1.

(A–6)

By assumption (i), (XTΣ−1X)−1 ≥ (V ∗ + A)−1(XT X)−1. Also, XTΣ−2X ≤ A−2XT X

and XTΣ−3X ≤ A−3XT X. Hence from Eq. A–6,

V (∂b

∂A) ≤ (V ∗ + A)−2{A−3 + 3A−4(V ∗ + A)−1}(XT X)−1.

Now, noting that E(A− A)2 = O(m−1),

xiT E[(b− b)(b− b)T ]xi

.= xi

T E[(A− A)2(∂b

∂A)(

∂b

∂A)T ]xi

≤ C{E[(A− A)4]}1/2{E[(xiT (

∂b

∂A)(

∂b

∂A)T xi)

4]}1/2

= C{E[(A− A)4]}1/2{[(xiT V(

∂b

∂A)xi)

2]}1/2

≤ C{E[(A− A)4]}1/2(xiT (XT X)−1xi)

= O(m−2), (A–7)

where C(> 0) is a constant. This implies that xTi (b− b) = Op(m

−1). Accordingly, since

|I[|yi−xT

iˆb|>KDi]

− I[|yi−xT

i˜b|>KDi]

| ≤ I[|xT

i (ˆb−˜b)|>K|Di−Di|]

, (A–8)

where |Di −Di| = Op(m− 1

2 ) and xTi (b− b) = Op(m

−1), it follows from Eq. A–8 that

P (mr|I[|yi−xT

iˆb|>KDi]

− I[|yi−xT

i˜b|>KDi]

| 6= 0) → 0 (A–9)

77

Page 78: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

for arbitrarily large r(> 0) as m →∞. Moreover,

Bi(yi − xTi b)− Bi(yi − xT

i b) = BixTi (b− b)− (Bi −Bi)(yi − xT

i b)

= −(Bi −Bi)(yi − xTi b) + Op(m

−1). (A–10)

θi

REB − θREBi

= −(Bi −Bi)(yi − xTi b) + Bi[{(yi − xT

i b−KDi)− (yi − xTi b−KDi)}I

[yi−xTi

˜b>KDi]

+{(yi − xTi b + KDi)− (yi − xT

i b + KDi)}I[yi−xT

i˜b<−KDi]

]

+(Bi −Bi)[(yi − xTi b−KDi)I

[yi−xTi

˜b>KDi]+ (yi − xT

i b + KDi)I[yi−xT

i˜b<−KDi]

]

+Op(m−1)

= −(Bi −Bi)(yi − xTi b)I

[|yi−xTi

˜b|≤KDi]

−K{Bi(Di −Di) + Di(Bi −Bi)}(I[yi−xT

i˜b>KDi]

− I[yi−xT

i˜b<−KDi]

) + Op(m−1).

(A–11)

Hence,

E(θi

REB − θREBi )2 = E[(Bi −Bi)

2(yi − xTi b)2I

[|yi−xTi

˜b|≤KDi]]

+K2E[{Bi(Di −Di) + Di(Bi −Bi)}2(I[|yi−xT

i˜b|>KDi]

](A–12)

Now,

Bi −Bi = Vi(Vi + A)−1(Vi + A)−1(A− A)

= Vi(Vi + A)−2(A− A)(1− A− A

Vi + A)−1

= Vi(Vi + A)−2(A− A)(1 + Op(m− 1

2 )).

Hence,

E[(Bi −Bi)2(yi − xT

i b)2I[|yi−xT

i˜b|≤KDi]

]

= V 2i (Vi + A)−4E[(A− A)2(yi − xT

i b)2I[|yi−xT

i˜b|≤KDi]

(1 + Op(m− 1

2 ))]. (A–13)

78

Page 79: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

We use the approximation

A− A.= [

m∑j=1

(Vj + A)−2]−1

m∑j=1

(Vj + A)−2{(yj − xTj b)2 − (Vj + A)}, (A–14)

and write

(yi − xTi b)2 = [(yi − xT

i b)− xTi (b− b)]2

= (yi − xTi b)2 + Op(m

− 12 ). (A–15)

Also, I[|yi−xT

i˜b|≤KDi]

= I[|yi−xT

i b|≤KDi]+ op(m

−r) for arbitrarily large r > 0. Since

(A− A)2 = Op(m−1), we now get

E[(A− A)2(yi − xTi b)2I

[|yi−xTi

˜b|≤KDi]] = E[(A− A)2(yi − xT

i b)2I[|yi−xT

i˜b|≤KDi]

] + o(m−1).

(A–16)

Next by the mutual independence of the yi − xiT b, (i = 1, · · · ,m), and assumption (i) of

Theorem 2-3,

Cov[(A− A)2, (yi − xTi b)2I

[|yi−xTi b|≤KDi]

]

.= [

m∑j=1

(Vj + A)−2]−2

×Cov[{m∑

j=1

(Vj + A)−2{(yj − xTj b)2 − (Vj + A)}}2, (yi − xT

i b)2I[|yi−xT

i b|≤KDi]]

= [m∑

j=1

(Vj + A)−2]−2

×Cov[(Vi + A)−4{(yi − xTi b)2 − (Vi + A)}2, (yi − xT

i b)2I[|yi−xT

i b|≤KDi]]]

= O(m−2).

79

Page 80: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Hence from Eq. A–16, and the fact that D2i = Vi + A + O(m−1),

E[(A− A)2(yi − xTi b)2I

[|yi−xTi

˜b|≤KDi]]

= E(A− A)2E[(yi − xTi b)2I

[|yi−xTi b|≤K(Vi+A)

12 ]

] + o(m−1)

= 2[m∑

j=1

(Vj + A)−2]−1(Vi + A)E[Z2I[|Z|≤K]] + o(m−1)

= 2[m∑

j=1

(Vj + A)−2]−1(Vi + A)[2Φ(K)− 1− 2Kφ(K)] + o(m−1). (A–17)

From Eq. A–13 and Eq. A–17,

E[(Bi −Bi)2(yi − xT

i b)2I[|yi−xT

i˜b|≤KDi]

]

= 2B2i D

2i (1−Bi)

2[m∑

j=1

(1−Bj)2]−1[2Φ(K)− 1− 2Kφ(K)] + o(m−1). (A–18)

80

Page 81: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Next we calculate

(Vi + A)12 (Bi + Bi −Bi)[{1 +

A− A

Vi + A− xT

i

Vi + A(

m∑j=1

(Vj + A)−1xjxTj )−1xi} 1

2

−{1− xTi

Vi + A(

m∑j=1

(Vj + A)−1xjxTj )−1xi} 1

2 ]

= (Vi + A)12 (Bi + Bi −Bi)[1 +

1

2

A− A

Vi + A− 1

8

(A− A)2

(Vi + A)2

−1

2

xTi

Vi + A(

m∑j=1

(Vj + A)−1xjxTj )−1xi − 1

+1

2

xTi

Vi + A(

m∑j=1

(Vj + A)−1xjxTj )−1xi + o(m−1)]

= (Vi + A)12 (Bi + Bi −Bi)[

1

2

A− A

Vi + A− 1

8

(A− A)2

(Vi + A)2+ op(m

−1)]

= (Vi + A)12 Bi

1

2

A− A

Vi + A− (Vi + A)

12 Bi

1

8

(A− A)2

(Vi + A)2

+(Vi + A)12

Vi(A− A)

(Vi + A)(Vi + A)

1

2

A− A

Vi + A+ op(m

−1)

=Bi

2

A− A

(Vi + A)12

− Bi

8

(A− A)2

(Vi + A)32

− Bi(A− A)2

2(Vi + A)32

+ op(m−1)

=Bi

2

A− A

(Vi + A)12

− 5Bi

8

(A− A)2

(Vi + A)32

+ op(m−1),

and,

Di(Bi −Bi) = [Vi + A + O(m−1)]12

Vi(A− A)

(Vi + A)(Vi + A)

=Vi(A− A)

(Vi + A)32

(1 +A− A

Vi + A)−1

=Vi(A− A)

(Vi + A)32

[1− A− A

Vi + A+ Op(m

−1)]

= Vi[− A− A

(Vi + A)32

+(A− A)2

(Vi + A)52

+ op(m−1)]

= −BiA− A

(Vi + A)12

+ Bi(A− A)2

(Vi + A)32

+ op(m−1).

81

Page 82: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Thus,

Bi(Di −Di) + Di(Bi −Bi)

= −Bi

2

A− A

(Vi + A)12

+3Bi

8

(A− A)2

(Vi + A)32

+ op(m−1), (A–19)

so that

E[{Bi(Di −Di) + Di(Bi −Bi)}2(I[|yi−xT

i˜b|>KDi]

]

=1

4B2

i (Vi + A)−1E{(A− A)2}E{I[|yi−xT

i˜b|>KDi]

}+ o(m−1)

= B2i (Vi + A)−1[

m∑j=1

(Vj + A)−2]−1Φ(−K) + o(m−1)

= B2i (Vi + A)(1−Bi)

2[m∑

j=1

(1−Bj)2]−1Φ(−K) + o(m−1)

= B2i D

2i (1−Bi)

2[m∑

j=1

(1−Bj)2]−1Φ(−K) + o(m−1). (A–20)

Combining Eq. A–18 and Eq. A–20, it follows that

E(θi

REB − θREBi )2

= B2i D

2i (1−Bi)

2[m∑

j=1

(1−Bj)2]−1[4Φ(K)− 2− 4Kφ(K) + K2(1− Φ(K))] + o(m−1)

= B2i D

2i (1−Bi)

2[m∑

j=1

(1−Bj)2]−1[(K2 − 2)− 4Kφ(K)− (K2 − 4)Φ(K)] + o(m−1).

(A–21)

Finally, by Eq. 2-13 and Eq. A–19 we get

(θREBi − θi

B)(θi

REB − θREBi )

= −Bi{xTi (b− b) + (yi − xT

i b−KDi)I[yi−xT

i˜b>KDi]

+(yi − xTi b + KDi)I

[yi−xTi

˜b<−KDi]}

×[(Bi −Bi)(yi − xTi b)I

[|yi−xTi

˜b|≤KDi]− KBi

2(A− A)(Vi + A)−

12 (1− 3(A− A)

4(Vi + A)

+op(m− 1

2 )){I[yi−xT

i˜b>KDi]

− I[yi−xT

i˜b<−KDi]

}] + Op(m−1). (A–22)

82

Page 83: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Now by the independence of b with (A, yi − xTi b), we get

(θREBi − θi

B)(θi

REB − θREBi )

=KB2

i

2[{(A− A)(Vi + A)−

12 − 3(A− A)2

4(Vi + A)32

}

×{(yi − xTi b−KDi)I

[yi−xTi

˜b>KDi]− (yi − xT

i b + KDi)I[yi−xT

i˜b<−KDi]

)}+ op(m−1).

(A–23)

With the approximation given in Eq. A–14, and the independence of b with yi − xTi b, we

get

E[(A− A)(Vi + A)−12{(yi − xT

i b−KDi)I[yi−xT

i˜b>KDi]

−(yi − xTi b + KDi)I

[yi−xTi

˜b<−KDi]}]

= E[(A− A)(Vi + A)−12{((yi − xT

i b−KDi)− xTi (b− b))I

[yi−xTi

˜b>KDi]

−((yi − xTi b + KDi)− xT

i (b− b))I[yi−xT

i˜b<−KDi]

)}]

= E[(A− A)(Vi + A)−12{(yi − xT

i b−KDi)(I[yi−xTi b>KDi]

+ op(m− 1

2 ))

−(yi − xTi b + KDi)(I[yi−xT

i b<−KDi]+ op(m

− 12 ))}]

= E[{m∑

j=1

(Vj + A)−2}−1{m∑

j=1

(Vj + A)−2((yj − xTj b)2 − (Vj + A))}

×(Vi + A)−12{(yi − xT

i b−KDi)I[yi−xTi b>KDi]

−(yi − xTi b + KDi)I[yi−xT

i b<−KDi]}] + op(m

−1)

= {m∑

j=1

(Vj + A)−2}−1(Vi + A)−1E[(Vi + A)−32 ((yi − xT

i b)2 − (Vi + A))}

×{(yi − xTi b−K(Vi + A)

12 )I

[yi−xTi b>K(Vi+A)

12 ]

−(yi − xTi b + K(Vi + A)

12 )I

[yi−xTi b<−K(Vi+A)

12 ]

]}] + o(m−1)

= {m∑

j=1

(Vj + A)−2}−1(Vi + A)−1E[(Z2 − 1)(Z −K)I[Z>K] − (Z2 − 1)(Z + K)I[Z<−K]]

+o(m−1),

(A–24)

83

Page 84: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

where Z ∼ N(0, 1). Since Zd= −Z,

E[(Z2 − 1)(Z −K)I[Z>K] − (Z2 − 1)(Z + K)I[Z<−K]]

= 2E[(Z2 − 1)(Z −K)I[Z>K]]

= 2

∫ ∞

K

(Z2 − 1)(Z −K)φ(Z)dZ

= 2[

∫ ∞

K

Z2(−φ′(Z))dZ −K

∫ ∞

K

Z(−φ′(Z))dZ −∫ ∞

K

(−φ′(Z))dZ + K

∫ ∞

K

φ(Z)dZ]

= 2[(K2 + 2)φ(K)− (K2φ(K) + KΦ(−K))− φ(K) + KΦ(−K)]

= 2φ(K). (A–25)

Similarly,

E[(A− A)2(Vi + A)−32{(yi − xT

i b−KDi)I[yi−xT

i˜b>KDi]

− (yi − xTi b + KDi)I

[yi−xTi

˜b<−KDi]}]

= 2{m∑

j=1

(Vj + A)−2}−1(Vi + A)−1E[(Z −K)I[Z>K] − (Z + K)I[Z<−K]] + o(m−1)

= 4{m∑

j=1

(Vj + A)−2}−1(Vi + A)−1E[(Z −K)I[Z>K]] + o(m−1)

= 4{m∑

j=1

(Vj + A)−2}−1(Vi + A)−1[φ(K)−KΦ(−K)] + o(m−1). (A–26)

Hence,

E(θREBi − θi

B)(θi

REB − θREBi )

=KB2

i

2[

m∑j=1

(Vj + A)−2]−1(Vi + A)−1[2φ(K)− 3{φ(K)−KΦ(−K)}] + o(m−1)

=KB2

i

2[

m∑j=1

(Vj + A)−2]−1(Vi + A)−1[−φ(K) + 3KΦ(−K)}] + o(m−1)

=KB2

i D2i

2(1−Bi)

2[m∑

j=1

(1−Bj)2]−1[−φ(K) + 3KΦ(−K)}] + o(m−1). (A–27)

Combining Eq. A–27 and Eq. A–21 the theorem follows.

84

Page 85: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

APPENDIX BPROOFS OF THEOREMS 4-1 AND 4-2

Proof of Theorem 4-1: We start with the joint posterior given by:

π(b, σ2,α,v,xmis|Dobs)

∝{

k∏i=1

[ni∏

j=1

4∏

l=1

exp{zijl(xTijbl + vi)}

zijl!exp

{− exp(xT

ijbl + vi)}]

1√2πσ

exp(− v2

i

2σ2

)}

×k∏

i=1

ni∏j=1

f(xij,obs,xij,mis|α) exp(− c

2σ2)(σ2)−d/2−1π(α)π(b) (B–1)

Here b = (b1, b2, b3, b4), π(b) ∝ 1 and π(α) ∝ 1. Now, let ηijl = xTijbl + vi, l =

1, 2, 3, 4, j = 1, · · · , ni, i = 1, · · · , k. Now let us first integrate the posterior w.r.t σ2:

π(b,α,v,xmis|Dobs)

∝[ k∏

i=1

ni∏j=1

4∏

l=1

exp{zijlηijl}zijl!

exp{− exp(ηijl)

}] 1

(c + vT v)(d+k)/2

×k∏

i=1

ni∏j=1

f(xij,obs,xij,mis|α) (B–2)

Then from Eq. B–2 and Eq. 4-10 we get:

π(b,α, v,xmis|Dobs)

≤4∏

l=1

[ ∏

(i,j)∈I1l

f1(uTijτ l)

(i,j)∈I2l

cij3f2(uTijτ l)f3(u

Tijτ l)

]

× 1

(c + vT v)(d+k)/2

k∏i=1

ni∏j=1

f(xij,obs,xij,mis|α), (B–3)

where f1, f2 and f3 are defined in Section 4.2. For l = 1, · · · , 4, 1 ≤ r ≤ q, let xlijr = al

ij

for (i, j) ∈ M∗r and xl

ijr = aln+ij for (i, j) ∈ M∗

r

⋂I2l if blr ≥ 0; xl

ijr = blij for (i, j) ∈ M∗

r

and xlijr = bl

n+ij for (i, j) ∈ M∗r

⋂I2l if blr < 0. Write x

(l)

ij,miss = {xlijr, r ∈ Mij} for

(i, j) ∈ M∗r and x

(l)

ij,miss = {xlijr, r ∈ Mij} for (i, j) ∈ M∗

r

⋂ I2l, x(l)ij = (x

ij,obs, x(l)

ij,miss)

and x(l)ij = (x

ij,obs, x(l)

ij,miss). Also, let u(l)ij = (x

(l)ij

T , eTi )T and u

(l)ij = (x

(l)ij

T , eTi )T . Then

85

Page 86: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

from Eq. B–3,

π(b,α, v,xmis|Dobs)

≤ C

4∏

l=1

[ ∏

(i,j)∈I1l

f1(u(l)ij

T τ l)∏

(i,j)∈I2l

f2(u(l)ij

T τ l)f3(u(l)ij

T τ l)]

× 1

(c + vT v)(d+k)/2

k∏i=1

ni∏j=1

f(xij,obs,xij,mis|α) (B–4)

Then,

π(b,α, v|Dobs)

≤ C

4∏

l=1

[ ∏

(i,j)∈I1l

f1(u(l)ij

T τ l)∏

(i,j)∈I2l

f2(u(l)ij

T τ l)f3(u(l)ij

T τ l)]

× 1

(c + vT v)(d+k)/2

k∏i=1

ni∏j=1

aij≤xij≤bij

f(xij,obs,xij,mis|α)

= C

4∏

l=1

[ ∏

(i,j)∈I1l

f1(u(l)ij

T τ l)∏

(i,j)∈I2l

f2(u(l)ij

T τ l)f3(u(l)ij

T τ l)]

× 1

(c + vT v)(d+k)/2

k∏i=1

ni∏j=1

f(xij,obs|α) (B–5)

Let,

Il =

Rq

{ ∏

(i,j)∈I1l

f1(u(l)ij

T τ l)∏

(i,j)∈I2l

f2(u(l)ij

T τ l)f3(u(l)ij

T τ l)}

dbl.

Using the fact that

fi(η) =

∫ ∞

η

d(−fi(u)) =

∫ ∞

−∞1{η ≤ u}d(−fi(u)),

for i = 1, 2, and

f3(η) =

∫ η

−∞d(f3(u)) =

∫ ∞

−∞1{−η ≤ −u}d(f3(u)),

86

Page 87: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

we get,

Il =

Rq

{ ∏

(i,j)∈I1l

∫ ∞

−∞1{u(l)

ijT τ l ≤ ηijl}d(−f1(ηijl))

(i,j)∈I2l

∫ ∞

−∞1{u(l)

ijT τ l ≤ ηijl}d(−f2(ηijl))

(i,j)∈I2l

∫ ∞

−∞1{−u

(l)ij

T τ l ≤ −ηn+ij,l}d(f3(ηijl))}

dbl

=

Rn+Nl2

{ ∫

Rq

1{U (l)∗ τ l ≤ ηl}dbl

}dFl (B–6)

where dFl = d( ∏

(i,j)∈I1l(−f1(ηijl))

∏(i,j)∈I2l

(−f2(ηijl))×∏

(i,j)∈I2l(f3(ηijl))

), for l = 1, · · · , 4.

Now from Lemma (4.1) of Chen and Shao (2000), we know that under conditions

(C1) and (C2),

U (l)∗ τ l ≤ ηl ⇒ ‖τ l‖ ≤ K‖ηl‖

where K is a constant. But

‖τ l‖ ≤ K‖ηl‖ ⇒ ‖bl‖ ≤ K‖ηl‖ & ‖v‖ ≤ K‖ηl‖.

Then,

Il ≤∫

Rn+Nl2

{1{‖v‖ ≤ K‖ηl‖}

Rq

1{‖bl‖ ≤ K‖ηl‖}dbl

}dFl

≤ K

Rn+Nl2

‖ηl‖q1{‖v‖ ≤ K‖ηl‖}dFl. (B–7)

Then from Eq. B–5 we get,

∫π(b, α,v|Dobs)db

≤ K

4∏

l=1

{ ∫

Rn+Nl2

‖ηl‖q

(c + vT v)(d+k)/21{‖v‖ ≤ K‖ηl‖}

}dFl

} k∏i=1

ni∏j=1

f(xij,obs|α) (B–8)

87

Page 88: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Integrating w.r.t. v:

π(α|Dobs)

≤K

Rn+N12

Rn+N22

Rn+N32

Rn+N42

{ 4∏

l=1

‖ηl‖q[ ∫

Rk

1

(c + vT v)(d+k)/21{‖v‖ ≤ K‖ηl‖}

] 4∏

l=1

dFl

}

×k∏

i=1

ni∏j=1

f(xij,obs|α) (B–9)

Then

J =

Rn+N12

Rn+N22

Rn+N32

Rn+N42

{ 4∏

l=1

‖ηl‖q[ ∫

Rk

1

(c + vT v)(d+k)/21{‖v‖ ≤ K‖ηl‖}

]} 4∏

l=1

dFl

≤ K

Rn+N12

Rn+N22

Rn+N32

Rn+N42

{ 4∏

l=1

‖ηl‖q}4∏

l=1

dFl

≤ K

4∏

l=1

Rn+Nl2

{ ∑

(i,j)∈I1l

|ηijl|q +∑

(i,j)∈I2l

|ηijl|q +∑

(i,j)∈I2l

|ηn+ij,l|q}

dFl

< ∞ (from condition (C3))

Hence, ∫π(α|Dobs)dα ≤ K

∫ k∏i=1

ni∏j=1

f(xij,obs|α)dα < ∞ (B–10)

Hence the theorem follows.

88

Page 89: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Proof of Theorem 4-2: We have

L(α2,α3,α6|Dobs) =∏

(i,j)∈Mc6

f(xij6|α6)

(i,j)∈J1

∑xij6,mis

[f(xij6|α6)

exp (uT1ijγ)

1 + exp (uT1ijγ) + exp (uT

2ijγ)

]

(i,j)∈J2

∑xij6,mis

[f(xij6|α6)

exp (uT2ijγ)

1 + exp (uT1ijγ) + exp (uT

2ijγ)

]

×∏

(i,j)∈J3

∑xij6,mis

[f(xij6|α6)

1

1 + exp (uT1ijγ) + exp (uT

2ijγ)

]. (B–11)

For 1 ≤ r ≤ 8, let u1ijr = 1 if γr > 0 and u1ijr = 0 if γr ≤ 0 for (i, j) ∈ J1

⋂M∗

6 ;

u1ijr = 0 if γr > 0 and u1ijr = 1 if γr ≤ 0 for (i, j) ∈ J1

⋃J3

⋂M∗

6 ; u2ijr = 1 if γr > 0

and u2ijr = 0 if γr ≤ 0 for (i, j) ∈ J2

⋂M∗

6 ; u2ijr = 0 if γr > 0 and u2ijr = 1 if γr ≤ 0 for

(i, j) ∈ J2

⋃J3

⋂M∗

6 .

Then

L(α2,α3,α6|Dobs) ≤∏

(i,j)∈Mc6

f(xij6|α6)

(i,j)∈J1

exp (uT1ijγ)

1 + exp (uT1ijγ) + exp (uT

2ijγ)

(i,j)∈J2

exp (uT2ijγ)

1 + exp (uT1ijγ) + exp (uT

2ijγ)

×∏

(i,j)∈J3

1

1 + exp (uT1ijγ) + exp (uT

2ijγ), (B–12)

where the last step follows since∑

xij6,misf(xij6|α6) = 1. Now, let

L(α2,α3|Dobs) =∏

(i,j)∈J1

exp (uT1ijγ)

1 + exp (uT1ijγ) + exp (uT

2ijγ)

(i,j)∈J2

exp (uT2ijγ)

1 + exp (uT1ijγ) + exp (uT

2ijγ)

×∏

(i,j)∈J3

1

1 + exp (uT1ijγ) + exp (uT

2ijγ), (B–13)

89

Page 90: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Let

J1 =exp (uT

1ijγ)

1 + exp (uT1ijγ) + exp (uT

2ijγ)

=

∫ ∞

0

exp[− tij

(1 + exp (−uT

1ijγ) + exp (−(u1ij − u2ij)T γ)

)]dtij

=

∫ ∞

0

exp(−t) · exp(− tij(exp (−uT

1ijγ)))· exp

(− tij(exp (−(u1ij − u2ij)

T γ))dtij

Now, let F (v) = exp(− exp(−v)) and noting that F (v) =∫∞−∞ 1(s ≤ v)dF (s) =

∫∞−∞ 1(s ≥ −v)(−dF (−s)), we get

J1 =

∫ ∞

0

exp(−tij) · exp(− tij(exp (−uT

1ijγ)))· exp

(− tij(exp (−(u1ij − u2ij)

T γ))dtij

=

∫ ∞

0

exp(−tij)[ ∫ ∞

−∞1(−uT

1ijγ ≤ vij)(−dF tij(−vij))

×∫ ∞

−∞1(−(u1ij − u2ij)

T γ ≤ wij)(−dF tij(−wij))]dtij. (B–14)

Similarly,

J2 =exp (uT

2ijγ)

1 + exp (uT2ijγ) + exp (uT

1ijγ)

=

∫ ∞

0

exp(−tij)[ ∫ ∞

−∞1(−uT

2ijγ) ≤ vij)(−dF tij(−vij))

×∫ ∞

−∞1(−(u2ij − u1ij)

T γ ≤ wij)(−dF tij(−wij))]dtij (B–15)

and,

J3 =1

1 + exp (uT2ijγ) + exp (uT

1ijγ)

=

∫ ∞

0

exp(−tij)[ ∫ ∞

−∞1(−uT

1ijγ) ≤ vij)(−dF tij(−vij))

×∫ ∞

−∞1(−uT

2ijγ ≤ wij)(−dF tij(−wij))]dtij. (B–16)

90

Page 91: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Putting Eq. B–14-Eq. B–16 in Eq. B–13 and integrating w.r.t. to γ = (αT2 ,αT

3 )T :

R8

L(γ|D)dγ ≤∫

R8

{ ∏

(i,j)∈J1

[ ∫ ∞

0

exp(−tij)[ ∫ ∞

−∞1(−uT

1ijγ) ≤ vij)(−dF tij(−vij))

×∫ ∞

−∞1(−(u1ij − u2ij)

T γ ≤ wij)(−dF tij(−wij))]dtij

]

(i,j)∈J2

[ ∫ ∞

0

exp(−tij)[ ∫ ∞

−∞1(−uT

2ijγ) ≤ vij)(−dF tij(−vij))

×∫ ∞

−∞1(−(u2ij − u1ij)

T γ ≤ wij)(−dF tij(−wij))]dtij

]

(i,j)∈J3

[ ∫ ∞

0

exp(−tij)[ ∫ ∞

−∞1(−uT

2ijγ) ≤ vij)(−dF tij(−vij))

×∫ ∞

−∞1(−uT

1ijγ ≤ wij)(−dF tij(−wij))]dtij

]}dγ

=

R8

{ ∫

R+n

exp(−k∑

i=1

ni∑j=1

tij)[ ∫

R2n

1(U ∗∗γ ≤ v∗)dF t(v∗)]dt

}dγ

(B–17)

where dF t(v∗) =∏

i,j(−dF tij(−vij))∏

i,j(−dF tij(−wij)). Using Chen and Shao

(2000),under conditions (H1) and (H2), we get

R8

L(γ|D)dγ ≤ K1

R+n

exp(−k∑

i=1

ni∑j=1

tij)[ ∫

R2n

‖v∗‖8dF t(v∗)]dt (B–18)

Now,

(−dF tij(−vij))(−dF tij(−wij)) = t2ij exp(vij) exp(−tij exp(vij)) exp(wij) exp(−tij exp(wij)).

91

Page 92: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

Then

R8

L(γ|D)dγ

≤K1

R2n

‖v∗‖8[∏

i,j

∫ ∞

0

t2ij exp(vij + wij) exp{−tij(1 + exp(vij) + exp(wij))}]dvdw

=K1

R2n

‖v∗‖8∏i,j

exp(vij + wij)

(1 + exp(vij) + exp(wij))3

≤K1

R2n

(∑i,j

v8ij +

∑i,j

w8ij)

∏i,j

exp(vij + wij)

(1 + exp(vij) + exp(wij))3

<∞

Then

R3

R8

L(α2,α3, α6|Dobs)dα2dα3dα6 ≤K

R3

(i,j)∈Mc6

f(xij6|α6)dα6 (B–19)

But∫

R3

∏(i,j)∈Mc

6f(xij6|α6)dα6 < ∞ under conditions (D1) and (D2) by (Theorem 2.1) of

Chen and Shao (2000).

Hence the theorem follows.

92

Page 93: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

REFERENCES

Battese, G. E., Harter, R. M., and Fuller, W. A. (1988), “An error-components model forprediction of county crop areas using survey and satellite data,” Journal of AmericanStatistical Association, 83, 28–36.

Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975), Discrete MultivariateAnalysis: Theory and Practice, The MIT Press, Cambridge, Mass.-London.

Brackstone, G. J. (1987), “Small area data: policy issues and technical challanges,” inPlatek et al. (1987), pp. 3–20.

Chambers, R. L. (1986), “Outlier robust finite population estimation,” Journal of Ameri-can Statistical Association, 81, 1063–1069.

Chaudhuri, A. (1994), “Small domain statistics: A review,” Statistica Neerlandica, 48,215–236.

Chen, M. H., Ibrahim, J. G., and Shao, Q. M. (2004), “Propriety of the PosteriorDistribution and Existence of the MLE for Regression Models with Covariates Missingat Random,” Journal of American Statistical Association, 99, 421–438.

— (2006), “Posterior propriety and computation for the Cox Regression Model withapplications to Missing Covariates,” Biometrika, 93, 791–807.

Chen, M. H. and Shao, Q. M. (2000), “Propriety of Posterior Distribution forDichotomous Quantal Response Models,” Proceedings of the American Mathemati-cal Society, 129, 293–302.

Chen, M. H., Shao, Q. M., and Xu, D. (2002), “Necessary and sufficient conditions on theproperiety of posterior distributions for generalized linear mixed models,” Sankhya, 64,57–85.

Cox, D. R. and Snell, E. J. (1968), “A general definition of residuals,” Journal of theRoyal Statistical Society, Series B, 30, 248–275.

Cressie, N. (1990), “Small-area prediction of undercount using the general linear model,”Statistics Canada Symposium, 93–105.

Datta, G. S. and Ghosh, M. (1991), “Bayesian prediction in linear models: applications tosmall area estimation,” Annals of Statistics, 19, 1748–1770.

Datta, G. S. and Lahiri, P. (1995), “Robust hierarchical Bayes estimation of small areacharacteristics in the presence of covariates and outliers,” Journal of MultivariateAnalysis, 54, 310–328.

— (2000), “A unified measure of uncertainty of estimated best linear unbiased predictorsin small area estimation problems,” Statistica Sinica, 10, 613–627.

93

Page 94: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

94

Datta, G. S., Rao, J. N. K., and Smith, D. D. (2005), “On measuring the variability ofsmall area estimators under a basic area level model,” Biometrika, 92, 183–196.

Drew, D., Singh, M. P., and Choudhry, G. H. (1982), “Evaluation of Small AreaEstimation Techniques for the Canadian Labour Force Survey,” Survey Methodology, 8,17–47.

Efron, B. and Morris, C. (1971), “Limiting the risk of Bayes and empirical Bayesestimators. I. The Bayes case,” Journal of American Statistical Association, 66, 807–815.

— (1972), “Limiting the risk of Bayes and empirical Bayes estimators. II. The empiricalBayes case,” Journal of American Statistical Association, 67, 130–139.

Erickson, E. P. and Kadane, J. B. (1987), “Sensitivity analysis of local estimates ofundercount in the 1980 U.S. Census,” in Platek et al. (1987), pp. 23–45.

Farrell, P. J., McGibbon, B., and Tomberlin, T. J. (1997), “Empirical Bayes estimators ofsmall area proportions in multistage designs,” Statistica Sinica, 7, 1065–1083.

Fay, R. E. and Herriot, R. A. (1979), “Estimates of income for small places: an applicationof James-Stein procedures to census data,” Journal of American Statistical Association,74, 269–277.

Ghosh, M. and Maiti, T. (2004), “Small-area estimation based on natural exponentialfamliy quadratic variance function models and survey weights,” Biometrika, 91, 95–112.

Ghosh, M., Nangia, N., and Kim, D. H. (1996), “Estimation of median income offour-person families: a Bayesian approach,” Journal of American Statistical Associ-ation, 91, 1423–1431.

Ghosh, M., Natarajan, K., Stroud, T. W. F., and Carlin, B. P. (1998), “Generalized linearmodels for small-area estimation,” Journal of American Statistical Association, 93,273–282.

Ghosh, M., Natarajan, K., Waller, L. A., and Kim, D. H. (1999), “Hierarchical BayesGLMs for the analysis of spatial data: an application to disease mapping,” Journal ofStatistical Planning and Inference, 75, 305–318.

Ghosh, M. and Rao, J. N. K. (1994), “Small area estimation: an appraisal,” StatisticalScience, 9, 55–93.

Gilks, W. R. and Wild, P. (1992), “Adaptive Rejection Sampling for Gibbs Sampling,”Applied Statistics, 41, 337–348.

Godambe, V. P. and Thompson, M. E. (1989), “An extension of quasi-likelihoodestimation,” Journal of Statistical Planning and Inference, 22, 137–172.

Gwet, J.-P. and Rivest, L.-P. (1992), “Outlier resistant alternatives to the ratioestimator,” Journal of American Statistical Association, 87, 1174–1182.

Page 95: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

95

Harter, R. M. and Fuller, W. (1987), “The Multivariate components of variance model insmall area estimation,” in Platek et al. (1987), pp. 103–123.

Ibrahim, J. G., Lipsitz, S. R., and Chen, M.-H. (1999), “Missing Covariates in GeneralizedLinear Models When the Missing Data Mechanism Is Nonignorable,” Journal of theRoyal Statistical Society, Series B, 61, 173–190.

Jiang, J. and Lahiri, P. (2001), “Empirical Best Prediction for Small Area Inference withBinary Data,” Annals of the Institute of Statistical Mathematics, 53, 217–243.

Kackar, R. N. and Harville, D. A. (1984), “Approximations for standard errors ofestimators of fixed and random effects in mixed linear models,” Journal of Ameri-can Statistical Association, 79, 853–862.

Lahiri, P. and Maiti, T. (2002), “Empirical Bayes estimation of relative risks in diseasemapping,” Calcutta Statistical Association Bulletin, 53, 213–223.

Lahiri, P. and Rao, J. N. K. (1995), “Robust Estimation of Mean Squared Error of SmallArea Estimators,” Journal of American Statistical Association, 82, 758–766.

Lipsitz, S. R. and Ibrahim, J. G. (1996), “A Conditional Model for Incomplete Covariatesin Parametric Regression Models,” Biometrika, 83, 916–922.

Little, R. J. A. and Rubin, D. B. (2002), Statistical Analysis of Missing Data, Wiley, NewYork, 2nd ed.

Maiti, T. (1998), “Hierarchical Bayes Estimation of Mortality Rates for Disease Mapping,”Journal of Statistical Planning and Inference, 69, 339–348.

Malec, D., Sedransk, J., Moriarity, C. L., and LeClere, F. B. (1997), “Small Area Inferencefor Binary Variables in the National Health Interview Survey,” Journal of AmericanStatistical Association, 92, 815–826.

McGibbon, B. and Tomberlin, T. J. (1989), “Small area estimates of proportions viaempirical Bayes techniques,” Survey Methodology, 15, 237–252.

Nandram, B., Sedransk, J., and Pickle, L. (1999), “Bayesian analysis of mortality rates forU.S. health service areas,” Sankhya, 61, 145–165.

Pfeffermann, D. (2002), “Small area estimation-new developments and directions,”International Statistical Review, 15, 237–252.

Pfeffermann, D. and Burck, L. (1990), “Robust small area estimation combining timeseries and cross-sectional data,” Survey Methodology, 16, 217–237.

Platek, R., Rao, J., Sarndal, S., and Singh, M. (eds.) (1987), Wiley, New York.

Prasad, N. G. N. and Rao, J. N. K. (1990), “The estimation of mean squared errors ofsmall area estimators,” Journal of American Statistical Association, 85, 163–171.

Page 96: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

96

Rao, J. N. K. (1986), “Synthetic Estimators, SPREE and Model Based Predictors,”in Proceedings of the Conference on Survey Research Methods in Agriculture, U. S.Department of Agriculture,, Washington, D. C., pp. 1–16.

— (1999), “Some recent advances in model-based small area estimation,” Survey Method-ology, 25, 175–186.

— (2003), Small Area Estimation, John Wiley and Sons.

Rao, J. N. K. and Yu, M. (1992), “Small area estimation by combining time series andcross-sectional data,” ASA Proceedings of the Survey Research Methods Section, 1–19.

Schaible, W. L. (1978), “Choosing weights for composite estimators for small areastatistics,” ASA Proceedings of the Section on Survey Research Methods, 741–746.

Zaslavsky, A. M., Schenker, N., and Belin, T. R. (2001), “Downweighting InfluentialClusters in Surveys: Application to the 1990 Post Enumeration Survey,” Journal ofAmerican Statistical Association, 96, 858–8629.

Zhao, L. P. and Prentice, R. L. (1990), “Correlated binary regression using a quadraticexponential model,” Biometrika, 77, 642–648.

Page 97: EMPIRICAL AND HIERARCHICAL BAYESIAN METHODS WITH

BIOGRAPHICAL SKETCH

Ananya Roy was born in Kolkata, India, on the 24th of December, 1978. She

graduated with a bachelor’s degree from Calcutta University in 2000 with first class

honours in statistics. She then joined the Indian Statistical Institute, from where she

received her Master of Statistics degree in 2002. She moved to the University of Florida

at Gainesville in the fall of 2002 to pursue her doctoral studies in statistics. Upon

graduation, she joined the University of Nebraska at Lincoln as an Assistant Professor

in the Department of Statistics.

97