2021 siscer module 2 small area estimation: lecture 2: non

2021 SISCER Module 2 Small AreaEstimation:

Lecture 2: Non-Spatial Area-Level Modeling

Jon Wakefield

Departments of Statistics and BiostatisticsUniversity of Washington

1 / 41

Outline

Motivation for Mixed Models and Smoothing

Linear Mixed Effects Models

Generalized Linear Mixed Models

Bayesian Inference

Simulated Data

Fay-Herriot Modeling

2 / 41

Motivation for Mixed Models and Smoothing

3 / 41

Motivation

In a SAE context, we are often faced with situations in which the dataare sparse in space, which leads to great uncertainty in calculatedestimates.

Mixed models1are designed to alleviate this problem, by modeling thetotality of data from all areas, in order to leverage similarities in thedata – these are examples of indirect estimates.

The key element is coupling the different areas, by assuming thisparameters in these areas are linked through a common probabilitydistribution.

In this lecture we describe mixed models for data that are bothnormal (via linear mixed models) and binomial (via generalized linearmixed models) – spatial smoothing will be covered in the next lecture.

1also known as random effects or hierarchical models4 / 41

Model-based approaches

Indirect methods provide a link, often with an implicit model, betweendifferent areas; we describe explicit models that provide such a link.

The models we describe are mixed-effects models which aim toaccurately describe between domain (area) differences.

Such models offer several advantages:• Models can be tuned to the application, building on the existing

theory and practical experience of mixed models, includingnon-linear models, such as logistic mixed models.

• Domain-(Area)-specific measures of uncertainty are produced.• Estimates for areas with no data can be formed.• One can attempt to check assumptions using diagnostics.• A variety of area-specific random effects, including spatial, are

available.

5 / 41

Model-based approaches

The use of explicit models has not been carried out greatly in surveysampling, where design-based inference is historically the norm.

The design consistency of model-based estimators is therefore ofinterest.

Disadvantages of mixed models:• How to incorporate the design weights/acknowledge the design?• It is often difficult to check modeling assumptions.• Computation can be demanding, though this is improving.

Models can be specified at the level of the area or the unit

6 / 41

Motivating Examples: Normal and Binomial Data

We consider simulated data: we have two outcomes, one continuousand one binary, again using the King County geography.

These data were simulated with non-constant prevalence acrossHRAs (areas).

Nominally, the variables will be labeled weight and diabetes.

7 / 41

Motivating Examples: Normal and Binomial Data

170

180

190

ave weight

0.0

0.1

0.2

0.3

0.4

prob

3

4

5

sd(ave weight)

0.02

0.04

0.06

0.08

0.10

sd(prob)

Figure 1: Sample mean weights (top left) and fractions with diabetes (topright) Standard errors of: mean weights (bottom left) and fractions withdiabetes (bottom right). Gray areas in the right map correspond to areas withzero counts, and hence an estimated standard error of zero.

8 / 41

Linear Mixed Effects Models

9 / 41

Smoothing Models

Instability of estimates has lead to methods being developed toimpose smoothness on the underlying parameters using mixedeffects models that use the data from the totality of areas to providemore reliable estimates in each of the constituent areas.

Overview of Models:• Basic Linear Model: No smoothing.• Linear Mixed Models:

• Normal likelihood. Random effects with no spatial structure (knownas IID2).

• Normal Likelihood, Two sets of random effects, one set with nospatial structure, one set with spatial structure.

• Covariates may be added to each of these in order to smoothover covariate space.

• Estimation in these models is a separate issue.

2Independent and Identically Distributed10 / 41

Linear Mixed Effects ModelLet Yik be the weight of the k -th sampled individual in area i .

Three possible models:

1. No between-area variability:

Yik = β0︸︷︷︸Common Mean

+εik ,

with εik ∼ N(0, σ2ε). (Basic synthetic estimates).

2. Distinct between-area variability:

Yik = βi︸︷︷︸Mean of Area i

+εik ,

with εik ∼ N(0, σ2ε). Known as a fixed effects model. Note: no way to link

different areas. (Direct estimates).3. Linked between-area variability:

Yik = β0 + δi︸︷︷︸Mean of Area i

+εik ,

with δi ∼ N(0, σ2δ), εik ∼ N(0, σ2

ε). Known as a mixed effects model.(Indirect estimates).

11 / 41

Basic Linear Mixed ModelWe will concentrate on the linked between-area mixed effects model:

Yik = β0 + δi︸︷︷︸Mean of Area i

+εik ,

with δi ∼ N(0, σ2δ) – these are the area-specific deviations (the

random effects) from the overall level β0 – and εik ∼ N(0, σ2ε ), is the

measurement error.

This model is also known as a Linear Mixed Effects Model (LMEM).

In this model, the totality of data are used to inform on the overalllevel β0 and between-area variability σ2

δ .

The unknown parameters are:

Overall mean β0

Between-Area Variance σ2δ

Measurement Error Variance σ2ε

Random Effects δ1, . . . , δn

12 / 41

Maximum Likelihood Estimation

The MLEs are the values of the parameters that maximize theprobability of the observed data, under an assumed probability model.

For a LMEM the likelihood to be maximized is,

L(β, σ2δ , σ

2ε ) =

n∏i=1

∫δi

p(y i |β0, δi , σ2ε )× p(δi |σ2

δ) dδi .

If a likelihood approach is taken, the random effect estimates δi , areobtained as the conditional means E[δi |y ].

These are known as the best linear unbiased predictors (BLUPs).

Known as estimated BLUPs (EBLUPs) when β and variancecomponents σ2

ε , σ2δ are estimated.

Technical Note: Restricted ML (REML) is preferred for estimation ofvariances – accounts for estimation of β.

13 / 41

Estimation in the Linear Mixed Model

In general, there are no closed-form (i.e., explicit) forms for theestimates of the parameters.

Suppose, for simplicity, that β0, σ2δ and σ2

ε are known; this allowssome insight into inference.

14 / 41

Estimation in the Linear Mixed Model

The posterior mean of the random effect (area-specific adjustment) is:

δi = E[δi |yi ]

=niσ

2δ

σ2ε + niσ

2δ

(y i − β0)

= qi(y i − β0)

where

qi =niσ

2δ

σ2ε + niσ

2δ

≤ 1

and is small (so more shrinkage) if:• ni is small (not much data in the area), or• σ2

δ is small (between-area variability is small), or• σ2

ε is large (within-area variability is large).

δi is the BLUP.15 / 41

Inference for Random Effcets

From a frequentist mixed-model perspective:• The BLUPs are unbiased in the sense that,

E[δi − δi ] = 0,

where the expectation is over δi and δi since both are viewed asrandom.

• Uncertainty is measured by var(δi − δi) which is equal to themean squared error (MSE), MSE(δi).

• The MSE can also be used for construction of asymptoticconfidence intervals:

δi − δi ∼ N(0,MSE(δi)︸︷︷︸var(δi−δi )

).

• The MSE can be estimated analytically, or via the bootstrap (Raoand Molina, 2015).

16 / 41

Predicting the Population Total and MeanLet Si and Ri denote, respectively, the set of indices of the sampledand unsampled individuals in area i .

Let Ti =∑Ni

k=1 yik be the total for the population in area i , where Ni isthe population size.

The average for the population in area i is

Y i =Ti

Ni

=

∑Nik=1 yik

Ni

=

∑k∈Si

yik +∑

k∈Riyik

Ni

=

∑k∈Si

yik

ni︸︷︷︸Mean of Sampled

× ni

Ni+

∑k∈Ri

yik

Ni − ni︸︷︷︸Mean of Unsampled

×Ni − ni

Ni

17 / 41

Predicting the Population Total and Mean

Suppose now we have fitted the linear mixed effects model andobtained posterior medians β0 and δi .

The obvious estimate is:

Y i =

∑k∈Si

yik

ni︸︷︷︸Mean of Sampled

× ni

Ni+ (β0 + δi)︸︷︷︸

Estimated Mean

×Ni − ni

Ni

If Ni � ni , then the sampled data provide a small fraction of the totalpopulation in the area and we can estimate the area mean by

Y i = β0 + δi . (1)

If an area contains no data, then its mean is assumed to be β0 + δ?,where δ? ∼ N(0, σ2

δ).

18 / 41

Motivating Example: Continuous Outcome

Totals can be similarly estimated:

Ti =∑k∈Si

yik +∑k∈Ri

yik

=∑k∈Si

yik︸︷︷︸Total of Sampled

+ (Ni − ni)× (β0 + δi)︸︷︷︸Estimated Total for Unsampled

.

If Ni � ni ,

Ti ≈ Ni × (β0 + δi).

This assumes that population size Ni is known.

19 / 41

Generalized Linear Mixed Models

20 / 41

Smoothing Models

The above considerations of instability led to methods beingdeveloped to smooth the prevalences using mixed effects models thatuse the data from the totality of areas to provide more reliableestimates in each of the constituent areas.

Overview of Models:• Basic Binomial Model: No smoothing.• Mixed Effects Models:

• Binomial IID GLMM3: Non-spatial smoothing.• Binomial Spatial GLMM: Spatial and non-spatial smoothing.

• Covariates may be added to each of these in order to smoothover covariate space.

• Estimation in these models is a separate issue.

3Generalized Linear Mixed Model21 / 41

Overview of Models

The individual responses are the binary random variables Yik , for thesampled individuals k ∈ Si .

If we take Yi =∑

k∈SiYik , the obvious sampling model (likelihood) is:

Yi |pi ∼ Binomial (ni ,pi) .

Having unconstrained pi and taking the MLE’s leads to pi = Yi/ni .

22 / 41

Overview of ModelsThe simplest GLMM assumes the odds in area i are of the form

pi

1− pi= exp(β0)× exp(δi),

with exp(δi) are area-specific adjustments that multiply the overallodds exp(β0).

This model is equivalent to a linear model on the logistic scale:

log

(pi

1− pi

)= β0 + δi .

The random effects δi are assumed to follow a normal distribution,that is,

δi ∼iid N(0, σ2δ).

It is straightforward to add area-level covariates to this model via

log

(pi

1− pi

)= β0 + x T

i β1 + δi .

23 / 41

SAE Inference

We may wish to estimate the total number of cases or the average(prevalence) in area i :

Ti =∑k∈Si

yik +∑k∈Ri

yik

Ti

Ni=

∑k∈si

yik +∑

k∈Riyik

Ni.

Estimates:

Ti =∑k∈Si

yik + (Ni − ni)× pi

Ti

Ni=

∑k∈Si

yik

ni× ni

Ni+ pi ×

Ni − ni

Ni.

24 / 41

SAE Inference

If Ni � ni :

Ti = Ni pi

Ti

Ni= pi .

In the Binomial GLMM:

pi =exp(β0 + x T

i β1 + δi)

1 + exp(β0 + x Ti β1 + δi)

.

25 / 41

Bayesian Inference

26 / 41

Bayesian InferenceBayesian modeling is convenient for implementing notions ofsmoothing.

There are two key elements that must be specified:• The sampling model (likelihood) describes the distribution of the

data – this model depends on unknown parameters, that we willdenote p.

• The prior distribution expresses beliefs about the parameters pand provides a mechanism by which penalization/smoothing canbe imposed.

These elements are probabilistically combined via Bayes Theorem:

p(p|y)︸︷︷︸Posterior

∝ L(p)︸︷︷︸Likelihood

×π(p)︸︷︷︸Prior

.

On the log scale:

log p(p|y)︸︷︷︸Updated Beliefs

= log L(p)︸︷︷︸Sampling Model

+ log π(p)︸︷︷︸Penalization

.

27 / 41

Bayesian Inference

• In a Bayesian analysis the complete set of unknowns(parameters) is summarized via the multivariate posteriordistribution – one or two dimensional marginal posteriordistributions can be vizualised.

• The marginal distribution for each parameter may besummarized via its mean, standard deviation, or quantiles.

• It is common to report the posterior median and a 90% or 95%posterior range for parameters of interest.

• The range that is reported is known as a credible interval.

28 / 41

Bayesian Computation

• The computations required for Bayesian inference (integrals) areoften not trivial and may be carried out using a variety ofanalytical, numerical and simulation based techniques.

• We use the integrated nested Laplace approximation (INLA),introduced by Rue et al. (2009).

• R-INLA is a package that implements the INLA approach.• The SUMMER package uses INLA for all Bayesian computation.• Book-length treatments on INLA:

• Blangiardo and Cameletti (2015) – space-time modeling.• Wang et al. (2018) – general modeling.• Krainski et al. (2018) – advanced space-time modeling.

29 / 41

Bayes Example

• Imagine the data model is normal with an unknown mean µ:

yi | µ ∼ N(µ, σ2),

i = 1, . . . ,n, with σ2 assumed known.• This is equivalent to:

y | µ ∼ N(µ, σ2/n),

where σ/√

n is the standard error.• Suppose a normal prior is appropriate:

µ ∼ N(m, v),

so that values of the mean µ that are (relatively) far from m arepenalized – v is the smoothing parameter, more/less smoothingif small/big.

• The log posterior is:

log p(µ | y︸︷︷︸Updated Beliefs

) = − n2σ2 (y − µ)

2︸︷︷︸Data Model

− 12v

(µ−m)2︸︷︷︸Penalization

.

30 / 41

10 15 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Parameter

Den

sity

priorposteriorlikelihood

Figure 2: Normal data model with n = 10, y = 19.3 and standard error 1.41.The prior for µ has mean m =15 and v = 32. The posterior for the parameterµ is a compromise between the two sources of information: the posteriormean is 18.5 and the posterior standard deviation is 1.28.

31 / 41

The Bayesian Linear Mixed Model

The LMEM with priors is:

Yik |β, δi , σ2ε ∼iid N(β0 + x T

i β1 + δi , σ2ε )

δi |σ2δ ∼iid N(0, σ2

δ)

β, σ2ε , σ

2δ ∼ Priors

The posterior, given data y , is obtained as

p(β, δ1, . . . , δn, σ2ε , σ

2δ |y) = p(y |β, δ1, . . . , δn, σ

2ε , σ

2δ)

× p(β, δ1, . . . , δn, σ2ε , σ

2δ)/p(y)

=n∏

i=1

p(y i |β, δi , σ2ε )× p(δi |σ2

δ)

× p(β, σ2ε , σ

2δ)/p(y)

Marginal distributions, such as p(β0|y), are obtained by integration/

In the SUMMER package, the INLA approach is used (Rue et al., 2009).

32 / 41

Simulated Data

33 / 41

Motivating Example: Linear Model

170

180

190

MLE

170

180

190

Bayes

3

4

5

MLE.sd

3

4

5

Bayes.sd

Figure 3: Top row: Estimates of area averages of weight via MLE’s (left) andposterior medians (right). Bottom row: Uncertainty of estimates with standarderrors (left) and posterior standard deviations (right).

34 / 41

Motivating Example: Linear Model

●

●

●

●

●●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

170 175 180 185 190 195

170

175

180

185

190

195

Estimates

MLE

Pos

terio

r m

edia

n

●

●

●

●●

●

●

●

●●

●

● ● ●

●

●

●●

●

●

●●●

●

●●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●●

●

● ●

●

2 3 4 5 6

23

45

6

Standard errors

se(MLE)

Pos

terio

r S

D

Figure 4: Comparison of area averages: Posterior medians versus MLEs(left). Posterior standard deviations versus standard errors associated withthe MLEs (right).

The posterior medians are shrunk from the MLEs towards the overallmean, with the extreme values undergoing the most shrinkage.

In general, the Bayes measures of uncertainty (the posterior standarddeviations) are smaller than the standard errors of the MLEs, with thegreatest difference occurring for those areas with the large standarderrors (which have the smallest sample sizes).

35 / 41

Motivating Example: Binary Outcome

0.0

0.1

0.2

0.3

0.4

mle

0.0

0.1

0.2

0.3

0.4

median

0.025

0.050

0.075

0.100

mle.sd

0.025

0.050

0.075

0.100

sd

Figure 5: Top row: Estimates of area prevalences with diabetes via MLE’s(left) and posterior medians (right). Bottom row: Uncertainty of estimates withstandard errors (left) and posterior standard deviations (right).

36 / 41

Motivating Example: Binary Outcome

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

−4 −3 −2 −1

−4

−3

−2

−1

Logit Prevalence

MLE

Pos

terio

r m

edia

n

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

0.4 0.6 0.8 1.0

0.4

0.6

0.8

1.0

Uncertainty

se(MLE)

Pos

terio

r S

DFigure 6: Comparison of area averages: Posterior medians versus MLEs onthe logistic scale (left). Posterior standard deviations versus standard errorsassociated with the MLEs on the logistic scale (right).

As in the normal model case, we see that the Bayesian estimates areshrunk relative to the MLEs, and the uncertainty of the Bayesianestimates is in general smaller.

37 / 41


38 / 41

Fay-Herriot ModelWhat about accounting for complex sampling?

So far we have been (implicitly) assuming the data are gatheredthrough simple random sampling (SRS).

In a ground-breaking paper, Fay and Herriot (1979) suggestedmodeling a transform of the weighted estimator via a linear mixedmodel.

Specifically, let yi denote a transformation of the weighted estimator:

y i = g

(∑k∈Si

wk yk∑k∈Si

wk

).

and assumeyi |β0, δi ∼ N(β0 + x T

i β1 + δi ,Vi),

with Vi assumed known and being derived from the variance of theweighted estimator, and

δi |σ2δ ∼iid N(0, σ2

δ).

39 / 41


• The F-H approach is an extremely popular way of producingarea-level estimates.

• This formulation avoids the need to model the design.• For a continuous outcome (e.g., child growth measures) it may

suffice to simply take the weighted estimator, with its associatedvariance.

• For prevalences, we can take g(z) = log(z/(1− z)), and thenuse the delta method to obtain the appropriate variance.

• For rates, we can take g(z) = log(z + c), where the constant c ischosen to avoid taking the log of zero, and again use the deltamethod to obtain the appropriate variance.

• Area-level covariates are frequently used.• Problems can occur when the estimate lies on its boundary, both

for point estimation and variance estimation (e.g., for aprevalence when all the outcomes are 0).

40 / 41

Discussion: Area-Level Modeling

• Mixed effects models are a common approach for SAE, as theyacknowledge between-area differences in outcomes.

• But one must consider the design (e.g., stratification andclustering) – Fay-Herriot modeling is a simple way to do this.

• Model-checking is important.

41 / 41

References

Blangiardo, M. and Cameletti, M. (2015). Spatial and Spatio-TemporalBayesian Models with R-INLA. John Wiley and Sons.

Fay, R. and Herriot, R. (1979). Estimates of income for small places:an application of James–Stein procedure to census data. Journalof the American Statistical Association, 74, 269–277.

Krainski, E. T., Gomez-Rubio, V., Bakka, H., Lenzi, A., Castro-Camilo,D., Simpson, D., Lindgren, F., and Rue, H. (2018). AdvancedSpatial Modeling with Stochastic Partial Differential EquationsUsing R and INLA. Chapman and Hall/CRC.

Rao, J. and Molina, I. (2015). Small Area Estimation, Second Edition.John Wiley, New York.

Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesianinference for latent Gaussian models using integrated nestedLaplace approximations (with discussion). Journal of the RoyalStatistical Society, Series B, 71, 319–392.

Wang, X., Yue, Y., and Faraway, J. J. (2018). Bayesian RegressionModeling with INLA. Chapman and Hall/CRC.

41 / 41

2021 siscer module 2 small area estimation: lecture 2: non

Documents