2021 siscer module 2 small area estimation: lecture 2: non

42
2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non-Spatial Area-Level Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 41

Upload: others

Post on 26-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

2021 SISCER Module 2 Small AreaEstimation:

Lecture 2: Non-Spatial Area-Level Modeling

Jon Wakefield

Departments of Statistics and BiostatisticsUniversity of Washington

1 / 41

Page 2: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Outline

Motivation for Mixed Models and Smoothing

Linear Mixed Effects Models

Generalized Linear Mixed Models

Bayesian Inference

Simulated Data

Fay-Herriot Modeling

2 / 41

Page 3: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Motivation for Mixed Models and Smoothing

3 / 41

Page 4: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Motivation

In a SAE context, we are often faced with situations in which the dataare sparse in space, which leads to great uncertainty in calculatedestimates.

Mixed models1are designed to alleviate this problem, by modeling thetotality of data from all areas, in order to leverage similarities in thedata – these are examples of indirect estimates.

The key element is coupling the different areas, by assuming thisparameters in these areas are linked through a common probabilitydistribution.

In this lecture we describe mixed models for data that are bothnormal (via linear mixed models) and binomial (via generalized linearmixed models) – spatial smoothing will be covered in the next lecture.

1also known as random effects or hierarchical models4 / 41

Page 5: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Model-based approaches

Indirect methods provide a link, often with an implicit model, betweendifferent areas; we describe explicit models that provide such a link.

The models we describe are mixed-effects models which aim toaccurately describe between domain (area) differences.

Such models offer several advantages:• Models can be tuned to the application, building on the existing

theory and practical experience of mixed models, includingnon-linear models, such as logistic mixed models.

• Domain-(Area)-specific measures of uncertainty are produced.• Estimates for areas with no data can be formed.• One can attempt to check assumptions using diagnostics.• A variety of area-specific random effects, including spatial, are

available.

5 / 41

Page 6: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Model-based approaches

The use of explicit models has not been carried out greatly in surveysampling, where design-based inference is historically the norm.

The design consistency of model-based estimators is therefore ofinterest.

Disadvantages of mixed models:• How to incorporate the design weights/acknowledge the design?• It is often difficult to check modeling assumptions.• Computation can be demanding, though this is improving.

Models can be specified at the level of the area or the unit

6 / 41

Page 7: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Motivating Examples: Normal and Binomial Data

We consider simulated data: we have two outcomes, one continuousand one binary, again using the King County geography.

These data were simulated with non-constant prevalence acrossHRAs (areas).

Nominally, the variables will be labeled weight and diabetes.

7 / 41

Page 8: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Motivating Examples: Normal and Binomial Data

170

180

190

ave weight

0.0

0.1

0.2

0.3

0.4

prob

3

4

5

sd(ave weight)

0.02

0.04

0.06

0.08

0.10

sd(prob)

Figure 1: Sample mean weights (top left) and fractions with diabetes (topright) Standard errors of: mean weights (bottom left) and fractions withdiabetes (bottom right). Gray areas in the right map correspond to areas withzero counts, and hence an estimated standard error of zero.

8 / 41

Page 9: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Linear Mixed Effects Models

9 / 41

Page 10: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Smoothing Models

Instability of estimates has lead to methods being developed toimpose smoothness on the underlying parameters using mixedeffects models that use the data from the totality of areas to providemore reliable estimates in each of the constituent areas.

Overview of Models:• Basic Linear Model: No smoothing.• Linear Mixed Models:

• Normal likelihood. Random effects with no spatial structure (knownas IID2).

• Normal Likelihood, Two sets of random effects, one set with nospatial structure, one set with spatial structure.

• Covariates may be added to each of these in order to smoothover covariate space.

• Estimation in these models is a separate issue.

2Independent and Identically Distributed10 / 41

Page 11: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Linear Mixed Effects ModelLet Yik be the weight of the k -th sampled individual in area i .

Three possible models:

1. No between-area variability:

Yik = β0︸︷︷︸Common Mean

+εik ,

with εik ∼ N(0, σ2ε). (Basic synthetic estimates).

2. Distinct between-area variability:

Yik = βi︸︷︷︸Mean of Area i

+εik ,

with εik ∼ N(0, σ2ε). Known as a fixed effects model. Note: no way to link

different areas. (Direct estimates).3. Linked between-area variability:

Yik = β0 + δi︸ ︷︷ ︸Mean of Area i

+εik ,

with δi ∼ N(0, σ2δ), εik ∼ N(0, σ2

ε). Known as a mixed effects model.(Indirect estimates).

11 / 41

Page 12: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Basic Linear Mixed ModelWe will concentrate on the linked between-area mixed effects model:

Yik = β0 + δi︸ ︷︷ ︸Mean of Area i

+εik ,

with δi ∼ N(0, σ2δ) – these are the area-specific deviations (the

random effects) from the overall level β0 – and εik ∼ N(0, σ2ε ), is the

measurement error.

This model is also known as a Linear Mixed Effects Model (LMEM).

In this model, the totality of data are used to inform on the overalllevel β0 and between-area variability σ2

δ .

The unknown parameters are:

Overall mean β0

Between-Area Variance σ2δ

Measurement Error Variance σ2ε

Random Effects δ1, . . . , δn

12 / 41

Page 13: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Maximum Likelihood Estimation

The MLEs are the values of the parameters that maximize theprobability of the observed data, under an assumed probability model.

For a LMEM the likelihood to be maximized is,

L(β, σ2δ , σ

2ε ) =

n∏i=1

∫δi

p(y i |β0, δi , σ2ε )× p(δi |σ2

δ) dδi .

If a likelihood approach is taken, the random effect estimates δi , areobtained as the conditional means E[δi |y ].

These are known as the best linear unbiased predictors (BLUPs).

Known as estimated BLUPs (EBLUPs) when β and variancecomponents σ2

ε , σ2δ are estimated.

Technical Note: Restricted ML (REML) is preferred for estimation ofvariances – accounts for estimation of β.

13 / 41

Page 14: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Estimation in the Linear Mixed Model

In general, there are no closed-form (i.e., explicit) forms for theestimates of the parameters.

Suppose, for simplicity, that β0, σ2δ and σ2

ε are known; this allowssome insight into inference.

14 / 41

Page 15: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Estimation in the Linear Mixed Model

The posterior mean of the random effect (area-specific adjustment) is:

δi = E[δi |yi ]

=niσ

σ2ε + niσ

(y i − β0)

= qi(y i − β0)

where

qi =niσ

σ2ε + niσ

≤ 1

and is small (so more shrinkage) if:• ni is small (not much data in the area), or• σ2

δ is small (between-area variability is small), or• σ2

ε is large (within-area variability is large).

δi is the BLUP.15 / 41

Page 16: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Inference for Random Effcets

From a frequentist mixed-model perspective:• The BLUPs are unbiased in the sense that,

E[δi − δi ] = 0,

where the expectation is over δi and δi since both are viewed asrandom.

• Uncertainty is measured by var(δi − δi) which is equal to themean squared error (MSE), MSE(δi).

• The MSE can also be used for construction of asymptoticconfidence intervals:

δi − δi ∼ N(0,MSE(δi)︸ ︷︷ ︸var(δi−δi )

).

• The MSE can be estimated analytically, or via the bootstrap (Raoand Molina, 2015).

16 / 41

Page 17: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Predicting the Population Total and MeanLet Si and Ri denote, respectively, the set of indices of the sampledand unsampled individuals in area i .

Let Ti =∑Ni

k=1 yik be the total for the population in area i , where Ni isthe population size.

The average for the population in area i is

Y i =Ti

Ni

=

∑Nik=1 yik

Ni

=

∑k∈Si

yik +∑

k∈Riyik

Ni

=

∑k∈Si

yik

ni︸ ︷︷ ︸Mean of Sampled

× ni

Ni+

∑k∈Ri

yik

Ni − ni︸ ︷︷ ︸Mean of Unsampled

×Ni − ni

Ni

17 / 41

Page 18: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Predicting the Population Total and Mean

Suppose now we have fitted the linear mixed effects model andobtained posterior medians β0 and δi .

The obvious estimate is:

Y i =

∑k∈Si

yik

ni︸ ︷︷ ︸Mean of Sampled

× ni

Ni+ (β0 + δi)︸ ︷︷ ︸

Estimated Mean

×Ni − ni

Ni

If Ni � ni , then the sampled data provide a small fraction of the totalpopulation in the area and we can estimate the area mean by

Y i = β0 + δi . (1)

If an area contains no data, then its mean is assumed to be β0 + δ?,where δ? ∼ N(0, σ2

δ).

18 / 41

Page 19: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Motivating Example: Continuous Outcome

Totals can be similarly estimated:

Ti =∑k∈Si

yik +∑k∈Ri

yik

=∑k∈Si

yik︸ ︷︷ ︸Total of Sampled

+ (Ni − ni)× (β0 + δi)︸ ︷︷ ︸Estimated Total for Unsampled

.

If Ni � ni ,

Ti ≈ Ni × (β0 + δi).

This assumes that population size Ni is known.

19 / 41

Page 20: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Generalized Linear Mixed Models

20 / 41

Page 21: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Smoothing Models

The above considerations of instability led to methods beingdeveloped to smooth the prevalences using mixed effects models thatuse the data from the totality of areas to provide more reliableestimates in each of the constituent areas.

Overview of Models:• Basic Binomial Model: No smoothing.• Mixed Effects Models:

• Binomial IID GLMM3: Non-spatial smoothing.• Binomial Spatial GLMM: Spatial and non-spatial smoothing.

• Covariates may be added to each of these in order to smoothover covariate space.

• Estimation in these models is a separate issue.

3Generalized Linear Mixed Model21 / 41

Page 22: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Overview of Models

The individual responses are the binary random variables Yik , for thesampled individuals k ∈ Si .

If we take Yi =∑

k∈SiYik , the obvious sampling model (likelihood) is:

Yi |pi ∼ Binomial (ni ,pi) .

Having unconstrained pi and taking the MLE’s leads to pi = Yi/ni .

22 / 41

Page 23: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Overview of ModelsThe simplest GLMM assumes the odds in area i are of the form

pi

1− pi= exp(β0)× exp(δi),

with exp(δi) are area-specific adjustments that multiply the overallodds exp(β0).

This model is equivalent to a linear model on the logistic scale:

log

(pi

1− pi

)= β0 + δi .

The random effects δi are assumed to follow a normal distribution,that is,

δi ∼iid N(0, σ2δ).

It is straightforward to add area-level covariates to this model via

log

(pi

1− pi

)= β0 + x T

i β1 + δi .

23 / 41

Page 24: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

SAE Inference

We may wish to estimate the total number of cases or the average(prevalence) in area i :

Ti =∑k∈Si

yik +∑k∈Ri

yik

Ti

Ni=

∑k∈si

yik +∑

k∈Riyik

Ni.

Estimates:

Ti =∑k∈Si

yik + (Ni − ni)× pi

Ti

Ni=

∑k∈Si

yik

ni× ni

Ni+ pi ×

Ni − ni

Ni.

24 / 41

Page 25: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

SAE Inference

If Ni � ni :

Ti = Ni pi

Ti

Ni= pi .

In the Binomial GLMM:

pi =exp(β0 + x T

i β1 + δi)

1 + exp(β0 + x Ti β1 + δi)

.

25 / 41

Page 26: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Bayesian Inference

26 / 41

Page 27: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Bayesian InferenceBayesian modeling is convenient for implementing notions ofsmoothing.

There are two key elements that must be specified:• The sampling model (likelihood) describes the distribution of the

data – this model depends on unknown parameters, that we willdenote p.

• The prior distribution expresses beliefs about the parameters pand provides a mechanism by which penalization/smoothing canbe imposed.

These elements are probabilistically combined via Bayes Theorem:

p(p|y)︸ ︷︷ ︸Posterior

∝ L(p)︸︷︷︸Likelihood

×π(p)︸︷︷︸Prior

.

On the log scale:

log p(p|y)︸ ︷︷ ︸Updated Beliefs

= log L(p)︸ ︷︷ ︸Sampling Model

+ log π(p)︸ ︷︷ ︸Penalization

.

27 / 41

Page 28: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Bayesian Inference

• In a Bayesian analysis the complete set of unknowns(parameters) is summarized via the multivariate posteriordistribution – one or two dimensional marginal posteriordistributions can be vizualised.

• The marginal distribution for each parameter may besummarized via its mean, standard deviation, or quantiles.

• It is common to report the posterior median and a 90% or 95%posterior range for parameters of interest.

• The range that is reported is known as a credible interval.

28 / 41

Page 29: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Bayesian Computation

• The computations required for Bayesian inference (integrals) areoften not trivial and may be carried out using a variety ofanalytical, numerical and simulation based techniques.

• We use the integrated nested Laplace approximation (INLA),introduced by Rue et al. (2009).

• R-INLA is a package that implements the INLA approach.• The SUMMER package uses INLA for all Bayesian computation.• Book-length treatments on INLA:

• Blangiardo and Cameletti (2015) – space-time modeling.• Wang et al. (2018) – general modeling.• Krainski et al. (2018) – advanced space-time modeling.

29 / 41

Page 30: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Bayes Example

• Imagine the data model is normal with an unknown mean µ:

yi | µ ∼ N(µ, σ2),

i = 1, . . . ,n, with σ2 assumed known.• This is equivalent to:

y | µ ∼ N(µ, σ2/n),

where σ/√

n is the standard error.• Suppose a normal prior is appropriate:

µ ∼ N(m, v),

so that values of the mean µ that are (relatively) far from m arepenalized – v is the smoothing parameter, more/less smoothingif small/big.

• The log posterior is:

log p(µ | y︸ ︷︷ ︸Updated Beliefs

) = − n2σ2 (y − µ)

2︸ ︷︷ ︸Data Model

− 12v

(µ−m)2︸ ︷︷ ︸Penalization

.

30 / 41

Page 31: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

10 15 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Parameter

Den

sity

priorposteriorlikelihood

Figure 2: Normal data model with n = 10, y = 19.3 and standard error 1.41.The prior for µ has mean m =15 and v = 32. The posterior for the parameterµ is a compromise between the two sources of information: the posteriormean is 18.5 and the posterior standard deviation is 1.28.

31 / 41

Page 32: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

The Bayesian Linear Mixed Model

The LMEM with priors is:

Yik |β, δi , σ2ε ∼iid N(β0 + x T

i β1 + δi , σ2ε )

δi |σ2δ ∼iid N(0, σ2

δ)

β, σ2ε , σ

2δ ∼ Priors

The posterior, given data y , is obtained as

p(β, δ1, . . . , δn, σ2ε , σ

2δ |y) = p(y |β, δ1, . . . , δn, σ

2ε , σ

2δ)

× p(β, δ1, . . . , δn, σ2ε , σ

2δ)/p(y)

=n∏

i=1

p(y i |β, δi , σ2ε )× p(δi |σ2

δ)

× p(β, σ2ε , σ

2δ)/p(y)

Marginal distributions, such as p(β0|y), are obtained by integration/

In the SUMMER package, the INLA approach is used (Rue et al., 2009).

32 / 41

Page 33: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Simulated Data

33 / 41

Page 34: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Motivating Example: Linear Model

170

180

190

MLE

170

180

190

Bayes

3

4

5

MLE.sd

3

4

5

Bayes.sd

Figure 3: Top row: Estimates of area averages of weight via MLE’s (left) andposterior medians (right). Bottom row: Uncertainty of estimates with standarderrors (left) and posterior standard deviations (right).

34 / 41

Page 35: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Motivating Example: Linear Model

●●

●●

●●●

●●

●●

●●

170 175 180 185 190 195

170

175

180

185

190

195

Estimates

MLE

Pos

terio

r m

edia

n

●●

●●

● ● ●

●●

●●●

●●

●●●

●●●

● ●

2 3 4 5 6

23

45

6

Standard errors

se(MLE)

Pos

terio

r S

D

Figure 4: Comparison of area averages: Posterior medians versus MLEs(left). Posterior standard deviations versus standard errors associated withthe MLEs (right).

The posterior medians are shrunk from the MLEs towards the overallmean, with the extreme values undergoing the most shrinkage.

In general, the Bayes measures of uncertainty (the posterior standarddeviations) are smaller than the standard errors of the MLEs, with thegreatest difference occurring for those areas with the large standarderrors (which have the smallest sample sizes).

35 / 41

Page 36: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Motivating Example: Binary Outcome

0.0

0.1

0.2

0.3

0.4

mle

0.0

0.1

0.2

0.3

0.4

median

0.025

0.050

0.075

0.100

mle.sd

0.025

0.050

0.075

0.100

sd

Figure 5: Top row: Estimates of area prevalences with diabetes via MLE’s(left) and posterior medians (right). Bottom row: Uncertainty of estimates withstandard errors (left) and posterior standard deviations (right).

36 / 41

Page 37: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Motivating Example: Binary Outcome

●●

●●

●●

●●

−4 −3 −2 −1

−4

−3

−2

−1

Logit Prevalence

MLE

Pos

terio

r m

edia

n

●●

●●

●●●

●●

0.4 0.6 0.8 1.0

0.4

0.6

0.8

1.0

Uncertainty

se(MLE)

Pos

terio

r S

DFigure 6: Comparison of area averages: Posterior medians versus MLEs onthe logistic scale (left). Posterior standard deviations versus standard errorsassociated with the MLEs on the logistic scale (right).

As in the normal model case, we see that the Bayesian estimates areshrunk relative to the MLEs, and the uncertainty of the Bayesianestimates is in general smaller.

37 / 41

Page 38: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Fay-Herriot Modeling

38 / 41

Page 39: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Fay-Herriot ModelWhat about accounting for complex sampling?

So far we have been (implicitly) assuming the data are gatheredthrough simple random sampling (SRS).

In a ground-breaking paper, Fay and Herriot (1979) suggestedmodeling a transform of the weighted estimator via a linear mixedmodel.

Specifically, let yi denote a transformation of the weighted estimator:

y i = g

(∑k∈Si

wk yk∑k∈Si

wk

).

and assumeyi |β0, δi ∼ N(β0 + x T

i β1 + δi ,Vi),

with Vi assumed known and being derived from the variance of theweighted estimator, and

δi |σ2δ ∼iid N(0, σ2

δ).

39 / 41

Page 40: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Fay-Herriot Modeling

• The F-H approach is an extremely popular way of producingarea-level estimates.

• This formulation avoids the need to model the design.• For a continuous outcome (e.g., child growth measures) it may

suffice to simply take the weighted estimator, with its associatedvariance.

• For prevalences, we can take g(z) = log(z/(1− z)), and thenuse the delta method to obtain the appropriate variance.

• For rates, we can take g(z) = log(z + c), where the constant c ischosen to avoid taking the log of zero, and again use the deltamethod to obtain the appropriate variance.

• Area-level covariates are frequently used.• Problems can occur when the estimate lies on its boundary, both

for point estimation and variance estimation (e.g., for aprevalence when all the outcomes are 0).

40 / 41

Page 41: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

Discussion: Area-Level Modeling

• Mixed effects models are a common approach for SAE, as theyacknowledge between-area differences in outcomes.

• But one must consider the design (e.g., stratification andclustering) – Fay-Herriot modeling is a simple way to do this.

• Model-checking is important.

41 / 41

Page 42: 2021 SISCER Module 2 Small Area Estimation: Lecture 2: Non

References

Blangiardo, M. and Cameletti, M. (2015). Spatial and Spatio-TemporalBayesian Models with R-INLA. John Wiley and Sons.

Fay, R. and Herriot, R. (1979). Estimates of income for small places:an application of James–Stein procedure to census data. Journalof the American Statistical Association, 74, 269–277.

Krainski, E. T., Gomez-Rubio, V., Bakka, H., Lenzi, A., Castro-Camilo,D., Simpson, D., Lindgren, F., and Rue, H. (2018). AdvancedSpatial Modeling with Stochastic Partial Differential EquationsUsing R and INLA. Chapman and Hall/CRC.

Rao, J. and Molina, I. (2015). Small Area Estimation, Second Edition.John Wiley, New York.

Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesianinference for latent Gaussian models using integrated nestedLaplace approximations (with discussion). Journal of the RoyalStatistical Society, Series B, 71, 319–392.

Wang, X., Yue, Y., and Faraway, J. J. (2018). Bayesian RegressionModeling with INLA. Chapman and Hall/CRC.

41 / 41