spatial variable selection methods for investigating acute

32
Biometrics , 1–23 2013 Spatial variable selection methods for investigating acute health effects of fine particulate matter components Laura F. Boehm Vock St. Olaf College Brian J. Reich, Montserrat Fuentes North Carolina State University and Francesca Dominici Harvard University Summary: Multi site time series studies have reported evidence of an association between short term exposure to particulate matter (PM) and adverse health effects, but the effect size varies across the United States. Variability in the effect may partially be due to differing community level exposure and health characteristics, but also due to the chemical composition of PM which is known to vary greatly by location and time. The objective of this paper is to identify particularly harmful components of this chemical mixture. Because of the large number of highly-correlated components, we must incorporate some regularization into a statistical model. We assume that, at each spatial location, the regression coefficients come from a mixture model with the flavor of stochastic search variable selection, but utilize a copula to share information about variable inclusion and effect magnitude across locations. The model differs from current spatial variable selection techniques by accommodating both local and global variable selection. The model is used to study the association between fine PM (PM < 2.5 μm) components, measured at 115 counties nationally over the period 2000-2008, and cardiovascular emergency room admissions among Medicare patients. Key words: Bayesian modeling; Fine particulate matter; Stochastic search variable selection; Spatial data. This paper has been submitted for consideration for publication in Biometrics

Upload: others

Post on 22-Jan-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Biometrics , 1–23

2013

Spatial variable selection methods for investigating acute health effects of fine

particulate matter components

Laura F. Boehm Vock

St. Olaf College

Brian J. Reich, Montserrat Fuentes

North Carolina State University

and

Francesca Dominici

Harvard University

Summary: Multi site time series studies have reported evidence of an association between short term exposure to

particulate matter (PM) and adverse health effects, but the effect size varies across the United States. Variability in

the effect may partially be due to differing community level exposure and health characteristics, but also due to the

chemical composition of PM which is known to vary greatly by location and time. The objective of this paper is to

identify particularly harmful components of this chemical mixture. Because of the large number of highly-correlated

components, we must incorporate some regularization into a statistical model. We assume that, at each spatial

location, the regression coefficients come from a mixture model with the flavor of stochastic search variable selection,

but utilize a copula to share information about variable inclusion and effect magnitude across locations. The model

differs from current spatial variable selection techniques by accommodating both local and global variable selection.

The model is used to study the association between fine PM (PM < 2.5 µm) components, measured at 115 counties

nationally over the period 2000-2008, and cardiovascular emergency room admissions among Medicare patients.

Key words: Bayesian modeling; Fine particulate matter; Stochastic search variable selection; Spatial data.

This paper has been submitted for consideration for publication in Biometrics

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 1

1. Introduction

Particulate matter (PM) is a complex mixture of airborne particles from both primary and

secondary anthropogenic and natural sources. Major sources include combustion of fossil

fuels and biomass, dust from industrial, construction, and mining operations, wildfires, and

lightning strikes (Schlesinger et al., 2006). Particles are either emitted directly into the air

(primary pollution) or formed by chemical interactions of gases and primary pollutant par-

ticles in the air (secondary pollution). Ambient levels of this complex mixture are currently

regulated by the Environmental Protection Agency (EPA) based solely on particle size, not

chemical composition or source. Yearly average and hourly maximum levels of the total

amount of both coarse (PM10; particles < 10µm in aerodynamic diameter) and fine (PM2.5;

particles < 2.5µm) are restricted.

Numerous studies have quantified the relationship of fine particulate matter exposure and

human health using multi-site time series studies (Schwartz et al., 1996; Dominici et al., 2000,

2006; Choi et al., 2009) comparing day to day changes in ambient measures of particulate

matter to day to day changes in hospital admissions and mortality data at the county and city

level. Both spatial and temporal heterogeneity in health effect estimates have been observed

(Bell et al., 2007, 2008; Zhou et al., 2011; Ito et al., 2011). We hypothesize that spatial

variation in the estimated health effects could be due to differing chemical composition of

PM2.5 across space (Fuentes et al., 2006; Bell et al., 2007; Peng et al., 2009; Bell et al., 2009;

Levy et al., 2012). While PM2.5 is currently regulated by total mass, deeper understanding

of the toxicity of the PM2.5 chemical composition could lead to air pollution regulations

that are more targeted to the most harmful emission sources.

A growing body of literature has investigated the potential health impact of specific PM2.5

components. While carbon fractions, including elemental carbon, black carbon, and organic

carbon matter, are shown to have positive associations in a variety of health studies, no

2 Biometrics, 2013

components have been ruled out in all studies (Rohr and Wyzga, 2012). Even looking at

individual chemical components does not eliminate between site variability in the health

effect estimates. There is some variability in particle size within component depending on

the source (Schlesinger, 2007), and also in particle chemistry and acidity, as these com-

ponents may be part of larger molecules (Schlesinger et al., 2006). Site-specific effects of

individual components may also vary due to population health characteristics which modify

susceptibility or by lifestyle or housing characteristics which modify exposure to ambient air

(Dominici et al., 2002, 2003; Zeka et al., 2005). For example, Dai et al. (2014) find PM2.5

health effects are greater in counties with higher rates of smoking and heavy alcohol use.

However, Krall et al. (2013) did not find significant regional differences in mortality effects

of PM2.5 constituents using single pollutant models.

Previous epidemiological and toxicological studies investigating components of fine par-

ticulate matter tend to focus on a few pollutants or pollutant groups. Peng et al. (2009)

investigate the relationship of the seven most massive PM2.5 components and CVD hos-

pitalizations. They use a hierarchical model to estimate national effects, but assume the

effects at every site are independent. Some city-specific analyses investigate a larger number

of pollutants (Ito et al., 2011; Zhou et al., 2011), but these results are hard to generalize to

other locations.

To our knowledge, a model which accomplishes variable selection both locally (within a

spatial location) and globally (on average for all the locations) across all sites within one

model has not previously been attempted. Here we extend previous methods for multi-

site variable selection to identify components of PM2.5 which may be important for health

outcomes at all or specific sites. Our flexible spatial approach facilitates global and local

selection simultaneously, unlike previous work. Reich et al. (2010) consider variable selection

approaches for multiple predictors globally, that is, included or excluded from the model at all

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 3

sites. The coefficients for the included variables are allowed to vary by site with spatial priors.

Smith and Fahrmeir (2007) propose local selection for a single predictor. The predictor may

be included in the regression at some sites and not others. This is accomplished by treating

the coefficient of interest as the product of a spatially correlated binary random field and

an independent Gaussian field. Similar work with multiple predictors in generalized linear

models have been undertaken by Lum (2012) and Scheel et al. (2013). In all these works, the

Gaussian field assumes the magnitudes of the coefficients are independent across sites, and

there is no cross dependence between the magnitude and inclusion probability implied in the

prior. We extend this approach to include multiple predictors and spatial correlation in the

effect size, while simultaneously allowing global selection as a special case. Rather than treat

each coefficient as the product of a binary field and a continuous field, we utilize a Gaussian

copula to create a smooth predictive surface.

The key difference between the proposed method and the global methods of Reich et al.

(2010) is that it addresses not only the question of which covariates are important, but

also where the covariates are important. There are many cases when a practicing statistician

would be interested in separating these two questions. For example, in an imaging study such

as Smith and Fahrmeir (2007) one may attempt to find a particular spatial region of the brain

which is affected by a covariate. Another example is meteorology, where certain factors may

be important only in some regions because of different regional climate systems or regimes.

In the air pollution application of this paper, the objective is to identify subregions where

different pollutants are associated with adverse health outcomes. These difference may arise

from different sources of pollution or different characteristics of the at-risk population. In

summary, the proposed method is best suited to cases where covariates are thought to have

an effect only in subregions of the overall spatial domain.

This spatial variable selection method facilitates a more comprehensive analysis of PM2.5

4 Biometrics, 2013

constituent data than previously attempted. We include 22 components of PM2.5 mea-

sured in 115 counties across the United States. For these data, we find that the spatial

model without variable selection produces highly variable effect estimates across space and

component, and fails to identify any particularly harmful components. By adding variable

selection to the model, we are able to isolate elemental carbon as the component with largest

effect. Compared to a non-spatial analysis, our spatial analysis provides more precise effect

estimates, and reveals strong regional variation in the strength of this association, with the

largest estimates in the Northeast, Midwest, and Pacific Coast.

2. Data

Of the approximately 50 speciated PM2.5 components measured by the EPA, we selected p =

22 components of interest. Each contributes at least 1% of total mass to PM2.5, or the litera-

ture has suggested a potential link with health outcomes, or both. The components and sum-

mary statistics are shown in Table 1. These speciated PM measurements are taken from the

EPA’s Air Quality System (AQS) and AirExplorer databases (www.epa.gov/ttn/airs/airsaqs/,

www.epa.gov/airexplorer/). The AQS data include raw monitor values and daily averages,

while AirExplorer is a processed data product designed for use by health and epidemiology

research. For twenty of the components we use the AQS data. Because of a high proportion

of missingness for elemental carbon (EC) and organic carbon matter (OCM) in the AQS

database, we use the AirExplorer data for these components. Any observations below the

lower limit of detection are recorded as one half the detection limit. Following Peng et al.

(2009), for counties that had more than one active monitor on a given day, an average was

taken using 10% trimmed mean if more than 10 stations; for 3–10 stations, minimum and

maximum values were excluded from the mean; and for 2 stations, we use the mean. All

components are measured in µg/m3 except EC, which is measured in inverse megameters, a

measure of light extinction in haze.

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 5

We only used information from non-source-oriented monitors, and exclude values flagged

by the EPA for data quality issues. Source-oriented monitors are placed with the intention

of monitoring a known large pollutant source, and may not be in a populated area. To avoid

biased pollution measurements, we exclude these and focus on non-source-oriented monitors,

which are placed with the purpose of estimating the exposure in populated areas. Any days

missing pollutant information were excluded. We include 117 counties in the US with at least

100,000 residents and PM2.5 components monitors active on at least 150 days in the time

period 2000-2008. Of these we exclude two California counties with data quality issues. In

these counties half the pollution measurements were 1000 times larger than expected based

on nearby counties and measurements on preceding days. In total we include 115 counties.

As recommended by Gelfand et al. (2003) covariates are scaled, but not centered. We use

the 90th percentile value to scale rather than standard deviation because of the skewness

of the pollution data. We note that other scaling factors could be used, such as standard

deviation or interquartile range (IQR), and that this is equivalent to putting different prior

variances on the coefficients for each pollutant, and that scaling is important when pollutants

are present in much different concentrations. As in Peng et al. (2009), we removed extreme

pollution values from both the simulation study and analysis, where “extreme” is any value

more than double the second highest value for that pollutant in that county. This results in

a removal of approximately 0.4% of the observed data. We also conducted the data analysis

with the extreme values as part of our sensitivity analysis (see Supplementary Material).

The health data includes Medicare beneficiary enrollment and Medicare Part-A inpatient

records, aggregated to daily county totals. We count the number of patients hospitalized with

a principal ICD-9 diagnosis code related to cardiovascular disease (CVD). These include heart

failure, ischemic heart disease, and cerebrovascular events among others. The average CVD

6 Biometrics, 2013

hospitalization rates vary greatly by county, ranging from 8.5 to 32 per 100,000 Medicare

enrollees per day.

[Table 1 about here.]

3. Statistical model

We observe Yt(s) CVD cases at county s on day t, in addition to a vector of pollutants, xt(s),

and confounding variables zt(s). The confounding variables include indicator functions for

the day of the week and smooth functions of time, temperature, and dewpoint. To adjust for

long term and seasonal trends, we include natural cubic spline smoothing function of time

with 7 df per year as a predictor in the model. We also include indicators for the day of the

week, and splines for daily average temperature (6 df) and the 3-day lag mean dewpoint (3

df) to control for confounding with weather. We chose these numbers of degrees of freedom

for consistency with Dominici et al. (2002). Other work, including Peng et al. (2009) use

different numbers of degrees of freedom. Details of sensitivity analysis are included in the

supplementary material.

We utilize the spatially varying coefficients model described by Gelfand et al. (2003),

Yt(s)|β(s),η(s) ∼ Poisson [Nt(s)exp{β0(s) + xt(s)′β(s) + zt(s)

′η(s)}] (1)

where Nt(s) is a known offset for the number of Medicare patients at county s on day t

and β0(s) is a spatially varying intercept. We include spatially varying coefficients β(s) =

[β1(s), . . . , βp(s)]T and η(s) = [η1(s), . . . , ηqs(s)]

T . Note that the dimension of η(s), qs,

depends on the county s due to the use of 7 knots per year for the spline function of

time, and that each county may be observed over a different time interval. We control for

unmeasured confounding independently within each location and then borrow information

across locations for the health estimates. Therefore we select a low precision normal density

as the prior for the confounding effects η(s), which is independent across counties. We let

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 7

βk(s) denote the coefficient on pollutant k, for k = 1, . . . , p, and county s, for s = 1, . . . , n.

The prior on βk(s) will imply both spatial correlation and variable selection.

3.1 Spatial variable selection model

To facilitate variable selection, the marginal prior distribution for each coefficient βk(s) is a

mixture density,

fk[βk(s)] = πkN(αk, ω2k) + (1− πk)N(0, ω2

k/C), (2)

where N(m, v) is the Gaussian density function with mean m and variance v. Global behavior

is exhibited when πk = 1 or 0. Coefficients will be smoothed either toward αk, the overall

mean when βk is important, or toward 0. Thus πkαk is the overall mean effect across

all counties. Variability is controlled by ωk, with C representing the ratio of variance for

coefficients “in” and “out” of the model. One interpretation of C is that if βk(s) is within

±3ωk/√C of 0 it may be safely replaced by zero (George and McCulloch, 1997). If this

tuning parameter C = ∞, the second part of the mixture distribution is a point mass at

zero. This is the formulation of SSVS used by Smith and Fahrmeir (2007) and Reich et al.

(2010), in which the coefficients are modeled as the product of binary and continuous fields.

In the simulation study, we let C = 100, while goodness of fit criteria are used to select

C = 400 for the data analysis in Section 5.

In our model we apply a copula (Nelsen, 1999; Sklar, 1973, 1953) to the marginal distri-

bution in (2). A copula is a general technique for modeling dependence between random

variables while preserving desired marginal distributions. Many copulas have been proposed

to capture various forms of dependence, including the class of Archimedean copulas and

copulas constructed from multivariate densities such as Gaussian and t (Nelsen, 1999). We

select a Gaussian copula because the vast majority of spatial modeling deals with dependence

through the covariance function of a Gaussian process, and the Gaussian copula allows us

to leverage these models for our spatial variable selection model. To implement the copula

8 Biometrics, 2013

model, we introduce latent variables, θk(s), which follow a mean zero Gaussian process with

correlation function ρ(s, s′). Here we use an exponential spatial correlation function,

ρ(s, s′) = cor [θk(s), θk(s′)] = exp (−||s− s′||/r).

The range parameter r has the interpretation that at distance 3r correlation is about 0.05.

Although we choose an exponential correlation structure, any correlation function may be

used, including a generalization to non-stationary areal correlation structure such as that

implied by a conditional autoregressive model (CAR). We fix the variance of θk to one for

identifiability purposes in the Gaussian copula. We write a Gaussian process with mean m,

standard deviation v, and exponential spatial correlation with range r as GPex(m, v, r).

Hence, θk(s) ∼ GPex(0, 1, r). We force βk(s) to have the desired marginal distribution by

the transformation

βk(s) = F−1k {Φ[θk(s)]} , (3)

where Φ is the standard normal cdf and Fk is the marginal cdf of βk(s) defined in (2) and

dependent on πk, αk, and ωk.

3.2 A comparison of copula and other model solutions

As noted previously, the usual SSVS formulation of the model in (2) is to set C = ∞ and

write βk(s) as the product of a binary and a continuous field. That is, let βk(s) = γk(s)Bk(s)

where γk(s) is a binary spatial process and Bk(s) is a continuous process. in the introduction,

an alternative approach is to define each coefficient as the product of a binary field and a

continuous field. In the local approach by Smith and Fahrmeir (2007) the binary indicator

varies spatially according to an Ising (auto-logistic) prior, which is a binary analog to the

Conditionally Autoregressive (CAR) model used for continuous data measured on a lattice.

The continuous part, measuring the effect size, is assumed to vary across sites, but no spatial

correlation is directly imposed in the prior. Using our notation, we can write γk(s) ∼ Ising,

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 9

and Bk(s)iid∼ N(0, ω2). The model described by Smith and Fahrmeir (2007) has p = 1,

while Lum (2012) and Scheel et al. (2013) extend to multiple predictors. In Smith and

Fahrmeir (2007) the approach is applied to functional MRI data in which spatial indication

of importance through γk(s) was of primary interest. The spatial variable selection approach

we propose encourages sharing of information across sites by centering the distribution of

βk(s) around αk and inducing spatial correlation through the copula.

While possible to induce spatial correlation through Bk(s), treating βk(s) as the product

of two fields or variables requires choosing a correlation and cross-correlation structure for

the two processes. The copula naturally incorporates the intuition that if a component has

no effect within a given county, nearby counties are unlikely to have large coefficients and

creates a continuous prior surface. The realization of the spatial process on a spatial grid

in Supplemental Materials A shows large areas with β(s) = 0 (light blue) corresponding

to regions of null effects. The effect is near zero for sites near null regions, and β(s) varies

smoothly across sites in non-null regions.

The global approach to spatial variable selection as implemented by Reich et al. (2010)

fosters sharing of information across sites, but assumes the same set of predictors are included

in the model at every site. Using our notation, their model can be summarized as γk(s) =

γk ∼ Bernoulli(π) and Bk(s) is a Gaussian Process with spatial correlation. The Reich et al.

(2010) model also includes a second binary process which determines whether the variance

in the Gaussian process is non-zero. That is, coefficients are equal to zero at all sites, equal

to αk at all sites, or are spatially varying with mean α. Because this approach is global, it

does not allow for the case that pollutants would have zero coefficients at some sites and

non-zero effects at others.

The SpVS approach we implement here allows for both local and global inferences while

incorporating spatial smoothing through a single continuous process. While the non-copula

10 Biometrics, 2013

approaches can be written with conjugate full-conditional Gibbs sampling updates for all

parameters conditional on β, the copula approach is still computationally feasible through

MCMC and code is provided in the supplementary material.

4. Simulation study

In the simulation study, we will investigate the performance of our model in terms of

mean absolute deviation (MAD), power, Type I error, and ability to correctly identify null

coefficients compared to a spatial model without variable selection and a variable selection

model without spatial correlation. Because the primary focus of this study is to evaluate

local variable selection, we compare performance for varying degrees of spatial correlation in

the site-specific coefficients βk(s).

We generate coefficients βk(s) from multivariate normal latent variables θk(s) with expo-

nential correlation using the transformation in (3) where F is the cdf of (2) with C = ∞.

Though we fit the continuous version of the model where C <∞, we chose to generate the

data for the C =∞ case so that the true values of the coefficients will be exactly zero when

they are out of the model, whether globally or locally.

Each model has p = 9 covariates. By varying π1, . . . , π9 we determine the behavior for

each of the nine pollutant coefficient vectors. We will define “global” behavior, as being

generated with πk = 1 and thus βk(s) are drawn from a multivariate normal distribution,

βk(s) ∼ N(αk, ω2kΣ), where Σ is an exponential correlation matrix. Globally “null” covariates

have πk = 0 and hence βk(s) = 0 for all s. Covariates exhibiting “local” behavior will have a

non-zero mean αk, but 0 < πk < 1 so that the marginal distribution at each site is a mixture

distribution.

For computational purposes, we restrict the simulation study to 48 counties in the Eastern

United States. The maximum distance between these counties is 1360 km. We include the

components which contribute the greatest mass to total PM2.5: sulfate, nitrate, elemental

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 11

carbon, organic carbon, silicon, and sodium, as well as arsenic, bromine, and calcium. By

using the actual pollutant data to generate responses, we evaluate the models under the more

realistic scenario in which covariates of interest are moderately to strongly correlated within

site across time. Table 2(a) specifies the components and true parameters αk and πk used

to generate the data. The overall mean α is 0, 0.05, or 0.10, and the inclusion probability

πk is 0, 0.3, 0.7, or 1, indicating null, local, or global inclusion. For all coefficients we set

ωk = 0.025. We use the Poisson regression model (1) and the observed pollution data xt(s)

to generate values of Yt(s) based on our draws of β(s). That is, for each of M datasets we

generate βk(s) and then Yt(s)|β(s),xt(s). For simplicity, we do not include any confounding

variables in the simulation study.

Following the procedures outlined above, we generate M = 50 datasets using each of the

following designs for the correlation in βk(s):

(1) No Spatial Correlation: The true coefficients for each county are independent.

(2) Moderate Spatial Correlation: The true coefficients for each county are exponentially

spatially correlated with range r = 60 km (effective range 180 km).

(3) Strong Spatial Correlation: The true coefficients for each county are exponentially

spatially correlated with range r = 240 km (effective range 720 km).

In all cases we also include a random intercept β0(s), with E[β0(s)] = −8 yielding an

approximate average risk similar to the median risk observed. For the model with no spatial

correlation, the intercepts are also uncorrelated. The intercept in designs 2 and 3 are expo-

nentially spatially correlated with r = 120 km, which implies an effective range of 360 km.

The standard deviation of β0(s), ω0 = 1, is similar to the between county standard deviation

of average CVD rate.

We investigate the following three models:

12 Biometrics, 2013

(1) Spatial Model (Sp): βk(s) ∼ GPex(αk, ωk, r), without variable selection (i.e., πk =

1, k = 1, . . . , p).

(2) Exchangeable VS Model (EVS): βk(s) are independent and follow the distribution

in (2).

(3) Spatial VS Model (SpVS): The full model in Section 3. βk(s) marginally follow a

mixture distribution described in (2), and dependence is induced with a Gaussian copula.

In the simulation study, we use relatively uninformative priors, with αk ∼ N(0, 1), πk ∼

Beta(1, 1), 1/ω2k ∼ Gamma(0.001, 0.001), and r ∼ Uniform(0, 2). For each dataset we run

the model for 5500 iterations discarding the first 1000. We compare these three models using

MAD, power and type-I error of βk(s), and the ability to identify locally null coefficients.

4.1 Simulation results

Table 2(a) gives the MAD separately for global (π = 1), local (0 < π < 1), and null (π =

0) predictors. For global predictors, the spatial models (Sp and SpVS) have similar MAD

and outperform the non-spatial model (EVS), as expected. The MAD for null covariates is

considerably smaller for the variable selection models (EVS and SpVS). The spatial variable

selection model outperforms both competitors for the local predictors, where both shrinking

effects to zero and borrowing strength across space are beneficial.

Table 2(b) shows that by exploiting dependence between nearby locations, the SpVS model

has substantially higher power for detecting important covariates than the EVS model. Type-

I errors rates for local and global effects are near the nominal 0.05 level for all models, and

much less than 0.05 for the globally zero coefficients for all three models (not shown). For

the local covariates, there is not a significant difference in power when πk = 0.3 and when

πk = 0.7, therefore power is only reported for each level of αk. When the coefficients are

uncorrelated, the spatial model has the highest power, with the EVS and SpVS models

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 13

having slightly lower power very similar to each other. When the coefficients have weak or

strong correlation, both spatial models significantly outperform the EVS model.

Finally, we check the ability of three goodness of fit measures to select the correct model:

Deviance Information Criterion (DIC) (Spiegelhalter et al., 2002); the predictive measure of

Gelfand and Ghosh (1998) (GG); and log pseudo marginal likelihood (LPML). Smaller values

indicate better fit for DIC and GG, while the preferred model is indicated by larger values of

LPML. The generating model for Scenario 1 is EVS, while for Scenarios 2 and 3 the correct

model is SpVS. In this study, DIC and LPML most often pick the correct “best” model,

while GG is less accurate. Under Scenario 1, LPML and DIC pick SpVS and EVS equally

often, while GG is split across all three models. LPML selects SpVS 38/50 and 48/50 times

for Scenarios 2 and 3. DIC performs similarly, 37/50 and 48/50 times respectively. Neither

measure selected the Sp model under any scenario. GG incorrectly selects the EVS model

as best 32/50 and 43/50 times in Scenarios 2 and 3, and chooses the Sp model 14/50 and

6/50 times, selecting the correct SpVS model only 4/50 and 1/50 times.

[Table 2 about here.]

5. Analysis of CVD hospitalization data

5.1 Model comparisons

We fit the Spatial, EVS, and Spatial VS models to the CVD hospitalization and PM2.5

component data described in Section 2. Due to the large number of confounding variables

per site, we chose to fit a two-stage approximation to the fully Bayesian model. The Poisson

likelihood function in (1) is approximated by a normal likelihood, ˆβ(s) ∼ Np[β(s), V (s)],

where ˆβ(s) is the maximum likelihood estimator and V (s) is the estimated covariance matrix.

This two-stage model has the advantage of much faster computation time and model fit does

14 Biometrics, 2013

not depend on MCMC convergence of the confounding coefficients, as the approximation is

fit marginally over the confounding coefficients η(s).

In all of the models, we choose a flat prior between 0 and 1 for πk, but note that one

could use a beta prior for πk to encourage global selection by choosing both beta parameters

to be less than one. Local selection (0< πk <1) could also be weighted more heavily in

the prior, and greater prior weight could be placed on inclusion or exclusion by choosing an

asymmetric beta distribution for the prior. The prior for αk is normal with standard deviation

0.025, which is approximately two to four times larger than the effect sizes we expect to see.

Recalling that we model the log relative risk, 0.1 would correspond to a more than 10%

increase in risk of CVD hospitalization for a 90th percentile increase in pollution, which

previous work suggests is quite large. We have found that model fit is somewhat sensitive to

the choice of the prior on ωk. When ωk is large it becomes difficult to differentiate between

the 0 and αk modes of the distribution, and thus πk and αk become difficult to estimate.

Though αk and πk are therefore somewhat sensitive to the choice of prior on ωk, we found

that estimates of βk(s) are more stable over different choices of prior. We therefore choose a

gamma(0.1, 0.001) prior on the inverse variance, 1/ω2k, to be relatively uninformative while

encouraging ωk to be small. We note in the simulation study that larger effect sizes are more

robust to an uninformative prior on ω2k, and that this prior is equivalent to fitting gamma(0.1,

0.1), a vague prior, on data with 10 times larger effect sizes, roughly equivalent to the effect

sizes of the simulation study. The spatial range parameter is a uniform distribution from 0 to

1200 km. We fit the spatial VS model with different choices for the value of C; C=100, 225,

400, and 900. DIC was nearly identical for the first three choices, and highest for C=900.

Here we present the model with C = 400. Though C = 100 minimizes DIC for the EVS

model, the EVS model presented here also has C = 400 for consistency, and the model fits

are very similar. We generate 50,000 MCMC samples, discarding the first 10,000 for burn in.

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 15

We compare models using DIC, LPML, and GG, the goodness of fit measures described in

Section 4.1. We calculate LPML and GG statistics using the Poisson likelihood in (1) with

ηk(s) replaced by its maximum likelihood estimate, ηk(s). We calculate the DIC using the

normal likelihood approximation. The goodness of fit measures for each model are shown in

Table 3. The spatial VS model is selected as the best model by DIC and LPML, a substantial

improvement over EVS and spatial. The spatial model without variable selection is selected

as the best model by GG, but as shown in the simulation study, this result is questionable.

We also note that the estimated spatial correlation is very strong, further evidence that the

spatial models are a better choice than the exchangeable model.

In addition to these model comparisons, we conducted an extensive sensitivity analysis,

which is summarized in the supplementary material. We investigated sensitivity to several

modeling choices, including the effects of hyperpriors, confounders, and the variance ratio,

C, and found some sensitivity for hyperparameters, but that the effect estimates of interest

were robust to these factors.

5.2 Results

[Figure 1 about here.]

[Figure 2 about here.]

[Figure 3 about here.]

The overall (spatial average) effects for all pollutants are reported in Table 3, where local

covariates are defined as 0.20 < πk < 0.80 and global covariates as πk > 0.80. In Table

4 we present the posterior estimates for the overall mean when included, αk, the inclusion

probability, πk, and the random effects standard deviation, ω2k, for our spatial VS model.

Across all models, we consistently find positive overall effects for elemental carbon (EC). In

the EVS model, the overall effect estimate of elemental carbon, α4, is statistically significant

16 Biometrics, 2013

(95% posterior credible interval excludes 0). For the spatial and spatial VS models, posterior

probability that α4 > 0 is 0.88. No other overall effects are statistically significant, though

many county specific relative risk estimates are significant. These results are in general

agreement with Peng et al. (2009), who found 0.8% increase in CVD hospitalization risk

for an IQR increase in elemental carbon, over the time period 2002-2006 in a six pollutant

model.

The three models produce markedly different spatial patterns for the effect of EC, as shown

in Figure 1. The spatial model without variable selection shows considerable variation in the

effects across component and location. In the presence of the large number of potential

predictors, EC does not stand out as particularly important in the spatial model without

variable selection (Figure 1a). EC is clearly identified as the most relevant pollutant in both

variable selection models. In the EVS model, the effect size varies considerably by city with

no discernible spatial pattern. In contrast, the SpVS model reveals a strong spatial pattern,

with the largest effects in the Northeast, Mid Atlantic, East North Central, West North

Central, and Pacific regions. Therefore, the spatial variable selection model isolates both a

specific PM component with the most impact, and further refines the spatial distribution of

the effect size.

Figure 2 maps the county-specific EC effect estimates. The estimates vary smoothly across

space, with the largest values in the Northeast, Midwest, and Pacific Coast. The estimates

are small in other areas, with the notable exception of El Paso, Texas which has one of the

largest effect estimates. The 95% posterior intervals for many counties exclude zero for the

SpVS model, as shown in Figure 3c. Comparing the intervals for the EVS and SpVS models

in Figure 3 illustrates the benefit of spatial smoothing; by borrowing strength across nearby

counties, the intervals are narrower for the SpVS model and thus, unlike the EVS model,

several individual counties have statistically significant EC effects.

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 17

[Table 3 about here.]

[Table 4 about here.]

6. Discussion

In this paper, we develop a statistical model which allows for local variable selection of spa-

tially correlated coefficients. The copula framework creates a smooth prior surface reflecting

the intuition that effect estimates near counties where the effect is zero will also be small.

A simulation study confirms that including spatial correlation in the model, while adding

computational complexity, results in better model fit and improved effect estimation than

the spatial model or the exchangeable variable selection model. While the MAD for globally

included covariates are similar between spatial and spatial VS models as expected, we found

more precise estimates and more powerful tests of association when the data were generated

with locally-relevant predictors. Under strong spatial correlation, our model had nearly half

the MAD of the spatial model and 20% greater power than the EVS model.

In the data application, the SpVS model is clearly the most appealing. Figure 1 shows

that the spatial models with and without variable selection give dramatically different

results. With variable selection, a single pollutant (elemental carbon) is identified as the

most relevant, whereas many pollutants have some effect in the model without variable

selection. Compared to the EVS model, we find a substantial reduction in variance due to

spatial smoothing.

This investigation can help us understand how observed spatial and temporal variations

in the health effects of total PM2.5 concentration may be related to variation in chemical

components. No previous study to our knowledge has utilized variable selection techniques

on such a large list of components, and few have incorporated spatial correlation.

When interpreting these overall effect estimates, it is also important to note that these

18 Biometrics, 2013

numbers reflect only the average risk increase or decrease for the 115 counties included in the

study, and are not a reflection of an average national risk. The counties selected for the study

are generally densely populated urban counties which may differ significantly in pollution

levels, demographic, socio-economic and health characteristics from other US counties. Also,

while epidemiological studies such as this one can point to interesting relationships between

pollutant levels and health effects, we must be careful in interpreting effects as causal links.

As discussed by Thomas et al. (2007), the indicated pollutants may be correlated with other

pollutants not included with the study, or in fact may be proxying for another pollutant

which is included. Because we do not observe the relationship between ambient and personal

exposure, the indicated pollutants may be signaling due to higher association with personal

exposure, though another pollutant is more correlated with health. Measurement error,

particularly that of using ambient pollution as a proxy for personal exposure, may also play

a role, especially for pollutants which are extremely spatially heterogeneous. A spatially

homogeneous pollutant may have stronger association between ambient and personal expo-

sure, and thus may mask the more heterogeneous pollutant which has a relationship between

personal exposure and health. Previous work studying measurement error and coarse PM10

(that greater than 2.5) (Chang et al., 2011) and total PM10 (Zeger et al., 2000) suggest

relatively consistent estimates of that effect under different measurement error scenarios,

but spatial heterogeneity of individual components of PM2.5 may be appreciably greater,

and individual components vary in their level of heterogeneity (Bell et al., 2011). Further

investigation of how spatial heterogeneity and measurement error may change health effect

estimates is warranted, particularly in variable selection models.

Further research into the reason for heterogeneity in pollutant effects using both epidemi-

ological and toxicological approaches is also needed. Heterogeneity may be due to differences

in particle chemistry, interaction effects, differing population exposure characteristics, or the

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 19

measurement error issues described above. Though we have divided PM into its chemical

components, the physical and toxicological properties are by no means homogeneous. For

example, sulfate and nitrate particles are mostly secondary pollutants created by reactions of

sulfur and nitrogen oxides with other gases and particles to form a diverse set of molecules,

including sulfuric acid and ammonium bisulfate, which vary in acidity (Schlesinger et al.,

2006). Toxicological studies suggest this acidity, not just the components, may be an impor-

tant factor in health outcomes (Schlesinger et al., 2006). Individual chemical components may

come from diverse primary sources as well, resulting in differing size and associated particles.

For example, potassium becomes airborne from burning biomass, from blown crustal material

(dust), and salts from sea spray which may have differing biological effects (Schlesinger,

2007). An extension of this SpVS model to a space-time setting would be useful for fully

understanding possible seasonal effects. Though we adjust for seasonal confounding, we do

not consider interactions of pollutant effect and season in the current model.

Supplementary Materials

Web Appendices and Figures referenced in Sections 2, 3, 3.2, and 5.1 and R code for

implementing the SpVS model are available with this paper at the Biometrics website on

Wiley Online Library.

Acknowledgements

We are grateful for the support provided by NIH/NIEHS grant R21ES022585-01, NIH grants

R01ES019560, R21ES022795-01A1, and 5R01ES014843-02, EPA grant RD83479801, and

funds from HEI. We are also grateful to the editor, associate editor, and referees for their

helpful comments.

20 Biometrics, 2013

References

Bell, M. L., Dominici, F., Ebisu, K., Zeger, S. L., and Samet, J. M. (2007), “Spatial and

Temporal Variation in PM2.5 Chemical Composition in the United States for Health

Effects Studies,” Environmental Health Perspectives, 115, 989–995.

Bell, M. L., Ebisu, K., and Peng, R. D. (2011), “Community-level spatial heterogeneity

of chemical constituent levels of fine particulates and implications for epidemiological

research.” Journal of Exposure Science & Environmental Epidemiology, 21, 372 – 384.

Bell, M. L., Ebisu, K., Peng, R. D., Samet, J. M., and Dominici, F. (2009), “Hospital

admissions and chemical composition of fine particle air pollution,” American Journal

of Respiratory and Critical Care Medicine, 179, 1115–1120.

Bell, M. L., Ebisu, K., Peng, R. D., Walker, J., Samet, J. M., Zeger, S. L., and Dominici,

F. (2008), “Seasonal and Regional Short-term Effects of Fine Particles on Hospital

Admissions in 202 US Counties, 1999-2005,” American Journal of Epidemiology, 168,

1301–1310.

Chang, H. H., Peng, R. D., and Dominici, F. (2011), “Estimating the acute health effects of

coarse particulate matter accounting for exposure measurement error,” Biostatistics, 12,

637–652.

Choi, J., Fuentes, M., and Reich, B. J. (2009), “Spatial-temporal association between fine

particulate matter and daily mortality,” Computational Statistics & Data Analysis, 53,

2989–3000.

Dai, L., Zanobetti, A., Koutrakis, P., and Schwartz, J. D. (2014), “Associations of Fine

Particulate Matter Species with Mortality in the United States: A Multicity Time-Series

Analysis,” Environmental Health Perspectives.

Dominici, F., Daniels, M., Zeger, S. L., and Samet, J. M. (2002), “Air Pollution and

Mortality: Estimating Regional and National Dose-Response Relationships,” Journal of

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 21

the American Statistical Association, 97, 100–111.

Dominici, F., McDermott, A., Zeger, S. L., and Samet, J. M. (2003), “National Maps

of the Effects of Particulate Matter on Mortality: Exploring Geographical Variation,”

Environmental Health Perspectives, 111, 39–43.

Dominici, F., Peng, R. D., Bell, M. L., and et al. (2006), “Fine particulate air pollution and

hospital admission for cardiovascular and respiratory diseases,” JAMA: The Journal of

the American Medical Association, 295, 1127–1134.

Dominici, F., Samet, J. M., and Zeger, S. L. (2000), “Combining Evidence on Air Pollution

and Daily Mortality from the 20 Largest US Cities: A Hierarchical Modelling Strategy,”

Journal of the Royal Statistical Society. Series A (Statistics in Society), 163, 263–302.

Fuentes, M., Song, H.-R., Ghosh, S. K., Holland, D. M., and Davis, J. M. (2006), “Spatial

association between speciated fine particles and mortality,” Biometrics, 62, 855–863.

Gelfand, A. E. and Ghosh, S. K. (1998), “Model choice: A minimum posterior predictive loss

approach,” Biometrika, 85, 1–11.

Gelfand, A. E., Kim, H.-J., Sirmans, C. F., and Banerjee, S. (2003), “Spatial Modeling with

Spatially Varying Coefficient Processes,” Journal of the American Statistical Association,

98, 387–396.

George, E. I. and McCulloch, R. E. (1997), “Approaches for Bayesian variable selection,”

Statistica Sinica, 7, 339–373.

Ito, K., Mathes, R., Ross, Z., Nadas, A., Thurston, G., and Matte, T. (2011), “Fine

Particulate Matter Constituents Associated with Cardiovascular Hospitalizations and

Mortality in New York City,” Environmental Health Perspectives, 119, 467–473.

Krall, J. R., Anderson, G. B., Dominici, F., Bell, M. L., and Peng, R. D. (2013), “Short-

term exposure to particulate matter constituents and mortality in a national study of

US urban communities,” Environmental Health Perspectives, 121, 1148.

22 Biometrics, 2013

Levy, J. I., Diez, D., Dou, Y., Barr, C. D., and Dominici, F. (2012), “A Meta-Analysis and

Multisite Time-Series Analysis of the Differential Toxicity of Major Fine Particulate

Matter Constituents,” American Journal of Epidemiology, 175, 1091–1099.

Lum, K. (2012), “Bayesian variable selection for spatially dependent generalized linear

models,” arXiv:1209.0661 [stat.ME].

Nelsen, R. B. (1999), An Introduction to Copulas, New York: Springer-Verlag.

Peng, R. D., Bell, M. L., Geyh, A. S., McDermott, A., Zeger, S. L., Samet, J. M.,

and Dominici, F. (2009), “Emergency Admissions for Cardiovascular and Respiratory

Diseases and the Chemical Composition of Fine Particle Air Pollution,” Environmental

Health Perspectives, 117, 957–963.

Reich, B. J., Fuentes, M., Herring, A. H., and Evenson, K. R. (2010), “Bayesian Variable

Selection for Multivariate Spatially Varying Coefficient Regression,” Biometrics, 66, 772–

782.

Rohr, A. C. and Wyzga, R. E. (2012), “Attributing health effects to individual particulate

matter constituents,” Atmospheric Environment, 62, 130 – 152.

Scheel, I., Ferkingstad, E., Frigessi, A., Haug, O., Hinnerichsen, M., and Meze-Hausken,

E. (2013), “A Bayesian hierarchical model with spatial variable selection: the effect of

weather on insurance claims,” Journal of the Royal Statistical Society: Series C (Applied

Statistics), 62, 85–100.

Schlesinger, R. B. (2007), “The Health Impact of Common Inorganic Components of Fine

Particulate Matter (PM2.5) in Ambient Air: A Critical Review,” Inhalation Toxicology,

19, 811–832, pMID: 17687714.

Schlesinger, R. B., Kunzli, N., Hidy, G. M., Gotschi, T., and Jerrett, M. (2006), “The Health

Relevance of Ambient Particulate Matter Characteristics: Coherence of Toxicological and

Epidemiological Inferences,” Inhalation Toxicology, 18, 95–125.

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 23

Schwartz, J., Dockery, D. W., and Neas, L. M. (1996), “Is Daily Mortality Associated

Specifically with Fine Particles?” Journal of the Air & Waste Management Association,

46, 927–939.

Sklar, A. (1953), “Fonctions de repartition a n dimensions et leurs marges,” Publications de

l’Institut de Statistique de l’Universite de Paris, 8, 229–231.

— (1973), “Random variables, joint distributions, and copulas,” Kybernetica, 9, 449–460.

Smith, M. and Fahrmeir, L. (2007), “Spatial Bayesian Variable Selection With Application

to Functional Magnetic Resonance Imaging,” Journal of the American Statistical Asso-

ciation, 102, 417–431.

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Linde, A. v. d. (2002), “Bayesian

Measures of Model Complexity and Fit,” Journal of the Royal Statistical Society. Series

B (Statistical Methodology), 64, 583–639.

Thomas, D. C., Jerrett, M., Kuenzli, N., Louis, T. A., Dominici, F., Zeger, S., Schwarz, J.,

Burnett, R. T., Krewski, D., and Bates, D. (2007), “Bayesian Model Averaging in Time–

Series Studies of Air Pollution and Mortality,” Journal of Toxicology and Environmental

Health, Part A, 70, 311–315, pMID: 17365593.

Zeger, S. L., Thomas, D., Dominici, F., Samet, J. M., Schwartz, J., Dockery, D., and

Cohen, A. (2000), “Exposure Measurement Error in Time-Series Studies of Air Pollution:

Concepts and Consequences,” Environmental Health Perspectives, 108, 419–426.

Zeka, A., Zanobetti, A., and Schwartz, J. (2005), “Short Term Effects of Particulate Matter

on Cause Specific Mortality: Effects of Lags and Modification by City Characteristics,”

Occupational and Environmental Medicine, 62, 718–725.

Zhou, J., Ito, K., Lall, R., Lippmann, M., and Thurston, G. (2011), “Time-Series Analysis

of Mortality Effects of Fine Particulate Matter Components in Detroit and Seattle,”

Environmental Health Perspectives, 119, 461466.

24 Biometrics, 2013

Received: November 24, 2013

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 25

−1.0

−0.5

0.0

0.5

1.0

NE Mid Atl S Atl ESC WSC ENC WNC Mtn Pac

SulfNitrSili

El COr CSodi

AmmoAlumArse

BromCalcChloChroCopp

IronLeadMagn

NickPotaTita

VanaZinc

−1.0

−0.5

0.0

0.5

1.0

NE Mid Atl S Atl ESC WSC ENC WNC Mtn Pac

SulfNitrSili

El COr CSodi

AmmoAlumArse

BromCalcChloChroCopp

IronLeadMagn

NickPotaTita

VanaZinc

(a) Spatial (no VS) (b) Exchangeable VS

−1.0

−0.5

0.0

0.5

1.0

NE Mid Atl S Atl ESC WSC ENC WNC Mtn Pac

SulfNitrSili

El COr CSodi

AmmoAlumArse

BromCalcChloChroCopp

IronLeadMagn

NickPotaTita

VanaZinc

(c) Spatial VS

Figure 1. Posterior median coefficients (in % RR increase per IQR increase) for eachmodel. Each row includes coefficients for the labeled pollutant; each column indicates onecounty. The counties are grouped according to the 8 EPA subregions, defined as Northeast(NE), Mid Atlantic (Mid Atl), South Atlantic (S Atl), East South Central (ESC), WestSouth Central (WSC), East North Central (ENC), West North Central (WNC), Mountain(Mtn), and Pacific (Pac). This figure appears in color in the electronic version of this article.

26 Biometrics, 2013

−1.0

−0.5

0.0

0.5

1.0

Figure 2. Posterior median county-specific % risk increase for an IQR increase in elementalcarbon from the SpVS model. This figure appears in color in the electronic version of thisarticle.

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 27

●●●●●●●●●●●●●●●●●●●●●●●● ●●

●●●●

●●●●●●●

●●●●●●●●●●

●●●●●●●

●●●●●●●

●●●●●

●●●

●●●●

●●●●●●●

●●●●●●●● ●●●●●

●●●● ●●

●●●●●●● ●●●

●●●●●●

−1

01

23

β

●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●

●●●●●●●

●●●●●●●●●●

●●●●●●●

●●●●●●●

●●●●●

●●●

●●●●

●●●●●●●

●●●●●●●● ●●●●●

●●●

● ●●●●

●●●●● ●●●

●●●●●●

NE Mid Atl S Atl ESC WSC ENC WNC Mtn Pac Avg

(a) Spatial model

●●●

●● ●●●

●●●

●●●●

●●●●●●●● ●●●●●●

●●

●●●

●●●●●●

●●

●●●

●●●●●●

●●●●●●●

●●●●

●●●●●

●●●●●●

●●●●●

●●●●●

●●●●●●●

●● ●

●●

●●●

● ●●●

●●

●●●

−1

01

23

β ●

●●

●● ●●●

●●●●

●●●●●●

●●●● ●●

●●●●

●●

●●●

●●●

●●●

●●

●●●

●●●●

●●●●●●●

●●

●●●●

●●●●

●●

●●●●●●●

●●●●●

●●

●●●●●

●● ●

●●

●●

● ●●●

●●

●●

NE Mid Atl S Atl ESC WSC ENC WNC Mtn Pac Avg

(b) Exchangeable variable selection model

●●●●●●

●●●●●●●●●●●●●●●●●● ●

●●●●●●●●

●●●●

●●●●●●●●●

●●●●

●●●

●●●●●

●●

●●●

●●●●●●

●●●●●●

●●●●

●●●●●

●●●●●●●●●●●

●●

●●●●●●

●●●●●●

−1

01

23

β ●●●●●● ●●●●

●●●●

●●●●●●●●●● ●

●●●●●●●●●

●●●●●●●●

●●●

●●

●●●●

●●●

●●●●●●

●●●

●●●

●●●●●●

●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●●●

●●●●●●

●●

●●●

NE Mid Atl S Atl ESC WSC ENC WNC Mtn Pac Avg

(c) Spatial variable selection model

Figure 3. City-specific means (open dot), median (filled dot), and 95% posterior intervals(vertical lines) for the elemental carbon effect, arranged by EPA subregion, defined as North-east (NE), Mid Atlantic (Mid Atl), South Atlantic (S Atl), East South Central (ESC), WestSouth Central (WSC), East North Central (ENC), West North Central (WNC), Mountain(Mtn), and Pacific (Pac). The national average estimates are given for αk (triangle), αkπk(square), and average of the 115 βk(s) (diamond). This figure appears in color in the electronicversion of this article.

28 Biometrics, 2013

Table 1Interquartile range (IQR), median, and maximum observed value in µg/m3 across all 115 sites in the period

2000-2008. Minimum values all approximately zero, and are not shown. The seven most massive components arelisted first, with the rest in alphabetical order.

Median IQR Max Median IQR Max

Sulfate 2.43 2.89 39.90 Chlorine 0.00 0.02 5.78Nitrate 0.83 1.54 45.30 Chromium 0.00 0.00 1.17Silicon 0.07 0.10 10.10 Copper 0.00 0.00 0.55

Elemental Carbon 0.54 0.52 11.00 Iron 0.07 0.08 24.10Organic Carbon 3.30 3.42 101.18 Lead 0.00 0.00 0.98

Sodium Ion 0.09 0.13 33.40 Magnesium 0.00 0.01 2.72Ammonium Ion 1.12 1.43 21.70 Nickel 0.00 0.00 0.91

Aluminum 0.01 0.04 4.01 Potassium 0.05 0.05 12.80Arsenic 0.00 0.00 0.16 Titanium 0.00 0.01 0.56

Bromine 0.00 0.00 0.19 Vanadium 0.00 0.00 0.17Calcium 0.04 0.05 4.90 Zinc 0.01 0.01 3.11

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 29

Table 2(a) Median absolute deviation (MAD) × 100 for the simulation study. MAD is averaged across all 50 locations for

each pollutant. The overall average over datasets is presented here. Simulation settings for αk, πk are shown incolumns 2 and 3, and abbreviated names of the components used to generate the response are in column 1. A “*”indicates a statistically significant difference from the SpVS model. (b) Power and Type-I error for spatial modelwith no variable selection (Sp), exchangeable variable selection model (EVS), and spatial variable selection model

(SpVS) for local and global covariates. We calculate power for each setting of the true value of αk.

(a) Median absolute deviatio (MAD)Independent Weak Correlation Strong Correlation

α× 100 π Sp EVS SpVS Sp EVS SpVS Sp EVS SpVS

Sulf. 5 1 1.39* 1.66* 1.48 1.36 1.65* 1.38 1.10 1.55* 1.12Nitr. 10 1 1.40* 1.46 1.45 1.37* 1.59* 1.41 1.16 1.43* 1.14

Sili. 5 0.3 1.34* 0.91 0.95 1.41* 0.99* 1.11 1.17* 0.95 0.91El C. 5 0.7 1.91* 2.10 2.08 1.79 2.04* 1.78 1.49* 1.83* 1.44Sodi. 10 0.3 1.53* 0.80 0.86 1.52* 0.79 0.84 1.21* 0.70 0.66

Or C. 10 0.7 2.30* 2.02 2.04 2.30* 2.01 1.95 1.91* 2.05* 1.60

Arse. 0 0 0.53* 0.12 0.12 0.50* 0.16 0.19 0.46* 0.13* 0.18Brom. 0 0 0.60* 0.15 0.13 0.56* 0.14* 0.21 0.50* 0.12* 0.24Calc. 0 0 0.63* 0.15 0.15 0.69* 0.15* 0.25 0.54* 0.12* 0.26

Avg. Local, βk(s) = 0 1.61* 1.19* 1.23 1.64* 1.24 1.24 1.26* 0.72 0.63Avg. Local, βk(s) 6= 0 1.93* 1.72 1.73 1.87* 1.67* 1.59 1.62 2.04 1.67

(b) Power and Type I errorIndependent Weak Correlation Strong Correlation

αk Sp EVS SpVS Sp EVS SpVS Sp EVS SpVS

PowerLocal 0.05 0.41 0.30 0.34 0.48 0.31 0.49 0.65 0.37 0.67Local 0.10 0.80 0.76 0.75 0.84 0.74 0.81 0.90 0.77 0.91

Global 0.05 0.74 0.55 0.61 0.75 0.55 0.73 0.85 0.57 0.85Global 0.10 1.00 0.92 0.92 0.99 0.88 0.95 1.00 0.90 0.99

Type-I ErrorLocal 0.03 0.01 0.02 0.05 0.01 0.07 0.06 0.01 0.08

Global 0.01 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.01

30 Biometrics, 2013

Table 3Estimated overall relative risk expressed as percent increase (eαkπk × 100 − 100) and goodness of fit statistics. ForSpatial model (no VS), eαk × 100 − 100 is reported. For models with spatial correlation, range estimates in km are

given. Statistically significant associations (0.05 level) marked by ∗. Local selection marked by L (posterior0.20 < πk < 0.80). No estimates of π were greater than 0.8. We also include goodness of fit criterion, DIC, GG, and

LPML. The best model by each measure is highlighted in bold.

SpVS Spatial EVS

Sulfate 0.09L 0.26 0.02Nitrate 0.01L 0.21 0.07Silicon -0.00 -0.1 -0.00Elemental Carbon 0.42L 0.46 0.64L∗

Organic Carbon 0.03L -0.14 0.04L

Sodium Ion -0.01 -0.09 0.00Ammonium Ion 0.05 0.17 0.16L

Aluminum -0.00 -0.06 -0.02Arsenic -0.05 -0.42 -0.06Bromine -0.02 0.09 0.00Calcium -0.00 0.07 0.04Chlorine 0.01 -0.01 0.00Chromium 0.02 -0.16 -0.01Copper 0.01 -0.02 0.00Iron 0.00 0.41 0.04Lead 0.04 -0.09 -0.02Magnesium 0.02 -0.06 -0.02Nickel 0.00 0.02 -0.02Potassium 0.02 -0.15 -0.03Titanium 0.00 0.06 -0.01Vanadium 0.04 -0.03 0.02Zinc -0.00 0.20 0.02

Range (km) 1143 1175DIC 2647 2746 2684GG 27.335 27.292 27.350LPML -155495 -155551 -155585

Spatial variable selection methods for investigating acute health effects of fine particulate matter components 31

Table 4Estimates of average county specific β, mean when included (eαk × 100 − 100), probability of inclusion πk from the

Spatial VS model with 22 pollutants, and estimate of standard deviation, rescaled (eωk × 100 − 100).

Avg. βsk αk πk ωk Avg. βs

k αk πk ωk

Sulfate 0.11 0.16 0.30 3.02 Chlorine -0.01 0.09 0.06 1.04Nitrate 0.15 -0.00 0.24 2.29 Chromium -0.04 0.22 0.06 1.89Silicon 0.03 -0.04 0.08 2.18 Copper 0.01 0.03 0.07 2.26Elemental Carbon 0.59 0.56 0.70 0.94 Iron 0.06 0.03 0.08 2.76Organic Carbon 0.05 0.07 0.22 3.54 Lead -0.09 0.36 0.08 4.34Sodium -0.01 -0.08 0.08 2.10 Magnesium -0.06 0.14 0.06 2.00Ammonium 0.14 0.15 0.17 4.36 Nickel -0.02 -0.05 0.07 1.90Aluminum -0.05 -0.03 0.08 2.66 Potassium -0.15 0.15 0.13 2.51Arsenic -0.07 -0.34 0.12 3.65 Titanium -0.02 0.05 0.12 5.65Bromine 0.06 -0.12 0.08 2.98 Vanadium 0.04 0.24 0.12 3.42Calcium 0.10 -0.03 0.08 2.72 Zinc 0.06 -0.04 0.09 2.27