spatial variable selection methods for investigating acute
TRANSCRIPT
Biometrics , 1–23
2013
Spatial variable selection methods for investigating acute health effects of fine
particulate matter components
Laura F. Boehm Vock
St. Olaf College
Brian J. Reich, Montserrat Fuentes
North Carolina State University
and
Francesca Dominici
Harvard University
Summary: Multi site time series studies have reported evidence of an association between short term exposure to
particulate matter (PM) and adverse health effects, but the effect size varies across the United States. Variability in
the effect may partially be due to differing community level exposure and health characteristics, but also due to the
chemical composition of PM which is known to vary greatly by location and time. The objective of this paper is to
identify particularly harmful components of this chemical mixture. Because of the large number of highly-correlated
components, we must incorporate some regularization into a statistical model. We assume that, at each spatial
location, the regression coefficients come from a mixture model with the flavor of stochastic search variable selection,
but utilize a copula to share information about variable inclusion and effect magnitude across locations. The model
differs from current spatial variable selection techniques by accommodating both local and global variable selection.
The model is used to study the association between fine PM (PM < 2.5 µm) components, measured at 115 counties
nationally over the period 2000-2008, and cardiovascular emergency room admissions among Medicare patients.
Key words: Bayesian modeling; Fine particulate matter; Stochastic search variable selection; Spatial data.
This paper has been submitted for consideration for publication in Biometrics
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 1
1. Introduction
Particulate matter (PM) is a complex mixture of airborne particles from both primary and
secondary anthropogenic and natural sources. Major sources include combustion of fossil
fuels and biomass, dust from industrial, construction, and mining operations, wildfires, and
lightning strikes (Schlesinger et al., 2006). Particles are either emitted directly into the air
(primary pollution) or formed by chemical interactions of gases and primary pollutant par-
ticles in the air (secondary pollution). Ambient levels of this complex mixture are currently
regulated by the Environmental Protection Agency (EPA) based solely on particle size, not
chemical composition or source. Yearly average and hourly maximum levels of the total
amount of both coarse (PM10; particles < 10µm in aerodynamic diameter) and fine (PM2.5;
particles < 2.5µm) are restricted.
Numerous studies have quantified the relationship of fine particulate matter exposure and
human health using multi-site time series studies (Schwartz et al., 1996; Dominici et al., 2000,
2006; Choi et al., 2009) comparing day to day changes in ambient measures of particulate
matter to day to day changes in hospital admissions and mortality data at the county and city
level. Both spatial and temporal heterogeneity in health effect estimates have been observed
(Bell et al., 2007, 2008; Zhou et al., 2011; Ito et al., 2011). We hypothesize that spatial
variation in the estimated health effects could be due to differing chemical composition of
PM2.5 across space (Fuentes et al., 2006; Bell et al., 2007; Peng et al., 2009; Bell et al., 2009;
Levy et al., 2012). While PM2.5 is currently regulated by total mass, deeper understanding
of the toxicity of the PM2.5 chemical composition could lead to air pollution regulations
that are more targeted to the most harmful emission sources.
A growing body of literature has investigated the potential health impact of specific PM2.5
components. While carbon fractions, including elemental carbon, black carbon, and organic
carbon matter, are shown to have positive associations in a variety of health studies, no
2 Biometrics, 2013
components have been ruled out in all studies (Rohr and Wyzga, 2012). Even looking at
individual chemical components does not eliminate between site variability in the health
effect estimates. There is some variability in particle size within component depending on
the source (Schlesinger, 2007), and also in particle chemistry and acidity, as these com-
ponents may be part of larger molecules (Schlesinger et al., 2006). Site-specific effects of
individual components may also vary due to population health characteristics which modify
susceptibility or by lifestyle or housing characteristics which modify exposure to ambient air
(Dominici et al., 2002, 2003; Zeka et al., 2005). For example, Dai et al. (2014) find PM2.5
health effects are greater in counties with higher rates of smoking and heavy alcohol use.
However, Krall et al. (2013) did not find significant regional differences in mortality effects
of PM2.5 constituents using single pollutant models.
Previous epidemiological and toxicological studies investigating components of fine par-
ticulate matter tend to focus on a few pollutants or pollutant groups. Peng et al. (2009)
investigate the relationship of the seven most massive PM2.5 components and CVD hos-
pitalizations. They use a hierarchical model to estimate national effects, but assume the
effects at every site are independent. Some city-specific analyses investigate a larger number
of pollutants (Ito et al., 2011; Zhou et al., 2011), but these results are hard to generalize to
other locations.
To our knowledge, a model which accomplishes variable selection both locally (within a
spatial location) and globally (on average for all the locations) across all sites within one
model has not previously been attempted. Here we extend previous methods for multi-
site variable selection to identify components of PM2.5 which may be important for health
outcomes at all or specific sites. Our flexible spatial approach facilitates global and local
selection simultaneously, unlike previous work. Reich et al. (2010) consider variable selection
approaches for multiple predictors globally, that is, included or excluded from the model at all
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 3
sites. The coefficients for the included variables are allowed to vary by site with spatial priors.
Smith and Fahrmeir (2007) propose local selection for a single predictor. The predictor may
be included in the regression at some sites and not others. This is accomplished by treating
the coefficient of interest as the product of a spatially correlated binary random field and
an independent Gaussian field. Similar work with multiple predictors in generalized linear
models have been undertaken by Lum (2012) and Scheel et al. (2013). In all these works, the
Gaussian field assumes the magnitudes of the coefficients are independent across sites, and
there is no cross dependence between the magnitude and inclusion probability implied in the
prior. We extend this approach to include multiple predictors and spatial correlation in the
effect size, while simultaneously allowing global selection as a special case. Rather than treat
each coefficient as the product of a binary field and a continuous field, we utilize a Gaussian
copula to create a smooth predictive surface.
The key difference between the proposed method and the global methods of Reich et al.
(2010) is that it addresses not only the question of which covariates are important, but
also where the covariates are important. There are many cases when a practicing statistician
would be interested in separating these two questions. For example, in an imaging study such
as Smith and Fahrmeir (2007) one may attempt to find a particular spatial region of the brain
which is affected by a covariate. Another example is meteorology, where certain factors may
be important only in some regions because of different regional climate systems or regimes.
In the air pollution application of this paper, the objective is to identify subregions where
different pollutants are associated with adverse health outcomes. These difference may arise
from different sources of pollution or different characteristics of the at-risk population. In
summary, the proposed method is best suited to cases where covariates are thought to have
an effect only in subregions of the overall spatial domain.
This spatial variable selection method facilitates a more comprehensive analysis of PM2.5
4 Biometrics, 2013
constituent data than previously attempted. We include 22 components of PM2.5 mea-
sured in 115 counties across the United States. For these data, we find that the spatial
model without variable selection produces highly variable effect estimates across space and
component, and fails to identify any particularly harmful components. By adding variable
selection to the model, we are able to isolate elemental carbon as the component with largest
effect. Compared to a non-spatial analysis, our spatial analysis provides more precise effect
estimates, and reveals strong regional variation in the strength of this association, with the
largest estimates in the Northeast, Midwest, and Pacific Coast.
2. Data
Of the approximately 50 speciated PM2.5 components measured by the EPA, we selected p =
22 components of interest. Each contributes at least 1% of total mass to PM2.5, or the litera-
ture has suggested a potential link with health outcomes, or both. The components and sum-
mary statistics are shown in Table 1. These speciated PM measurements are taken from the
EPA’s Air Quality System (AQS) and AirExplorer databases (www.epa.gov/ttn/airs/airsaqs/,
www.epa.gov/airexplorer/). The AQS data include raw monitor values and daily averages,
while AirExplorer is a processed data product designed for use by health and epidemiology
research. For twenty of the components we use the AQS data. Because of a high proportion
of missingness for elemental carbon (EC) and organic carbon matter (OCM) in the AQS
database, we use the AirExplorer data for these components. Any observations below the
lower limit of detection are recorded as one half the detection limit. Following Peng et al.
(2009), for counties that had more than one active monitor on a given day, an average was
taken using 10% trimmed mean if more than 10 stations; for 3–10 stations, minimum and
maximum values were excluded from the mean; and for 2 stations, we use the mean. All
components are measured in µg/m3 except EC, which is measured in inverse megameters, a
measure of light extinction in haze.
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 5
We only used information from non-source-oriented monitors, and exclude values flagged
by the EPA for data quality issues. Source-oriented monitors are placed with the intention
of monitoring a known large pollutant source, and may not be in a populated area. To avoid
biased pollution measurements, we exclude these and focus on non-source-oriented monitors,
which are placed with the purpose of estimating the exposure in populated areas. Any days
missing pollutant information were excluded. We include 117 counties in the US with at least
100,000 residents and PM2.5 components monitors active on at least 150 days in the time
period 2000-2008. Of these we exclude two California counties with data quality issues. In
these counties half the pollution measurements were 1000 times larger than expected based
on nearby counties and measurements on preceding days. In total we include 115 counties.
As recommended by Gelfand et al. (2003) covariates are scaled, but not centered. We use
the 90th percentile value to scale rather than standard deviation because of the skewness
of the pollution data. We note that other scaling factors could be used, such as standard
deviation or interquartile range (IQR), and that this is equivalent to putting different prior
variances on the coefficients for each pollutant, and that scaling is important when pollutants
are present in much different concentrations. As in Peng et al. (2009), we removed extreme
pollution values from both the simulation study and analysis, where “extreme” is any value
more than double the second highest value for that pollutant in that county. This results in
a removal of approximately 0.4% of the observed data. We also conducted the data analysis
with the extreme values as part of our sensitivity analysis (see Supplementary Material).
The health data includes Medicare beneficiary enrollment and Medicare Part-A inpatient
records, aggregated to daily county totals. We count the number of patients hospitalized with
a principal ICD-9 diagnosis code related to cardiovascular disease (CVD). These include heart
failure, ischemic heart disease, and cerebrovascular events among others. The average CVD
6 Biometrics, 2013
hospitalization rates vary greatly by county, ranging from 8.5 to 32 per 100,000 Medicare
enrollees per day.
[Table 1 about here.]
3. Statistical model
We observe Yt(s) CVD cases at county s on day t, in addition to a vector of pollutants, xt(s),
and confounding variables zt(s). The confounding variables include indicator functions for
the day of the week and smooth functions of time, temperature, and dewpoint. To adjust for
long term and seasonal trends, we include natural cubic spline smoothing function of time
with 7 df per year as a predictor in the model. We also include indicators for the day of the
week, and splines for daily average temperature (6 df) and the 3-day lag mean dewpoint (3
df) to control for confounding with weather. We chose these numbers of degrees of freedom
for consistency with Dominici et al. (2002). Other work, including Peng et al. (2009) use
different numbers of degrees of freedom. Details of sensitivity analysis are included in the
supplementary material.
We utilize the spatially varying coefficients model described by Gelfand et al. (2003),
Yt(s)|β(s),η(s) ∼ Poisson [Nt(s)exp{β0(s) + xt(s)′β(s) + zt(s)
′η(s)}] (1)
where Nt(s) is a known offset for the number of Medicare patients at county s on day t
and β0(s) is a spatially varying intercept. We include spatially varying coefficients β(s) =
[β1(s), . . . , βp(s)]T and η(s) = [η1(s), . . . , ηqs(s)]
T . Note that the dimension of η(s), qs,
depends on the county s due to the use of 7 knots per year for the spline function of
time, and that each county may be observed over a different time interval. We control for
unmeasured confounding independently within each location and then borrow information
across locations for the health estimates. Therefore we select a low precision normal density
as the prior for the confounding effects η(s), which is independent across counties. We let
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 7
βk(s) denote the coefficient on pollutant k, for k = 1, . . . , p, and county s, for s = 1, . . . , n.
The prior on βk(s) will imply both spatial correlation and variable selection.
3.1 Spatial variable selection model
To facilitate variable selection, the marginal prior distribution for each coefficient βk(s) is a
mixture density,
fk[βk(s)] = πkN(αk, ω2k) + (1− πk)N(0, ω2
k/C), (2)
where N(m, v) is the Gaussian density function with mean m and variance v. Global behavior
is exhibited when πk = 1 or 0. Coefficients will be smoothed either toward αk, the overall
mean when βk is important, or toward 0. Thus πkαk is the overall mean effect across
all counties. Variability is controlled by ωk, with C representing the ratio of variance for
coefficients “in” and “out” of the model. One interpretation of C is that if βk(s) is within
±3ωk/√C of 0 it may be safely replaced by zero (George and McCulloch, 1997). If this
tuning parameter C = ∞, the second part of the mixture distribution is a point mass at
zero. This is the formulation of SSVS used by Smith and Fahrmeir (2007) and Reich et al.
(2010), in which the coefficients are modeled as the product of binary and continuous fields.
In the simulation study, we let C = 100, while goodness of fit criteria are used to select
C = 400 for the data analysis in Section 5.
In our model we apply a copula (Nelsen, 1999; Sklar, 1973, 1953) to the marginal distri-
bution in (2). A copula is a general technique for modeling dependence between random
variables while preserving desired marginal distributions. Many copulas have been proposed
to capture various forms of dependence, including the class of Archimedean copulas and
copulas constructed from multivariate densities such as Gaussian and t (Nelsen, 1999). We
select a Gaussian copula because the vast majority of spatial modeling deals with dependence
through the covariance function of a Gaussian process, and the Gaussian copula allows us
to leverage these models for our spatial variable selection model. To implement the copula
8 Biometrics, 2013
model, we introduce latent variables, θk(s), which follow a mean zero Gaussian process with
correlation function ρ(s, s′). Here we use an exponential spatial correlation function,
ρ(s, s′) = cor [θk(s), θk(s′)] = exp (−||s− s′||/r).
The range parameter r has the interpretation that at distance 3r correlation is about 0.05.
Although we choose an exponential correlation structure, any correlation function may be
used, including a generalization to non-stationary areal correlation structure such as that
implied by a conditional autoregressive model (CAR). We fix the variance of θk to one for
identifiability purposes in the Gaussian copula. We write a Gaussian process with mean m,
standard deviation v, and exponential spatial correlation with range r as GPex(m, v, r).
Hence, θk(s) ∼ GPex(0, 1, r). We force βk(s) to have the desired marginal distribution by
the transformation
βk(s) = F−1k {Φ[θk(s)]} , (3)
where Φ is the standard normal cdf and Fk is the marginal cdf of βk(s) defined in (2) and
dependent on πk, αk, and ωk.
3.2 A comparison of copula and other model solutions
As noted previously, the usual SSVS formulation of the model in (2) is to set C = ∞ and
write βk(s) as the product of a binary and a continuous field. That is, let βk(s) = γk(s)Bk(s)
where γk(s) is a binary spatial process and Bk(s) is a continuous process. in the introduction,
an alternative approach is to define each coefficient as the product of a binary field and a
continuous field. In the local approach by Smith and Fahrmeir (2007) the binary indicator
varies spatially according to an Ising (auto-logistic) prior, which is a binary analog to the
Conditionally Autoregressive (CAR) model used for continuous data measured on a lattice.
The continuous part, measuring the effect size, is assumed to vary across sites, but no spatial
correlation is directly imposed in the prior. Using our notation, we can write γk(s) ∼ Ising,
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 9
and Bk(s)iid∼ N(0, ω2). The model described by Smith and Fahrmeir (2007) has p = 1,
while Lum (2012) and Scheel et al. (2013) extend to multiple predictors. In Smith and
Fahrmeir (2007) the approach is applied to functional MRI data in which spatial indication
of importance through γk(s) was of primary interest. The spatial variable selection approach
we propose encourages sharing of information across sites by centering the distribution of
βk(s) around αk and inducing spatial correlation through the copula.
While possible to induce spatial correlation through Bk(s), treating βk(s) as the product
of two fields or variables requires choosing a correlation and cross-correlation structure for
the two processes. The copula naturally incorporates the intuition that if a component has
no effect within a given county, nearby counties are unlikely to have large coefficients and
creates a continuous prior surface. The realization of the spatial process on a spatial grid
in Supplemental Materials A shows large areas with β(s) = 0 (light blue) corresponding
to regions of null effects. The effect is near zero for sites near null regions, and β(s) varies
smoothly across sites in non-null regions.
The global approach to spatial variable selection as implemented by Reich et al. (2010)
fosters sharing of information across sites, but assumes the same set of predictors are included
in the model at every site. Using our notation, their model can be summarized as γk(s) =
γk ∼ Bernoulli(π) and Bk(s) is a Gaussian Process with spatial correlation. The Reich et al.
(2010) model also includes a second binary process which determines whether the variance
in the Gaussian process is non-zero. That is, coefficients are equal to zero at all sites, equal
to αk at all sites, or are spatially varying with mean α. Because this approach is global, it
does not allow for the case that pollutants would have zero coefficients at some sites and
non-zero effects at others.
The SpVS approach we implement here allows for both local and global inferences while
incorporating spatial smoothing through a single continuous process. While the non-copula
10 Biometrics, 2013
approaches can be written with conjugate full-conditional Gibbs sampling updates for all
parameters conditional on β, the copula approach is still computationally feasible through
MCMC and code is provided in the supplementary material.
4. Simulation study
In the simulation study, we will investigate the performance of our model in terms of
mean absolute deviation (MAD), power, Type I error, and ability to correctly identify null
coefficients compared to a spatial model without variable selection and a variable selection
model without spatial correlation. Because the primary focus of this study is to evaluate
local variable selection, we compare performance for varying degrees of spatial correlation in
the site-specific coefficients βk(s).
We generate coefficients βk(s) from multivariate normal latent variables θk(s) with expo-
nential correlation using the transformation in (3) where F is the cdf of (2) with C = ∞.
Though we fit the continuous version of the model where C <∞, we chose to generate the
data for the C =∞ case so that the true values of the coefficients will be exactly zero when
they are out of the model, whether globally or locally.
Each model has p = 9 covariates. By varying π1, . . . , π9 we determine the behavior for
each of the nine pollutant coefficient vectors. We will define “global” behavior, as being
generated with πk = 1 and thus βk(s) are drawn from a multivariate normal distribution,
βk(s) ∼ N(αk, ω2kΣ), where Σ is an exponential correlation matrix. Globally “null” covariates
have πk = 0 and hence βk(s) = 0 for all s. Covariates exhibiting “local” behavior will have a
non-zero mean αk, but 0 < πk < 1 so that the marginal distribution at each site is a mixture
distribution.
For computational purposes, we restrict the simulation study to 48 counties in the Eastern
United States. The maximum distance between these counties is 1360 km. We include the
components which contribute the greatest mass to total PM2.5: sulfate, nitrate, elemental
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 11
carbon, organic carbon, silicon, and sodium, as well as arsenic, bromine, and calcium. By
using the actual pollutant data to generate responses, we evaluate the models under the more
realistic scenario in which covariates of interest are moderately to strongly correlated within
site across time. Table 2(a) specifies the components and true parameters αk and πk used
to generate the data. The overall mean α is 0, 0.05, or 0.10, and the inclusion probability
πk is 0, 0.3, 0.7, or 1, indicating null, local, or global inclusion. For all coefficients we set
ωk = 0.025. We use the Poisson regression model (1) and the observed pollution data xt(s)
to generate values of Yt(s) based on our draws of β(s). That is, for each of M datasets we
generate βk(s) and then Yt(s)|β(s),xt(s). For simplicity, we do not include any confounding
variables in the simulation study.
Following the procedures outlined above, we generate M = 50 datasets using each of the
following designs for the correlation in βk(s):
(1) No Spatial Correlation: The true coefficients for each county are independent.
(2) Moderate Spatial Correlation: The true coefficients for each county are exponentially
spatially correlated with range r = 60 km (effective range 180 km).
(3) Strong Spatial Correlation: The true coefficients for each county are exponentially
spatially correlated with range r = 240 km (effective range 720 km).
In all cases we also include a random intercept β0(s), with E[β0(s)] = −8 yielding an
approximate average risk similar to the median risk observed. For the model with no spatial
correlation, the intercepts are also uncorrelated. The intercept in designs 2 and 3 are expo-
nentially spatially correlated with r = 120 km, which implies an effective range of 360 km.
The standard deviation of β0(s), ω0 = 1, is similar to the between county standard deviation
of average CVD rate.
We investigate the following three models:
12 Biometrics, 2013
(1) Spatial Model (Sp): βk(s) ∼ GPex(αk, ωk, r), without variable selection (i.e., πk =
1, k = 1, . . . , p).
(2) Exchangeable VS Model (EVS): βk(s) are independent and follow the distribution
in (2).
(3) Spatial VS Model (SpVS): The full model in Section 3. βk(s) marginally follow a
mixture distribution described in (2), and dependence is induced with a Gaussian copula.
In the simulation study, we use relatively uninformative priors, with αk ∼ N(0, 1), πk ∼
Beta(1, 1), 1/ω2k ∼ Gamma(0.001, 0.001), and r ∼ Uniform(0, 2). For each dataset we run
the model for 5500 iterations discarding the first 1000. We compare these three models using
MAD, power and type-I error of βk(s), and the ability to identify locally null coefficients.
4.1 Simulation results
Table 2(a) gives the MAD separately for global (π = 1), local (0 < π < 1), and null (π =
0) predictors. For global predictors, the spatial models (Sp and SpVS) have similar MAD
and outperform the non-spatial model (EVS), as expected. The MAD for null covariates is
considerably smaller for the variable selection models (EVS and SpVS). The spatial variable
selection model outperforms both competitors for the local predictors, where both shrinking
effects to zero and borrowing strength across space are beneficial.
Table 2(b) shows that by exploiting dependence between nearby locations, the SpVS model
has substantially higher power for detecting important covariates than the EVS model. Type-
I errors rates for local and global effects are near the nominal 0.05 level for all models, and
much less than 0.05 for the globally zero coefficients for all three models (not shown). For
the local covariates, there is not a significant difference in power when πk = 0.3 and when
πk = 0.7, therefore power is only reported for each level of αk. When the coefficients are
uncorrelated, the spatial model has the highest power, with the EVS and SpVS models
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 13
having slightly lower power very similar to each other. When the coefficients have weak or
strong correlation, both spatial models significantly outperform the EVS model.
Finally, we check the ability of three goodness of fit measures to select the correct model:
Deviance Information Criterion (DIC) (Spiegelhalter et al., 2002); the predictive measure of
Gelfand and Ghosh (1998) (GG); and log pseudo marginal likelihood (LPML). Smaller values
indicate better fit for DIC and GG, while the preferred model is indicated by larger values of
LPML. The generating model for Scenario 1 is EVS, while for Scenarios 2 and 3 the correct
model is SpVS. In this study, DIC and LPML most often pick the correct “best” model,
while GG is less accurate. Under Scenario 1, LPML and DIC pick SpVS and EVS equally
often, while GG is split across all three models. LPML selects SpVS 38/50 and 48/50 times
for Scenarios 2 and 3. DIC performs similarly, 37/50 and 48/50 times respectively. Neither
measure selected the Sp model under any scenario. GG incorrectly selects the EVS model
as best 32/50 and 43/50 times in Scenarios 2 and 3, and chooses the Sp model 14/50 and
6/50 times, selecting the correct SpVS model only 4/50 and 1/50 times.
[Table 2 about here.]
5. Analysis of CVD hospitalization data
5.1 Model comparisons
We fit the Spatial, EVS, and Spatial VS models to the CVD hospitalization and PM2.5
component data described in Section 2. Due to the large number of confounding variables
per site, we chose to fit a two-stage approximation to the fully Bayesian model. The Poisson
likelihood function in (1) is approximated by a normal likelihood, ˆβ(s) ∼ Np[β(s), V (s)],
where ˆβ(s) is the maximum likelihood estimator and V (s) is the estimated covariance matrix.
This two-stage model has the advantage of much faster computation time and model fit does
14 Biometrics, 2013
not depend on MCMC convergence of the confounding coefficients, as the approximation is
fit marginally over the confounding coefficients η(s).
In all of the models, we choose a flat prior between 0 and 1 for πk, but note that one
could use a beta prior for πk to encourage global selection by choosing both beta parameters
to be less than one. Local selection (0< πk <1) could also be weighted more heavily in
the prior, and greater prior weight could be placed on inclusion or exclusion by choosing an
asymmetric beta distribution for the prior. The prior for αk is normal with standard deviation
0.025, which is approximately two to four times larger than the effect sizes we expect to see.
Recalling that we model the log relative risk, 0.1 would correspond to a more than 10%
increase in risk of CVD hospitalization for a 90th percentile increase in pollution, which
previous work suggests is quite large. We have found that model fit is somewhat sensitive to
the choice of the prior on ωk. When ωk is large it becomes difficult to differentiate between
the 0 and αk modes of the distribution, and thus πk and αk become difficult to estimate.
Though αk and πk are therefore somewhat sensitive to the choice of prior on ωk, we found
that estimates of βk(s) are more stable over different choices of prior. We therefore choose a
gamma(0.1, 0.001) prior on the inverse variance, 1/ω2k, to be relatively uninformative while
encouraging ωk to be small. We note in the simulation study that larger effect sizes are more
robust to an uninformative prior on ω2k, and that this prior is equivalent to fitting gamma(0.1,
0.1), a vague prior, on data with 10 times larger effect sizes, roughly equivalent to the effect
sizes of the simulation study. The spatial range parameter is a uniform distribution from 0 to
1200 km. We fit the spatial VS model with different choices for the value of C; C=100, 225,
400, and 900. DIC was nearly identical for the first three choices, and highest for C=900.
Here we present the model with C = 400. Though C = 100 minimizes DIC for the EVS
model, the EVS model presented here also has C = 400 for consistency, and the model fits
are very similar. We generate 50,000 MCMC samples, discarding the first 10,000 for burn in.
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 15
We compare models using DIC, LPML, and GG, the goodness of fit measures described in
Section 4.1. We calculate LPML and GG statistics using the Poisson likelihood in (1) with
ηk(s) replaced by its maximum likelihood estimate, ηk(s). We calculate the DIC using the
normal likelihood approximation. The goodness of fit measures for each model are shown in
Table 3. The spatial VS model is selected as the best model by DIC and LPML, a substantial
improvement over EVS and spatial. The spatial model without variable selection is selected
as the best model by GG, but as shown in the simulation study, this result is questionable.
We also note that the estimated spatial correlation is very strong, further evidence that the
spatial models are a better choice than the exchangeable model.
In addition to these model comparisons, we conducted an extensive sensitivity analysis,
which is summarized in the supplementary material. We investigated sensitivity to several
modeling choices, including the effects of hyperpriors, confounders, and the variance ratio,
C, and found some sensitivity for hyperparameters, but that the effect estimates of interest
were robust to these factors.
5.2 Results
[Figure 1 about here.]
[Figure 2 about here.]
[Figure 3 about here.]
The overall (spatial average) effects for all pollutants are reported in Table 3, where local
covariates are defined as 0.20 < πk < 0.80 and global covariates as πk > 0.80. In Table
4 we present the posterior estimates for the overall mean when included, αk, the inclusion
probability, πk, and the random effects standard deviation, ω2k, for our spatial VS model.
Across all models, we consistently find positive overall effects for elemental carbon (EC). In
the EVS model, the overall effect estimate of elemental carbon, α4, is statistically significant
16 Biometrics, 2013
(95% posterior credible interval excludes 0). For the spatial and spatial VS models, posterior
probability that α4 > 0 is 0.88. No other overall effects are statistically significant, though
many county specific relative risk estimates are significant. These results are in general
agreement with Peng et al. (2009), who found 0.8% increase in CVD hospitalization risk
for an IQR increase in elemental carbon, over the time period 2002-2006 in a six pollutant
model.
The three models produce markedly different spatial patterns for the effect of EC, as shown
in Figure 1. The spatial model without variable selection shows considerable variation in the
effects across component and location. In the presence of the large number of potential
predictors, EC does not stand out as particularly important in the spatial model without
variable selection (Figure 1a). EC is clearly identified as the most relevant pollutant in both
variable selection models. In the EVS model, the effect size varies considerably by city with
no discernible spatial pattern. In contrast, the SpVS model reveals a strong spatial pattern,
with the largest effects in the Northeast, Mid Atlantic, East North Central, West North
Central, and Pacific regions. Therefore, the spatial variable selection model isolates both a
specific PM component with the most impact, and further refines the spatial distribution of
the effect size.
Figure 2 maps the county-specific EC effect estimates. The estimates vary smoothly across
space, with the largest values in the Northeast, Midwest, and Pacific Coast. The estimates
are small in other areas, with the notable exception of El Paso, Texas which has one of the
largest effect estimates. The 95% posterior intervals for many counties exclude zero for the
SpVS model, as shown in Figure 3c. Comparing the intervals for the EVS and SpVS models
in Figure 3 illustrates the benefit of spatial smoothing; by borrowing strength across nearby
counties, the intervals are narrower for the SpVS model and thus, unlike the EVS model,
several individual counties have statistically significant EC effects.
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 17
[Table 3 about here.]
[Table 4 about here.]
6. Discussion
In this paper, we develop a statistical model which allows for local variable selection of spa-
tially correlated coefficients. The copula framework creates a smooth prior surface reflecting
the intuition that effect estimates near counties where the effect is zero will also be small.
A simulation study confirms that including spatial correlation in the model, while adding
computational complexity, results in better model fit and improved effect estimation than
the spatial model or the exchangeable variable selection model. While the MAD for globally
included covariates are similar between spatial and spatial VS models as expected, we found
more precise estimates and more powerful tests of association when the data were generated
with locally-relevant predictors. Under strong spatial correlation, our model had nearly half
the MAD of the spatial model and 20% greater power than the EVS model.
In the data application, the SpVS model is clearly the most appealing. Figure 1 shows
that the spatial models with and without variable selection give dramatically different
results. With variable selection, a single pollutant (elemental carbon) is identified as the
most relevant, whereas many pollutants have some effect in the model without variable
selection. Compared to the EVS model, we find a substantial reduction in variance due to
spatial smoothing.
This investigation can help us understand how observed spatial and temporal variations
in the health effects of total PM2.5 concentration may be related to variation in chemical
components. No previous study to our knowledge has utilized variable selection techniques
on such a large list of components, and few have incorporated spatial correlation.
When interpreting these overall effect estimates, it is also important to note that these
18 Biometrics, 2013
numbers reflect only the average risk increase or decrease for the 115 counties included in the
study, and are not a reflection of an average national risk. The counties selected for the study
are generally densely populated urban counties which may differ significantly in pollution
levels, demographic, socio-economic and health characteristics from other US counties. Also,
while epidemiological studies such as this one can point to interesting relationships between
pollutant levels and health effects, we must be careful in interpreting effects as causal links.
As discussed by Thomas et al. (2007), the indicated pollutants may be correlated with other
pollutants not included with the study, or in fact may be proxying for another pollutant
which is included. Because we do not observe the relationship between ambient and personal
exposure, the indicated pollutants may be signaling due to higher association with personal
exposure, though another pollutant is more correlated with health. Measurement error,
particularly that of using ambient pollution as a proxy for personal exposure, may also play
a role, especially for pollutants which are extremely spatially heterogeneous. A spatially
homogeneous pollutant may have stronger association between ambient and personal expo-
sure, and thus may mask the more heterogeneous pollutant which has a relationship between
personal exposure and health. Previous work studying measurement error and coarse PM10
(that greater than 2.5) (Chang et al., 2011) and total PM10 (Zeger et al., 2000) suggest
relatively consistent estimates of that effect under different measurement error scenarios,
but spatial heterogeneity of individual components of PM2.5 may be appreciably greater,
and individual components vary in their level of heterogeneity (Bell et al., 2011). Further
investigation of how spatial heterogeneity and measurement error may change health effect
estimates is warranted, particularly in variable selection models.
Further research into the reason for heterogeneity in pollutant effects using both epidemi-
ological and toxicological approaches is also needed. Heterogeneity may be due to differences
in particle chemistry, interaction effects, differing population exposure characteristics, or the
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 19
measurement error issues described above. Though we have divided PM into its chemical
components, the physical and toxicological properties are by no means homogeneous. For
example, sulfate and nitrate particles are mostly secondary pollutants created by reactions of
sulfur and nitrogen oxides with other gases and particles to form a diverse set of molecules,
including sulfuric acid and ammonium bisulfate, which vary in acidity (Schlesinger et al.,
2006). Toxicological studies suggest this acidity, not just the components, may be an impor-
tant factor in health outcomes (Schlesinger et al., 2006). Individual chemical components may
come from diverse primary sources as well, resulting in differing size and associated particles.
For example, potassium becomes airborne from burning biomass, from blown crustal material
(dust), and salts from sea spray which may have differing biological effects (Schlesinger,
2007). An extension of this SpVS model to a space-time setting would be useful for fully
understanding possible seasonal effects. Though we adjust for seasonal confounding, we do
not consider interactions of pollutant effect and season in the current model.
Supplementary Materials
Web Appendices and Figures referenced in Sections 2, 3, 3.2, and 5.1 and R code for
implementing the SpVS model are available with this paper at the Biometrics website on
Wiley Online Library.
Acknowledgements
We are grateful for the support provided by NIH/NIEHS grant R21ES022585-01, NIH grants
R01ES019560, R21ES022795-01A1, and 5R01ES014843-02, EPA grant RD83479801, and
funds from HEI. We are also grateful to the editor, associate editor, and referees for their
helpful comments.
20 Biometrics, 2013
References
Bell, M. L., Dominici, F., Ebisu, K., Zeger, S. L., and Samet, J. M. (2007), “Spatial and
Temporal Variation in PM2.5 Chemical Composition in the United States for Health
Effects Studies,” Environmental Health Perspectives, 115, 989–995.
Bell, M. L., Ebisu, K., and Peng, R. D. (2011), “Community-level spatial heterogeneity
of chemical constituent levels of fine particulates and implications for epidemiological
research.” Journal of Exposure Science & Environmental Epidemiology, 21, 372 – 384.
Bell, M. L., Ebisu, K., Peng, R. D., Samet, J. M., and Dominici, F. (2009), “Hospital
admissions and chemical composition of fine particle air pollution,” American Journal
of Respiratory and Critical Care Medicine, 179, 1115–1120.
Bell, M. L., Ebisu, K., Peng, R. D., Walker, J., Samet, J. M., Zeger, S. L., and Dominici,
F. (2008), “Seasonal and Regional Short-term Effects of Fine Particles on Hospital
Admissions in 202 US Counties, 1999-2005,” American Journal of Epidemiology, 168,
1301–1310.
Chang, H. H., Peng, R. D., and Dominici, F. (2011), “Estimating the acute health effects of
coarse particulate matter accounting for exposure measurement error,” Biostatistics, 12,
637–652.
Choi, J., Fuentes, M., and Reich, B. J. (2009), “Spatial-temporal association between fine
particulate matter and daily mortality,” Computational Statistics & Data Analysis, 53,
2989–3000.
Dai, L., Zanobetti, A., Koutrakis, P., and Schwartz, J. D. (2014), “Associations of Fine
Particulate Matter Species with Mortality in the United States: A Multicity Time-Series
Analysis,” Environmental Health Perspectives.
Dominici, F., Daniels, M., Zeger, S. L., and Samet, J. M. (2002), “Air Pollution and
Mortality: Estimating Regional and National Dose-Response Relationships,” Journal of
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 21
the American Statistical Association, 97, 100–111.
Dominici, F., McDermott, A., Zeger, S. L., and Samet, J. M. (2003), “National Maps
of the Effects of Particulate Matter on Mortality: Exploring Geographical Variation,”
Environmental Health Perspectives, 111, 39–43.
Dominici, F., Peng, R. D., Bell, M. L., and et al. (2006), “Fine particulate air pollution and
hospital admission for cardiovascular and respiratory diseases,” JAMA: The Journal of
the American Medical Association, 295, 1127–1134.
Dominici, F., Samet, J. M., and Zeger, S. L. (2000), “Combining Evidence on Air Pollution
and Daily Mortality from the 20 Largest US Cities: A Hierarchical Modelling Strategy,”
Journal of the Royal Statistical Society. Series A (Statistics in Society), 163, 263–302.
Fuentes, M., Song, H.-R., Ghosh, S. K., Holland, D. M., and Davis, J. M. (2006), “Spatial
association between speciated fine particles and mortality,” Biometrics, 62, 855–863.
Gelfand, A. E. and Ghosh, S. K. (1998), “Model choice: A minimum posterior predictive loss
approach,” Biometrika, 85, 1–11.
Gelfand, A. E., Kim, H.-J., Sirmans, C. F., and Banerjee, S. (2003), “Spatial Modeling with
Spatially Varying Coefficient Processes,” Journal of the American Statistical Association,
98, 387–396.
George, E. I. and McCulloch, R. E. (1997), “Approaches for Bayesian variable selection,”
Statistica Sinica, 7, 339–373.
Ito, K., Mathes, R., Ross, Z., Nadas, A., Thurston, G., and Matte, T. (2011), “Fine
Particulate Matter Constituents Associated with Cardiovascular Hospitalizations and
Mortality in New York City,” Environmental Health Perspectives, 119, 467–473.
Krall, J. R., Anderson, G. B., Dominici, F., Bell, M. L., and Peng, R. D. (2013), “Short-
term exposure to particulate matter constituents and mortality in a national study of
US urban communities,” Environmental Health Perspectives, 121, 1148.
22 Biometrics, 2013
Levy, J. I., Diez, D., Dou, Y., Barr, C. D., and Dominici, F. (2012), “A Meta-Analysis and
Multisite Time-Series Analysis of the Differential Toxicity of Major Fine Particulate
Matter Constituents,” American Journal of Epidemiology, 175, 1091–1099.
Lum, K. (2012), “Bayesian variable selection for spatially dependent generalized linear
models,” arXiv:1209.0661 [stat.ME].
Nelsen, R. B. (1999), An Introduction to Copulas, New York: Springer-Verlag.
Peng, R. D., Bell, M. L., Geyh, A. S., McDermott, A., Zeger, S. L., Samet, J. M.,
and Dominici, F. (2009), “Emergency Admissions for Cardiovascular and Respiratory
Diseases and the Chemical Composition of Fine Particle Air Pollution,” Environmental
Health Perspectives, 117, 957–963.
Reich, B. J., Fuentes, M., Herring, A. H., and Evenson, K. R. (2010), “Bayesian Variable
Selection for Multivariate Spatially Varying Coefficient Regression,” Biometrics, 66, 772–
782.
Rohr, A. C. and Wyzga, R. E. (2012), “Attributing health effects to individual particulate
matter constituents,” Atmospheric Environment, 62, 130 – 152.
Scheel, I., Ferkingstad, E., Frigessi, A., Haug, O., Hinnerichsen, M., and Meze-Hausken,
E. (2013), “A Bayesian hierarchical model with spatial variable selection: the effect of
weather on insurance claims,” Journal of the Royal Statistical Society: Series C (Applied
Statistics), 62, 85–100.
Schlesinger, R. B. (2007), “The Health Impact of Common Inorganic Components of Fine
Particulate Matter (PM2.5) in Ambient Air: A Critical Review,” Inhalation Toxicology,
19, 811–832, pMID: 17687714.
Schlesinger, R. B., Kunzli, N., Hidy, G. M., Gotschi, T., and Jerrett, M. (2006), “The Health
Relevance of Ambient Particulate Matter Characteristics: Coherence of Toxicological and
Epidemiological Inferences,” Inhalation Toxicology, 18, 95–125.
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 23
Schwartz, J., Dockery, D. W., and Neas, L. M. (1996), “Is Daily Mortality Associated
Specifically with Fine Particles?” Journal of the Air & Waste Management Association,
46, 927–939.
Sklar, A. (1953), “Fonctions de repartition a n dimensions et leurs marges,” Publications de
l’Institut de Statistique de l’Universite de Paris, 8, 229–231.
— (1973), “Random variables, joint distributions, and copulas,” Kybernetica, 9, 449–460.
Smith, M. and Fahrmeir, L. (2007), “Spatial Bayesian Variable Selection With Application
to Functional Magnetic Resonance Imaging,” Journal of the American Statistical Asso-
ciation, 102, 417–431.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Linde, A. v. d. (2002), “Bayesian
Measures of Model Complexity and Fit,” Journal of the Royal Statistical Society. Series
B (Statistical Methodology), 64, 583–639.
Thomas, D. C., Jerrett, M., Kuenzli, N., Louis, T. A., Dominici, F., Zeger, S., Schwarz, J.,
Burnett, R. T., Krewski, D., and Bates, D. (2007), “Bayesian Model Averaging in Time–
Series Studies of Air Pollution and Mortality,” Journal of Toxicology and Environmental
Health, Part A, 70, 311–315, pMID: 17365593.
Zeger, S. L., Thomas, D., Dominici, F., Samet, J. M., Schwartz, J., Dockery, D., and
Cohen, A. (2000), “Exposure Measurement Error in Time-Series Studies of Air Pollution:
Concepts and Consequences,” Environmental Health Perspectives, 108, 419–426.
Zeka, A., Zanobetti, A., and Schwartz, J. (2005), “Short Term Effects of Particulate Matter
on Cause Specific Mortality: Effects of Lags and Modification by City Characteristics,”
Occupational and Environmental Medicine, 62, 718–725.
Zhou, J., Ito, K., Lall, R., Lippmann, M., and Thurston, G. (2011), “Time-Series Analysis
of Mortality Effects of Fine Particulate Matter Components in Detroit and Seattle,”
Environmental Health Perspectives, 119, 461466.
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 25
−1.0
−0.5
0.0
0.5
1.0
NE Mid Atl S Atl ESC WSC ENC WNC Mtn Pac
SulfNitrSili
El COr CSodi
AmmoAlumArse
BromCalcChloChroCopp
IronLeadMagn
NickPotaTita
VanaZinc
−1.0
−0.5
0.0
0.5
1.0
NE Mid Atl S Atl ESC WSC ENC WNC Mtn Pac
SulfNitrSili
El COr CSodi
AmmoAlumArse
BromCalcChloChroCopp
IronLeadMagn
NickPotaTita
VanaZinc
(a) Spatial (no VS) (b) Exchangeable VS
−1.0
−0.5
0.0
0.5
1.0
NE Mid Atl S Atl ESC WSC ENC WNC Mtn Pac
SulfNitrSili
El COr CSodi
AmmoAlumArse
BromCalcChloChroCopp
IronLeadMagn
NickPotaTita
VanaZinc
(c) Spatial VS
Figure 1. Posterior median coefficients (in % RR increase per IQR increase) for eachmodel. Each row includes coefficients for the labeled pollutant; each column indicates onecounty. The counties are grouped according to the 8 EPA subregions, defined as Northeast(NE), Mid Atlantic (Mid Atl), South Atlantic (S Atl), East South Central (ESC), WestSouth Central (WSC), East North Central (ENC), West North Central (WNC), Mountain(Mtn), and Pacific (Pac). This figure appears in color in the electronic version of this article.
26 Biometrics, 2013
−1.0
−0.5
0.0
0.5
1.0
Figure 2. Posterior median county-specific % risk increase for an IQR increase in elementalcarbon from the SpVS model. This figure appears in color in the electronic version of thisarticle.
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 27
●●●●●●●●●●●●●●●●●●●●●●●● ●●
●●●●
●●●●●●●
●●●●●●●●●●
●●●●●●●
●●●●●●●
●●●●●
●●●
●●●●
●●●●●●●
●●●●●●●● ●●●●●
●●●● ●●
●●●●●●● ●●●
●●●●●●
−1
01
23
β
●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●
●●●●●●●
●●●●●●●●●●
●●●●●●●
●●●●●●●
●●●●●
●●●
●●●●
●●●●●●●
●●●●●●●● ●●●●●
●●●
● ●●●●
●●●●● ●●●
●●●●●●
NE Mid Atl S Atl ESC WSC ENC WNC Mtn Pac Avg
(a) Spatial model
●●●
●
●● ●●●
●●●
●●●●
●●●●●●●● ●●●●●●
●●
●●●
●●●●●●
●●
●●●
●
●●●●●●
●●●●●●●
●
●●●●
●
●
●●●●●
●●●●●●
●●●●●
●●●●●
●●●●●●●
●● ●
●●
●
●
●●●
● ●●●
●●
●
●●●
−1
01
23
β ●
●●
●
●● ●●●
●
●●●●
●●●●●●
●●●● ●●
●●●●
●●
●●●
●●●
●●●
●●
●●●
●
●●●●
●●●●●●●
●●
●
●●●●
●
●
●●●●
●
●●
●
●
●●●●●●●
●●●●●
●●
●●●●●
●● ●
●●
●
●
●
●●
● ●●●
●●
●
●
●●
NE Mid Atl S Atl ESC WSC ENC WNC Mtn Pac Avg
(b) Exchangeable variable selection model
●●●●●●
●●●●●●●●●●●●●●●●●● ●
●●●●●●●●
●●●●
●●●●●●●●●
●
●●●●
●●●
●●●●●
●
●
●●
●●●
●
●●●●●●
●●●●●●
●●●●
●●●●●
●●●●●●●●●●●
●●
●
●●●●●●
●
●●●●●●
−1
01
23
β ●●●●●● ●●●●
●●●●
●●●●●●●●●● ●
●●●●●●●●●
●●●●●●●●
●●●
●●
●●●●
●●●
●●●●●●
●●●
●●●
●
●●●●●●
●●●●●●
●●●●
●●●
●●●●●
●●●●●●
●●●●
●
●●●●●●
●
●
●●
●●●
NE Mid Atl S Atl ESC WSC ENC WNC Mtn Pac Avg
(c) Spatial variable selection model
Figure 3. City-specific means (open dot), median (filled dot), and 95% posterior intervals(vertical lines) for the elemental carbon effect, arranged by EPA subregion, defined as North-east (NE), Mid Atlantic (Mid Atl), South Atlantic (S Atl), East South Central (ESC), WestSouth Central (WSC), East North Central (ENC), West North Central (WNC), Mountain(Mtn), and Pacific (Pac). The national average estimates are given for αk (triangle), αkπk(square), and average of the 115 βk(s) (diamond). This figure appears in color in the electronicversion of this article.
28 Biometrics, 2013
Table 1Interquartile range (IQR), median, and maximum observed value in µg/m3 across all 115 sites in the period
2000-2008. Minimum values all approximately zero, and are not shown. The seven most massive components arelisted first, with the rest in alphabetical order.
Median IQR Max Median IQR Max
Sulfate 2.43 2.89 39.90 Chlorine 0.00 0.02 5.78Nitrate 0.83 1.54 45.30 Chromium 0.00 0.00 1.17Silicon 0.07 0.10 10.10 Copper 0.00 0.00 0.55
Elemental Carbon 0.54 0.52 11.00 Iron 0.07 0.08 24.10Organic Carbon 3.30 3.42 101.18 Lead 0.00 0.00 0.98
Sodium Ion 0.09 0.13 33.40 Magnesium 0.00 0.01 2.72Ammonium Ion 1.12 1.43 21.70 Nickel 0.00 0.00 0.91
Aluminum 0.01 0.04 4.01 Potassium 0.05 0.05 12.80Arsenic 0.00 0.00 0.16 Titanium 0.00 0.01 0.56
Bromine 0.00 0.00 0.19 Vanadium 0.00 0.00 0.17Calcium 0.04 0.05 4.90 Zinc 0.01 0.01 3.11
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 29
Table 2(a) Median absolute deviation (MAD) × 100 for the simulation study. MAD is averaged across all 50 locations for
each pollutant. The overall average over datasets is presented here. Simulation settings for αk, πk are shown incolumns 2 and 3, and abbreviated names of the components used to generate the response are in column 1. A “*”indicates a statistically significant difference from the SpVS model. (b) Power and Type-I error for spatial modelwith no variable selection (Sp), exchangeable variable selection model (EVS), and spatial variable selection model
(SpVS) for local and global covariates. We calculate power for each setting of the true value of αk.
(a) Median absolute deviatio (MAD)Independent Weak Correlation Strong Correlation
α× 100 π Sp EVS SpVS Sp EVS SpVS Sp EVS SpVS
Sulf. 5 1 1.39* 1.66* 1.48 1.36 1.65* 1.38 1.10 1.55* 1.12Nitr. 10 1 1.40* 1.46 1.45 1.37* 1.59* 1.41 1.16 1.43* 1.14
Sili. 5 0.3 1.34* 0.91 0.95 1.41* 0.99* 1.11 1.17* 0.95 0.91El C. 5 0.7 1.91* 2.10 2.08 1.79 2.04* 1.78 1.49* 1.83* 1.44Sodi. 10 0.3 1.53* 0.80 0.86 1.52* 0.79 0.84 1.21* 0.70 0.66
Or C. 10 0.7 2.30* 2.02 2.04 2.30* 2.01 1.95 1.91* 2.05* 1.60
Arse. 0 0 0.53* 0.12 0.12 0.50* 0.16 0.19 0.46* 0.13* 0.18Brom. 0 0 0.60* 0.15 0.13 0.56* 0.14* 0.21 0.50* 0.12* 0.24Calc. 0 0 0.63* 0.15 0.15 0.69* 0.15* 0.25 0.54* 0.12* 0.26
Avg. Local, βk(s) = 0 1.61* 1.19* 1.23 1.64* 1.24 1.24 1.26* 0.72 0.63Avg. Local, βk(s) 6= 0 1.93* 1.72 1.73 1.87* 1.67* 1.59 1.62 2.04 1.67
(b) Power and Type I errorIndependent Weak Correlation Strong Correlation
αk Sp EVS SpVS Sp EVS SpVS Sp EVS SpVS
PowerLocal 0.05 0.41 0.30 0.34 0.48 0.31 0.49 0.65 0.37 0.67Local 0.10 0.80 0.76 0.75 0.84 0.74 0.81 0.90 0.77 0.91
Global 0.05 0.74 0.55 0.61 0.75 0.55 0.73 0.85 0.57 0.85Global 0.10 1.00 0.92 0.92 0.99 0.88 0.95 1.00 0.90 0.99
Type-I ErrorLocal 0.03 0.01 0.02 0.05 0.01 0.07 0.06 0.01 0.08
Global 0.01 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.01
30 Biometrics, 2013
Table 3Estimated overall relative risk expressed as percent increase (eαkπk × 100 − 100) and goodness of fit statistics. ForSpatial model (no VS), eαk × 100 − 100 is reported. For models with spatial correlation, range estimates in km are
given. Statistically significant associations (0.05 level) marked by ∗. Local selection marked by L (posterior0.20 < πk < 0.80). No estimates of π were greater than 0.8. We also include goodness of fit criterion, DIC, GG, and
LPML. The best model by each measure is highlighted in bold.
SpVS Spatial EVS
Sulfate 0.09L 0.26 0.02Nitrate 0.01L 0.21 0.07Silicon -0.00 -0.1 -0.00Elemental Carbon 0.42L 0.46 0.64L∗
Organic Carbon 0.03L -0.14 0.04L
Sodium Ion -0.01 -0.09 0.00Ammonium Ion 0.05 0.17 0.16L
Aluminum -0.00 -0.06 -0.02Arsenic -0.05 -0.42 -0.06Bromine -0.02 0.09 0.00Calcium -0.00 0.07 0.04Chlorine 0.01 -0.01 0.00Chromium 0.02 -0.16 -0.01Copper 0.01 -0.02 0.00Iron 0.00 0.41 0.04Lead 0.04 -0.09 -0.02Magnesium 0.02 -0.06 -0.02Nickel 0.00 0.02 -0.02Potassium 0.02 -0.15 -0.03Titanium 0.00 0.06 -0.01Vanadium 0.04 -0.03 0.02Zinc -0.00 0.20 0.02
Range (km) 1143 1175DIC 2647 2746 2684GG 27.335 27.292 27.350LPML -155495 -155551 -155585
Spatial variable selection methods for investigating acute health effects of fine particulate matter components 31
Table 4Estimates of average county specific β, mean when included (eαk × 100 − 100), probability of inclusion πk from the
Spatial VS model with 22 pollutants, and estimate of standard deviation, rescaled (eωk × 100 − 100).
Avg. βsk αk πk ωk Avg. βs
k αk πk ωk
Sulfate 0.11 0.16 0.30 3.02 Chlorine -0.01 0.09 0.06 1.04Nitrate 0.15 -0.00 0.24 2.29 Chromium -0.04 0.22 0.06 1.89Silicon 0.03 -0.04 0.08 2.18 Copper 0.01 0.03 0.07 2.26Elemental Carbon 0.59 0.56 0.70 0.94 Iron 0.06 0.03 0.08 2.76Organic Carbon 0.05 0.07 0.22 3.54 Lead -0.09 0.36 0.08 4.34Sodium -0.01 -0.08 0.08 2.10 Magnesium -0.06 0.14 0.06 2.00Ammonium 0.14 0.15 0.17 4.36 Nickel -0.02 -0.05 0.07 1.90Aluminum -0.05 -0.03 0.08 2.66 Potassium -0.15 0.15 0.13 2.51Arsenic -0.07 -0.34 0.12 3.65 Titanium -0.02 0.05 0.12 5.65Bromine 0.06 -0.12 0.08 2.98 Vanadium 0.04 0.24 0.12 3.42Calcium 0.10 -0.03 0.08 2.72 Zinc 0.06 -0.04 0.09 2.27