optimal spatial regularisation of autocorrelation estimates ......optimal spatial regularisation of...
TRANSCRIPT
Optimal Spatial Regularisation of
Autocorrelation Estimates in fMRI Analysis
Temujin Gautama ∗ and Marc M. Van Hulle
Laboratorium voor Neuro- en Psychofysiologie
K.U.Leuven, Campus Gasthuisberg
Herestraat 49, bus 801
B-3000 Leuven, BELGIUM
Tel.: + 32 16 34 59 61 Fax: + 32 16 34 59 60
Abstract
In the General Linear Model (GLM) framework for the statistical analysis of
fMRI data, the problem of temporal autocorrelations in the residual signal (after
regression) has been frequently addressed in the open literature. There exist various
methods for correcting the ensuing bias in the statistical testing, among which the
prewhitening strategy, which uses a prewhitening matrix for rendering the resid-
ual signal white (i.e., without temporal autocorrelations). Since this correction is
only exact when the autocorrelation structure of the noise-generating process is
accurately known, the estimates derived from the fMRI data are too noisy to be
used for correction. Recently, Worsley and co-workers proposed to spatially smooth
the noisy autocorrelation estimates, effectively reducing their variance and allow-
ing for a better correction. In this article, a systematic study into the effect of the
smoothing kernel width is performed and a method is introduced for choosing this
bandwidth in an ‘optimal’ manner. Several aspects of the prewhitening strategy are
investigated, namely the choice of the autocorrelation estimate (biased or unbiased),
Preprint submitted to Elsevier Science 4 November 2004
the accuracy of the estimates, the degree of spatial regularisation and the order of
the autoregressive model used for characterising the noise. The proposed method is
extensively evaluated on both synthetic and real fMRI data.
Key words:
spatial regularisation, autocorrelation, prewhitening
∗ Corresponding author.Email addresses: [email protected] (Temujin Gautama),
[email protected] (Marc M. Van Hulle).
2
1 Introduction
In the statistical analysis of fMRI data, the presence of intrinsic temporal
autocorrelations in the noise-generating process is a well-studied topic. The
time series of a given voxel is usually modelled as consisting of a possible
haemodynamic response to the stimulus and coloured noise (containing tem-
poral autocorrelations). Albeit the origin of this noise is still an open issue
(see, e.g., Biswal et al., 1995; Zarahn et al., 1997; Woolrich et al., 2001), there
is a general agreement that it needs to be taken into account in the analysis
methods used for activation detection.
The conventional statistical analysis techniques, such as SPM (Statistical
Parametric Mapping, Wellcome Department of Cognitive Neurology, London),
are based upon the General Linear Model (GLM), which models an fMRI time
series as a linear combination of paradigm-related responses, drift terms and
an error term. The GLM analysis is only exact when the autocorrelation func-
tion of the (real) noise process, which generates the error term, is taken into
account. In practice, however, the underlying process is unknown and alter-
native approaches have been devised, such as “precolouring”, which imposes
a certain autocorrelation function on the noise term by temporal smoothing
(Friston et al., 1995; Worsley et al., 1995), and “prewhitening”, which trans-
forms the data such that the error term becomes white noise (Bullmore et al.,
1996). Several studies have evaluated these (and other) approaches in combi-
nation with different statistical tests, both on synthetic and real-world fMRI
data (Purdon and Weisskoff, 1998; Friston et al., 2000a; Woolrich et al., 2001;
Wicker and Fonlupt, 2003). The prewhitening strategy yields the best (mini-
mum variance) linear unbiased estimator, but only if the true autocorrelation
3
structure is known (Friston et al., 2000a; Bullmore et al., 2001; Woolrich et al.,
2001), or at least if it can be accurately estimated. However, any mismatch
between the true and the estimated autocorrelations will lead to a bias in the
estimation of the parameter variance (Friston et al., 2000a), which is used for
the statistical inference of effects. Therefore, an accurate model of the noise,
from which the prewhitening matrix is computed, is essential to the efficacy of
the prewhitening strategy, and various noise models have been proposed (Bull-
more et al., 1996; Locascio et al., 1997; Purdon and Weisskoff, 1998; Zarahn
et al., 1997). Additionally, the underlying noise process needs to be estimated
from the residual signal after regression (the error term), which introduces a
bias in the autocorrelation estimates (for an overview, see Marchini and Smith,
2003).
Another important aspect of the autocorrelation is the spatial variability,
which cannot be solely attributed to a difference between tissue types (Bull-
more et al., 1996; Purdon et al., 2001; Solo et al., 2001; Worsley et al., 2002).
Thus, ideally, the autocorrelations should be accurately estimated on a voxel-
wise basis, as has been proposed by Bullmore et al. (1996) and Locascio et al.
(1997). However, the variance of the autocorrelation estimate is fairly high
for traditional fMRI time series, which introduces a bias in traditional GLM-
based statistical testing. One possible approach (that adopted in SPM’99) is
to compute the average autocorrelation across the entire brain, yielding a very
robust estimate. However, in that case, the autocorrelation is systematically
over- and underestimated in different regions, since the spatial variability of
the autocorrelation is not taken into account. Another approach is that of
spatial regularisation of the autocorrelation estimates, which can reduce the
variance of the autocorrelation estimate. This has been suggested in (Purdon
4
et al., 2001; Solo et al., 2001), where a local likelihood function is spatially
smoothed before parameter estimation, and in (Worsley et al., 2002), where
the estimated parameters themselves are spatially smoothed. In the first ap-
proach, there is an optimality criterion for choosing the degree of spatial reg-
ularisation. The second approach is much less computationally intensive, but
uses a user-defined degree of smoothing.
In this article, the methodology proposed by Worsley et al. (2002) is followed,
thus, prewhitening using a filter derived from an autoregressive (AR) model
of the residual signal, and spatial smoothing of the autocorrelation estimates.
First, the importance of the accuracy of the autocorrelation estimate is exam-
ined with respect to the validity of the ensuing statistical test, i.e., the efficacy
of the prewhitening correction. A novel method is introduced for determining
the optimal degree of spatial smoothing of the autocorrelation estimate, and
the method is evaluated both on synthetic and real fMRI data. The results are
compared to those obtained with and without bias-reduction of the autocor-
relation estimates, obtained without, with fixed and with optimal smoothing,
and obtained using a global average of the autocorrelation estimates.
2 Methods
First, the general prewhitening framework and the evaluation measures are
shortly described. Second, the effect of the correction is empirically verified,
and the importance of the accuracy of the autocorrelation (AC) estimate used
for prewhitening is illustrated. Third, the spatial regularisation and the pro-
posed method for determining the optimal degree of regularisation are ex-
plained.
5
2.1 Ordinary Least-Squares in the Presence of Autocorrelation
The General Linear Model framework with autocorrelated errors is well-known
and will only be described shortly in this section. The fMRI response at voxel
i (consisting of n time samples) is modelled as a linear sum of m covariates:
Yi = Xβ + ei, (1)
where X[n × m] is the design matrix containing q expected responses to a
certain stimulus (0/1 block pulse convolved with a haemodynamic response
function) and a number of (polynomial) drift terms, β is a vector of regression
coefficients, and ei is a Gaussian noise source following N (0, σ2V ), with V is
the autocorrelation matrix of the noise process (in the absence of autocorre-
lations, V = I). There exist several methods for drawing inferences regarding
the regression coefficients in the presence of autocorrelation (for an overview,
see Woolrich et al., 2001), one of which is that of prewhitening (see Bullmore
et al., 1996; Worsley et al., 2002; Marchini and Smith, 2003). The method first
solves the model using Ordinary Least-Squares (OLS), yielding unbiased, but
not fully efficient estimates of the regression coefficients. Second, the autocor-
relation structure is estimated from the residual signal after OLS-regression,
ei, from which an n×n whitening matrix, A, is generated. Both Yi and X are
multiplied by this matrix A, after which the OLS is used for recomputing the
regression coefficients and the subsequent statistical testing.
Matrix A is often estimated using a parametric approach by fitting a noise
model to the residual signal, e.g., autoregressive (AR) models or autoregressive
models with moving average (ARMA). The matrix A can then be computed
from the autocorrelation matrix V of the noise model as A = V −1/2. When an
6
AR-model is assumed, the whitening matrix A can be found directly from the
autocorrelations in the residual signal (as shown in Appendix A.3 of Worsley
et al., 2002). However, the autocorrelation from the residual signal after re-
gression is a biased estimate of the autocorrelation of the actual disturbance,
as has been recently addressed in the fMRI-context by Worsley et al. (2002)
and Marchini and Smith (2003). The complete prewhitening procedure, using
an autoregressive model of order p for the noise, is the following:
OLS The regression coefficients β are estimated using OLS (Eq. 1), yielding
βOLS = (X ′X)−1X ′Yi, (2)
where ()′ denotes the matrix transpose, and ()−1 the matrix inverse.
Estimating the Whitening Matrix Ai From the residual signal, ei =
Yi − XβOLS, the autocorrelations are estimated for lags l = 1 . . . p, where p
is the order of the autoregressive model that is used for modelling the noise.
Two estimates are considered in this study, the standard (biased) one:
abias,i,l =1
n
n∑
j=l+1
ei(j) ei(j − l), (3)
and an unbiased one (Worsley et al., 2002):
aunb,i,l = vl/v0, (4)
where v = M−1abias,i, and
7
v =
v0
...
vp
,M =
m00 . . . m0p
......
mp0 . . . mpp
, abias,i =
abias,i,0
...
abias,i,p
mlj =
trace(RDl) j = 0
trace(RDlR(Dj + D′j)) 1 ≤ j ≤ p,
with Dl a matrix of zeros with ones on the l–th upper off-diagonal. Note
that there is hardly any additional computational cost associated with this
bias correction. The whitening matrix Ai can then be computed from these
autocorrelation estimates, yielding Abias,i and Aunb,i (see Appendix A.3 in
Worsley et al., 2002).
OLS of the Whitened Data The OLS-procedure is used for solving:
AiYi = AXβ + ri, (5)
where ri ∼ N (0, σ2), yielding the regression coefficients:
β = (X ′A′iAiX)−1(AiX)′(AiYi). (6)
Inference In this study, the F -score is computed to test for any of the
q paradigm-related effects, which is specified by a contrast matrix c of size
[q ×m].
Fi =ESSi/q
RSSi/(n−m)∼ F (q, n−m) (7)
8
RSSi = r′iri
ESSi =(cβ
)′(c(AiX)(AiX)′c′
)−1(cβ
)
where RSSi is the residual sum-of-squares, ESSi is the explained sum-of-
squares. Using either the biased (Ai = Abias,i) or the unbiased (Ai = Aunb,i)
whitening matrices, the different F -scores are obtained, respectively Fbias,i
and Funb,i. When no correction for autocorrelation is performed, the F -score
is denoted by FOLS,i.
2.2 Evaluation Measures
There exists no ‘standard’ way for quantifying the performance of different
correction methods, as it depends on several factors. Xiong et al. (1996) com-
pared several statistical tests on the basis of their sensitivity (rate of false
positives), specificity (rate of true positives) and normality (of the test statis-
tics). Zarahn et al. (1997) and Purdon and Weisskoff (1998) evaluated the
rate of false positives as a performance measure. In (Friston et al., 2000b),
performance is described in terms of the validity of the test (a test is valid if
the false positive rate is less than the nominal size α), its efficiency (a param-
eter estimation method is more efficient when the variability of the estimated
parameters is smaller), and its robustness (a test that remains valid when the
assumptions are violated to a certain degree is called robust). Woolrich et al.
(2001) and Marchini and Smith (2003) proposed a qualitative comparison be-
tween an empirical distribution of test statistics and the expected theoretical
one, by visualising them in a scatter diagram (PP -plot). We have opted for a
performance evaluation in terms of exactness (the degree to which the empir-
ical distribution of test statistics corresponds to the theoretical one), the rate
9
of false positives and the rate of true positives.
In all simulations on synthetic data, a dummy paradigm is used, consisting
of two alternating conditions with a length of 10 time samples. The design
matrix, X, consists of one constant term and two ‘fMRI responses’, which
are generated by convolving a square wave (sampling period of 3 seconds,
alternating 10 samples on, 10 samples off) with a standard haemodynamic
response function with a repetition time of TR = 3 seconds. The exactness of
the test is measured by the mean-square discrepancy between the empirical
and the theoretical (cumulative) F -distribution:
MSEF =1
N
N∑
j=1
(Fj − fcdf
(j
N + 1, q, n−m
))2
, (8)
where N is the number of pixels/voxels, Fj is the j–th element of the sorted se-
ries of (empirical) F -scores, and fcdf(·, df1, df2) is the cumulative F -distribution
function with df1 and df2 degrees of freedom. The rate of false positives rFP
(erroneous rejections of the null hypothesis at a significance level α), is com-
puted as the number of false rejections divided by the total number of inac-
tive pixels/voxels. For small deviations from the theoretical distribution (small
MSEF ), rFP is expected to approximate the nominal size of the test (rFP ≈ α).
In a number of simulations, time series are synthetically ‘activated’ by linearly
adding the first response in the design matrix X with a scaling factor γ1. When
this is the case, the corresponding F -score is not included in the evaluation
of the empirical null distribution (Eq. 8), and in these cases, the rate of true
positives, rTP, is computed as the ratio of the number of F -scores correspond-
ing to true positives exceeding the statistical threshold and the total number
of true positives.
10
2.3 Effect of Accuracy of AC Correction
To better interpret the results obtained further, it is important to investi-
gate to what extent the results are robust to a deviation from the ‘optimal’
correction. To this cause, a set of N = 10, 000 time series generated from an
AR(1)-model (thus, without activation and complying to the null hypothesis),
using:
yk = a∗1 yk−1 + νk, (9)
where ν is a white Gaussian noise source. The AR(1)-model used for the
generation has a∗1 = 0.4 (solid curves in Fig. 1) and a∗1 = 0.2 (dashed curves),
and the AR(1)-model used for the GLM-correction is varied from a1 = 0 to
a1 = 0.8 in steps of 0.05. Testing is performed at α = 0.01 and the results are
evaluated using the MSEF and the rFP measures. The results are shown in
Fig. 1A and 1B, respectively. In both data sets, 1,000 randomly selected time
series are ‘activated’ during the first condition with γ1 = 1.0, thus allowing
for the evaluation of rTP. The latter rate can be used as an empirical measure
of the power of the test.
It can be observed clearly from Fig. 1A that the MSEF is minimal for the
theoretical value a∗1, and that the rate of false positives complies best to the
nominal size of the test at a∗1 (rFP = 0.0092 and rFP = 0.0093, respectively for
a∗1 = 0.4 and a∗1 = 0.2). The variance of the regression coefficients is underesti-
mated when the autocorrelation is underestimated (yielding a too progressive
test), and, conversely, the variance is overestimated when the autocorrelation
is overestimated. As a result, the rate of false positives decreases consistently
over the interval under investigation (Fig. 1B), and the rate of true positives
11
A B C
1
F
0 0.2 0.4 0.6 0.8
0
0.2
0.4
0.6
0.8
1
1.2
a
MS
E
1
FP
0 0.2 0.4 0.6 0.8
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
a
r
1
TP
0 0.2 0.4 0.6 0.8
0.2
0.4
0.6
0.8
1
a
r
Fig. 1. Effect of the accuracy of the AC correction, quantified using the MSEF be-
tween the empirical and theoretical F -distribution (A) and the rate of false positives
(B), and the rate of true positives (C), for a data set generated from an AR(1)-model
with a∗1 = 0.4 (solid curves) and a∗1 = 0.2 (dashed curves).
(Fig. 1C) decreases consistently for increasing a1. Thus, an underestimation
of the autocorrelation yields an increased rate of false positives (exceeding the
nominal size), but also increased power, whereas an overestimation yields a
lower rate of false positives and lower power.
2.4 Proposed Spatial Regularisation of the AC estimate
The variance of the autocorrelation estimates scales inversely proportional to
the number of time samples, n. The previous simulation study has demon-
strated that an accurate (low-variance) estimate is required in order to im-
prove the exactness of the statistical test. One way to reduce the variability
of the estimate around the true value is spatial regularisation, as described
in different forms by Purdon et al. (2001) and by Worsley et al. (2002). We
adhere to the latter approach, which spatially smooths the autocorrelation
estimates using a Gaussian kernel, albeit with a user-defined FWHM (default
value of 15 mm, which corresponds to a standard deviation of 6.36 mm). In
12
this section, a method for determining the ‘optimal’ smoothing bandwidth is
introduced, which corresponds to the ‘optimal’ degree of spatial regularisation
of the autocorrelation estimates.
It will be demonstrated in the Results section that the degree of smoothing of
the autocorrelation estimate has an effect on the distribution of the F -score
(Eq. 7). The deviation of the empirical F -distribution and the theoretical one
would be a good evaluation measure for choosing a regularisation bandwidth,
were it not for the fact that it can only be used when the theoretical distribu-
tion is known. In fMRI, this is only the case when there is no paradigm effect
present in the data, and the theoretical null distribution of the test statistic
can be used as a reference. We propose to choose the regularisation bandwidth
by minimising an observable criterion, which does not depend on the theoret-
ical distribution of the test statistic under study, but on the predictability of
the (estimated) spatial autocorrelation pattern.
The proposed analysis technique is based upon the so-called Nadaraya-Watson
kernel-weighted average for local regression. Consider ci, the vector consisting
of the p (i.e., the order of the AR-model used for correction) autocorrelation
estimates (for lags 1 . . . p) in pixel/voxel i with spatial coordinates xi. This
vector is ‘predicted’ from the vectors of the pixels/voxels within a given spatial
neighbourhood, the spatial extent of which is controlled by the ‘bandwidth’
h:
ch,i =
∑Nj=1 Kh(‖xj − xi‖)cj∑Nj=1 Kh(‖xj − xi‖)
, (10)
where N is the number of pixels/voxels and Kh is a Gaussian kernel with
13
standard deviation h. The prediction error can be computed as:
ε2(h) =1
N
N∑
i=1
(ci − ch,i)2. (11)
However, this prediction error will be zero when h approaches zero, since in
that case the spatial neighbourhood consists of only the centre pixel/voxel
i, due to which ci = ci, which is the trivial solution. Therefore, we adopt a
leave-one-out cross-validation strategy, and modify Eq. (10) by excluding the
vector of the centre pixel/voxel from the weighted average:
ch,i =
∑j 6=i Kh(‖xj − xi‖)cj∑j 6=i Kh(‖xj − xi‖) . (12)
Similar schemes have been considered for the optimisation of smoothing ker-
nels in local modelling and density estimation (for an overview, see Hastie
et al., 2001). The ensuing prediction error, ε2(h), can be minimised with re-
spect to h by computing it for a number of different bandwidth values and
determining the minimum. The corresponding bandwidth, hopt, yields the ‘op-
timal’ spatial extent over which the autocorrelation vector can be predicted
using a locally weighted average. This optimal bandwidth is influenced by the
variability of the autocorrelation estimate around the true value, as well as
the spatial variability of the true autocorrelation.
As an illustration, consider a two-dimensional sinusoidal grating (50× 50 pix-
els), to which white Gaussian noise is added with a standard deviation of
γn = 0.5. The proposed cross-validation measure is evaluated on this spatial
pattern for bandwidths (linearly between 0.1 and 3 in steps of 0.01). The
(h, ε2) curve is very smooth (see Fig. 2, solid line) and has a clear minimum
at hopt = 0.78. Since the objective is the optimal recovery of the noiseless
14
0 0.5 1 1.5 2 2.5 30
0.1
0.2
0.3
0.4
0.5
0.6
h
ε2
0 0.5 1 1.5 2 2.5 30
0.1
0.2
0.3
0.4
0.5
0.6
h
Fig. 2. Illustration of the method for determining the optimal bandwidth in the
case of a noisy 2D sinusoidal grating: the cross-validation prediction error ε2 (solid
line), the MSE between the smoothed noisy grating and the noiseless one (dashed
line), and the iterative minimisation steps performed by the Golden Section Search
method (open circles on top curve).
spatial pattern, the mean-square-error between the noiseless pattern and the
noisy pattern smoothed with a kernel of varying width h is also evaluated
(Fig. 2, dashed curve). This curve is also very smooth and shows a minimum
at h = 0.85. Therefore, the bandwidth which optimises the cross-validation
criterion is very close to that which minimises the MSE (‘theoretically’ op-
timal). The corresponding MSE-values are 0.0439 (h = 0.85) and 0.0450
(h = hopt = 0.78).
We will loosely use the term “optimal regularisation” to refer to the spa-
tial smoothing using a Gaussian kernel with a bandwidth obtained using the
cross-validation procedure on the spatial pattern of (p-dimensional) unbiased
autocorrelation estimates (in all situations tested, there was no noticeable
difference in optimal bandwidth for the spatial pattern of biased estimates).
In order to make the method computationally efficient, we further exploit
the smoothness of the (h, ε2)-curve and the fact that it seems to have only
a single minimum within a reasonable range (we do not claim this to be the
15
general case, but in all cases tested, there was only a single minimum). If
there is only a single minimum and it can be ‘bracketed ’ by three h-values 1 ,
the Golden Section Search algorithm can be used for obtaining the optimal
bandwidth in a relatively small number of evaluations of ε2 (Press et al.,
1992). The search algorithm iteratively brackets the minimum of a function
in smaller intervals (assuming the minimum exists and is unique within the
initial bracket). Convergence is monitored by the fractional precision of the
estimate (for details, see Press et al., 1992), which should be smaller than a
given tolerance parameter, which in all simulations has been set to tol = 0.01.
In the example shown in Fig. 2 (open circles in top curve), the algorithm
converges after 13 iterations.
3 Results
Several autocorrelation estimates are compared in the following studies. Where
possible, the theoretical correction (“Theo”) is used (in the case of a known au-
tocorrelation structure), and as a reference, the uncorrected results (“OLS”)
are also included. Both biased and unbiased autocorrelation estimates are
considered in different variants: unregularised (Bias and Unb), optimally reg-
ularised using the proposed approach (BiasR and UnbR), and ‘globally’ regu-
larised (BiasG and UnbG), i.e., using the average autocorrelations computed
over all pixels. First, the performance of Bias and Unb are evaluated, and
second, a detailed comparative study is performed on synthetic data sets,
1 The minimum of a function is bracketed by a triplet of points a < b < c, such
that f(b) is less than both f(a) and f(c), in which case the function has a minimum
in the interval (a, c).
16
examining the effect of biased/unbiased estimates and spatial regularisation.
Finally, real fMRI data sets are considered, both null and activation fMRI
data sets.
3.1 Fixed Autocorrelation
In order to compare the performance of the correction methods using the bi-
ased and unbiased estimates, without performing any spatial regularisation,
the following data sets are considered. Time series are generated from a first-
order autoregressive process driven by white Gaussian noise (Eq. 9). The cor-
rections are performed using an AR(1)-model for the noise, and the F -tests
are performed at a significance level of α = 0.01.
Sets of N = 10, 000 AR(1)-signals (n = 100 time samples) are generated for
values of a∗1 ranging from a∗1 = 0 to a∗1 = 0.5 in steps of 0.01. In each set, 1,000
time series are synthetically activated with γ1 = 1. The results are shown in
Fig. 3 (thick black curves for the theoretical, red curves for the biased and blue
for the unbiased corrections). For reference, the uncorrected OLS results have
been included (thin black curves), showing a strong increase of the MSEF and
the rate of false positives for increasing degrees of autocorrelation. The latter
even reaches rFP = 0.12 for a∗1 = 0.5, which exceeds ten times the nominal
size α (dotted line in Fig. 3B). The empirical power of the uncorrected OLS,
however, is higher than that of the corrected versions. Figure 3A shows that
the empirical null distribution of the unbiased correction method (blue curve)
better matches the theoretical one than the biased (red curve), which can also
be concluded from the rate of false positives (Fig. 3B), which is closer to 0.01
for the unbiased correction. The power of the unbiased correction method is
17
A B C
1
F
0 0.2 0.4 0.6
10-3
10-2
10-1
100
a*
MS
E
1
FP
0 0.2 0.4 0.6
10-2
10-1
a*
r
1
TP
0 0.2 0.4 0.60
0.2
0.4
0.6
0.8
1
a*
r
Fig. 3. Results of the fixed autocorrelation simulations: MSEF (A), rFP (B) and rTP
(C). Conventions are the following: Bias (red), Unb (blue), Theo (black thick) and
OLS (black thin). The dotted line in panel B denotes the nominal size α = 0.01.
slightly lower than that of the biased correction method (Fig. 3C). The power
for the theoretical correction is similar to the other methods, but the rate of
false positives is closer to the nominal size than that of the other correction
methods.
3.2 Spatially Variable Autocorrelation
Next, the improvement due to spatial regularisation of the (biased and unbi-
ased) autocorrelation estimates is illustrated. Spatially smooth autocorrelation
patterns are generated in the following way. A spatial noise pattern of 40× 50
pixels is generated containing normally distributed noise convolved with a
rectangular mask of 5 pixels with value 0.04 (a rectangular mask is chosen so
that its functional form differs from the Gaussian kernel used for regularisa-
tion). The values are shifted and scaled such that the extremal values are zero
and amax = 0.3, yielding a spatial pattern A(x), an example of which is shown
in Fig. 4A. For each pixel i at position xi, an AR(1)-signal of n = 100 time
samples is generated with an autocorrelation at lag one of A(xi):
18
yi,k = A(xi) yi,k−1 + νk,
where ν is a white Gaussian noise source. A central circular ‘activation’ region,
consisting of Nact = 500 pixels is defined, where the pixels are synthetically ac-
tivated with a factor γ1 (see Methods Section), and where the autocorrelation
values A(x) are optionally increased by aadd (see, e.g., Fig. 4C). The MSEF
is computed on the basis of the pixels outside the activation region (1,500
pixels). In addition, the mean-square discrepancy between the theoretical au-
tocorrelation pattern A(x) and the estimated one is evaluated (MSEAC).
Three types of situations are considered, examples of which are shown in
Fig. 4, visualising A(x) in a pseudocolour plot where the regions contain-
ing active pixels are delineated by a white dashed circle. The first situation
(“Batch1”) considers spatial autocorrelation patterns A(x) in the presence of
activation with no additional autocorrelation in the activation region (γ1 = 1
and aadd = 0). The second (“Batch 2”) introduces additional autocorrelations
in the activation region (γ1 = 1 and aadd = 0.3), similar to the increased
autocorrelations observed in grey matter compared to white matter in fMRI
studies (see, e.g. Woolrich et al., 2001). The third situation (“Batch 3”) is
designed for examining the performance of the correction methods when there
is no activation present (hence, no dashed line in Fig. 4), but there is still a
circular region with increased autocorrelation (γ1 = 0 and aadd = 0.3).
For each situation, 1,000 autocorrelation patterns are generated, and the dif-
ferent autocorrelation estimates are used for the correction. The results of
the F -tests (performed at the level of α = 0.05 with the dummy paradigm)
are summarised in Tables 1 and 2, showing the deviations from the theoreti-
cal autocorrelation pattern at unit lag (MSEAC), and the deviations from the
19
Batch1 Batch2 Batch3
γ1 = 1, aadd = 0 γ1 = 1, aadd = 0.3 γ1 = 0, aadd = 0.3
0 0.2 0.4 0.6
Fig. 4. Pseudocolour plot of a synthetically generated spatial autocorrelation pat-
terns, A(x). The region demarcated by a white, dashed circle denotes the ‘active’
region.
theoretical null distribution (MSEF ), respectively the rate of false (rFP) and
true positives (rTP). Note that since the autocorrelation patterns for Batch2
and Batch3 are identical by design (γ1 is the only difference between these two
batches), the MSEAC results in Table 1 are also identical, as are the rates of
false positives outside the activation region (columns 3 and 5 in Table 2). For
Batch3, there are two rates of false positives (columns 5 and 6 in Table 2),
namely that outside (rFP) and within (raFP) the activation region (the differ-
ence being the increased autocorrelation in the activation region). All pairs of
samples (each containing 1,000 data points) are tested for a difference in mean
using a two-sample t-test at the significance level of 0.05. For each batch, the
pairs of results using correction methods for which no statistical differences
are found, are indicated by pairs of superscripts (∗ and +) in Tables 1 and 2.
The standard deviations have not been included in Tables 1 and 2, but are on
the order of 0.0006 for the MSEAC, 0.01 for the MSEF , 0.006 for the rFP and
0.01 for the rTP.
20
It can be observed from Table 1 that the deviation from the theoretical au-
tocorrelation at unit lag (MSEAC) is smaller for the unbiased correction ap-
proaches than for the biased ones (comparing the unregularised, regularised
and global estimates in a pairwise fashion) as expected, and that the MSEACs
for the unregularised approaches are considerably larger than for the regu-
larised and global estimates. The same holds for the MSEF measures, except
for the global estimates in Batch2, where the MSEF for BiasG is smaller than
that for UnbG. Finally, the MSEF for the uncorrected F -test (OLS) is consid-
erably higher, indicating a clear effect of autocorrelation on the F -distribution
under the null hypothesis. The UnbR performs best with respect to both the
MSEAC and the MSEF in the three situations considered (shared with the
UnbG in Batch1 for the MSEF ).
The rate of true positives (columns 2 and 4 in Table 2) is an indication of the
power of the test. The power of the uncorrected OLS is highest and, maybe
at first surprisingly so, the theoretical correction (Theo) yields very low power
(rTP of 0.980 and 0.814 for Batch1 and Batch2, respectively). To explain these
two results, the rate of false positives, which should approximate the nominal
size of the test (α = 0.05), needs to be examined. The OLS displays a high
power (rTP = 0.991 and rTP = 0.944) at the expense of an increased rate of
false positives (rFP = 0.088). This indicates that the OLS test underestimates
the true variance of the regression coefficients: the empirical distribution is
wider than the distribution against which the test is performed. Therefore,
since an exact test (Theo), in which case the empirical distribution matches
that against which testing is performed, the rate of false positives will be lower
(0.050, very close to the nominal size), at the expense of lower power.
Due to the variance of the unregularised autocorrelation estimates, the rates
21
Table 1
Mean values of the mean-square discrepancy measures between the theoretical and
estimated autocorrelation patterns at unit lag (MSEAC) and empirical null distri-
bution (MSEF ). Within columns the pairs of samples are indicated, the means of
which are not statistically different using a two-sample t-test (superscripts ∗ and +)
at a significance level of 0.05.
Batch1 Batch2 Batch3
MSEAC MSEF MSEAC MSEF MSEAC MSEF
OLS — 0.144 — 0.144 — 0.432
Theo — 0.007 — 0.007+ — 0.005
Bias 0.011 0.030 0.011 0.030 0.011 0.032+
BiasR 0.002 0.015∗ 0.003 0.013∗ 0.003 0.016∗
BiasG 0.003 0.016∗ 0.020 0.013∗ 0.020 0.050
Unb 0.010 0.017∗ 0.010 0.017 0.010 0.016∗
UnbR 0.001 0.008+ 0.002 0.007+ 0.002 0.007
UnbG 0.002 0.008+ 0.019 0.025 0.019 0.030+
of true positives for these corrections are fairly low. Reducing this variance
(regularisation) increases the rate of true positives (Bias vs. BiasR and BiasG,
and Unb vs. UnbR and UnbG). When comparing the rates of true positives
to those of Theo, the possible presence of bias should be taken into account.
Indeed, biased estimates generally yield an underestimate of the actual auto-
correlation, due to which the rate of true positives increases (Bias vs. Theo).
Whether the rate of true positives for Unb is higher than for Theo is case
22
dependent. This can be explained by reconsidering Fig. 1C. Suppose the true
autocorrelation is identical for all pixels within a certain region, and that the
distribution of the estimated values is Gaussian and centred around the true
value. The rate of true positives would be the same as that for the theoretical
correction if the curves in Fig. 1C were linear (or at least approximately linear
in the region spanned by the autocorrelation estimates), in which case an over-
/underestimation of the true value would yield a balanced decrease/increase
of the rate of true positive values. In most cases, however, there is an asym-
metrical (nonlinear) effect of the over/-underestimation on the rate of true
positives, due to which this rate can be higher, lower or equal to that of the
theoretical correction, depending on the data.
To test whether the rFP-measures conform to the expected nominal size (α =
0.05), the rFP-samples are tested for a mean of 0.05 (one-sample t-test at α =
0.05) and the null was rejected for all approaches, except for Theo (i.e., Theo
is the only correction method that yields a correct rFP). The unregularised
correction methods (Bias and Unb) show increased rates of false positives with
respect to their regularised and global counterparts, due to the variability of
the autocorrelation estimates. The BiasR estimates yield rates of false positives
higher than the nominal size due to the bias, but this is not the case for the
BiasG in Batch2 and Batch3 (0.042). This can be attributed to the additional
autocorrelations in the activation region, due to which the global average of
the autocorrelation (BiasG) is increased with respect to that in Batch1. An
overestimation of the autocorrelation induces a decrease in the rate of false
positives (see Fig. 1B), due to which rFP decreases to 0.042. This can be
validated by the raFP: following the same reasoning, the autocorrelation in the
activation region is underestimated, due to which the rate of false positives
23
Table 2
Mean values of the rate of false (rFP) and true positives (rTP), using the same
conventions as in Table 1. The last column denotes the rate of false positives in the
‘activation’ region for Batch3 (raFP, see text).
Batch1 Batch2 Batch3
rFP rTP rFP rTP rFP raFP
OLS 0.088 0.991 0.088 0.944 0.088 0.190
Theo 0.050 0.980 0.050∗ 0.814 0.050∗ 0.047
Bias 0.062 0.981∗ 0.062 0.836 0.062 0.066∗
BiasR 0.058∗ 0.984+ 0.057 0.854 0.057 0.067∗
BiasG 0.058∗ 0.984+ 0.042 0.921 0.042 0.133
Unb 0.056 0.977 0.056 0.819 0.056 0.058+
UnbR 0.052+ 0.981∗ 0.050∗ 0.839 0.050∗ 0.059+
UnbG 0.052+ 0.982 0.036 0.916 0.036 0.124
should be higher than the nominal size, which is, indeed, the case in Batch3
(raFP = 0.133). Similar effects can be seen for UnbG. Except for the theoretical
correction, the unbiased regularised method (UnbR) is the only correction
method that maintains a rate of false positives close to the nominal size in the
three situations considered here.
The evaluation measures are further computed for Batch2 (1,000 realisations)
under different levels of smoothing of the autocorrelation estimates (Bias and
Unb). This is a necessary validation, since the ‘optimal’ degree of smoothing,
24
hopt, used in the BiasR and UnbR corrections, is not necessarily optimal with
respect to these evaluation measures. Figure 5 shows the average results for
varying the smoothing bandwidth, h, from 0.5 to 5 pixels in steps of 0.25.
The minima for the biased and unbiased approaches coincide, and the auto-
correlation pattern is optimally recovered (minimum MSEAC) for h = 1.00
using both the biased (red curve) and unbiased (blue curve) estimates (dotted
line in Fig. 5A), the latter with a smaller MSEAC (Fig. 5A). The empirical
null distribution using the biased correction improves (MSEF decreases) over
the complete range tested, but reaches an optimum using the unbiased cor-
rection for h = 2.00 (Fig. 5B). Similarly, the rate of false positives improves
(gets closer to the nominal size denoted by the dashed line in Fig. 5C) over
the complete range tested for the biased approach, and is closest to 0.05 for
h = 1.25 using the unbiased correction (dotted line in Fig. 5B). This illustrates
the effect of the bias on the efficacy of the correction, as it would be expected
that a low MSEAC would correspond to good results for the other evaluation
measures, which is the case for the unbiased, but not for the biased approach.
The effect of the bias can only be suppressed by large degrees of smoothing.
The rate of true positives increases almost linearly over the range tested for
both corrections (between 0.84 and 0.88 for the biased and between 0.82 and
0.87 for the unbiased correction; results not shown). The histogram of the
optimal bandwidths obtained using the proposed method is shown in Fig. 5D
(average 1.17 and standard deviation 0.08). This average is represented as the
black vertical lines in Fig. 5A–C. The hopt-value is very close to that which
minimises the MSEAC for both the biased and unbiased correction, and to
that which brings rFP closest to 0.05 for the unbiased correction (blue curves
in Fig. 5A and 5C). It is, indeed, logical to assume that if the autocorrelation
is estimated in an optimal (or nearly optimal) manner, the correction (and
25
the rate of false positives) can also be expected to be near-optimal rendering
the test exact, as is the case for the unbiased correction method (blue curves).
However, the degree of smoothing that minimises the MSEF for the unbiased
correction, is higher than the latter degree of smoothing, suggesting that the
MSEF measure is influenced by the variance of the parameter estimates even
after smoothing. Indeed, the F -test assumes the whitening matrix (computed
from the estimated autocorrelation structure) to be noiseless and exact. The
results for the biased correction (Fig. 5B and 5C) seem to suggest that large
degrees of smoothing are necessary for best performance (there is an opti-
mal value for the MSEF for h > 5, since the biased global correction yields
MSEF =0.013, as shown in Table 1), which is in contradiction to the degree of
smoothing which is optimal with respect to the MSEAC (Fig. 5A).
Finally, for one realisation of Batch3, the F -scores for the different correc-
tion methods are sorted in increasing order, and are plotted against those of
the theoretical correction (Fig. 6A for the biased and 6B for the unbiased
approaches). This generates plots that are similar to the PP -plots described
in (Marchini and Ripley, 2000; Woolrich et al., 2001; Marchini and Smith,
2003), where the log10(p)-values were plotted as a function of the theoretical
p-distribution. We have opted for a scatter plot of the F -scores, an “FF”-plot,
since this avoids the logarithmic scale and basically conveys the same informa-
tion as the PP -plot. An exact test would yield a scatter plot coinciding with
the bisector line (dashed lines in Fig. 6). In line with previous results where the
“empirically obtained probabilities are predominantly less than the expected
theoretical probabilities” (Woolrich et al., 2001), the F -scores of the various
correction methods are higher than the theoretically optimal. It is clear from
Fig. 6 that the OLS method (black curve) overestimates the F -scores, and,
26
A B
AC
0 1 2 3 4 50
0.002
0.004
0.006
0.008
h
MS
E
F
0 1 2 3 4 50
0.005
0.01
0.015
0.02
h
MS
E
C D
FP
0 1 2 3 4 50.04
0.05
0.06
h
r
opt
0.8 1 1.2 1.40
10
20
30
h
perc
enta
ge
Fig. 5. A–C) Performance measures as a function of the degree of smoothing (red
and blue curves correspond to the biased and unbiased autocorrelation estimates,
and optimal values indicated by dotted lines): MSE of the autocorrelation pattern
estimate (A), the MSE of the empirical null distribution (B) and the rate of false
positives (C, dashed line denotes the nominal size α = 0.05); D) Histogram of the
bandwidths suggested by the proposed method (the average value is indicated as
a black vertical line in panels A–C). The dotted lines in panels A–C represent the
degree of smoothing that optimises the performance of the unbiased correction.
albeit less severely, so do the unregularised (yellow curves in Fig. 6A and 6B)
and globally regularised approaches (green curves in Fig. 6A and 6B). The op-
timal regularisation approaches, and the theoretically correct yield empirical
F -distributions very close to the theoretically expected (red, blue and purple
curves almost coinciding with the bisector line).
27
A B
des
0 5 10 150
5
10
15
F
F
des
0 5 10 150
5
10
15
F
F
Fig. 6. Empirical F -distribution using the different biased (A; Bias–yellow, Bi-
asR–red, BiasG–green, OLS–black, Theo–purple) and unbiased (B; Unb–yellow,
UnbR–blue, UnbG–green, OLS–black, Theo–purple) correction methods against the
theoretically expected F -distribution, Fdes.
3.3 Null fMRI Data
Next, “Null” fMRI Data sets are considered, which are recorded in a hu-
man subject who is asked simply to remain passive during scanning. The
data set is publicly available from http://www-bmu.psychiatry.cam.ac.uk/
DATA/NULLdata/index.html. Results in this section are shown for data set
“000413-m02_6_EPI”. The voxel size is [3.9 × 3.9 × 5.0] mm, and each voxel
has a time course of 80 time samples with TR=3 seconds. The images are
acquired on a 3 Tesla machine.
The results are restricted to the correction methods using the unbiased auto-
correlation estimates (F -testing at the level of α = 0.05). The MSEF and rFP
for the methods using the biased estimates are similar, but in general, larger
than those for the methods using the unbiased estimates. For reference, the
uncorrected OLS results, and the results using a fixed bandwidth of 6.37 mm
(FWHM=15 mm, referred to as “UnbF”, which is the default value 2 used
2 To be precise, the actual bandwidth is given by FWHM = 15 (100/df)(1/3) [mm].
28
by Worsley et al. 2002) are shown. The design matrix consists of the two re-
sponses (following the dummy paradigm) and three drift terms (detrending of
order two). The first five scans have been discarded from the analysis.
The results are shown as a function of the order p of the AR-model used for
correction in Fig. 7A and 7B, respectively the MSEF and rFP. The uncorrected
OLS-results (black lines) have a large MSEF (0.174) and a strongly increased
rate of false positives (0.091) and only the results for Unb (yellow curves) are
worse for higher orders (p > 6). The performance of the unregularised correc-
tion degrades with increasing AR-order due to the increasing variance of the
autocorrelation estimates (an increasing number of AR-coefficients that need
to be estimated with the same amount of data, results in an increased vari-
ance of the estimates). The three remaining, regularised correction methods
(UnbR, UnbF and UnbG) do not suffer from the increasing variance of the
autocorrelation estimates with increasing AR-model orders. The MSEF mea-
sures remain low (< .01) over the range tested and the rates of false positives
also remain close to the nominal size (α = 0.05). The optimal bandwidths in-
crease as a function of the AR-model order (Fig. 7C). This is expected, since
a larger degree of spatial regularisation is required to reduce the variability
of the estimates (for the same amount of data, adding parameters will render
the estimates more variable). For the three Null data sets tested, the AR(1)-
correction is insufficient, and yields a slightly elevated rate of false positives
(left-hand side of Fig. 7B). Both the MSEF and rFP decrease for higher-order
AR-models up to four. The rate of false positives for UnbG (green curve in
Fig. 7B) is very close to the nominal size. However, as illustrated in a previous
simulation study (Batch3), there is a systematic over- and underestimation
of the autocorrelation, which can lead to an increased rate of false positives
29
A B
F
1 2 3 4 5 6 7 8 9 10
10-3
10-2
10-1
10-0
p
MS
E FP
1 2 3 4 5 6 7 8 9 100.025
0.05
0.075
0.1
0.125
p
r
C D
opt
1 2 3 4 5 6 7 8 9 102.25
2.5
2.75
3
p
h
[mm
]
F
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
10-3
10-2
h
MS
E
Fig. 7. Results for the Null fMRI data set. Discrepancy between the empirical and
theoretical null distribution (A), Rate of false positives (B) and Optimal bandwidth
(C) as a function of order p; D) MSEF as a function of the bandwidth h for p = 1
(solid) and for p = 4 (dashes), minima indicated by open circles; Colour conventions
in panels A and B are the following: Unb (yellow solid), UnbR (blue solid), UnbG
(green solid), UnbF (blue dashed) and OLS (black solid).
in (inactive) regions with a higher autocorrelation. Using UnbF (blue dashed
curves) and UnbG (green curves), the MSEF remains relatively constant for
higher-order AR-models, but the UnbR-correction method (blue solid curves)
shows a minimum, indicating a value which balances the effects of model
complexity, spatial resolution and noise. This minimum cannot be used for
determining the optimal AR-order in practice (fMRI activation studies), since
it would require a null fMRI scanning session for each study.
To validate these results, the MSEF is evaluated as a function of the bandwidth
30
h of the Gaussian kernel used for spatially smoothing the autocorrelation esti-
mate. The results using an AR(1) and an AR(4) noise model are shown in Fig.
7D (solid and dashed curves). The bandwidths obtained using the proposed
method are shown as open circles. It can be observed that the obtained hopt is
very close to the bandwidth which minimises the MSEF . This will not neces-
sarily be the case in general, since other disturbance factors can influence the
position of this minimum, as was shown for the biased estimates in Fig. 5B.
Finally, the FF -plots (empirical distribution vs. theoretical one) are shown
in Fig. 8, respectively for the biased (Fig. 8A) and unbiased (Fig. 8B) correc-
tion methods, both using an AR(4)-model. As was the case for the synthetical
data set (Fig. 6), the effect of spatial regularisation can be clearly observed in
Figs. 8A and 8B: the curves for the regularised corrections (BiasR-red, UnbR-
blue, BiasG/UnbG-green, BiasF/UnbF-purple) lie closer to the bisector line
than those for the unregularised (Bias/Unb-yellow) and OLS (black) correc-
tions. Furthermore, the unbiased corrections show a smaller deviation from
the bisector line than the biased corrections.
3.4 Effect of Presmoothing
Spatial smoothing of the spatiotemporal fMRI data prior to the statistical
analysis is a common preprocessing step. Therefore, it is important first to
establish the effect of spatial smoothing of the spatiotemporal data on the op-
timal degree of smoothing of the spatial pattern of autocorrelation estimates.
Consider the example Batch3 autocorrelation pattern shown in Fig. 9A, which
is used for generating a spatiotemporal data set. The estimated unbiased au-
tocorrelation pattern (Unb) is shown before (Fig. 9B) and after smoothing
31
A B
des
0 5 10 15 200
5
10
15
20
F
F
des
0 5 10 15 200
5
10
15
20
F
F
Fig. 8. FF -plots for the Null fMRI data set using the biased (A) and unbiased (B)
AR(4)-correction methods. Colour conventions are the following: Bias/Unb (yel-
low), BiasR (red), UnbR (blue), BiasG/UnbG (green), BiasF/UnbF (purple), OLS
(black). The dashed lines represent the bisector lines, corresponding to a perfect
match between the empirical and theoretical distributions.
of the estimates, using hMSE = 1.1 (Fig. 9C), which is the bandwidth that
minimises the MSEAC obtained using a grid search with bandwidths between
0.5 and 10 pixels in steps of 0.1, and using hopt = 1.13 (Fig. 9D), which is
obtained using the proposed approach. Next, the spatiotemporal data are spa-
tially smoothed using a Gaussian kernel with a bandwidth of one pixel (note
that the underlying autocorrelation pattern remains identical, thus Fig. 9A
and 9E are the same). The unsmoothed autocorrelation estimate (Unb, Fig.
9F) is clearly smoother than in the absence of presmoothing (Fig. 9B). Figure
9G shows the smoothed autocorrelation estimates using the bandwidth that
minimises the MSEAC (hMSE = 2.0).
The spatial smoothness of the estimated autocorrelation pattern is prob-
lematic for the proposed method. Indeed, the autocorrelation value in any
pixel/voxel will be best explained from only its nearest neighbouring pix-
els/voxels, due to the strong spatial correlations between neighbouring pix-
els/voxels, induced by the spatial smoothing process. A subsampling strategy
32
original Unb h = hMSE h = hopt
A B C D
unsmoothed
E F G H
smoothed
-0.3 0 0.3 0.6
Fig. 9. Effect of spatial (data) presmoothing on the spatial autocorrelation pattern.
A,E) Original autocorrelation patterns A(x) used for generating the spatiotemporal
data; B,F) Unbiased (unregularised) autocorrelation estimates; C,G) Regularised
autocorrelation estimates using hMSE; D,H) Regularised autocorrelation estimates
using hopt. The first (second) row shows the results in the absence (presence) of
spatial (data) smoothing.
is employed to adapt the proposed method, which assumes that the correla-
tions induced by the presmoothing are negligible when the data are (spatially)
subsampled with a factor equal to the FWHM (in pixels/voxels) of the spatial
presmoothing kernel. In the current implementation, this factor is rounded to
the nearest (non-negative) integer. If the obtained bandwidth is a degenerate
one (either of the extreme values of the initial bracket used in the Golden
Section Search method), the subsampling factor is increased by unity, and the
procedure is repeated. The resulting optimal regularisation bandwidth is con-
siderably higher than in the unsmoothed case (hopt = 2.27 pixels), and close
to hMSE.
Finally, the previously described data set Batch3 is reconsidered, and each
data set is spatially smoothed with a Gaussian kernel with a bandwidth, σs,
33
σs
0 0.5 1 1.5 2 2.5 30
2
4
6
h
Fig. 10. Desired (hMSE, red curve) and obtained bandwidths (hopt, blue curve) as a
function of the kernel width σs of the kernel used for spatially smoothing the data
(average and standard deviation shown).
varying from 0 to 3 pixels in increments of 0.25, after which the desired degree
of smoothing for MSEAC minimisation (hMSE) is determined for the Unb-
estimate. Overall results (mean and standard deviation computed over 100 re-
alisations for every degree of presmoothing) are shown in Fig. 10 (red curve).
The desired degree of smoothing clearly increases with increasing degrees of
presmoothing. This supports the previous observation that, even though the
estimated autocorrelations are spatially smooth (due to the spatial smoothing
of the data), a higher degree of smoothing is still required for recovering the
underlying autocorrelation pattern. The bandwidths obtained using the pro-
posed method are visualised as the blue curves, showing a similar increasing
trend, but the bandwidths show a higher variability.
3.5 Activation fMRI Data
To further illustrate the proposed method, the fMRI data set used in (Worsley
et al., 2002) is considered. The experiment addresses pain perception using the
following paradigm: 9 seconds of rest, a painful heat stimulus of 9 seconds, 9
seconds of rest, and a warm (neutral) stimulus of 9 seconds. The data were
34
scanned at 1.5 Tesla, with TR=3 seconds (120 scans in total), and consisted of
13 slices of 100× 128 pixels. The voxel size was 2.3× 2.3× 7 mm and the data
were smoothed in-slice by 6 mm as part of the motion correction. For more
details, we refer to the fmristat homepage (http://www.math.mcgill.ca/
~keith/fmristat/).
The data were analysed using fmristat (Worsley et al., 2002), with AR-models
of increasing order (p = 1, . . . , 6), and for each order, the unbiased estimates
of the p-dimensional autocorrelation vectors were smoothed using bandwidths
between 2.12 and 10.62 mm in steps of 2.12 mm (this corresponds to FWHMs
between 5 and 25 mm). The unbiased estimates were used for determining
the optimal bandwidth using the proposed method, taking into account the
presmoothing kernel (thus, subsampling by [3 3 1] voxels). This subsampling
factor sufficed for all orders p for obtaining a non-degenerate optimal band-
width. The required processing times in the AR(1)-case were approximately 8
seconds for determining the optimal bandwidth and 5 minutes for the statis-
tical analysis (which includes the prewhitening) 3 .
Figure 11 shows the (unbiased) AR(4)-parameters for a given axial slice (from
top to bottom row, respectively the unregularised, using a fixed kernel with
FWHM=15 mm, and using the optimal bandwidth, FWHM=6.9 mm ap-
proaches). In general, the AR-coefficients decrease in size from left to right
(a1 − a4) and the a4-coefficients (last column) are very small. The effect of
spatially smoothing is, evidently, that the details are blurred and that the
high-frequency (possibly noise) components are reduced, but also that the
3 Analyses were run on a Pentium 4 processor (3 GHz), running Matlab 6.5 (Release
13) under Linux.
35
a1 a2 a3 a4
Unb
-0.6 -0.4 -0.2 0 0.2 0.4 0.6
UnbF
UnbR
Fig. 11. AR(4)-coefficients for an axial slice of the Activation fMRI data set.
From top to bottom row, the results for the unregularised (Unb), fixed kernel of
FWHM=15 mm (UnbF) and optimally regularised (UnbR) are shown respectively.
The four columns correspond to the four AR-coefficients (a1−a4 from left to right).
The colour scale of the plots are set corresponding to the colour bar shown in the
top left corner.
strong AR-coefficients become smaller. This could influence the subsequent t-
scores, since underestimating the autocorrelation structure enhances the power
at the expense of a higher risk of false positives.
In (Worsley et al., 2002), it is stated that the t-statistics for higher-order
AR-models differ by less than 1% from those derived using an AR(1)-model
(for a fixed smoothing FWHM of 15 mm). Although the difference on aver-
age may be very small, it is still interesting to compare t-maps for different
AR-model orders and smoothing bandwidths in closer detail. Since the de-
36
sired activation pattern for this data set is not known, the correctness of the
resulting t-maps cannot be quantitatively evaluated. However, the effect can
be illustrated by observing the number of voxels with an absolute t-value ex-
ceeding a threshold of tthres = 3 as a function of the FWHM, and for a number
of AR-model orders. Note that this number contains both ‘correct’ and ‘in-
correct’ rejections of the null hypothesis. The results are shown in Fig. 12
(colour conventions as shown in figure legend). In the uncorrected OLS case,
the number of supra-threshold voxels is 2985, which is much higher than the
numbers found in the corrected cases. This is expected, since the power of the
OLS has been demonstrated to be the highest of the methods under study,
at the expense of a strongly increased rate of false positives. The number of
supra-threshold voxels tends to decrease for increasing FWHM, and for the
FWHM=15 mm-case, this number also tends to decrease for increasing AR-
model orders (from 1,780 to 1,548). The optimal bandwidths obtained using
the proposed method (indicated as open circles in Fig. 12) are around 6.5
and 7.5 mm (FWHM), with correspondingly between 1,700 and 2,000 supra-
threshold voxels. Thus, although the t-maps appear very similar for different
model orders and smoothing bandwidths (results not shown, but see Worsley
et al., 2002), these modelling choices do influence the number of voxels for
which the null hypothesis is rejected (using a very loose rejection criterion).
4 Discussion
The presence of temporal autocorrelations in the error signal after regression
has been a major concern in the statistical analysis of fMRI data. There exist
various correction methods, which aim at determining or shaping the auto-
37
AR(1)AR(2)AR(3)AR(4)AR(5)AR(6)
0 5 10 15 20 251500
1750
2000
2250
FWHM [mm]
#(t>
3)Fig. 12. Number of supra-threshold voxels in the fMRI activation data as a func-
tion of the FWHM used for spatially regularising the autocorrelation estimates, for
different orders p of AR-model (colour conventions as shown in legend). The results
using the optimal bandwidths are shown as open circles. The OLS-approach yields
2985 supra-threshold voxels (not shown on figure).
correlation structure of the error signal. The prewhitening approach, initially
proposed for fMRI by Bullmore et al. (1996), yields the minimum variance
unbiased estimates of the regression coefficients, if the true autocorrelation
is known (Woolrich et al., 2001; Friston et al., 2000a; Bullmore et al., 2001).
Since the true autocorrelation is unknown and needs to be estimated, various
sources of bias in the statistical inference are introduced, among which the
use of the OLS residuals for estimating the autocorrelation (due to which the
autocorrelation estimates are biased) and the variance of the autocorrelation
estimates (for an overview, see Marchini and Smith, 2003). The first source of
bias has been dealt with efficiently in (Worsley et al., 2002), yielding unbiased
estimates of the autocorrelation, and it has been suggested to reduce the effect
of the second source by spatially smoothing the (unbiased) autocorrelation es-
timates. There is also a recent alternative method, which takes into account
the variance of the autocorrelation estimates in the statistical inference, rather
than trying to reduce it by smoothing (Kiebel et al., 2003).
38
In this article, we have introduced a novel autonomous method for determin-
ing the bandwidth of the kernel used for spatially smoothing the autocor-
relations, on the basis of a cross-validation criterion. It has been shown by
extensive simulations on synthetic data to be near-optimal for recovering the
true autocorrelation structure. Furthermore, using the (optimally) spatially
smoothed autocorrelations for prewhitening the data, renders the empirically
obtained distribution of test statistics very close to that theoretically expected,
which is verified on both synthetic and real fMRI data recorded under “null”
conditions.
In (Marchini and Smith, 2003), it is claimed that the correction for the bias
in the autocorrelation estimates, as proposed by Worsley et al. (2002), did
not yield a noticeable improvement to the validity of the statistical test. They
qualitatively compared the effect of bias correction in synthetic data, using
PP -plots, in which the empirically obtained and the theoretically expected
p-values were visualised in a scatter diagram (on logarithmic axes). Similarly,
we have plotted “FF -plots”: the empirically obtained F -scores against those
obtained using the correct autocorrelation function, also on synthetic data,
which yielded very similar results (in both the PP - and FF -plots, a perfect
match between empirical and theoretical distributions would yield a curve co-
inciding with the bisector line). As described in (Woolrich et al., 2001), the
probabilities were predominantly less (smaller p-values, which corresponds to
higher F -scores) than those theoretically expected, at least in the absence of
spatial regularisation or using global regularisation. Albeit the results (both
our own and those of Marchini and Smith, 2003) of the biased and unbi-
ased approaches are visually very similar (see red and blue curves in Fig. 6A
and 6B), a rigorous quantitative analysis, using the mean-square discrepancy
39
between the empirical and theoretical F -distributions, and the rate of false
positives, indicate a significant improvement of the unbiased corrections over
the biased ones (see Tables 1 and 2). When the FF -plots are generated for a
Null fMRI data set, the difference between the biased and unbiased correction
methods is visually discernible, showing a larger deviation from the bisector
line for the biased corrections. Furthermore, optimal spatial regularisation,
as proposed in this article, greatly improves the exactness of the statistical
test after correction, which is shown quantitatively on both synthetic and null
fMRI data.
There is an additional source of bias in the statistical inference, namely the
misspecifications of the noise model. In most approaches, a first-order AR-
model is used, and is claimed to be sufficient. In (Bullmore et al., 1996),
the Box-Pierce test statistic (using biased autocorrelation estimates) was em-
ployed for testing the null hypothesis that the residual signal contains only
white noise. They concluded that an AR(1) model was sufficient for modelling
the fMRI noise process, based on the analysis of an averaged time series (over
156 voxels), thus, assuming a spatially constant autocorrelation structure. Two
approaches are described that determine the order of the noise model for each
voxel separately. In (Locascio et al., 1997), the order of the autoregressive mov-
ing average (ARMA) model was determined for each voxel separately (up to
order three) using the Ljung-Box test. Woolrich et al. (2001) tested for general-
order AR-models, and reported that orders of up to six were required, although
the majority of voxels required orders of up to three (Woolrich et al., 2001,
Fig. 12 therein). Furthermore, also in (Woolrich et al., 2001), it was shown
that (nonlinear) spatial smoothing of the autocorrelation estimates allowed
for more flexible models, clearly illustrating that spatial regularisation of the
40
autocorrelation estimates can in principle justify the application of models
of orders higher than those indicated by a unregularised test procedure. We
have illustrated that using higher-order AR-models for correction improves the
exactness of the test, when some form of spatial regularisation of the auto-
correlation estimates is present. Simulations on Null fMRI data demonstrate
that in the absence of regularisation, the exactness of the test deteriorates
with increasing model order, due to the increased variance of the parame-
ter estimates. Spatial regularisation reduces this variance, in which case the
statistical test benefits from higher-order AR-models.
The proposed method has been described in terms of spatial smoothing using
an isotropic Gaussian kernel. It can easily be extended to other kernel types,
or even to nonlinear spatial smoothing (to avoid the smoothing over different
tissue types, see, e.g., Woolrich et al., 2001), as long as the spatial extent
of the smoothing kernel is parametrised by a single bandwidth parameter. If
there are, e.g., different bandwidths per spatial dimension, the Golden Section
Search can no longer be used, and more intricate minimisation schemes need
to be employed. The proposed method can also easily be adapted for use with
other noise models, both parametric (Locascio et al., 1997; Purdon et al., 2001)
and non-parametric (Woolrich et al., 2001; Wicker and Fonlupt, 2003), since
the vector of coefficients, c, is not restricted to AR-coefficients in particular,
and can be replaced by any vector characterising the noise process.
5 Acknowledgements
The authors wish to thank the Brain Mapping Unit, University of Cambridge,
UK, for making their Null fMRI data sets publicly available. The activation
41
fMRI data (described in Worsley et al., 2002) were used with permission from
the authors.
The authors are supported by research grants received from the Belgian Fund
for Scientific Research – Flanders (G.0248.03 and G.0234.03), the Flemish Re-
gional Ministry of Education (Belgium) (GOA 2000/11), and the European
Commission, 5th framework programme (QLG3-CT-2000-30161 and IST-2001-
32114).
References
Biswal, B., Yetkin, F., Haughton, V., Hyde, J., 1995. Functional connectivity
in the motor cortex of resting human brain using echo-planar MRI. MRM
34, 537–541.
Bullmore, E., Brammer, M., Williams, S., Rabe-Hesketh, S., Janot, N., David,
A., Mellers, J., Howard, R., Sham, P., 1996. Statistical methods of estima-
tion and inference for functional MR image analysis. MRM 35, 261–277.
Bullmore, E., Long, C., Suckling, J., Fadili, J., Calvert, G., Zelaya, F., Car-
penter, T., Brammer, M., 2001. Colored noise and computational inference
in neurphysiological (fMRI) time series analysis: Resampling methods in
time and wavelet domains. Hum. Brain Mapp. 12, 61–78.
Friston, K., Holmes, A., Poline, J.-B., Grasby, P., Williams, S., Frackowiak,
R., , Turner, R., 1995. Analysis of fMRI time series revisited. NeuroImage
2, 45–53.
Friston, K., Josephs, O., Zarahn, E., Holmes, A., Rouquette, S., Poline, J.-B.,
2000a. To smooth or not to smooth? bias and efficiency in fMRI time-series
analysis. NeuroImage 12, 196–208.
42
Friston, K., Mechelli, A., Turner, R., Price, C., 2000b. Nonlinear responses
in fMRI: The balloon model, Volterra kernels, and other hemodynamics.
NeuroImage 12, 466–477.
Hastie, T., Tibshirani, R., Friedman, J., 2001. Elements of Statistical Learning:
Data Mining, Inference and Prediction. Springer-Verlag, New York.
Kiebel, S., Glaser, D., Friston, K., 2003. A heuristic for the degrees of freedom
of statistics based on multiple variance parameters. NeuroImage 20, 591–
600.
Locascio, J., Jennings, P., Moore, C., Corkin, S., 1997. Time series analysis in
the time domain and resampling methods for studies of functional magnetic
resonance brain imaging. Hum. Brain Mapp. 5, 168–193.
Marchini, J., Ripley, B., 2000. A new statistical approach to detecting signif-
icant actication in functional MRI. NeuroImage 12, 366–380.
Marchini, J., Smith, S., 2003. On bias in the estimation of autocorrelations
for fMRI voxel time-series analysis. NeuroImage 18, 83–90.
Press, W., Flannery, B., Teukolsky, S., Vetterling, W., 1992. Numerical Recipes
in C: The Art of Scientific Computing, 2nd Edition. Cambridge University
Press, New York, NY, USA.
Purdon, P., Solo, V., Weisskoff, R., Brown, E., 2001. Locally regularized spa-
tiotemporal modeling and model comparison for functional MRI. NeuroIm-
age 14, 912–923.
Purdon, P., Weisskoff, R., 1998. Effect of temporal autocorrelation due to
physiological noise and stimulus paradigm on voxel-level false-positive rates.
Hum. Brain Mapp. 6, 239–249.
Solo, V., Purdon, P., Weisskoff, R., Brown, E., 2001. A signal estimation ap-
proach to functional MRI. IEEE Trans. Med. Imaging 20 (1), 26–35.
Wicker, B., Fonlupt, P., 2003. Generalized least-squares method applied to
43
fMRI time series with empirically determined correlation matrix. NeuroIm-
age 18, 588–594.
Woolrich, M., Ripley, B., Brady, M., Smith, S., 2001. Temporal autocorrelation
in univariate linear modeling of FMRI data. NeuroImage 14, 1370–1386.
Worsley, K., , Friston, K., 1995. Analysis of fMRI time series revisited - again.
NeuroImage 2, 173–181.
Worsley, K., Liao, C., Aston, J., Petre, V., Duncan, G., Morales, F., Evans,
A., 2002. A general statistical analysis for fMRI data. NeuroImage 15, 1–15.
Xiong, J., Gao, J.-H., J.L., L., Fox, P., 1996. Assessment and optimization of
functional MRI analyses. Hum. Brain Mapp. 4, 153–167.
Zarahn, E., Aguirre, G., D’Esposito, M., 1997. Empirical analyses of
BOLD fMRI statistics: I. Spatially unsmoothed data collected under null-
hypothesis conditions. NeuroImage 5, 179–197.
44