optimal spatial regularisation of autocorrelation estimates ......optimal spatial regularisation of...

Optimal Spatial Regularisation of

Autocorrelation Estimates in fMRI Analysis

Temujin Gautama ∗ and Marc M. Van Hulle

Laboratorium voor Neuro- en Psychofysiologie

K.U.Leuven, Campus Gasthuisberg

Herestraat 49, bus 801

B-3000 Leuven, BELGIUM

Tel.: + 32 16 34 59 61 Fax: + 32 16 34 59 60

Abstract

In the General Linear Model (GLM) framework for the statistical analysis of

fMRI data, the problem of temporal autocorrelations in the residual signal (after

regression) has been frequently addressed in the open literature. There exist various

methods for correcting the ensuing bias in the statistical testing, among which the

prewhitening strategy, which uses a prewhitening matrix for rendering the resid-

ual signal white (i.e., without temporal autocorrelations). Since this correction is

only exact when the autocorrelation structure of the noise-generating process is

accurately known, the estimates derived from the fMRI data are too noisy to be

used for correction. Recently, Worsley and co-workers proposed to spatially smooth

the noisy autocorrelation estimates, effectively reducing their variance and allow-

ing for a better correction. In this article, a systematic study into the effect of the

smoothing kernel width is performed and a method is introduced for choosing this

bandwidth in an ‘optimal’ manner. Several aspects of the prewhitening strategy are

investigated, namely the choice of the autocorrelation estimate (biased or unbiased),

Preprint submitted to Elsevier Science 4 November 2004

the accuracy of the estimates, the degree of spatial regularisation and the order of

the autoregressive model used for characterising the noise. The proposed method is

extensively evaluated on both synthetic and real fMRI data.

Key words:

spatial regularisation, autocorrelation, prewhitening

∗ Corresponding author.Email addresses: [email protected] (Temujin Gautama),

[email protected] (Marc M. Van Hulle).

2

1 Introduction

In the statistical analysis of fMRI data, the presence of intrinsic temporal

autocorrelations in the noise-generating process is a well-studied topic. The

time series of a given voxel is usually modelled as consisting of a possible

haemodynamic response to the stimulus and coloured noise (containing tem-

poral autocorrelations). Albeit the origin of this noise is still an open issue

(see, e.g., Biswal et al., 1995; Zarahn et al., 1997; Woolrich et al., 2001), there

is a general agreement that it needs to be taken into account in the analysis

methods used for activation detection.

The conventional statistical analysis techniques, such as SPM (Statistical

Parametric Mapping, Wellcome Department of Cognitive Neurology, London),

are based upon the General Linear Model (GLM), which models an fMRI time

series as a linear combination of paradigm-related responses, drift terms and

an error term. The GLM analysis is only exact when the autocorrelation func-

tion of the (real) noise process, which generates the error term, is taken into

account. In practice, however, the underlying process is unknown and alter-

native approaches have been devised, such as “precolouring”, which imposes

a certain autocorrelation function on the noise term by temporal smoothing

(Friston et al., 1995; Worsley et al., 1995), and “prewhitening”, which trans-

forms the data such that the error term becomes white noise (Bullmore et al.,

1996). Several studies have evaluated these (and other) approaches in combi-

nation with different statistical tests, both on synthetic and real-world fMRI

data (Purdon and Weisskoff, 1998; Friston et al., 2000a; Woolrich et al., 2001;

Wicker and Fonlupt, 2003). The prewhitening strategy yields the best (mini-

mum variance) linear unbiased estimator, but only if the true autocorrelation

3

structure is known (Friston et al., 2000a; Bullmore et al., 2001; Woolrich et al.,

2001), or at least if it can be accurately estimated. However, any mismatch

between the true and the estimated autocorrelations will lead to a bias in the

estimation of the parameter variance (Friston et al., 2000a), which is used for

the statistical inference of effects. Therefore, an accurate model of the noise,

from which the prewhitening matrix is computed, is essential to the efficacy of

the prewhitening strategy, and various noise models have been proposed (Bull-

more et al., 1996; Locascio et al., 1997; Purdon and Weisskoff, 1998; Zarahn

et al., 1997). Additionally, the underlying noise process needs to be estimated

from the residual signal after regression (the error term), which introduces a

bias in the autocorrelation estimates (for an overview, see Marchini and Smith,

2003).

Another important aspect of the autocorrelation is the spatial variability,

which cannot be solely attributed to a difference between tissue types (Bull-

more et al., 1996; Purdon et al., 2001; Solo et al., 2001; Worsley et al., 2002).

Thus, ideally, the autocorrelations should be accurately estimated on a voxel-

wise basis, as has been proposed by Bullmore et al. (1996) and Locascio et al.

(1997). However, the variance of the autocorrelation estimate is fairly high

for traditional fMRI time series, which introduces a bias in traditional GLM-

based statistical testing. One possible approach (that adopted in SPM’99) is

to compute the average autocorrelation across the entire brain, yielding a very

robust estimate. However, in that case, the autocorrelation is systematically

over- and underestimated in different regions, since the spatial variability of

the autocorrelation is not taken into account. Another approach is that of

spatial regularisation of the autocorrelation estimates, which can reduce the

variance of the autocorrelation estimate. This has been suggested in (Purdon

4

et al., 2001; Solo et al., 2001), where a local likelihood function is spatially

smoothed before parameter estimation, and in (Worsley et al., 2002), where

the estimated parameters themselves are spatially smoothed. In the first ap-

proach, there is an optimality criterion for choosing the degree of spatial reg-

ularisation. The second approach is much less computationally intensive, but

uses a user-defined degree of smoothing.

In this article, the methodology proposed by Worsley et al. (2002) is followed,

thus, prewhitening using a filter derived from an autoregressive (AR) model

of the residual signal, and spatial smoothing of the autocorrelation estimates.

First, the importance of the accuracy of the autocorrelation estimate is exam-

ined with respect to the validity of the ensuing statistical test, i.e., the efficacy

of the prewhitening correction. A novel method is introduced for determining

the optimal degree of spatial smoothing of the autocorrelation estimate, and

the method is evaluated both on synthetic and real fMRI data. The results are

compared to those obtained with and without bias-reduction of the autocor-

relation estimates, obtained without, with fixed and with optimal smoothing,

and obtained using a global average of the autocorrelation estimates.

2 Methods

First, the general prewhitening framework and the evaluation measures are

shortly described. Second, the effect of the correction is empirically verified,

and the importance of the accuracy of the autocorrelation (AC) estimate used

for prewhitening is illustrated. Third, the spatial regularisation and the pro-

posed method for determining the optimal degree of regularisation are ex-

plained.

5

2.1 Ordinary Least-Squares in the Presence of Autocorrelation

The General Linear Model framework with autocorrelated errors is well-known

and will only be described shortly in this section. The fMRI response at voxel

i (consisting of n time samples) is modelled as a linear sum of m covariates:

Yi = Xβ + ei, (1)

where X[n × m] is the design matrix containing q expected responses to a

certain stimulus (0/1 block pulse convolved with a haemodynamic response

function) and a number of (polynomial) drift terms, β is a vector of regression

coefficients, and ei is a Gaussian noise source following N (0, σ2V ), with V is

the autocorrelation matrix of the noise process (in the absence of autocorre-

lations, V = I). There exist several methods for drawing inferences regarding

the regression coefficients in the presence of autocorrelation (for an overview,

see Woolrich et al., 2001), one of which is that of prewhitening (see Bullmore

et al., 1996; Worsley et al., 2002; Marchini and Smith, 2003). The method first

solves the model using Ordinary Least-Squares (OLS), yielding unbiased, but

not fully efficient estimates of the regression coefficients. Second, the autocor-

relation structure is estimated from the residual signal after OLS-regression,

ei, from which an n×n whitening matrix, A, is generated. Both Yi and X are

multiplied by this matrix A, after which the OLS is used for recomputing the

regression coefficients and the subsequent statistical testing.

Matrix A is often estimated using a parametric approach by fitting a noise

model to the residual signal, e.g., autoregressive (AR) models or autoregressive

models with moving average (ARMA). The matrix A can then be computed

from the autocorrelation matrix V of the noise model as A = V −1/2. When an

6

AR-model is assumed, the whitening matrix A can be found directly from the

autocorrelations in the residual signal (as shown in Appendix A.3 of Worsley

et al., 2002). However, the autocorrelation from the residual signal after re-

gression is a biased estimate of the autocorrelation of the actual disturbance,

as has been recently addressed in the fMRI-context by Worsley et al. (2002)

and Marchini and Smith (2003). The complete prewhitening procedure, using

an autoregressive model of order p for the noise, is the following:

OLS The regression coefficients β are estimated using OLS (Eq. 1), yielding

βOLS = (X ′X)−1X ′Yi, (2)

where ()′ denotes the matrix transpose, and ()−1 the matrix inverse.

Estimating the Whitening Matrix Ai From the residual signal, ei =

Yi − XβOLS, the autocorrelations are estimated for lags l = 1 . . . p, where p

is the order of the autoregressive model that is used for modelling the noise.

Two estimates are considered in this study, the standard (biased) one:

abias,i,l =1

n

n∑

j=l+1

ei(j) ei(j − l), (3)

and an unbiased one (Worsley et al., 2002):

aunb,i,l = vl/v0, (4)

where v = M−1abias,i, and

7

v =

v0

...

vp

,M =

m00 . . . m0p

......

mp0 . . . mpp

, abias,i =

abias,i,0

...

abias,i,p

mlj =

trace(RDl) j = 0

trace(RDlR(Dj + D′j)) 1 ≤ j ≤ p,

with Dl a matrix of zeros with ones on the l–th upper off-diagonal. Note

that there is hardly any additional computational cost associated with this

bias correction. The whitening matrix Ai can then be computed from these

autocorrelation estimates, yielding Abias,i and Aunb,i (see Appendix A.3 in

Worsley et al., 2002).

OLS of the Whitened Data The OLS-procedure is used for solving:

AiYi = AXβ + ri, (5)

where ri ∼ N (0, σ2), yielding the regression coefficients:

β = (X ′A′iAiX)−1(AiX)′(AiYi). (6)

Inference In this study, the F -score is computed to test for any of the

q paradigm-related effects, which is specified by a contrast matrix c of size

[q ×m].

Fi =ESSi/q

RSSi/(n−m)∼ F (q, n−m) (7)

8

RSSi = r′iri

ESSi =(cβ

)′(c(AiX)(AiX)′c′

)−1(cβ

)

where RSSi is the residual sum-of-squares, ESSi is the explained sum-of-

squares. Using either the biased (Ai = Abias,i) or the unbiased (Ai = Aunb,i)

whitening matrices, the different F -scores are obtained, respectively Fbias,i

and Funb,i. When no correction for autocorrelation is performed, the F -score

is denoted by FOLS,i.

2.2 Evaluation Measures

There exists no ‘standard’ way for quantifying the performance of different

correction methods, as it depends on several factors. Xiong et al. (1996) com-

pared several statistical tests on the basis of their sensitivity (rate of false

positives), specificity (rate of true positives) and normality (of the test statis-

tics). Zarahn et al. (1997) and Purdon and Weisskoff (1998) evaluated the

rate of false positives as a performance measure. In (Friston et al., 2000b),

performance is described in terms of the validity of the test (a test is valid if

the false positive rate is less than the nominal size α), its efficiency (a param-

eter estimation method is more efficient when the variability of the estimated

parameters is smaller), and its robustness (a test that remains valid when the

assumptions are violated to a certain degree is called robust). Woolrich et al.

(2001) and Marchini and Smith (2003) proposed a qualitative comparison be-

tween an empirical distribution of test statistics and the expected theoretical

one, by visualising them in a scatter diagram (PP -plot). We have opted for a

performance evaluation in terms of exactness (the degree to which the empir-

ical distribution of test statistics corresponds to the theoretical one), the rate

9

of false positives and the rate of true positives.

In all simulations on synthetic data, a dummy paradigm is used, consisting

of two alternating conditions with a length of 10 time samples. The design

matrix, X, consists of one constant term and two ‘fMRI responses’, which

are generated by convolving a square wave (sampling period of 3 seconds,

alternating 10 samples on, 10 samples off) with a standard haemodynamic

response function with a repetition time of TR = 3 seconds. The exactness of

the test is measured by the mean-square discrepancy between the empirical

and the theoretical (cumulative) F -distribution:

MSEF =1

N

N∑

j=1

(Fj − fcdf

(j

N + 1, q, n−m

))2

, (8)

where N is the number of pixels/voxels, Fj is the j–th element of the sorted se-

ries of (empirical) F -scores, and fcdf(·, df1, df2) is the cumulative F -distribution

function with df1 and df2 degrees of freedom. The rate of false positives rFP

(erroneous rejections of the null hypothesis at a significance level α), is com-

puted as the number of false rejections divided by the total number of inac-

tive pixels/voxels. For small deviations from the theoretical distribution (small

MSEF ), rFP is expected to approximate the nominal size of the test (rFP ≈ α).

In a number of simulations, time series are synthetically ‘activated’ by linearly

adding the first response in the design matrix X with a scaling factor γ1. When

this is the case, the corresponding F -score is not included in the evaluation

of the empirical null distribution (Eq. 8), and in these cases, the rate of true

positives, rTP, is computed as the ratio of the number of F -scores correspond-

ing to true positives exceeding the statistical threshold and the total number

of true positives.

10

2.3 Effect of Accuracy of AC Correction

To better interpret the results obtained further, it is important to investi-

gate to what extent the results are robust to a deviation from the ‘optimal’

correction. To this cause, a set of N = 10, 000 time series generated from an

AR(1)-model (thus, without activation and complying to the null hypothesis),

using:

yk = a∗1 yk−1 + νk, (9)

where ν is a white Gaussian noise source. The AR(1)-model used for the

generation has a∗1 = 0.4 (solid curves in Fig. 1) and a∗1 = 0.2 (dashed curves),

and the AR(1)-model used for the GLM-correction is varied from a1 = 0 to

a1 = 0.8 in steps of 0.05. Testing is performed at α = 0.01 and the results are

evaluated using the MSEF and the rFP measures. The results are shown in

Fig. 1A and 1B, respectively. In both data sets, 1,000 randomly selected time

series are ‘activated’ during the first condition with γ1 = 1.0, thus allowing

for the evaluation of rTP. The latter rate can be used as an empirical measure

of the power of the test.

It can be observed clearly from Fig. 1A that the MSEF is minimal for the

theoretical value a∗1, and that the rate of false positives complies best to the

nominal size of the test at a∗1 (rFP = 0.0092 and rFP = 0.0093, respectively for

a∗1 = 0.4 and a∗1 = 0.2). The variance of the regression coefficients is underesti-

mated when the autocorrelation is underestimated (yielding a too progressive

test), and, conversely, the variance is overestimated when the autocorrelation

is overestimated. As a result, the rate of false positives decreases consistently

over the interval under investigation (Fig. 1B), and the rate of true positives

11

A B C

1

F

0 0.2 0.4 0.6 0.8

0

0.2

0.4

0.6

0.8

1

1.2

a

MS

E

1

FP

0 0.2 0.4 0.6 0.8

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

a

r

1

TP

0 0.2 0.4 0.6 0.8

0.2

0.4

0.6

0.8

1

a

r

Fig. 1. Effect of the accuracy of the AC correction, quantified using the MSEF be-

tween the empirical and theoretical F -distribution (A) and the rate of false positives

(B), and the rate of true positives (C), for a data set generated from an AR(1)-model

with a∗1 = 0.4 (solid curves) and a∗1 = 0.2 (dashed curves).

(Fig. 1C) decreases consistently for increasing a1. Thus, an underestimation

of the autocorrelation yields an increased rate of false positives (exceeding the

nominal size), but also increased power, whereas an overestimation yields a

lower rate of false positives and lower power.

2.4 Proposed Spatial Regularisation of the AC estimate

The variance of the autocorrelation estimates scales inversely proportional to

the number of time samples, n. The previous simulation study has demon-

strated that an accurate (low-variance) estimate is required in order to im-

prove the exactness of the statistical test. One way to reduce the variability

of the estimate around the true value is spatial regularisation, as described

in different forms by Purdon et al. (2001) and by Worsley et al. (2002). We

adhere to the latter approach, which spatially smooths the autocorrelation

estimates using a Gaussian kernel, albeit with a user-defined FWHM (default

value of 15 mm, which corresponds to a standard deviation of 6.36 mm). In

12

this section, a method for determining the ‘optimal’ smoothing bandwidth is

introduced, which corresponds to the ‘optimal’ degree of spatial regularisation

of the autocorrelation estimates.

It will be demonstrated in the Results section that the degree of smoothing of

the autocorrelation estimate has an effect on the distribution of the F -score

(Eq. 7). The deviation of the empirical F -distribution and the theoretical one

would be a good evaluation measure for choosing a regularisation bandwidth,

were it not for the fact that it can only be used when the theoretical distribu-

tion is known. In fMRI, this is only the case when there is no paradigm effect

present in the data, and the theoretical null distribution of the test statistic

can be used as a reference. We propose to choose the regularisation bandwidth

by minimising an observable criterion, which does not depend on the theoret-

ical distribution of the test statistic under study, but on the predictability of

the (estimated) spatial autocorrelation pattern.

The proposed analysis technique is based upon the so-called Nadaraya-Watson

kernel-weighted average for local regression. Consider ci, the vector consisting

of the p (i.e., the order of the AR-model used for correction) autocorrelation

estimates (for lags 1 . . . p) in pixel/voxel i with spatial coordinates xi. This

vector is ‘predicted’ from the vectors of the pixels/voxels within a given spatial

neighbourhood, the spatial extent of which is controlled by the ‘bandwidth’

h:

ch,i =

∑Nj=1 Kh(‖xj − xi‖)cj∑Nj=1 Kh(‖xj − xi‖)

, (10)

where N is the number of pixels/voxels and Kh is a Gaussian kernel with

13

standard deviation h. The prediction error can be computed as:

ε2(h) =1

N

N∑

i=1

(ci − ch,i)2. (11)

However, this prediction error will be zero when h approaches zero, since in

that case the spatial neighbourhood consists of only the centre pixel/voxel

i, due to which ci = ci, which is the trivial solution. Therefore, we adopt a

leave-one-out cross-validation strategy, and modify Eq. (10) by excluding the

vector of the centre pixel/voxel from the weighted average:

ch,i =

∑j 6=i Kh(‖xj − xi‖)cj∑j 6=i Kh(‖xj − xi‖) . (12)

Similar schemes have been considered for the optimisation of smoothing ker-

nels in local modelling and density estimation (for an overview, see Hastie

et al., 2001). The ensuing prediction error, ε2(h), can be minimised with re-

spect to h by computing it for a number of different bandwidth values and

determining the minimum. The corresponding bandwidth, hopt, yields the ‘op-

timal’ spatial extent over which the autocorrelation vector can be predicted

using a locally weighted average. This optimal bandwidth is influenced by the

variability of the autocorrelation estimate around the true value, as well as

the spatial variability of the true autocorrelation.

As an illustration, consider a two-dimensional sinusoidal grating (50× 50 pix-

els), to which white Gaussian noise is added with a standard deviation of

γn = 0.5. The proposed cross-validation measure is evaluated on this spatial

pattern for bandwidths (linearly between 0.1 and 3 in steps of 0.01). The

(h, ε2) curve is very smooth (see Fig. 2, solid line) and has a clear minimum

at hopt = 0.78. Since the objective is the optimal recovery of the noiseless

14

0 0.5 1 1.5 2 2.5 30

0.1

0.2

0.3

0.4

0.5

0.6

h

ε2

0 0.5 1 1.5 2 2.5 30

0.1

0.2

0.3

0.4

0.5

0.6

h

Fig. 2. Illustration of the method for determining the optimal bandwidth in the

case of a noisy 2D sinusoidal grating: the cross-validation prediction error ε2 (solid

line), the MSE between the smoothed noisy grating and the noiseless one (dashed

line), and the iterative minimisation steps performed by the Golden Section Search

method (open circles on top curve).

spatial pattern, the mean-square-error between the noiseless pattern and the

noisy pattern smoothed with a kernel of varying width h is also evaluated

(Fig. 2, dashed curve). This curve is also very smooth and shows a minimum

at h = 0.85. Therefore, the bandwidth which optimises the cross-validation

criterion is very close to that which minimises the MSE (‘theoretically’ op-

timal). The corresponding MSE-values are 0.0439 (h = 0.85) and 0.0450

(h = hopt = 0.78).

We will loosely use the term “optimal regularisation” to refer to the spa-

tial smoothing using a Gaussian kernel with a bandwidth obtained using the

cross-validation procedure on the spatial pattern of (p-dimensional) unbiased

autocorrelation estimates (in all situations tested, there was no noticeable

difference in optimal bandwidth for the spatial pattern of biased estimates).

In order to make the method computationally efficient, we further exploit

the smoothness of the (h, ε2)-curve and the fact that it seems to have only

a single minimum within a reasonable range (we do not claim this to be the

15

general case, but in all cases tested, there was only a single minimum). If

there is only a single minimum and it can be ‘bracketed ’ by three h-values 1 ,

the Golden Section Search algorithm can be used for obtaining the optimal

bandwidth in a relatively small number of evaluations of ε2 (Press et al.,

1992). The search algorithm iteratively brackets the minimum of a function

in smaller intervals (assuming the minimum exists and is unique within the

initial bracket). Convergence is monitored by the fractional precision of the

estimate (for details, see Press et al., 1992), which should be smaller than a

given tolerance parameter, which in all simulations has been set to tol = 0.01.

In the example shown in Fig. 2 (open circles in top curve), the algorithm

converges after 13 iterations.

3 Results

Several autocorrelation estimates are compared in the following studies. Where

possible, the theoretical correction (“Theo”) is used (in the case of a known au-

tocorrelation structure), and as a reference, the uncorrected results (“OLS”)

are also included. Both biased and unbiased autocorrelation estimates are

considered in different variants: unregularised (Bias and Unb), optimally reg-

ularised using the proposed approach (BiasR and UnbR), and ‘globally’ regu-

larised (BiasG and UnbG), i.e., using the average autocorrelations computed

over all pixels. First, the performance of Bias and Unb are evaluated, and

second, a detailed comparative study is performed on synthetic data sets,

1 The minimum of a function is bracketed by a triplet of points a < b < c, such

that f(b) is less than both f(a) and f(c), in which case the function has a minimum

in the interval (a, c).

16

examining the effect of biased/unbiased estimates and spatial regularisation.

Finally, real fMRI data sets are considered, both null and activation fMRI

data sets.

3.1 Fixed Autocorrelation

In order to compare the performance of the correction methods using the bi-

ased and unbiased estimates, without performing any spatial regularisation,

the following data sets are considered. Time series are generated from a first-

order autoregressive process driven by white Gaussian noise (Eq. 9). The cor-

rections are performed using an AR(1)-model for the noise, and the F -tests

are performed at a significance level of α = 0.01.

Sets of N = 10, 000 AR(1)-signals (n = 100 time samples) are generated for

values of a∗1 ranging from a∗1 = 0 to a∗1 = 0.5 in steps of 0.01. In each set, 1,000

time series are synthetically activated with γ1 = 1. The results are shown in

Fig. 3 (thick black curves for the theoretical, red curves for the biased and blue

for the unbiased corrections). For reference, the uncorrected OLS results have

been included (thin black curves), showing a strong increase of the MSEF and

the rate of false positives for increasing degrees of autocorrelation. The latter

even reaches rFP = 0.12 for a∗1 = 0.5, which exceeds ten times the nominal

size α (dotted line in Fig. 3B). The empirical power of the uncorrected OLS,

however, is higher than that of the corrected versions. Figure 3A shows that

the empirical null distribution of the unbiased correction method (blue curve)

better matches the theoretical one than the biased (red curve), which can also

be concluded from the rate of false positives (Fig. 3B), which is closer to 0.01

for the unbiased correction. The power of the unbiased correction method is

17

A B C

1

F

0 0.2 0.4 0.6

10-3

10-2

10-1

100

a*

MS

E

1

FP

0 0.2 0.4 0.6

10-2

10-1

a*

r

1

TP

0 0.2 0.4 0.60

0.2

0.4

0.6

0.8

1

a*

r

Fig. 3. Results of the fixed autocorrelation simulations: MSEF (A), rFP (B) and rTP

(C). Conventions are the following: Bias (red), Unb (blue), Theo (black thick) and

OLS (black thin). The dotted line in panel B denotes the nominal size α = 0.01.

slightly lower than that of the biased correction method (Fig. 3C). The power

for the theoretical correction is similar to the other methods, but the rate of

false positives is closer to the nominal size than that of the other correction

methods.

3.2 Spatially Variable Autocorrelation

Next, the improvement due to spatial regularisation of the (biased and unbi-

ased) autocorrelation estimates is illustrated. Spatially smooth autocorrelation

patterns are generated in the following way. A spatial noise pattern of 40× 50

pixels is generated containing normally distributed noise convolved with a

rectangular mask of 5 pixels with value 0.04 (a rectangular mask is chosen so

that its functional form differs from the Gaussian kernel used for regularisa-

tion). The values are shifted and scaled such that the extremal values are zero

and amax = 0.3, yielding a spatial pattern A(x), an example of which is shown

in Fig. 4A. For each pixel i at position xi, an AR(1)-signal of n = 100 time

samples is generated with an autocorrelation at lag one of A(xi):

18

yi,k = A(xi) yi,k−1 + νk,

where ν is a white Gaussian noise source. A central circular ‘activation’ region,

consisting of Nact = 500 pixels is defined, where the pixels are synthetically ac-

tivated with a factor γ1 (see Methods Section), and where the autocorrelation

values A(x) are optionally increased by aadd (see, e.g., Fig. 4C). The MSEF

is computed on the basis of the pixels outside the activation region (1,500

pixels). In addition, the mean-square discrepancy between the theoretical au-

tocorrelation pattern A(x) and the estimated one is evaluated (MSEAC).

Three types of situations are considered, examples of which are shown in

Fig. 4, visualising A(x) in a pseudocolour plot where the regions contain-

ing active pixels are delineated by a white dashed circle. The first situation

(“Batch1”) considers spatial autocorrelation patterns A(x) in the presence of

activation with no additional autocorrelation in the activation region (γ1 = 1

and aadd = 0). The second (“Batch 2”) introduces additional autocorrelations

in the activation region (γ1 = 1 and aadd = 0.3), similar to the increased

autocorrelations observed in grey matter compared to white matter in fMRI

studies (see, e.g. Woolrich et al., 2001). The third situation (“Batch 3”) is

designed for examining the performance of the correction methods when there

is no activation present (hence, no dashed line in Fig. 4), but there is still a

circular region with increased autocorrelation (γ1 = 0 and aadd = 0.3).

For each situation, 1,000 autocorrelation patterns are generated, and the dif-

ferent autocorrelation estimates are used for the correction. The results of

the F -tests (performed at the level of α = 0.05 with the dummy paradigm)

are summarised in Tables 1 and 2, showing the deviations from the theoreti-

cal autocorrelation pattern at unit lag (MSEAC), and the deviations from the

19

Batch1 Batch2 Batch3

γ1 = 1, aadd = 0 γ1 = 1, aadd = 0.3 γ1 = 0, aadd = 0.3

0 0.2 0.4 0.6

Fig. 4. Pseudocolour plot of a synthetically generated spatial autocorrelation pat-

terns, A(x). The region demarcated by a white, dashed circle denotes the ‘active’

region.

theoretical null distribution (MSEF ), respectively the rate of false (rFP) and

true positives (rTP). Note that since the autocorrelation patterns for Batch2

and Batch3 are identical by design (γ1 is the only difference between these two

batches), the MSEAC results in Table 1 are also identical, as are the rates of

false positives outside the activation region (columns 3 and 5 in Table 2). For

Batch3, there are two rates of false positives (columns 5 and 6 in Table 2),

namely that outside (rFP) and within (raFP) the activation region (the differ-

ence being the increased autocorrelation in the activation region). All pairs of

samples (each containing 1,000 data points) are tested for a difference in mean

using a two-sample t-test at the significance level of 0.05. For each batch, the

pairs of results using correction methods for which no statistical differences

are found, are indicated by pairs of superscripts (∗ and +) in Tables 1 and 2.

The standard deviations have not been included in Tables 1 and 2, but are on

the order of 0.0006 for the MSEAC, 0.01 for the MSEF , 0.006 for the rFP and

0.01 for the rTP.

20

It can be observed from Table 1 that the deviation from the theoretical au-

tocorrelation at unit lag (MSEAC) is smaller for the unbiased correction ap-

proaches than for the biased ones (comparing the unregularised, regularised

and global estimates in a pairwise fashion) as expected, and that the MSEACs

for the unregularised approaches are considerably larger than for the regu-

larised and global estimates. The same holds for the MSEF measures, except

for the global estimates in Batch2, where the MSEF for BiasG is smaller than

that for UnbG. Finally, the MSEF for the uncorrected F -test (OLS) is consid-

erably higher, indicating a clear effect of autocorrelation on the F -distribution

under the null hypothesis. The UnbR performs best with respect to both the

MSEAC and the MSEF in the three situations considered (shared with the

UnbG in Batch1 for the MSEF ).

The rate of true positives (columns 2 and 4 in Table 2) is an indication of the

power of the test. The power of the uncorrected OLS is highest and, maybe

at first surprisingly so, the theoretical correction (Theo) yields very low power

(rTP of 0.980 and 0.814 for Batch1 and Batch2, respectively). To explain these

two results, the rate of false positives, which should approximate the nominal

size of the test (α = 0.05), needs to be examined. The OLS displays a high

power (rTP = 0.991 and rTP = 0.944) at the expense of an increased rate of

false positives (rFP = 0.088). This indicates that the OLS test underestimates

the true variance of the regression coefficients: the empirical distribution is

wider than the distribution against which the test is performed. Therefore,

since an exact test (Theo), in which case the empirical distribution matches

that against which testing is performed, the rate of false positives will be lower

(0.050, very close to the nominal size), at the expense of lower power.

Due to the variance of the unregularised autocorrelation estimates, the rates

21

Table 1

Mean values of the mean-square discrepancy measures between the theoretical and

estimated autocorrelation patterns at unit lag (MSEAC) and empirical null distri-

bution (MSEF ). Within columns the pairs of samples are indicated, the means of

which are not statistically different using a two-sample t-test (superscripts ∗ and +)

at a significance level of 0.05.


MSEAC MSEF MSEAC MSEF MSEAC MSEF

OLS — 0.144 — 0.144 — 0.432

Theo — 0.007 — 0.007+ — 0.005

Bias 0.011 0.030 0.011 0.030 0.011 0.032+

BiasR 0.002 0.015∗ 0.003 0.013∗ 0.003 0.016∗

BiasG 0.003 0.016∗ 0.020 0.013∗ 0.020 0.050

Unb 0.010 0.017∗ 0.010 0.017 0.010 0.016∗

UnbR 0.001 0.008+ 0.002 0.007+ 0.002 0.007

UnbG 0.002 0.008+ 0.019 0.025 0.019 0.030+

of true positives for these corrections are fairly low. Reducing this variance

(regularisation) increases the rate of true positives (Bias vs. BiasR and BiasG,

and Unb vs. UnbR and UnbG). When comparing the rates of true positives

to those of Theo, the possible presence of bias should be taken into account.

Indeed, biased estimates generally yield an underestimate of the actual auto-

correlation, due to which the rate of true positives increases (Bias vs. Theo).

Whether the rate of true positives for Unb is higher than for Theo is case

22

dependent. This can be explained by reconsidering Fig. 1C. Suppose the true

autocorrelation is identical for all pixels within a certain region, and that the

distribution of the estimated values is Gaussian and centred around the true

value. The rate of true positives would be the same as that for the theoretical

correction if the curves in Fig. 1C were linear (or at least approximately linear

in the region spanned by the autocorrelation estimates), in which case an over-

/underestimation of the true value would yield a balanced decrease/increase

of the rate of true positive values. In most cases, however, there is an asym-

metrical (nonlinear) effect of the over/-underestimation on the rate of true

positives, due to which this rate can be higher, lower or equal to that of the

theoretical correction, depending on the data.

To test whether the rFP-measures conform to the expected nominal size (α =

0.05), the rFP-samples are tested for a mean of 0.05 (one-sample t-test at α =

0.05) and the null was rejected for all approaches, except for Theo (i.e., Theo

is the only correction method that yields a correct rFP). The unregularised

correction methods (Bias and Unb) show increased rates of false positives with

respect to their regularised and global counterparts, due to the variability of

the autocorrelation estimates. The BiasR estimates yield rates of false positives

higher than the nominal size due to the bias, but this is not the case for the

BiasG in Batch2 and Batch3 (0.042). This can be attributed to the additional

autocorrelations in the activation region, due to which the global average of

the autocorrelation (BiasG) is increased with respect to that in Batch1. An

overestimation of the autocorrelation induces a decrease in the rate of false

positives (see Fig. 1B), due to which rFP decreases to 0.042. This can be

validated by the raFP: following the same reasoning, the autocorrelation in the

activation region is underestimated, due to which the rate of false positives

23

Table 2

Mean values of the rate of false (rFP) and true positives (rTP), using the same

conventions as in Table 1. The last column denotes the rate of false positives in the

‘activation’ region for Batch3 (raFP, see text).


rFP rTP rFP rTP rFP raFP

OLS 0.088 0.991 0.088 0.944 0.088 0.190

Theo 0.050 0.980 0.050∗ 0.814 0.050∗ 0.047

Bias 0.062 0.981∗ 0.062 0.836 0.062 0.066∗

BiasR 0.058∗ 0.984+ 0.057 0.854 0.057 0.067∗

BiasG 0.058∗ 0.984+ 0.042 0.921 0.042 0.133

Unb 0.056 0.977 0.056 0.819 0.056 0.058+

UnbR 0.052+ 0.981∗ 0.050∗ 0.839 0.050∗ 0.059+

UnbG 0.052+ 0.982 0.036 0.916 0.036 0.124

should be higher than the nominal size, which is, indeed, the case in Batch3

(raFP = 0.133). Similar effects can be seen for UnbG. Except for the theoretical

correction, the unbiased regularised method (UnbR) is the only correction

method that maintains a rate of false positives close to the nominal size in the

three situations considered here.

The evaluation measures are further computed for Batch2 (1,000 realisations)

under different levels of smoothing of the autocorrelation estimates (Bias and

Unb). This is a necessary validation, since the ‘optimal’ degree of smoothing,

24

hopt, used in the BiasR and UnbR corrections, is not necessarily optimal with

respect to these evaluation measures. Figure 5 shows the average results for

varying the smoothing bandwidth, h, from 0.5 to 5 pixels in steps of 0.25.

The minima for the biased and unbiased approaches coincide, and the auto-

correlation pattern is optimally recovered (minimum MSEAC) for h = 1.00

using both the biased (red curve) and unbiased (blue curve) estimates (dotted

line in Fig. 5A), the latter with a smaller MSEAC (Fig. 5A). The empirical

null distribution using the biased correction improves (MSEF decreases) over

the complete range tested, but reaches an optimum using the unbiased cor-

rection for h = 2.00 (Fig. 5B). Similarly, the rate of false positives improves

(gets closer to the nominal size denoted by the dashed line in Fig. 5C) over

the complete range tested for the biased approach, and is closest to 0.05 for

h = 1.25 using the unbiased correction (dotted line in Fig. 5B). This illustrates

the effect of the bias on the efficacy of the correction, as it would be expected

that a low MSEAC would correspond to good results for the other evaluation

measures, which is the case for the unbiased, but not for the biased approach.

The effect of the bias can only be suppressed by large degrees of smoothing.

The rate of true positives increases almost linearly over the range tested for

both corrections (between 0.84 and 0.88 for the biased and between 0.82 and

0.87 for the unbiased correction; results not shown). The histogram of the

optimal bandwidths obtained using the proposed method is shown in Fig. 5D

(average 1.17 and standard deviation 0.08). This average is represented as the

black vertical lines in Fig. 5A–C. The hopt-value is very close to that which

minimises the MSEAC for both the biased and unbiased correction, and to

that which brings rFP closest to 0.05 for the unbiased correction (blue curves

in Fig. 5A and 5C). It is, indeed, logical to assume that if the autocorrelation

is estimated in an optimal (or nearly optimal) manner, the correction (and

25

the rate of false positives) can also be expected to be near-optimal rendering

the test exact, as is the case for the unbiased correction method (blue curves).

However, the degree of smoothing that minimises the MSEF for the unbiased

correction, is higher than the latter degree of smoothing, suggesting that the

MSEF measure is influenced by the variance of the parameter estimates even

after smoothing. Indeed, the F -test assumes the whitening matrix (computed

from the estimated autocorrelation structure) to be noiseless and exact. The

results for the biased correction (Fig. 5B and 5C) seem to suggest that large

degrees of smoothing are necessary for best performance (there is an opti-

mal value for the MSEF for h > 5, since the biased global correction yields

MSEF =0.013, as shown in Table 1), which is in contradiction to the degree of

smoothing which is optimal with respect to the MSEAC (Fig. 5A).

Finally, for one realisation of Batch3, the F -scores for the different correc-

tion methods are sorted in increasing order, and are plotted against those of

the theoretical correction (Fig. 6A for the biased and 6B for the unbiased

approaches). This generates plots that are similar to the PP -plots described

in (Marchini and Ripley, 2000; Woolrich et al., 2001; Marchini and Smith,

2003), where the log10(p)-values were plotted as a function of the theoretical

p-distribution. We have opted for a scatter plot of the F -scores, an “FF”-plot,

since this avoids the logarithmic scale and basically conveys the same informa-

tion as the PP -plot. An exact test would yield a scatter plot coinciding with

the bisector line (dashed lines in Fig. 6). In line with previous results where the

“empirically obtained probabilities are predominantly less than the expected

theoretical probabilities” (Woolrich et al., 2001), the F -scores of the various

correction methods are higher than the theoretically optimal. It is clear from

Fig. 6 that the OLS method (black curve) overestimates the F -scores, and,

26

A B

AC

0 1 2 3 4 50

0.002

0.004

0.006

0.008

h

MS

E

F

0 1 2 3 4 50

0.005

0.01

0.015

0.02

h

MS

E

C D

FP

0 1 2 3 4 50.04

0.05

0.06

h

r

opt

0.8 1 1.2 1.40

10

20

30

h

perc

enta

ge

Fig. 5. A–C) Performance measures as a function of the degree of smoothing (red

and blue curves correspond to the biased and unbiased autocorrelation estimates,

and optimal values indicated by dotted lines): MSE of the autocorrelation pattern

estimate (A), the MSE of the empirical null distribution (B) and the rate of false

positives (C, dashed line denotes the nominal size α = 0.05); D) Histogram of the

bandwidths suggested by the proposed method (the average value is indicated as

a black vertical line in panels A–C). The dotted lines in panels A–C represent the

degree of smoothing that optimises the performance of the unbiased correction.

albeit less severely, so do the unregularised (yellow curves in Fig. 6A and 6B)

and globally regularised approaches (green curves in Fig. 6A and 6B). The op-

timal regularisation approaches, and the theoretically correct yield empirical

F -distributions very close to the theoretically expected (red, blue and purple

curves almost coinciding with the bisector line).

27

A B

des

0 5 10 150

5

10

15

F

F

des

0 5 10 150

5

10

15

F

F

Fig. 6. Empirical F -distribution using the different biased (A; Bias–yellow, Bi-

asR–red, BiasG–green, OLS–black, Theo–purple) and unbiased (B; Unb–yellow,

UnbR–blue, UnbG–green, OLS–black, Theo–purple) correction methods against the

theoretically expected F -distribution, Fdes.

3.3 Null fMRI Data

Next, “Null” fMRI Data sets are considered, which are recorded in a hu-

man subject who is asked simply to remain passive during scanning. The

data set is publicly available from http://www-bmu.psychiatry.cam.ac.uk/

DATA/NULLdata/index.html. Results in this section are shown for data set

“000413-m02_6_EPI”. The voxel size is [3.9 × 3.9 × 5.0] mm, and each voxel

has a time course of 80 time samples with TR=3 seconds. The images are

acquired on a 3 Tesla machine.

The results are restricted to the correction methods using the unbiased auto-

correlation estimates (F -testing at the level of α = 0.05). The MSEF and rFP

for the methods using the biased estimates are similar, but in general, larger

than those for the methods using the unbiased estimates. For reference, the

uncorrected OLS results, and the results using a fixed bandwidth of 6.37 mm

(FWHM=15 mm, referred to as “UnbF”, which is the default value 2 used

2 To be precise, the actual bandwidth is given by FWHM = 15 (100/df)(1/3) [mm].

28

by Worsley et al. 2002) are shown. The design matrix consists of the two re-

sponses (following the dummy paradigm) and three drift terms (detrending of

order two). The first five scans have been discarded from the analysis.

The results are shown as a function of the order p of the AR-model used for

correction in Fig. 7A and 7B, respectively the MSEF and rFP. The uncorrected

OLS-results (black lines) have a large MSEF (0.174) and a strongly increased

rate of false positives (0.091) and only the results for Unb (yellow curves) are

worse for higher orders (p > 6). The performance of the unregularised correc-

tion degrades with increasing AR-order due to the increasing variance of the

autocorrelation estimates (an increasing number of AR-coefficients that need

to be estimated with the same amount of data, results in an increased vari-

ance of the estimates). The three remaining, regularised correction methods

(UnbR, UnbF and UnbG) do not suffer from the increasing variance of the

autocorrelation estimates with increasing AR-model orders. The MSEF mea-

sures remain low (< .01) over the range tested and the rates of false positives

also remain close to the nominal size (α = 0.05). The optimal bandwidths in-

crease as a function of the AR-model order (Fig. 7C). This is expected, since

a larger degree of spatial regularisation is required to reduce the variability

of the estimates (for the same amount of data, adding parameters will render

the estimates more variable). For the three Null data sets tested, the AR(1)-

correction is insufficient, and yields a slightly elevated rate of false positives

(left-hand side of Fig. 7B). Both the MSEF and rFP decrease for higher-order

AR-models up to four. The rate of false positives for UnbG (green curve in

Fig. 7B) is very close to the nominal size. However, as illustrated in a previous

simulation study (Batch3), there is a systematic over- and underestimation

of the autocorrelation, which can lead to an increased rate of false positives

29

A B

F

1 2 3 4 5 6 7 8 9 10

10-3

10-2

10-1

10-0

p

MS

E FP

1 2 3 4 5 6 7 8 9 100.025

0.05

0.075

0.1

0.125

p

r

C D

opt

1 2 3 4 5 6 7 8 9 102.25

2.5

2.75

3

p

h

[mm

]

F

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

10-3

10-2

h

MS

E

Fig. 7. Results for the Null fMRI data set. Discrepancy between the empirical and

theoretical null distribution (A), Rate of false positives (B) and Optimal bandwidth

(C) as a function of order p; D) MSEF as a function of the bandwidth h for p = 1

(solid) and for p = 4 (dashes), minima indicated by open circles; Colour conventions

in panels A and B are the following: Unb (yellow solid), UnbR (blue solid), UnbG

(green solid), UnbF (blue dashed) and OLS (black solid).

in (inactive) regions with a higher autocorrelation. Using UnbF (blue dashed

curves) and UnbG (green curves), the MSEF remains relatively constant for

higher-order AR-models, but the UnbR-correction method (blue solid curves)

shows a minimum, indicating a value which balances the effects of model

complexity, spatial resolution and noise. This minimum cannot be used for

determining the optimal AR-order in practice (fMRI activation studies), since

it would require a null fMRI scanning session for each study.

To validate these results, the MSEF is evaluated as a function of the bandwidth

30

h of the Gaussian kernel used for spatially smoothing the autocorrelation esti-

mate. The results using an AR(1) and an AR(4) noise model are shown in Fig.

7D (solid and dashed curves). The bandwidths obtained using the proposed

method are shown as open circles. It can be observed that the obtained hopt is

very close to the bandwidth which minimises the MSEF . This will not neces-

sarily be the case in general, since other disturbance factors can influence the

position of this minimum, as was shown for the biased estimates in Fig. 5B.

Finally, the FF -plots (empirical distribution vs. theoretical one) are shown

in Fig. 8, respectively for the biased (Fig. 8A) and unbiased (Fig. 8B) correc-

tion methods, both using an AR(4)-model. As was the case for the synthetical

data set (Fig. 6), the effect of spatial regularisation can be clearly observed in

Figs. 8A and 8B: the curves for the regularised corrections (BiasR-red, UnbR-

blue, BiasG/UnbG-green, BiasF/UnbF-purple) lie closer to the bisector line

than those for the unregularised (Bias/Unb-yellow) and OLS (black) correc-

tions. Furthermore, the unbiased corrections show a smaller deviation from

the bisector line than the biased corrections.

3.4 Effect of Presmoothing

Spatial smoothing of the spatiotemporal fMRI data prior to the statistical

analysis is a common preprocessing step. Therefore, it is important first to

establish the effect of spatial smoothing of the spatiotemporal data on the op-

timal degree of smoothing of the spatial pattern of autocorrelation estimates.

Consider the example Batch3 autocorrelation pattern shown in Fig. 9A, which

is used for generating a spatiotemporal data set. The estimated unbiased au-

tocorrelation pattern (Unb) is shown before (Fig. 9B) and after smoothing

31

A B

des

0 5 10 15 200

5

10

15

20

F

F

des

0 5 10 15 200

5

10

15

20

F

F

Fig. 8. FF -plots for the Null fMRI data set using the biased (A) and unbiased (B)

AR(4)-correction methods. Colour conventions are the following: Bias/Unb (yel-

low), BiasR (red), UnbR (blue), BiasG/UnbG (green), BiasF/UnbF (purple), OLS

(black). The dashed lines represent the bisector lines, corresponding to a perfect

match between the empirical and theoretical distributions.

of the estimates, using hMSE = 1.1 (Fig. 9C), which is the bandwidth that

minimises the MSEAC obtained using a grid search with bandwidths between

0.5 and 10 pixels in steps of 0.1, and using hopt = 1.13 (Fig. 9D), which is

obtained using the proposed approach. Next, the spatiotemporal data are spa-

tially smoothed using a Gaussian kernel with a bandwidth of one pixel (note

that the underlying autocorrelation pattern remains identical, thus Fig. 9A

and 9E are the same). The unsmoothed autocorrelation estimate (Unb, Fig.

9F) is clearly smoother than in the absence of presmoothing (Fig. 9B). Figure

9G shows the smoothed autocorrelation estimates using the bandwidth that

minimises the MSEAC (hMSE = 2.0).

The spatial smoothness of the estimated autocorrelation pattern is prob-

lematic for the proposed method. Indeed, the autocorrelation value in any

pixel/voxel will be best explained from only its nearest neighbouring pix-

els/voxels, due to the strong spatial correlations between neighbouring pix-

els/voxels, induced by the spatial smoothing process. A subsampling strategy

32

original Unb h = hMSE h = hopt

A B C D

unsmoothed

E F G H

smoothed

-0.3 0 0.3 0.6

Fig. 9. Effect of spatial (data) presmoothing on the spatial autocorrelation pattern.

A,E) Original autocorrelation patterns A(x) used for generating the spatiotemporal

data; B,F) Unbiased (unregularised) autocorrelation estimates; C,G) Regularised

autocorrelation estimates using hMSE; D,H) Regularised autocorrelation estimates

using hopt. The first (second) row shows the results in the absence (presence) of

spatial (data) smoothing.

is employed to adapt the proposed method, which assumes that the correla-

tions induced by the presmoothing are negligible when the data are (spatially)

subsampled with a factor equal to the FWHM (in pixels/voxels) of the spatial

presmoothing kernel. In the current implementation, this factor is rounded to

the nearest (non-negative) integer. If the obtained bandwidth is a degenerate

one (either of the extreme values of the initial bracket used in the Golden

Section Search method), the subsampling factor is increased by unity, and the

procedure is repeated. The resulting optimal regularisation bandwidth is con-

siderably higher than in the unsmoothed case (hopt = 2.27 pixels), and close

to hMSE.

Finally, the previously described data set Batch3 is reconsidered, and each

data set is spatially smoothed with a Gaussian kernel with a bandwidth, σs,

33

σs

0 0.5 1 1.5 2 2.5 30

2

4

6

h

Fig. 10. Desired (hMSE, red curve) and obtained bandwidths (hopt, blue curve) as a

function of the kernel width σs of the kernel used for spatially smoothing the data

(average and standard deviation shown).

varying from 0 to 3 pixels in increments of 0.25, after which the desired degree

of smoothing for MSEAC minimisation (hMSE) is determined for the Unb-

estimate. Overall results (mean and standard deviation computed over 100 re-

alisations for every degree of presmoothing) are shown in Fig. 10 (red curve).

The desired degree of smoothing clearly increases with increasing degrees of

presmoothing. This supports the previous observation that, even though the

estimated autocorrelations are spatially smooth (due to the spatial smoothing

of the data), a higher degree of smoothing is still required for recovering the

underlying autocorrelation pattern. The bandwidths obtained using the pro-

posed method are visualised as the blue curves, showing a similar increasing

trend, but the bandwidths show a higher variability.

3.5 Activation fMRI Data

To further illustrate the proposed method, the fMRI data set used in (Worsley

et al., 2002) is considered. The experiment addresses pain perception using the

following paradigm: 9 seconds of rest, a painful heat stimulus of 9 seconds, 9

seconds of rest, and a warm (neutral) stimulus of 9 seconds. The data were

34

scanned at 1.5 Tesla, with TR=3 seconds (120 scans in total), and consisted of

13 slices of 100× 128 pixels. The voxel size was 2.3× 2.3× 7 mm and the data

were smoothed in-slice by 6 mm as part of the motion correction. For more

details, we refer to the fmristat homepage (http://www.math.mcgill.ca/

~keith/fmristat/).

The data were analysed using fmristat (Worsley et al., 2002), with AR-models

of increasing order (p = 1, . . . , 6), and for each order, the unbiased estimates

of the p-dimensional autocorrelation vectors were smoothed using bandwidths

between 2.12 and 10.62 mm in steps of 2.12 mm (this corresponds to FWHMs

between 5 and 25 mm). The unbiased estimates were used for determining

the optimal bandwidth using the proposed method, taking into account the

presmoothing kernel (thus, subsampling by [3 3 1] voxels). This subsampling

factor sufficed for all orders p for obtaining a non-degenerate optimal band-

width. The required processing times in the AR(1)-case were approximately 8

seconds for determining the optimal bandwidth and 5 minutes for the statis-

tical analysis (which includes the prewhitening) 3 .

Figure 11 shows the (unbiased) AR(4)-parameters for a given axial slice (from

top to bottom row, respectively the unregularised, using a fixed kernel with

FWHM=15 mm, and using the optimal bandwidth, FWHM=6.9 mm ap-

proaches). In general, the AR-coefficients decrease in size from left to right

(a1 − a4) and the a4-coefficients (last column) are very small. The effect of

spatially smoothing is, evidently, that the details are blurred and that the

high-frequency (possibly noise) components are reduced, but also that the

3 Analyses were run on a Pentium 4 processor (3 GHz), running Matlab 6.5 (Release

13) under Linux.

35

a1 a2 a3 a4

Unb

-0.6 -0.4 -0.2 0 0.2 0.4 0.6

UnbF

UnbR

Fig. 11. AR(4)-coefficients for an axial slice of the Activation fMRI data set.

From top to bottom row, the results for the unregularised (Unb), fixed kernel of

FWHM=15 mm (UnbF) and optimally regularised (UnbR) are shown respectively.

The four columns correspond to the four AR-coefficients (a1−a4 from left to right).

The colour scale of the plots are set corresponding to the colour bar shown in the

top left corner.

strong AR-coefficients become smaller. This could influence the subsequent t-

scores, since underestimating the autocorrelation structure enhances the power

at the expense of a higher risk of false positives.

In (Worsley et al., 2002), it is stated that the t-statistics for higher-order

AR-models differ by less than 1% from those derived using an AR(1)-model

(for a fixed smoothing FWHM of 15 mm). Although the difference on aver-

age may be very small, it is still interesting to compare t-maps for different

AR-model orders and smoothing bandwidths in closer detail. Since the de-

36

sired activation pattern for this data set is not known, the correctness of the

resulting t-maps cannot be quantitatively evaluated. However, the effect can

be illustrated by observing the number of voxels with an absolute t-value ex-

ceeding a threshold of tthres = 3 as a function of the FWHM, and for a number

of AR-model orders. Note that this number contains both ‘correct’ and ‘in-

correct’ rejections of the null hypothesis. The results are shown in Fig. 12

(colour conventions as shown in figure legend). In the uncorrected OLS case,

the number of supra-threshold voxels is 2985, which is much higher than the

numbers found in the corrected cases. This is expected, since the power of the

OLS has been demonstrated to be the highest of the methods under study,

at the expense of a strongly increased rate of false positives. The number of

supra-threshold voxels tends to decrease for increasing FWHM, and for the

FWHM=15 mm-case, this number also tends to decrease for increasing AR-

model orders (from 1,780 to 1,548). The optimal bandwidths obtained using

the proposed method (indicated as open circles in Fig. 12) are around 6.5

and 7.5 mm (FWHM), with correspondingly between 1,700 and 2,000 supra-

threshold voxels. Thus, although the t-maps appear very similar for different

model orders and smoothing bandwidths (results not shown, but see Worsley

et al., 2002), these modelling choices do influence the number of voxels for

which the null hypothesis is rejected (using a very loose rejection criterion).

4 Discussion

The presence of temporal autocorrelations in the error signal after regression

has been a major concern in the statistical analysis of fMRI data. There exist

various correction methods, which aim at determining or shaping the auto-

37

AR(1)AR(2)AR(3)AR(4)AR(5)AR(6)

0 5 10 15 20 251500

1750

2000

2250

FWHM [mm]

#(t>

3)Fig. 12. Number of supra-threshold voxels in the fMRI activation data as a func-

tion of the FWHM used for spatially regularising the autocorrelation estimates, for

different orders p of AR-model (colour conventions as shown in legend). The results

using the optimal bandwidths are shown as open circles. The OLS-approach yields

2985 supra-threshold voxels (not shown on figure).

correlation structure of the error signal. The prewhitening approach, initially

proposed for fMRI by Bullmore et al. (1996), yields the minimum variance

unbiased estimates of the regression coefficients, if the true autocorrelation

is known (Woolrich et al., 2001; Friston et al., 2000a; Bullmore et al., 2001).

Since the true autocorrelation is unknown and needs to be estimated, various

sources of bias in the statistical inference are introduced, among which the

use of the OLS residuals for estimating the autocorrelation (due to which the

autocorrelation estimates are biased) and the variance of the autocorrelation

estimates (for an overview, see Marchini and Smith, 2003). The first source of

bias has been dealt with efficiently in (Worsley et al., 2002), yielding unbiased

estimates of the autocorrelation, and it has been suggested to reduce the effect

of the second source by spatially smoothing the (unbiased) autocorrelation es-

timates. There is also a recent alternative method, which takes into account

the variance of the autocorrelation estimates in the statistical inference, rather

than trying to reduce it by smoothing (Kiebel et al., 2003).

38

In this article, we have introduced a novel autonomous method for determin-

ing the bandwidth of the kernel used for spatially smoothing the autocor-

relations, on the basis of a cross-validation criterion. It has been shown by

extensive simulations on synthetic data to be near-optimal for recovering the

true autocorrelation structure. Furthermore, using the (optimally) spatially

smoothed autocorrelations for prewhitening the data, renders the empirically

obtained distribution of test statistics very close to that theoretically expected,

which is verified on both synthetic and real fMRI data recorded under “null”

conditions.

In (Marchini and Smith, 2003), it is claimed that the correction for the bias

in the autocorrelation estimates, as proposed by Worsley et al. (2002), did

not yield a noticeable improvement to the validity of the statistical test. They

qualitatively compared the effect of bias correction in synthetic data, using

PP -plots, in which the empirically obtained and the theoretically expected

p-values were visualised in a scatter diagram (on logarithmic axes). Similarly,

we have plotted “FF -plots”: the empirically obtained F -scores against those

obtained using the correct autocorrelation function, also on synthetic data,

which yielded very similar results (in both the PP - and FF -plots, a perfect

match between empirical and theoretical distributions would yield a curve co-

inciding with the bisector line). As described in (Woolrich et al., 2001), the

probabilities were predominantly less (smaller p-values, which corresponds to

higher F -scores) than those theoretically expected, at least in the absence of

spatial regularisation or using global regularisation. Albeit the results (both

our own and those of Marchini and Smith, 2003) of the biased and unbi-

ased approaches are visually very similar (see red and blue curves in Fig. 6A

and 6B), a rigorous quantitative analysis, using the mean-square discrepancy

39

between the empirical and theoretical F -distributions, and the rate of false

positives, indicate a significant improvement of the unbiased corrections over

the biased ones (see Tables 1 and 2). When the FF -plots are generated for a

Null fMRI data set, the difference between the biased and unbiased correction

methods is visually discernible, showing a larger deviation from the bisector

line for the biased corrections. Furthermore, optimal spatial regularisation,

as proposed in this article, greatly improves the exactness of the statistical

test after correction, which is shown quantitatively on both synthetic and null

fMRI data.

There is an additional source of bias in the statistical inference, namely the

misspecifications of the noise model. In most approaches, a first-order AR-

model is used, and is claimed to be sufficient. In (Bullmore et al., 1996),

the Box-Pierce test statistic (using biased autocorrelation estimates) was em-

ployed for testing the null hypothesis that the residual signal contains only

white noise. They concluded that an AR(1) model was sufficient for modelling

the fMRI noise process, based on the analysis of an averaged time series (over

156 voxels), thus, assuming a spatially constant autocorrelation structure. Two

approaches are described that determine the order of the noise model for each

voxel separately. In (Locascio et al., 1997), the order of the autoregressive mov-

ing average (ARMA) model was determined for each voxel separately (up to

order three) using the Ljung-Box test. Woolrich et al. (2001) tested for general-

order AR-models, and reported that orders of up to six were required, although

the majority of voxels required orders of up to three (Woolrich et al., 2001,

Fig. 12 therein). Furthermore, also in (Woolrich et al., 2001), it was shown

that (nonlinear) spatial smoothing of the autocorrelation estimates allowed

for more flexible models, clearly illustrating that spatial regularisation of the

40

autocorrelation estimates can in principle justify the application of models

of orders higher than those indicated by a unregularised test procedure. We

have illustrated that using higher-order AR-models for correction improves the

exactness of the test, when some form of spatial regularisation of the auto-

correlation estimates is present. Simulations on Null fMRI data demonstrate

that in the absence of regularisation, the exactness of the test deteriorates

with increasing model order, due to the increased variance of the parame-

ter estimates. Spatial regularisation reduces this variance, in which case the

statistical test benefits from higher-order AR-models.

The proposed method has been described in terms of spatial smoothing using

an isotropic Gaussian kernel. It can easily be extended to other kernel types,

or even to nonlinear spatial smoothing (to avoid the smoothing over different

tissue types, see, e.g., Woolrich et al., 2001), as long as the spatial extent

of the smoothing kernel is parametrised by a single bandwidth parameter. If

there are, e.g., different bandwidths per spatial dimension, the Golden Section

Search can no longer be used, and more intricate minimisation schemes need

to be employed. The proposed method can also easily be adapted for use with

other noise models, both parametric (Locascio et al., 1997; Purdon et al., 2001)

and non-parametric (Woolrich et al., 2001; Wicker and Fonlupt, 2003), since

the vector of coefficients, c, is not restricted to AR-coefficients in particular,

and can be replaced by any vector characterising the noise process.

5 Acknowledgements

The authors wish to thank the Brain Mapping Unit, University of Cambridge,

UK, for making their Null fMRI data sets publicly available. The activation

41

fMRI data (described in Worsley et al., 2002) were used with permission from

the authors.

The authors are supported by research grants received from the Belgian Fund

for Scientific Research – Flanders (G.0248.03 and G.0234.03), the Flemish Re-

gional Ministry of Education (Belgium) (GOA 2000/11), and the European

Commission, 5th framework programme (QLG3-CT-2000-30161 and IST-2001-

32114).

References

Biswal, B., Yetkin, F., Haughton, V., Hyde, J., 1995. Functional connectivity

in the motor cortex of resting human brain using echo-planar MRI. MRM

34, 537–541.

Bullmore, E., Brammer, M., Williams, S., Rabe-Hesketh, S., Janot, N., David,

A., Mellers, J., Howard, R., Sham, P., 1996. Statistical methods of estima-

tion and inference for functional MR image analysis. MRM 35, 261–277.

Bullmore, E., Long, C., Suckling, J., Fadili, J., Calvert, G., Zelaya, F., Car-

penter, T., Brammer, M., 2001. Colored noise and computational inference

in neurphysiological (fMRI) time series analysis: Resampling methods in

time and wavelet domains. Hum. Brain Mapp. 12, 61–78.

Friston, K., Holmes, A., Poline, J.-B., Grasby, P., Williams, S., Frackowiak,

R., , Turner, R., 1995. Analysis of fMRI time series revisited. NeuroImage

2, 45–53.

Friston, K., Josephs, O., Zarahn, E., Holmes, A., Rouquette, S., Poline, J.-B.,

2000a. To smooth or not to smooth? bias and efficiency in fMRI time-series

analysis. NeuroImage 12, 196–208.

42

Friston, K., Mechelli, A., Turner, R., Price, C., 2000b. Nonlinear responses

in fMRI: The balloon model, Volterra kernels, and other hemodynamics.

NeuroImage 12, 466–477.

Hastie, T., Tibshirani, R., Friedman, J., 2001. Elements of Statistical Learning:

Data Mining, Inference and Prediction. Springer-Verlag, New York.

Kiebel, S., Glaser, D., Friston, K., 2003. A heuristic for the degrees of freedom

of statistics based on multiple variance parameters. NeuroImage 20, 591–

600.

Locascio, J., Jennings, P., Moore, C., Corkin, S., 1997. Time series analysis in

the time domain and resampling methods for studies of functional magnetic

resonance brain imaging. Hum. Brain Mapp. 5, 168–193.

Marchini, J., Ripley, B., 2000. A new statistical approach to detecting signif-

icant actication in functional MRI. NeuroImage 12, 366–380.

Marchini, J., Smith, S., 2003. On bias in the estimation of autocorrelations

for fMRI voxel time-series analysis. NeuroImage 18, 83–90.

Press, W., Flannery, B., Teukolsky, S., Vetterling, W., 1992. Numerical Recipes

in C: The Art of Scientific Computing, 2nd Edition. Cambridge University

Press, New York, NY, USA.

Purdon, P., Solo, V., Weisskoff, R., Brown, E., 2001. Locally regularized spa-

tiotemporal modeling and model comparison for functional MRI. NeuroIm-

age 14, 912–923.

Purdon, P., Weisskoff, R., 1998. Effect of temporal autocorrelation due to

physiological noise and stimulus paradigm on voxel-level false-positive rates.

Hum. Brain Mapp. 6, 239–249.

Solo, V., Purdon, P., Weisskoff, R., Brown, E., 2001. A signal estimation ap-

proach to functional MRI. IEEE Trans. Med. Imaging 20 (1), 26–35.

Wicker, B., Fonlupt, P., 2003. Generalized least-squares method applied to

43

fMRI time series with empirically determined correlation matrix. NeuroIm-

age 18, 588–594.

Woolrich, M., Ripley, B., Brady, M., Smith, S., 2001. Temporal autocorrelation

in univariate linear modeling of FMRI data. NeuroImage 14, 1370–1386.

Worsley, K., , Friston, K., 1995. Analysis of fMRI time series revisited - again.

NeuroImage 2, 173–181.

Worsley, K., Liao, C., Aston, J., Petre, V., Duncan, G., Morales, F., Evans,

A., 2002. A general statistical analysis for fMRI data. NeuroImage 15, 1–15.

Xiong, J., Gao, J.-H., J.L., L., Fox, P., 1996. Assessment and optimization of

functional MRI analyses. Hum. Brain Mapp. 4, 153–167.

Zarahn, E., Aguirre, G., D’Esposito, M., 1997. Empirical analyses of

BOLD fMRI statistics: I. Spatially unsmoothed data collected under null-

hypothesis conditions. NeuroImage 5, 179–197.

44

optimal spatial regularisation of autocorrelation estimates ......optimal spatial regularisation of...

Documents