fmri glm analysis 7/15/2014tuesday yingying wang
TRANSCRIPT
FMRI GLM Analysis7/15/2014 TuesdayYingying Wanghttp://neurobrain.net/2014STBCH/index.html
2
Overview
Realignment Smoothing
Normalization
General linear model
Statistical parametric map (SPM)Image time-series
Parameter estimates
Design matrix
Template
Kernel
Randomfield theory
p <0.05
Statisticalinference
GLM
3
Parametric statistics•one sample t-test• two sample t-test•paired t-test•Anova•AnCova•correlation• linear regression•multiple regression•F-tests•etc…
all cases of the
General Linear Model
4
The General Linear Model (GLM)
T-tests, correlations and Fourier analysis work for simple designs and were common in the early days of imaging.The General Linear Model (GLM) is now available in many software packages and tends to be the analysis of choice.
Why is the GLM so great?the GLM is an overarching tool that can do anything that the simpler tests doyou can examine any combination of contrasts (e.g., intact - scrambled, scrambled - baseline) with one GLM rather than multiple correlationsthe GLM allows much greater flexibility for combining data within subjects and between subjects
5
The General Linear Model (GLM)
it also makes it much easier to counterbalance orders and discard bad sections of data
the GLM allows you to model things that may account for variability in the data even though they aren’t interesting in and of themselves (e.g., head motion)
as we will see later in the course, the GLM also allows you to use more complex designs (e.g., factorial designs)
6
A mass-univariate approach
Time
7
8
General Linear Model
Equation for single voxel j:
yj = xj1 b1 + … xjL bL + ej ej ~ N(0,s2)
yj : data for scan, j = 1…J xjl : explanatory variables / covariates / regressors, l = 1…L
bl : parameters / regression slopes / fixed effects ej : residual errors, independent & identically distributed (“iid”)
(Gaussian, mean of zero and standard deviation of σ)
Equivalent matrix form:
y = Xb + e X : “design matrix” / model
9
GLM
10
The Design Matrix
11
Estimation of the parameters
= +𝜀𝛽
𝜀 𝑁 (0 ,𝜎 2 𝐼 )
��=(𝑋𝑇 𝑋 )−1 𝑋𝑇 𝑦
i.i.d. assumptions:
OLS estimates:
��1=3.9831
��2 −7={0.6871 ,1.9598 , 1.3902 , 166.1007 , 76.4770 ,− 64.8189 }
��8=131.0040
=
�� 2= ��𝑇 ��𝑁 −𝑝�� 𝑁 (𝛽 ,𝜎2(𝑋𝑇 𝑋 )−1 )
12
OLS (Ordinary Least Squares)
This is a scalar and the transpose of a scalar is a scalar
SOLUTION: OLS of the Parameters
The goal is to minimize the quadratic error between data and model
You find the extremum (max or min) of a function by taking its derivative and setting it to zero
13
14
What are the problems?
1. BOLD responses have a delayed and dispersed form.
2. The BOLD signal includes substantial amounts of low-frequency noise.
3. The data are serially correlated (temporally auto correlated) this violates the assumptions of the noise model in the GLM
Design Error
15
t
dtgftgf0
)()()(
The response of a linear time-invariant (LTI) system is the convolution of the input with the system's response to an impulse (delta function).
Problem 1: Shape of BOLD response
16
Solution: Convolution model of the BOLD responseexpected BOLD
response = input function impulse response function (HRF)
HRF
t
dtgftgf0
)()()(
blue = datagreen = predicted response, taking convolved with HRFred = predicted response, NOT taking into account the HRF
17
Problem 2: Low frequency noise
blue = datablack = mean + low-frequency driftgreen = predicted response, taking into account low-frequency driftred = predicted response, NOT taking into
account low-frequency drift
Linear model
18
discrete cosine transform (DCT) set
discrete cosine transform (DCT) set
Solution 2: High pass filtering
19
Problem 3: Serial correlations
non-identity non-independencet1 t2 t1
t2
i.i.d
n: number of scans
n
n
20
Problem 3: Serial correlations
n
n
auto covariancefunction
withwithttt aee 1 ),0(~ 2 Nt
1st order autoregressive process: AR(1)
n: number of scans
21
Solution 3: Pre-whitening
1. Use an enhanced noise model with multiple error covariance components, i.e. e ~ N(0,2V) instead of e ~ N(0,2I).
2. Use estimated serial correlation to specify filter matrix W for whitening the data.
WeWXWy
This is i.i.d
22
How do we define W ?• Enhanced noise model
• Remember linear transform for Gaussians
• Choose W such that error covariance becomes spherical
• Conclusion: W is a simple function of V so how do we estimate V ?
WeWXWy
),0(~ 2VNe
),(~
),,(~22
2
aaNy
axyNx
2/1
2
22 ),0(~
VW
IVW
VWNWe
23
Linear Model
effects estimate
error estimate
statistic
• GLM includes all known experimental effects and confounds• Convolution with a canonical HRF• High-pass filtering to account for low-frequency drifts• Estimation of multiple variance components (e.g. to account for
serial correlations)
24
Contrasts in the GLM
•We can examine whether a single predictor is significant (compared to the baseline)
•We can also examine whether a single predictor is significantly greater than another predictor
25
Implementation of GLM in SPM
26
Contrasts A contrast selects a specific effect of
interest.
A contrast is a vector of length .
is a linear combination of regression coefficients .
[1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
𝑐𝑇 �� 𝑁 (𝑐𝑇 𝛽 ,𝜎2 𝑐𝑇 (𝑋𝑇 𝑋 )−1 𝑐 )
[1 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
27
Hypothesis Testing - Introduction
Is the mean of a measurement different from zero?
Mean of severalmeasurements
Many experiments
𝑇=𝜇𝜎√𝑛
Ratio of effect vs. noise
t-statistic Null D
istribution of T
Null distribution
What distribution of T would we
get for m = 0?𝜎 𝜇=
𝜎1,2
√𝑛
0
m1, s1
Exp A m2, s2
Exp B
28
Hypothesis Testing Null Hypothesis H0
Typically what we want to disprove (no effect). The Alternative Hypothesis HA expresses outcome of interest.
To test an hypothesis, we construct “test statistics”.
Test Statistic T The test statistic summarises
evidence about H0. Typically, test statistic is small in
magnitude when the hypothesis H0 is true and large when false.
We need to know the distribution of T under the null hypothesis. Null Distribution of T
29
Hypothesis Testing
p-value: A p-value summarises evidence against H0. This is the chance of observing a value more
extreme than t under the null hypothesis.
Null Distribution of T
Significance level α: Acceptable false positive rate α. threshold uα
Threshold uα controls the false positive rate
t
p-value
Null Distribution of T
u
Conclusion about the hypothesis: We reject the null hypothesis in favour of the
alternative hypothesis if t > uα
)|( 0HuTp
𝑝 (𝑇>𝑡∨𝐻0 )
t
30
cT = 1 0 0 0 0 0 0 0
T =
contrast ofestimated
parameters
varianceestimate
effect of interest > 0 ?=
amplitude > 0 ?=
b1 = cT b > 0 ?
b1 b2 b3 b4 b5 ...
T-test - one dimensional contrasts – SPM{t}
Question:
Null hypothesis: H0: cTb=0 H0: cTb=0
Test statistic:
pNTT
T
T
T
tcXXc
c
c
cT
~ˆ
ˆ
)ˆvar(
ˆ
12
31
T-contrast in SPM
con_???? image
Tc
ResMS image
pN
T
ˆˆ
ˆ 2
spmT_???? image
SPM{t}
For a given contrast c:
yXXX TT 1)(ˆ beta_???? images
32
T-test: a simple example
Q: activation during listening ?
Null hypothesis: b1 = 0
Passive word listening versus rest
SPMresults: Threshold T = 3.2057 {p<0.001}voxel-level
p uncorrected
T
( Zº) Mm mm mm
13.94 Inf 0.000 -63 -27 15 12.04 Inf 0.000 -48 -33 12 11.82 Inf 0.000 -66 -21 6 13.72 Inf 0.000 57 -21 12 12.29 Inf 0.000 63 -12 -3 9.89 7.83 0.000 57 -39 6 7.39 6.36 0.000 36 -30 -15 6.84 5.99 0.000 51 0 48 6.36 5.65 0.000 -63 -54 -3
𝑡= 𝑐𝑇 ��
√var (𝑐𝑇 ��)
33
T-test: summary
T-test is a signal-to-noise measure (ratio of estimate to standard deviation of estimate).
T-contrasts are simple combinations of the betas; the T-statistic does not depend on the scaling of the regressors or the scaling of the contrast.
H0: 0Tc vs HA: 0Tc
Alternative hypothesis:
34
��=(𝑋𝑇 𝑋 )−1 𝑋𝑇 𝑦
Scaling issue
The T-statistic does not depend on the scaling of the regressors.
cXXc
c
c
cT
TT
T
T
T
12ˆ
ˆ
)ˆvar(
ˆ
[1 1 1 1 ]
Be careful of the interpretation of the contrasts themselves (eg, for a second level analysis):
sum ≠ average
The T-statistic does not depend on the scaling of the contrast.
/ 4
Tc
Sub
ject
1
[1 1 1 ]
Sub
ject
5
Contrast depends on scaling.Tc/ 3
35
F-test - the extra-sum-of-squares principleModel comparison:
Full model ?
X1 X0
or Reduced model?
X0 Test statistic: ratio of explained variability and unexplained variability (error)
1 = rank(X) – rank(X0)2 = N – rank(X)
RSS
2ˆ fullRSS0
2ˆreduced
RSS
RSSRSSF
0
21 ,~ FRSS
ESSF
Null Hypothesis H0: True model is X0 (reduced model)
36
F-test - multidimensional contrasts – SPM{F}
Test multiple linear hypotheses:
Full model ?
Null Hypothesis H0: b3 = b4 = b5 = b6 = b7 = b8 = 0
X1 (b3-8)X0 0 0 1 0 0 0 0 00 0 0 1 0 0 0 00 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 1
cT =
cTb = 0
Is any of b3-8 not equal 0?
37
F-contrast in SPM
ResMS image
pN
T
ˆˆ
ˆ 2
spmF_???? images
SPM{F}
ess_???? images
( RSS0 - RSS )
yXXX TT 1)(ˆ beta_???? images
38
F-test example: movement related effects
Design matrix
2 4 6 8
10
20
30
40
50
60
70
80
contrast(s)
Design matrix2 4 6 8
1020304050607080
contrast(s)
39
F-test: summary
F-tests can be viewed as testing for the additional variance explained by a larger model wrt a simpler (nested) model model comparison.
0000
0100
0010
0001
In testing uni-dimensional contrast with an F-test, for example b1 – b2, the result will be the same as testing b2 – b1. It will be exactly the square of the t-test, testing for both positive and negative effects.
F tests a weighted sum of squares of one or several combinations of the regression coefficients b.
In practice, we don’t have to explicitly separate X into [X1X2] thanks to multidimensional contrasts.
Hypotheses:
0 : Hypothesis Null 3210 H
0 oneleast at : Hypothesis eAlternativ kAH
40
Variability described by Variability described by
Orthogonal regressors
Variability in YTesting for Testing for
41
Correlated regressorsV
aria
bilit
y de
scrib
ed b
y V
ariability described by
Shared variance
Variability in Y
42
Correlated regressors
Var
iabi
lity
desc
ribed
by
Variability described by
Variability in Y
Testing for
43
Correlated regressors
Var
iabi
lity
desc
ribed
by
Variability described by
Variability in Y
Testing for
44
Correlated regressors
Var
iabi
lity
desc
ribed
by
Variability described by
Variability in Y
45
Correlated regressors
Var
iabi
lity
desc
ribed
by
Variability described by
Variability in Y
Testing for
46
Correlated regressors
Var
iabi
lity
desc
ribed
by
Variability described by
Variability in Y
Testing for
47
Correlated regressors
Var
iabi
lity
desc
ribed
by
Variability described by
Variability in Y
Testing for and/or
48
Design orthogonality
For each pair of columns of the design matrix, the orthogonality matrix depicts the magnitude of the cosine of the angle between them, with the range 0 to 1 mapped from white to black.
If both vectors have zero mean then the cosine of the angle between the vectors is the same as the correlation between the two variates.
49
Correlated regressors: summary• We implicitly test for an additional effect only. When
testing for the first regressor, we are effectively removing the part of the signal that can be accounted for by the second regressor: implicit orthogonalisation.
• Orthogonalisation = decorrelation. Parameters and test on the non modified regressor change.Rarely solves the problem as it requires assumptions about which regressor to uniquely attribute the common variance. change regressors (i.e. design) instead, e.g. factorial designs. use F-tests to assess overall significance.
• Original regressors may not matter: it’s the contrast you are testing which should be as decorrelated as possible from the rest of the design matrix
x1
x2
x1
x2
x1
x2x^
x^
2
1
2x^ = x2 – x1.x2 x1
50
Design efficiency
1122 ))(ˆ(),,ˆ( cXXcXce TT
)ˆvar(
ˆ
T
T
c
cT The aim is to minimize the standard error of a t-contrast
(i.e. the denominator of a t-statistic).
cXXcc TTT 12 )(ˆ)ˆvar( This is equivalent to maximizing the efficiency e:
Noise variance Design variance
If we assume that the noise variance is independent of the specific design:
11 ))((),( cXXcXce TT
This is a relative measure: all we can really say is that one design is more efficient than another (for a given contrast).
51
Design efficiencyA B
A+B
A-B
𝑋𝑇 𝑋=( 1 − 0.9− 0.9 1 )
High correlation between regressors leads to low sensitivity to each regressor alone.
We can still estimate efficiently the difference between them.
52
Types of error Reality
H1
H0
H0 H1
True negative (TN)
True positive (TP)
False positive (FP)
False negative (FN)
specificity: 1- = TN / (TN + FP)= proportion of actual negatives which are correctly identified
sensitivity (power): 1- = TP / (TP + FN)= proportion of actual positives which are correctly identified
h
h
hh
Decision
53
Correction for Multiple Comparisons
•With conventional probability levels (e.g., p < .05) and a huge number of comparisons (e.g., 64 x 64 x 12 = 49,152), a lot of voxels will be significant purely by chance
•e.g., .05 * 49,152 = 2458 voxels significant due to chance
•How can we avoid this? need to do correction for multiple comparisons
54
(1) Bonferroni correction• Divide desired p value by number of comparisons• Example:• desired p value: p < .05• number of voxels: 50,000• required p value: p < .05 / 50,000 p < .000001
• quite conservative• can use less stringent values• e.g., Brain Voyager can use the number of voxels in the
cortical surface• the luxury of a localizer task is that it allows you to
correct for only those voxels you are interested in (i.e., the voxels that fall within your ROI)
Source: Jody Culham’s web slides
55
(2) Cluster correction
•falsely activated voxels should be randomly dispersed
•set minimum cluster size to be large enough to make it unlikely that a cluster of that size would occur by chance
•assumes that data from adjacent voxels are uncorrelated (not true)
56
(3) Test-retest reliability
•Perform statistical tests on each half of the data
•The probability of a given voxel appearing in both purely by chance is the square of the p value used in each half
• e.g., .001 x .001 = .000001 •Alternatively, use the first half to select an
ROI and evaluate your hypothesis in the second half.
57
(4) Consistency between subjects•Though seldom discussed, consistency
between subjects provides further encouragement that activation is not spurious. Present group and individual data.
(5) Screen saver scan
•You can test how many voxels really do light up when really nothing is happening.
58
ROI Voxel Field‘volume’ resolution*
volume independence
FWE FDR
*voxels/volume
Height Extent
ROI Voxel Field
Height Extent
There is a multiple testing problem (‘voxel’ or ‘blob’ perspective).More corrections needed as ..
Detect an effect of unknown extent & location
Volume , Independence
59
Book – must readStatistical Parametric Mapping: The Analysis of Functional Brain Images. Elsevier, 2007.
60
Home Work
•http://www.fil.ion.ucl.ac.uk/spm/data/auditory/
•http://www.fil.ion.ucl.ac.uk/spm/doc/spm8_manual.pdf#Chap:data:auditory
•Walk through SPM manual chapter 28•Sample data can be found in the general
folder•If you have any question, ask me or
Danielle.•Bonus for those who finish the homework
by next training session.