detection of conversion from mild cognitive impairment to ... · trajectory of the brain anatomy...

ORIGINAL RESEARCHpublished: 24 February 2017

doi: 10.3389/fninf.2017.00016

Frontiers in Neuroinformatics | www.frontiersin.org 1 February 2017 | Volume 11 | Article 16

Edited by:

Pedro Antonio Valdes-Sosa,

Joint China Cuba Lab for Frontiers

Reaearch in Translational

Neurotechnology, Cuba

Reviewed by:

Hewei Cheng,

Chongqing University of Posts and

Telecommunications, China

Felix Carbonell,

Biospective Inc., Canada

*Correspondence:

Marius Staring

[email protected]

†Data used in preparation of this

article were obtained from the

Alzheimer’s Disease Neuroimaging

Initiative (ADNI) database

(adni.loni.usc.edu). As such, the

investigators within the ADNI

contributed to the design and

implementation of ADNI and/or

provided data but did not participate

in analysis or writing of this report. A

complete listing of ADNI investigators

can be found at: http://adni.loni.usc.

edu/wp-content/uploads/

how_to_apply/

ADNI_Acknowledgement_List.pdf

Received: 22 September 2016

Accepted: 08 February 2017

Published: 24 February 2017

Citation:

Sun Z, van de Giessen M, Lelieveldt

BPF, Staring M (2017) Detection of

Conversion from Mild Cognitive

Impairment to Alzheimer’s Disease

Using Longitudinal Brain MRI.

Front. Neuroinform. 11:16.

doi: 10.3389/fninf.2017.00016

Detection of Conversion from MildCognitive Impairment to Alzheimer’sDisease Using Longitudinal BrainMRIZhuo Sun 1, Martijn van de Giessen 1, Boudewijn P. F. Lelieveldt 1, 2 and Marius Staring1*

for the Alzheimer’s Disease NeuroImaging Initiative †

1Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, Netherlands,2Department of Intelligent Systems, Delft University of Technology, Delft, Netherlands

Mild Cognitive Impairment (MCI) is an intermediate stage between healthy and

Alzheimer’s disease (AD). To enable early intervention it is important to identify the MCI

subjects that will convert to AD in an early stage. In this paper, we provide a new

method to distinguish between MCI patients that either convert to Alzheimer’s Disease

(MCIc) or remain stable (MCIs), using only longitudinal T1-weighted MRI. Currently, most

longitudinal studies focus on volumetric comparison of a few anatomical structures,

thereby ignoring more detailed development inside and outside those structures. In this

study we propose to exploit the anatomical development within the entire brain, as

found by a non-rigid registration approach. Specifically, this anatomical development is

represented by the Stationary Velocity Field (SVF) from registration between the baseline

and follow-up images. To make the SVFs comparable among subjects, we use the

parallel transport method to align them in a common space. The normalized SVF together

with derived features are then used to distinguish between MCIc and MCIs subjects. This

novel feature space is reduced using a Kernel Principal Component Analysis method,

and a linear support vector machine is used as a classifier. Extensive comparative

experiments are performed to inspect the influence of several aspects of our method on

classification performance, specifically the feature choice, the smoothing parameter in

the registration and the use of dimensionality reduction. The optimal result from a 10-fold

cross-validation using 36 month follow-up data shows competitive results: accuracy

92%, sensitivity 95%, specificity 90%, and AUC 94%. Based on the same dataset, the

proposed approach outperforms two alternative ones that either depends on the baseline

image only, or uses longitudinal information from larger brain areas. Good results were

also obtained when scans at 6, 12, or 24 months were used for training the classifier.

Besides the classification power, the proposed method can quantitatively compare brain

regions that have a significant difference in development between the MCIc and MCIs

groups.

Keywords: Alzheimer’s disease, Mild Cognitive Impairment (MCI), conversion, MRI, Stationary Velocity Field (SVF),

non-rigid Registration, parallel Transport, SVM classification

http://www.frontiersin.org/Neuroinformatics

http://www.frontiersin.org/Neuroinformatics/editorialboard




https://doi.org/10.3389/fninf.2017.00016

http://crossmark.crossref.org/dialog/?doi=10.3389/fninf.2017.00016&domain=pdf&date_stamp=2017-02-24


http://www.frontiersin.org

http://www.frontiersin.org/Neuroinformatics/archive

https://creativecommons.org/licenses/by/4.0/

mailto:[email protected]

http://adni.loni.usc.edu

http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf




https://doi.org/10.3389/fninf.2017.00016

http://journal.frontiersin.org/article/10.3389/fninf.2017.00016/abstract

http://loop.frontiersin.org/people/379607/overview



Sun et al. Alzheimer’s Disease Conversion Detection

1. INTRODUCTION

Alzheimer’s disease (AD), one of the most common cases ofdementia, is an age related degenerative brain disease. It isusually diagnosed in people over 65 years old (Alzheimer’sAssociation, 2014). Brookmeyer et al. (2007) pointed out thatthe current number of AD patients worldwide will increasefourfold by the year 2050, from 26.6 million to above 100million. Early detection and treatment is necessary to slow downthe disease progress and decrease the societal cost of AD. Ifearly treatment could delay the disease onset or slow downthe disease progression by 1 year, this will yield almost ninemillion fewer AD patients in the world by 2050 (Brookmeyeret al., 2007). Different kinds of measurements and biomarkershave been used in early detection and prediction of AD, e.g.,structural brain MRI (Frisoni et al., 2010), metabolic brainalterations measured by fluorodeoxyglucose positron emissiontomography (FDG-PET) (De Santi et al., 2001), and pathologicalamyloid depositions measured from cerebrospinal fluid (CSF)(Leon et al., 2007; Mattsson et al., 2009). Among all thesemeasurements, Magnetic Resonance Imaging (MRI) plays anincreasingly important role in early detection of Alzheimer’sdisease because of its non-invasiveness, availability, and highsensitivity to change (Frisoni et al., 2010). Therefore, it iscommonly used as part of the clinical assessment for thediagnosis of AD.

One of the key questions in Alzheimer’s disease research is tounderstand the disease progression over a long period of time,where it is desirable to find the trend before manifestation ofclinical symptoms. It is known that Mild Cognitive Impairment(MCI) is an intermediate stage between healthy and diseased,and possibly predicts the onset of Alzheimer’s disease. However,not all MCI subjects develop AD. Based on whether the MCIpatient will convert to AD during a study, the MCI group istypically divided into a stable group (MCIs) and a convertorgroup (MCIc). A meta-analysis (Mitchell and Shiri-Feshki, 2008)on 15 studies showed that the total number of patients who hadprogressed to dementia in studies lasting less than 5 years was27.4%, while the total number of patients who had progressedto dementia by the end of the studies lasting up to 10 yearswas 31.4%. The conversion usually happens within the first 3years after being diagnosed as MCI and the conversion rate dropsdramatically in later years. Such a finding indicates that the MCIsgroup not simply takes longer to convert. Therefore, to early andsensitively detect AD, it is important to distinguish between casesthat convert fromMCI to AD and the cases that remain stable.

In the last 10 years, many cross-sectional structural MRI-based methods have been proposed to automatically distinguishbetween healthy controls (HC) and AD patients (Fan et al.,2005, 2007, 2008b; Davatzikos et al., 2008; Klöppel et al., 2008;Cuingnet et al., 2013), and some of them report a classificationaccuracy as high as 90%. Although such methods have diagnosedAlzheimer’s disease, it is still too late for treatment, since mostdrugs approved by the U.S. Food and Drug Administration(FDA) are more likely to have a significant impact in the earlystages of the disease (Crismon, 1994; Schneider et al., 2006;Kozauer and Katz, 2013).

Davatzikos et al. (2009, 2011) extended cross-sectional image-based classification to distinguish between MCIc and MCIssubjects and directly use healthy and AD brain images to traina classifier or regressor, and then use an AD-likeness score todistinguish between MCIc and MCIs (Davatzikos et al., 2009,2011). These methods can minimize the empirical error thatseparates AD and HC subjects, but do not model the dynamicinformation of brain changes from healthy to AD. Therefore,such methods may fail to distinguish between MCIc and MCIssubjects, since disease progression over time may be moreindicative than a static assessment at a fixed point in time.

Recently, methods were developed that learn a task-specificclassifier to detect MCI convertors using cross-sectional datasets,which contain only one scan for each subject belonging to eitherthe MCIc or the MCIs group (Chupin et al., 2009; Misra et al.,2009; Querbes et al., 2009; Cuingnet et al., 2011; Koikkalainenet al., 2011; Westman et al., 2011; Wolz et al., 2011; Cho et al.,2012; Eskildsen et al., 2013; Guerrero et al., 2014). Among thesemethods, different kinds of features are extracted from the imageto train a discriminative model. Popular features include thehippocampus volume (Chupin et al., 2009), cortical thickness(Querbes et al., 2009; Westman et al., 2011; Wolz et al., 2011;Cho et al., 2012; Eskildsen et al., 2013), voxel-basedmorphometry(VBM) (Misra et al., 2009; Davatzikos et al., 2011) and tensor-based morphometry (TBM) (Koikkalainen et al., 2011; Wolzet al., 2011; Eskildsen et al., 2013). These methods also donot to take the dynamic information of brain changes intoconsideration. A second issue is the time at which the scan istaken, which is somewhat random and frequently in a later stageof the disease.

Since anatomical development in the brain is due to bothnormal aging as well as Alzheimer’s disease progression (Lorenziet al., 2012), and since there is individual variation in brainanatomy, it is hard to distinguish between MCIc and MCIsfrom a single scan. Also, AD can be treated as an acceleratedaging process (Davatzikos et al., 2011), i.e., subjects with littleAD-related and more age-related development may have similaranatomy as AD patients. Older subject are more similar to AD,and the subject’s age at the baseline scan may introduce bias inthe decision.

To avoid such scan time bias, some methods have started toinclude longitudinal image information. Zhang and Shen (2012)and Liu et al. (2013) use longitudinal structure-wise volumetricmeasurements as features for a sparse multi-task classifier.However, these methods summarize the anatomical developmentof the whole brain into a few scalar measurements and losedetailed anatomical information. Also, they do not model howthe brain anatomy develops over time for the MCIc andMCIs groups. Similarly, Lorenzi et al. (2015a) summarizes thelongitudinal change of each structure as a regional flux computedfrom the longitudinal deformation between the baseline andfollow-up images. Fiot et al. (2012, 2014) directly use thelongitudinal deformation of the hippocampus to distinguishbetween MCIs and MCIc, but without considering other brainstructures.

In this study, we propose a new method that encodesthe subject’s longitudinal information by pairwise non-rigid






registration between its baseline and follow-up images, andaligns this longitudinal information into a common space forclassification. Our method can both distinguish between MCIcand MCIs subjects, as well as compute the mean developmentaltrajectory of the brain anatomy over time, for each group. Inour approach, the development of the brain is represented bythe anatomical correspondence between the baseline and follow-up scans from the same subject. In order to make the anatomyof different subjects comparable, we use the Schild’s Laddermethod (Lorenzi and Pennec, 2013) to transport the anatomicalcorrespondence to a common template. A Support VectorMachine (SVM) classifier is learned to separateMCIc fromMCIs.Our method can additionally be used for groupwise analysis, andcompute the significant regions that develop differently betweenthe MCIc and the MCIs groups. To investigate the influence ofdifferent parameters on the final result, we compare the effectof parallel transport, template selection, registration parametersand follow-up time on the final classification result. Our methodthus features dense as well as longitudinal information, andcan be used for classification as well as in studying groupwisedifferences.

The remainder of this paper is structured as follows: wefirst describe the dataset selection, image acquisition andpreprocessing in Section 2. In Section 3, we present the proposedmethod in detail. The experiments and results are shown inSection 4. We discuss the results and compare the proposedmethod with state-of-the-art methods in Section 5. Finally, theconclusions are given in Section 6.

2. MATERIALS

2.1. SubjectsAll data used in this paper was obtained from the Alzheimer’sDisease Neuroimaging Initiative (ADNI) (http://adni.loni.usc.edu). The ADNI was launched in 2003 by the NationalInstitute on Aging (NIA), the National Institute of BiomedicalImaging and Bioengineering (NIBIB), the Food and DrugAdministration (FDA), private pharmaceutical companies andnon-profit organizations, as a $60 million, 5-year public-privatepartnership. The primary goal of ADNI has been to test whetherserial MRI, PET, other biological markers, and clinical andneuropsychological assessment can be combined to measure theprogression of MCI and early AD. Determination of sensitiveand specific markers of very early AD progression is intendedto aid researchers and clinicians to develop new treatments andmonitor their effectiveness, as well as lessen the time and cost ofclinical trials.

The principal investigator of this initiative is Michael W.Weiner, MD, VA Medical Center and University of California,San Francisco. ADNI is the result of efforts of many co-investigators from a broad range of academic institutions andprivate corporations, and subjects have been recruited from over50 sites across the U.S. and Canada. The initial goal of ADNI wasto recruit 800 subjects but ADNI has been followed by ADNI-GOand ADNI-2.

We select and define MCI subjects according to the labelthat is available for each subject for most of the visits: MCI,

MCI to dementia, or dementia. According to the ADNIMergefile (https://adni.loni.usc.edu/wp-content/uploads/2012/08/instruction-ADNIMERGE-packages.pdf)1, there are 837subjects available that have either the label “MCI” or the label“MCI to dementia” at baseline. From those subjects we selectonly those that have a baseline scan, together with follow-upscans at 6, 12, 24, and 36 months. We exclude subjects that havea non-monotone diagnosis, for example going from MCI todementia to MCI again, resulting in 143 remaining subjects. Wefurther exclude subjects with only “MCI” or “MCI to dementia”labels where the last visit has no diagnosis available, as thesesubjects are considered unclear to belong to either the MCIsor the MCIc group. This finally results in 110 remaining MCIsubjects. A subject is defined as stable if no visit is labeled asdementia, and as a converter if the label dementia is present. Wethen have 43 MCIs and 67 MCIc subjects.

The demographic characteristics of the selected studypopulation are summarized inTable 1. The diagnosis of each visitis summarized in Table 2. We can see that some MCIc subjectsconvert to AD later than 36 months. Although the maximumconsidered time interval in our study is 36 months, we still treatsuch subjects as MCIc. We can also see that no subject convertedto AD in the first 6 months after the baseline visit.

2.2. Acquisition and PreprocessingAll selected scans are T1-weighted 1.5T MR images acquiredwith machines from Philips, Siemens or GE. Acquisitionswere made in different clinical centers across North Americaaccording to the ADNI acquisition protocol (http://adni.loni.usc.edu/methods/mri-analysis/mri-acquisition/). To enhance thestandardization among scans acquired from different clinicalsites and platforms, pre-processing and post-precessing ismade to correct certain image artifacts (Jack et al., 2008). Inour study, all the images downloaded from ADNI are fromthe MP-RAGE sequence and pre-processed by ADNI. In theADNI pre-processing and image correction pipeline, imagesfrom the Philips machine are intensity corrected by the N3method (Sled et al., 1998), while images from Siemens or GEmachines are grad-warped, followed by B1 bias field correctionand N3 intensity non-uniformity correction. All images werepreprocessed through the same fully automatic pipeline, asdescribed by Coupé et al. (2015). Inside this pipeline, theinhomogeneities are corrected by N3, the brain is extractedby BEaST (Eskildsen et al., 2012), and the intensity is linearlynormalized to the MNI template intensity. After preprocessingeach image independently using these identical steps, all follow-up images are registered to its baseline image using a similaritytransform (rigid plus isotropic scaling) with elastix (Kleinet al., 2010).

3. METHODS

In this section, we present the proposed method and brieflydescribe how to perform a group analysis on the MCIc and MCIs

1The latest version of this file was updated in 2012, so it only contains part of the

data of the current ADNI dateset.




https://adni.loni.usc.edu/wp- content/uploads/2012/08/instruction-ADNIMERGE- packages.pdf

https://adni.loni.usc.edu/wp- content/uploads/2012/08/instruction-ADNIMERGE- packages.pdf

http://adni.loni.usc.edu/methods/mri-analysis/mri-acquisition/

http://adni.loni.usc.edu/methods/mri-analysis/mri-acquisition/





TABLE 1 | Demographic characteristics of the studied longitudinal dataset (from ADNI).

Group Number Baseline age Gender Baseline MMSEa Follow-up MMSE

MCIc 67 74.4± 6.9 [55.1− 87.7] 43M / 24F 26.6± 1.7 [24− 30] 22.4± 4.5 [10− 30]

MCIs 43 75.5± 6.8 [57.8− 86.4] 35M / 8F 27.7± 1.9 [24− 30] 27.9± 2.1 [22− 30]

aMMSE means Mini-Mental State Examination.

TABLE 2 | Diagnosis distribution at each visit and MMSE of each group at each visit (in term of months).

Baseline 6 12 24 36 Last diagnosed visit

MCI 110 104 91 63 56 43

MCI to dementia 0 6 13 13 8 0

Dementia 0 0 6 34 46 67

MCIs MMSE 27.7± 1.9 27.9± 2.1 28.2± 1.8 27.7± 2.3 28.0± 2.1

MCIc MMSE 26.6± 1.7 25.4± 2.4 25.6± 2.4 23.6± 4.0 22.4± 4.5

groups. The proposed pipeline is summarized in Figure 1. First,see Section 3.1, we describe how to represent the anatomicaldevelopment between the baseline and follow-up images usingLogDemons non-rigid registration (Vercauteren et al., 2009), andhow to normalize the anatomical development in a commontemplate space by building a Schild’s Ladder on the imagemanifold. Second, see Section 3.2, we describe the features thatdescribe the anatomical development, a dimensionality reductionstep that uses kernel principle component analysis (KPCA),and the Support Vector Machine (SVM) classification method.Finally, group-wise analysis between the MCIc and the MCIsgroup is detailed in Section 3.3.

3.1. Normalization of AnatomicalDevelopment3.1.1. Symmetric LogDemons RegistrationIn this work, the symmetric LogDemons method (Vercauterenet al., 2009) is used to compute non-rigid diffeomorphictransformations between baseline and follow-up scans toestimate brain development, and also between different subjectsfor normalization by parallel transport. Originating from theDemons method (Thirion, 1998), symmetric LogDemons usesthe stationary velocity field (SVF) v to parameterize thediffeomorphic deformation ϕ by the exponential map ϕ =

Exp(v), which is used to align the moving image F with the fixedimageM. The following cost function is optimized:

v∗ = argminv

‖(F −M ◦ Exp(v))‖2

+ ‖(M − F ◦ Exp(−v))‖2 + Reg(v; σ ),(1)

where the first two terms measure the dissimilarity of F andM after forward and backward deformation, and the third(Reg(v; σ )) regularizes the SVF by smoothing it with a Gaussiankernel with standard deviation σ . The resulting stationaryvelocity v∗ is the optimal solution that minimizes the above costfunction. According to Cachier et al. (2003), this problem can bedecomposed into an alternativeminimization problem, for whicha more efficient computation is proposed in (Vercauteren et al.,

2009). As proven in Lorenzi and Pennec (2013), the deformationtrajectory can be represented as a one-parameter geodesic path,controlled by a virtual time parameter t:

ϕt(v) = Exp(v, t) = Exp(v× t). (2)

For t = 1 we obtain the deformation between fixed and movingimage.

3.1.2. Parallel Transport by Schild’s LadderIt is not possible to directly compare SVFs from different subjects,since they have different coordinate systems. It is thereforenecessary to normalize the SVFs to a common template space,for which we employ the parallel transport method (do Carmo,1992). In our approach, we use the Schild’s Ladder paralleltransport method (Lorenzi and Pennec, 2013) to deform thesubject space follow-up image I1 to its corresponding imageT1 in the template space, and compute the normalized SVFby symmetric LogDemons registration from T0 to T1. TheSchild’s Ladder method is summarized by the following steps andillustrated in Figure 2.

1. Compute the diffeomorphic deformation ϕT0→I1 to deformthe template image T0 to the follow-up image I1 in the subjectspace, and the corresponding SVF vT0→I1 .

2. Define the half-space image P = T0 ◦ ϕT0→I10.5 .

3. Compute the diffeomorphic deformation ϕI0→P that deformsthe subject space baseline image I0 to image P.

4. Define the template space follow-up image T1 as the double

deformed subject baseline image as T1 = I0 ◦ ϕI0→P2 .

5. Compute the SVF v that deforms T0 to T1, which transportsthe subject’s anatomical correspondence into the templatespace.

Lorenzi and Pennec (2013) proposed an alternativeimplementation of Schild’s Ladder using the Baker-Campbell-Hausdorf (BCH) approximation to work directly on the velocityfield. In this implementation, different subjects tend to havedifferent SVF amplitude, need a different number of BCHapproximation steps and possibly use multiple ladders. In this






FIGURE 1 | Illustration of the proposed pipeline for longitudinal brain development analysis.

work, We use a computationally attractive approach that omitsthe BCH approximation and requires only a single ladder pernormalization. However, the method proposed by Lorenzi andPennec (2013) could also be used in the proposed framework.In our approach, the Schild’s Ladder is applied to each subject’sbaseline and follow-up image pair independently.

3.1.3. MCI Template SelectionIn our study, a template image is needed for the transportationof the SVFs. The template needs to be selected in an adaptiveand unbiased way to fit the study population and minimize theregistration error. In our approach, we use the multidimensionalscaling method to select the least biased template from thebaseline images of our study population, according to Parket al. (2005), as shown in Figure 3. The selected image is anapproximation of the group mean image in the given imagemanifold.

To learn the mean of a population, we first compute thepairwise geodesic distances between two images i and j as di,j =|vi,j|

22 =

∑q∈I

〈vi,j(q), vi,j(q)〉, where vi,j is the SVF that aligns

these two images and 〈·, ·〉 denotes the inner product. Themultidimensional scaling (MDS) method is used to map eachhigh-dimensional geodesic distance on the image manifold into

a 2D Euclidean space. The optimal MCI template image is thenselected as the nearest neighbor of the arithmetic average in the2D space, shown as the green dot in Figure 3. Compared toother commonly used unbiased template construction methods(Joshi et al., 2004; Lorenzen et al., 2005), this method is easyto implement and very suitable for parallel computing. Toinvestigate the influence of a pre-defined general populationtemplate and a study-specific template, in the experiments wecompare the MDS-based template with the commonly used MNItemplate.

3.2. ClassificationIn our study, we use the linear support vector machine (SVM) asthe classificationmethod. Given the training set {fi, yi}, where fi isthe feature vector and yi is the label of subject i (MCIc or MCIs),we compute the optimal linear SVMmodel {w, b} by minimizingthe cost function:

{w, b} = argminw,b

‖w‖22 + c

N∑

i= 1

max (0, 1− yi(〈w, fi〉 + b)), (3)

where {w, b} are the learned parameters of the linear model. Thefeature(s) used in the SVM could be the normalized SVF itself






FIGURE 2 | Using Schild’s Ladder to transport the SVF from subject space to a common reference space.

FIGURE 3 | Illustration of MDS-based template selection. Step 1: Compute a matrix containing the pairwise distances between globally aligned baseline images;

2: Map images into a low dimensional space (shown as blue circles) using MDS; 3: Find the image (shown as the green dot) closest to the mean point (shown as red

star) in the low dimensional map.

(shown as the green arrow in Figure 2), or features derived fromit, or a combination of them. In this paper, we only use the SVF ofa single time interval, i.e., from baseline to one of the follow-upimages. We consider the following additional features:

• Jacobian Determinant (JD): defines the local volume ratiobefore and after registration. If it is larger than 1, the localvolume increases and vice versa.

• Divergence (Div): locally defines how much volume flows in(Div > 0) or out (Div < 0).

• Geodesic Length (GL): defines the path length that a particletravels in the deformation trajectory.

• Deformation (Def): the deformation field generated from thenormalized SVF using the exponential map φ = Exp(v).

• Combination of features: concatenate all these features into alonger feature vector.

Note that instead of the normalized SVF (green arrow inFigure 2) we could have also used the standard SVF that does notemploy Schild’s ladder (red arrow in Figure 2) as a distinguishingfeature. We will compare the two in the experiments.

For our particular application the feature dimensionality ofthe original feature gi ∈ R

D described above for each subjecti is much larger than the number of training samples N, i.e.,






the number of subjects. To reduce the effect of overfitting onthe classifier, we therefore employ dimensionality reduction onthe original feature vector gi ∈ R

D by mapping it to a spaceof reduced dimension thereby obtaining a new feature vectorfi ∈ R

d, d ≪ D. For each subject i, the original feature gi ∈ RD

is first z-score normalized, after which it is downsampled with afactor of 4 in each direction using nearest neighbor interpolation,and only voxels within the brain mask are considered. This yieldsfeatures hi ∈ R

D2 . Since all features are based on the normalizedSVF, they are aligned to the template space, and there is thereforeno need to normalize them again as for example used in VBM.Secondly, we use a dimensionality reduction method based onPrincipal Component Analysis (PCA) for its simplicity, therebymapping the downsampled feature hi to a low dimensionalspace fi ∈ R

d, d ≪ D2. As the standard PCA approach iscomputationally unattractive for large feature sizes, we use theKernel PCA (KPCA) approach (Schölkopf et al., 1997). ForKPCA the kernel K(i, j) represents the pairwise relation betweensubjects i and j in the high dimensional space R

D2 . For the linearcase, the kernel is chosen as the inner product between twofeature vectors hi and hj: K(i, j) = 〈hi, hj〉. For the Gaussian case,

K(i, j) = exp(−(hj − hi)

2

2θ2), where θ is the standard deviation of the

Gaussian kernel. A standard PCA is then applied to the kernel Kto obtain a low-dimensional representation fi, by projecting thekernel representation K(i, :) = {K(i, 1), · · · ,K(i,N)} ∈ R

N to thecomputed eigen vectors. This low-dimensional representation fiis then used in the SVM, see Equation (3). We test the with linearas well as the Gaussian kernel in the experiments below.

3.3. Group AnalysisGiven a set of normalized SVFs from two clinical groups, wecan find the regions that differ in anatomical development,by computing the voxelwise two-sample Hotelling’s T-squaretest. To compute the Hotelling’s T-square test, we treat SVFsfrom the two different groups (MCIc and MCIs) as two vectordistributions. For each voxel position, we have two matrices ofsize n1 × 3 and n2 × 3, where n1 and n2 are the number ofsubjects in the MCIc and MCIs group, respectively. Each rowin the matrix stores the SVF at this position, which is also oneelement in the vector distribution.

4. EXPERIMENTS AND RESULTS

In this section, we perform several experiments to investigatethe influence of different aspects of our method on theclassification results and compare the MCIc and MCIs groups.First, we experimentally validate different SVF smoothingstrengths σ for the LogDemons registration. Second, we comparethe performance of the different features, with and withoutdimensionality reduction. Third, we assess the effect of paralleltransport on the classification. Fourth, the proposed MDS-basedtemplate is compared with the well-known MNI template, andthe result shows that a study-specific template performs muchbetter than a pre-defined general population template. Except forthis experiment, all experiments use theMDS-based template. Toprove the importance of both longitudinal and dense information

as feature characteristics, we compare our results with twoalternative type of features: the gray matter density map atbaseline from (Klöppel et al., 2008) and the regional flux featureproposed by Lorenzi et al. (2015a). In our re-implementationthese two methods are not exactly the same as the originalones, we only use the proposed features. The same linear SVMclassification method and the same dataset are used. We alsoinvestigate the influence of follow-up time by using differentfollow-up times (6, 12, 24, and 36 months). We compute astatistical distance map between the MCIc and MCIs group. Thismap highlights the regions that develop differently between theMCIc and the MCIs groups.

In the experiment’s default setting, for each subject, the follow-up image is first rigidly registered to the baseline image, and thenthe subject space is globally aligned to the template space usingaffine registration, both using elastix (Klein et al., 2010). Forthe LogDemons registration method, we use a multi-resolutionapproach with the default number of iterations (15 iterations inthe first resolution, 10 iterations in the second resolution and5 iterations in the highest resolution) and default second orderBCH approximation (Hernandez et al., 2009; Vercauteren et al.,2009). For KPCA using the Gaussian kernel we use a standarddeviation equal to 1.

The liblinear toolbox (Fan et al., 2008a) is used for linear SVMclassification. The parameter c used in the SVM classification ischosen by a grid search over the discrete set {10−3, 10−2, 10−1,0.5, 1, 2, 5, 10, 50, 102, 103}.

We use 10-fold cross-validation to test the performance ofthe proposed classification method, and use all the images forgroup analysis. In our classification experiment, we use accuracy(ACC), sensitivity (SEN), specificity (SPE), F1 score and areaunder the ROC curve (AUC) to measure the performance of theclassification.

4.1. Smoothing Parameter in LogDemonsSince the LogDemons registration quality is influenced by thesmoothing parameter σ , we inspect the influence of σ on thefinal classification accuracy using σ ∈ {0, 1.5 (default), 3, 6}. Foreach σ , we recompute Schild’s Ladder, and train the linear SVMclassifier on the SVF feature with linear KPCA dimensionalityreduction. The result of the 10-fold cross-validation withdifferent σ is shown in Figure 4.

We can see that when σ = 0, both the ACC and AUC is lowerthan the other settings of σ . As σ increases from 1.5 to 6, bothACC and AUC decrease, but are still better than σ = 0. Thesetting σ = 1.5 gives the best classification performance, and isused in further experiments.

In this experiment, we perform classification based on theSVF feature itself, the derived features (JD, GL, Div, Def) andthe combination of all the features. For each feature, we testKPCA with the linear as well as the Gaussian kernel. Theresulting ROC curves are shown in Figure 5. We can see that foreach single feature, in general the linear kernel performs betterthan the Gaussian kernel, and also is better than the originalfeature without KPCA. For the combination of all features, theGaussian kernel performs substantially better than the linearkernel and the original feature without KPCA. While for single






FIGURE 4 | Classification result of the proposed method using the SVF

feature with varying SVF smoothing parameter σ . KPCA with a linear

kernel is used, employing the MDS-based template.

features the independence assumption of a linear kernel maystill approximately hold, for the combination of related features(all derived from the SVF) this does not hold anymore. In thatcase the Gaussian kernel may be more appropriate, and indeedperforms better. It can also be observed that the SVF and thedeformation are the best performing single features, and that thecombination of all features, using the Gaussian kernel, performsbest overall.

4.2. With and without Parallel TransportAs discussed before, parallel transport is a necessary step to alignthe subject’s features into a common space. Here, we comparethe SVF feature with and without normalization by paralleltransport, shown as the green and red arrows in Figure 2. Weshow the effect of normalization both on the original features, aswell as on the dimensionality reduced version by KPCA using alinear kernel (this was the best kernel for SVF as a single feature,see Figure 5). The resulting ROC curves are shown in Figure 6.From this figure it is clear that the normalization by paralleltransport is a necessary step for both theoretical and performancebenefits.

4.3. Influence of Template SpaceAlthough the template space selection is not the main goal of thiswork, we briefly compare the template image selected by theMDSmethod, see Section 3.1.3, with the MNI152 template (Fonovet al., 2009). For both templates, we use the same LogDemonsregistration parameters (σ = 1.5), and perform classificationusing the combination of all features and the Gaussian kernel fordimensionality reduction. The resulting ROC curve is shown inFigure 7.

4.4. Comparison with Alternative MethodsTo compare with other methods fairly, we use the same linearSVM classification on the same dataset. For this experiment, were-implement the dense gray matter density feature computed

from the baseline image (Klöppel et al., 2008) and the sparseregional flux feature (Lorenzi et al., 2015b) as alternativemethods. Our re-implementation is slightly different fromthe original paper. For the gray matter density feature, wecomputed the voxel-based morphometry (VBM) feature usingthe SPM8 toolbox (http://www.fil.ion.ucl.ac.uk/spm/software/spm8/), while in Klöppel et al. (2008) the authors use SPM5. Forthe sparse regional flux feature, we used structural flux, whichis also used in Lorenzi et al. (2015b) and computed as the sumof divergences inside brain regions. Like Lorenzi et al. (2015b),we use the Automated Anatomical Labeling (AAL) segmentation(Tzourio-Mazoyer et al., 2002) by mapping it into the MDS-based template space. We use structural flux computed on allbrain regions in the ALL segmentation, as well as computed onlyon selected brain regions (hippocampi, medial temporal lobes,posterior cingulate, and ventricles) as in Lorenzi et al. (2015b).For each of these approaches, we report the classification resultin Figure 8. We can see that the proposed methods substantiallyoutperform the others.

4.5. Comparing Follow-Up TimeAll the above results are based on the 36month follow-up images.The same approach can be used on shorter time intervals. Weperformed an experiment on the 6, 12, 24, and 36 month follow-up images, using the combination of all features with Gaussiankernels for KPCA. Classification was repeated 500 times, eachtime with a different randomization in a 10-fold cross-validation.From Figure 9, it can be appreciated that when using a Gaussiankernel, the accuracy does not change a lot with different timeintervals, and even the 6 month follow-up image can give a highaccuracy and AUC. We performed Wilcoxon signed-rank tests,comparing the results of the different time intervals with thoseof the 36 month interval. The results are reported in Table 3. Wecan see that similar classification performance can be obtainedwith each time interval, but 12 and 24 month follow-up giveslightly better classification performance than 6 month follow-up. The 36 month follow-up seems slightly worse, which maybe attributed to an increase in registration difficulty for largeranatomical differences.

4.6. Group AnalysisBased on the aligned SVFs of all subjects, see Section 3.1.2, wecan do a group analysis to show the differences in developmentbetween the MCIc and MCIs groups. Before the group testing,we first performed a voxelwise Shapiro-Wilk test inside eachgroup for each voxel in the brain mask, 82.2% voxels in MCIcgroup and 74.1% for the MCIs group fit the normal distribution.Then,we compute a voxelwise two-sample Hotelling’s T-squaretest on the SVFs of the MCIc and the MCIs groups. Finally, weuse the FDR method (Benjamini and Yekutieli, 2001) to correctthe voxelwise independent p-value map for multiple comparisontesting. The resulting FDR corrected p-value map (p ≤ 0.001)is shown in Figure 10. The corpus callosum and hippocampusshow significant differences (p ≤ 0.001), which are confirmed byother MCI conversion studies (Wang et al., 2006; Zhang et al.,2013; Elahi et al., 2015).


http://www.fil.ion.ucl.ac.uk/spm/software/spm8/

http://www.fil.ion.ucl.ac.uk/spm/software/spm8/





FIGURE 5 | ROC analysis of classification with different features with and without dimensionality reduction. All experiments in this figure use the

MDS-based template.

FIGURE 6 | ROC of classification using the downsampled SVF feature with and without normalization, using the MDS-based template. (A) Without

KPCA, (B) with a linear KPCA.






FIGURE 7 | ROC analysis of the MDS-based template and the MNI152

template using the combination of all features and KPCA with the

Gaussian kernel.

FIGURE 8 | ROC of classification using the proposed SVF feature (using

linear KPCA), the proposed combination of features (using Gaussian

KPCA), the structural longitudinal feature using all brain regions, the

structural longitudinal feature using a subset of brain regions, and the

dense cross-sectional feature, on the same study population.

4.7. Summary of All the ResultsThe experimental results on the 36 month follow-up data aresummarized in Table 4, for different features, KPCA kernels andLogDemons smoothing parameters. Note that here we performedthe 10-fold cross-validation once instead of repeatedly as inTable 3. From Table 4 we can see that the optimal choice isthe combination of all features, using a LogDemons smoothingparameter of σ = 1.5 and including dimensionality reductionusing KPCA with a Gaussian kernel. Based on this table, we

FIGURE 9 | Classification results for the different follow-up images,

using the combination of all features and KPCA with a Gaussian

kernel. All experiments in this figure are employing the MDS-based template.

employed the optimal pipeline for different follow-up data,shown in Table 3, and compute a two-sample Hotelling’s T-square test on the normalized SVFs.

Considering a single feature, using the linear kernel, wefurthermore see from Table 4 that the SVF and Def featuresperform the best, while the GL feature performs worst. The JDand Div features perform better than GL, but worse than SVFand Def. This may be related to the information content thesefeatures carry: SVF and Def encode the point-wise developmentas a vector field, JD and Div encode the local volume change butonly as a scalar field, while GL only measures SVF amplitude, i.e.,neighboring and directional information is missing.

5. DISCUSSION

5.1. Analysis of the Experimental ResultsIn the smoothing parameter experiment, the worst performancefor the SVF feature was obtained for σ = 0. Since in thiscase no regularization is performed, the resulting deformationmay not be diffeomorphic and fails to represent real anatomicalcorrespondence. The best performance is obtained for σ =

1.5. Too rigorous smoothing filters out the high frequencycomponents in the SVF, and such details are important todistinguish between MCIc and MCIs subjects. σ = 1.5 seems agood compromise between regularization and the preservationof detail.

In the experiment that compares the classificationperformance of different features, the combination of allfeatures outperforms any single feature in both ACC and AUC.By comparing different kernels, we find that the Gaussian kernelwith the default parameter outperforms the linear kernel for thecombination of all features.

From the ROC curves in Figure 5, we can see that formost of the features, KPCA dimensionality reduction largelyimproves classification performance. By introducing the max-margin mechanism, in theory the SVM classifier (Cortes and






TABLE 3 | Comparing different follow-up times, using the combination of all features and Gaussian kernel KPCA.

6 months 12 months 24 months 36 months

ACCMean (Q1,Q3) 0.89 (0.87,0.91) 0.90 (0.87,0.93) 0.92 (0.90,0.95) 0.89 (0.87,0.91)

p-value 0.3955 0.0225 0.0000 −

AUCMean (Q1,Q3) 0.91 (0.90,0.92) 0.90 (0.87,0.93) 0.95 (0.94,0.97) 0.91 (0.90,0.93)

p-value 0.0335 0.0027 0.0000 −

SENMean (Q1,Q3) 0.92 (0.90,0.95) 0.98 (0.98,0.98) 0.97 (0.99,1.00) 0.90 (0.86,0.95)

p-value 0.0144 0.0000 0.0000 −

SPEMean (Q1,Q3) 0.87 (0.84,0.90) 0.85 (0.82,0.90) 0.89 (0.85,0.93) 0.88 (0.85,0.91)

p-value 0.1685 0.0000 0.5500 −

F1Mean (Q1,Q3) 0.87 (0.85,0.89) 0.88 (0.86,0.91) 0.90 (0.88,0.95) 0.86 (0.85,0.89)

p-value 0.1886 0.0001 0.0000 −

Here, we report the mean, first quartile (Q1) and third quartile (Q3) of the results of a 10-fold cross-validation, together with p-values of a Wilcoxon signed-rank test comparing shorter

time intervals with the 36 months time interval.

FIGURE 10 | Voxelwise two-sample Hotelling’s T-square test on the SVFs between the MCIc and the MCIs groups. The corpus callosum and the

hippocampus show significant differences between these two groups. The color spots just outside the brain region are caused by smoothing of the SVFs during the

LogDemons registration.






TABLE 4 | Summary of the experimental results on baseline and 36 months follow-up images.

Feature σ KPCA ACC AUC SEN SPE F1

La G L G L G L G L G

Baseline gray matter density − No 0.62 0.61 0.14 0.93 0.22

Regional flux, all regions − No 0.66 0.57 0.16 0.98 0.27

Regional flux, selected regions − No 0.72 0.64 0.44 0.90 0.55

JD 1.5 Yes 0.74 0.73 0.73 0.32 0.63 0.14 0.81 0.96 0.65 0.23

Div 1.5 Yes 0.73 0.69 0.72 0.65 0.77 0.19 0.7 1.00 0.69 0.32

GL 1.5 Yes 0.67 0.63 0.66 0.50 0.63 0.30 0.70 0.94 0.60 0.43

SVF 1.5 Yes 0.76 0.66 0.80 0.64 0.70 0.86 0.81 0.58 0.70 0.69

Def 1.5 Yes 0.75 0.68 0.80 0.48 0.72 0.16 0.79 1.00 0.70 0.28

Combination 1.5 Yes 0.78 0.92b 0.79 0.94 0.74 0.95 0.81 0.90 0.73 0.90

Combination 0 Yes 0.62 0.87 0.45 0.91 0.02 0.91 1.00 0.85 0.04 0.85

Combination 3 Yes 0.76 0.85 0.69 0.85 0.67 0.93 0.81 0.79 0.68 0.82

Combination 6 Yes 0.66 0.86 0.62 0.89 0.37 0.77 0.84 0.93 0.46 0.82

Combination 1.5 No 0.68 0.69 0.65 0.70 0.61

aL means the linear kernel function, G means the Gaussian kernel function.bBold indicates the best performance.

TABLE 5 | Previous methods and results on classification of MCIs vs. MCIc.

Article Data and feature Feature typea C / Lb Period (month) N (MCIs,MCIc) ACC (SEN/SPE)

Chupin et al., 2009 Hippocampus and amygdalae segment R C 0–18 134, 76 64 (60 / 65)

Misra et al., 2009 Whole brain, ROIs VBM R C 0–36 76, 27 82 (−/−)

Querbes et al., 2009 Cortex thickness R, DRs C 0–24 50, 72 73 (73 / 69)

Koikkalainen et al., 2011 Whole brain TBM V C 0–36 215, 164 72 (77 / 71)

Cuingnet et al., 2011 Hippocampus segment R C 0–18 134, 76 67 (62 / 69)

Whole brain VBM(GM) V C 71 (57 / 78)

Cortical thickness V C 70 (32 / 91)

Davatzikos et al., 2011 Whole brain VBM V, DRs C 0–36 170, 69 56 (95 / 38)

Westman et al., 2011 Cortical thickness and subcortical volumes R C 0–12 256, 62 59 (74 / 56)

Wolz et al., 2011 Hippocampus segment R C 0–48 238, 167 65 (63 / 67)

Whole brain TBM V C 64 (65 / 62)

Hip and amygdalae ROI, manifold learning R, DRm C 65 (64 / 66)

Cortical thickness V C 56 (63 / 45)

Combination of above 68 (67 / 69)

Cho et al., 2012 Cortical thickness V, DRm C 0–18 131, 72 71 (63 / 76)

Coupé et al., 2012 Hip and entorhinal cortex segment V C 0–48 238, 167 74 (74 / 74)

Zhang and Shen, 2012 Whole brain ROIs volume R, DRs L 0–24 50, 38 78 (79 / 78)

Eskildsen et al., 2013 Cortial ROIs TBM R C 0–48 227, 161 68 (68 / 69)

Liu et al., 2013 Whole brain ROIs volume R, DRs L 0–24 185, 164 71 (71 / 71)

Guerrero et al., 2014 Whole brain, manifold learning V, DRm C 0–24 114, 116 71 (75 / 67)

Lorenzi et al., 2015b Whole brain regional flux R L 0–36 110, 86 65 (67 / 63)

Proposed method SVF, Parallel Transport V, DRm L 0–36 43, 67 76 (70 / 81)

Proposed method Combined, Parallel Transport V, DRm L 0–36 43, 67 92 (95 / 90)

aFeature Type: R means ROI-based feature; V means voxel or vertex based feature; DRm means dimensionality reduction by feature mapping; DRs means dimensionality reduction by

feature selection. S means using single modality data, while M means using multi-modality data.b L means longitudinal feature; C means cross-sectional feature.

Results are self-reported on different data.






Vapnik, 1995) is able to deal with small sample sizes and high-dimensional features. However, in the real world, a small trainingset often fails to provide enough information to separate a high-dimensional feature space. As a result, dimensionality reductionis necessary for the SVM.

From the ROC curves in Figure 6, we can see that thenormalization step can improve the classification performance.This result strongly supports the use of normalized features inclassification.

When comparing the proposed method with two alternativemethods on the study dataset, we can see that the proposedmethod outperforms them, as shown in Figure 8. Since themain difference between the MCIc and MCIs subjects is theirdifferent pathological and anatomical development, using onlythe baseline image does not provide such information. As a result,using the gray matter density of the baseline image performsworst among the three methods. The sparse regional flux featurealso fails to obtain full developmental information, because itonly measures the overall flux of the pre-defined structuresand ignores many details. It is for example quite possiblethat two different developments have the same structural flux.Secondly, from the results in Table 4, dimensionality reductionusing KPCA on the divergence feature slightly outperformedthe method that aggregated divergence information over brainregions (the regional flux method). This may indicate thatregional aggregation is not an optimal dimensionality reductionmethod, at least for this particular feature.

When comparing the results from follow-up data withdifferent time intervals, we can see that even the short timeinterval (6 months for example) obtains a promising result.Previous results (Zhang and Shen, 2012; Liu et al., 2013) alsoshowed that including short time interval follow-up data intothe feature space can improve the results. As reported fromJack et al. (2013), brain structural changes may occur severalyears before Alzheimer’s disease manifests itself. Similar to thesigmoid curve, the largest changes may happen at the mid-stageof the disease, rather then in the end stage. From Table 2, we cansee that most MCIc patients have converted to “Dementia” or“MCI to Dementia” in the first 2 years. It is therefore possiblethat short follow-up times can still offer enough informationfor classification. This may indicate that the differences indevelopment between the MCIc and MCIs groups appear evenin the early stage of MCI, which is useful for early prediction ofAlzheimer’s disease.

5.2. Relation with Methods from LiteratureIn the last 5 years, predicting conversion from MCI to ADhas received increasing attention, and many methods have beenproposed based on structural MR images. The classificationperformance reported from some state-of-the-art methods arelisted in Table 5. A direct comparison between these methods isdifficult because of the different choice of cohort, preprocessingsteps, validation strategy and reported measurement. However,from the reported performance and the method summary, someobservations can be made.

Several methods focus only on one or two brain structuresthat are believed to be directly related to the disease progression.

Chupin et al. (2009) used the hippocampus and amygdala volumeto obtain an accuracy of 64%. Coupé et al. (2012) focussed onthe hippocampus and assigned an AD-likeness score to it. Withthe help of a better and more complex preprocessing pipeline,this method reported an accuracy of 74%. Wolz et al. (2011) andCuingnet et al. (2011) reported accuracies of 67% and 65%, basedon the hippocampus. These method use only cross-sectionalinformation, and ignore the development over time.

Manymethods use whole brain features in theMCI predictiontask, either voxelwise or region-wise. Such whole brain featuresinclude Voxel Based Morphometry (VBM), Tensor BasedMorphometry (TBM) and cortical thickness. Davatzikos et al.(2011) used cross-sectional VBM features to obtain an accuracyof 56%. Misra et al. (2009) used data-driven ROIs to extractROIwise cross-sectional VBM features and reported a highaccuracy of 82%. Asmentioned in Guerrero et al. (2014), the ratioof positive and negative samples in this study is quite unbalanced,so that the reported result is hard to compare with othermethods.Koikkalainen et al. (2011) used cross-sectional region-wise TBMon the whole brain to obtain an accuracy of 72%, while Eskildsenet al. (2013) focussed on cortical ROIs obtaining an accuracy of68%. Several methods (Querbes et al., 2009; Westman et al., 2011;Wolz et al., 2011; Cho et al., 2012) used cross-sectional corticalthickness to predict MCI conversion. Querbes et al. (2009) usedthe cortical thickness within ROIs to obtain an accuracy of 73%.However, there is a bias in the ROI generation step as pointedout by Guerrero et al. (2014). Westman et al. (2011) used bothsubcortical structure volume and cortical thickness in predefinedROIs as features to achieve an accuracy of 59%. Cho et al. (2012)used a vertex wise cortical thickness to achieve an accuracy of71%.

Beside these methods based on cross-sectional whole brainfeatures, somemethods extract longitudinal whole brain features.Liu et al. (2013) used the ratio of gray matter volume in baselineand follow-up images within 93 ROIs as features to obtainsan accuracy of 71%. Zhang and Shen (2012) use longitudinalfeatures from multi-modal data to train a multi-kernel SVMand achieved a good performance (accuracy of 78%). Lorenziet al. (2015b) extracted the flux from the non-rigid registrationbetween baseline and follow-up image of each structure asfeatures.

From this summary we can see that both longitudinal aswell as whole brain features are very powerful. Therefore, theproposed method is a meaningful choice since for the firsttime it exploits a dense whole brain feature (the SVF) thatwas derived from longitudinal information. Importantly, in ourmethod the two visits are not treated independently as in Zhangand Shen (2012); Liu et al. (2013) but are related throughregistration. Moreover, our feature directly models anatomicalchange. The proposed method furthermore features a relativelystraightforward skull stripping, and uses simple linear SVMclassification and pre-processing steps.

5.3. Group-Specific Template SelectionIn the proposed method, a template image T0 is needed. ManyVBM-based methods (Fan et al., 2007; Davatzikos et al., 2008,2009, 2011; Fan et al., 2008b) use the MNI template during






analysis. However, the MNI template maybe not a suitable choicefor our application: (1) TheMNI template is learned from a groupof young and healthy subjects, while our study is on aged andMCI subjects; (2) The MNI template is averaged and blurred,potentially resulting in a sub-optimal registration, losing details.The results from Section 4.3, furthermore support the use of atemplate selected from the MCI population.

5.4. Future WorkAlthough our method obtains a good classification result, thereare still some points that can be improved. For the anatomicalcorrespondence estimation, the proposed method uses onlyone follow-up image. State-of-the-art image regression methods(Singh et al., 2015; Sun et al., 2015) compute the developmentaltrajectory from multiple follow-up images, which may help toestimate an improved correspondence. For the dimensionalityreduction, we only use KPCA. Better dimensionality reductionmethods, such as t-distributed Stochastic Neighbor Embeddings(t-SNE) (Van der Maaten and Hinton, 2008) may help to increasethe classification performance. This study uses a relative smalldataset, and a larger dataset may help the classifier to learn abetter model. Including more types of features, like gray matterdensity, cortical thickness or structure volume, may also improvethe result, as different types of features can represent differentproperties of the brain.

6. CONCLUSION

In this paper, we propose a newmethod to detectMCI conversionusing the combination of normalized features from longitudinalstructural MR images, with KPCA dimensionality reduction.Using the proposed method, we obtain a high classificationperformance. The alignment of SVFs in the template spaceenables a statistical comparison between the MCIc and MCIsgroup on a voxel level. This facilitates the inspection ofdifferences, which were found in the hippocampus and theventricles.

We conclude that the estimation of structural change inthe brain over time can aid in predicting if MCI will evolveto Alzheimer’s disease. Such finding may be useful for earlyidentification of AD patients, to delay the disease onset or toslow down the disease progression. Therefore, it may be useful toimprove the quality of life of the patient and decrease the societalcost related to AD.

AUTHOR CONTRIBUTIONS

ZS, MV, BL, and MS contributed to the designing the work,drafting, final approval and all aspects of this work.

ACKNOWLEDGMENTS

The research leading to these results has received fundingfrom the Joint Scientific Thematic Research Programme (JSTP)between The Netherlands and China (Project 116350001),the Netherlands Organization for Scientific Research (NWOVENI 639.021.124), the European Union Seventh FrameworkProgramme (FP7/2007-2013) under grant agreement No. 604102(HBP) and the European Union’s Horizon 2020 research andinnovation programme under grant agreement No. 720270(HBP SGA1). Data collection and sharing for this project wasfunded by the Alzheimer’s Disease Neuroimaging Initiative(ADNI) (National Institutes of Health Grant U01 AG024904)and DOD ADNI (Department of Defense award numberW81XWH-12-2-0012). ADNI is funded by the National Instituteon Aging, the National Institute of Biomedical Imaging andBioengineering, and through generous contributions from thefollowing: AbbVie, Alzheimer’s Association; Alzheimer’s DrugDiscovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen;Bristol-Myers Squibb Company; CereSpir,Inc.; Eisai Inc.; ElanPharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun;F. Hoffmann-La Roche Ltd and its affiliated companyGenentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; JanssenAlzheimer Immunotherapy Research & Development, LLC.;Johnson & Johnson Pharmaceutical Research & DevelopmentLLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso ScaleDiagnostics, LLC.; NeuroRx Research; Neurotrack Technologies;Novartis Pharmaceuticals Corporation; Pfizer Inc.; PiramalImaging; Servier; Takeda Pharmaceutical Company; andTransition Therapeutics. The Canadian Institutes of HealthResearch is providing funds to support ADNI clinical sitesin Canada. Private sector contributions are facilitated by theFoundation for the National Institutes of Health (www.fnih.org).The grantee organization is the Northern California Institutefor Research and Education, and the study is coordinated bythe Alzheimer’s Disease Cooperative Study at the Universityof California, San Diego. ADNI data are disseminated by theLaboratory for Neuro Imaging at the University of SouthernCalifornia.

REFERENCES

Alzheimer’s Association (2014). 2014 Alzheimer’s disease facts and figures.

Alzheimer’s Dementia 10, 47–92. doi: 10.1016/j.jalz.2014.02.001

Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in

multiple testing under dependency. Ann. Stat. 29, 1165–1188.

Brookmeyer, R., Johnson, E., Ziegler-Graham, K., and Arrighi, H. M. (2007).

Forecasting the global burden of Alzheimer’s disease. Alzheimer’s Dementia 3,

186–191. doi: 10.1016/j.jalz.2007.04.381

Cachier, P., Bardinet, E., Dormont, D., Pennec, X., and Ayache, N. (2003). Iconic

feature based nonrigid registration: the PASHA algorithm. Comput. Vis. Image

Understand. 89, 272–298. doi: 10.1016/S1077-3142(03)00002-X

Cho, Y., Seong, J. K., Jeong, Y., and Shin, S. Y. (2012). Individual subject

classification for Alzheimer’s disease based on incremental learning using a

spatial frequency representation of cortical thickness data. NeuroImage 59,

2217–2230. doi: 10.1016/j.neuroimage.2011.09.085

Chupin, M., Gérardin, E., Cuingnet, R., Boutet, C., Lemieux, L., Lehéricy, S.,

et al. (2009). Fully automatic hippocampus segmentation and classification

in Alzheimer’s disease and mild cognitive impairment applied on data from

ADNI. Hippocampus 19, 579–587. doi: 10.1002/hipo.20626

Cortes, C., and Vapnik, V. (1995). Support-vector networks. Mach. Learn. 20,

273–297. doi: 10.1007/BF00994018

Coupé, P., Eskildsen, S. F., Manjón, J. V., Fonov, V. S., and Collins, D. L. (2012).

Simultaneous segmentation and grading of anatomical structures for patient’s


www.fnih.org

https://doi.org/10.1016/j.jalz.2014.02.001

https://doi.org/10.1016/j.jalz.2007.04.381

https://doi.org/10.1016/S1077-3142(03)00002-X

https://doi.org/10.1016/j.neuroimage.2011.09.085

https://doi.org/10.1002/hipo.20626

https://doi.org/10.1007/BF00994018





classification: application to Alzheimer’s disease. NeuroImage 59, 3736–3747.

doi: 10.1016/j.neuroimage.2011.10.080

Coupé, P., Fonov, V. S., Bernard, C., Zandifar, A., Eskildsen, S. F., Helmer, C., et

al. (2015). Detection of Alzheimer’s disease signature in MR images seven years

before conversion to dementia: Toward an early individual prognosis. Hum.

Brain Mapp. 36, 4758–4770. doi: 10.1002/hbm.22926

Crismon, M. L. (1994). Tacrine: first drug approved for Alzheimer’s disease. Ann.

Pharmacother. 28, 744–751. doi: 10.1177/106002809402800612

Cuingnet, R., Gerardin, E., Tessieras, J., Auzias, G., Lehéricy, S., Habert, M.-

O., et al. (2011). Automatic classification of patients with Alzheimer’s disease

from structural MRI: a comparison of ten methods using the ADNI database.

NeuroImage 56, 766–781. doi: 10.1016/j.neuroimage.2010.06.013

Cuingnet, R., Glaunès, J. A., Chupin, M., Benali, H., and Colliot, O. (2013).

Spatial and anatomical regularization of SVM: a general framework for

neuroimaging data. Pattern Anal. Mach. Intell. IEEE Trans. 35, 682–696.

doi: 10.1109/TPAMI.2012.142

Davatzikos, C., Bhatt, P., Shaw, L. M., Batmanghelich, K. N., and Trojanowski,

J. Q. (2011). Prediction of MCI to AD conversion, via MRI, CSF

biomarkers, and pattern classification.Neurobiol. Aging 32, 2322.e19–2322.e27.

doi: 10.1016/j.neurobiolaging.2010.05.023

Davatzikos, C., Resnick, S. M., Wu, X., Parmpi, P., and Clark, C. M.

(2008). Individual patient diagnosis of AD and FTD via high-

dimensional pattern classification of MRI. NeuroImage 41, 1220–1227.

doi: 10.1016/j.neuroimage.2008.03.050

Davatzikos, C., Xu, F., An, Y., Fan, Y., and Resnick, S. M. (2009). Longitudinal

progression of Alzheimer’s-like patterns of atrophy in normal older adults: the

SPARE-AD index. Brain 132, 2026–2035. doi: 10.1093/brain/awp091

De Santi, S., de Leon, M. J., Rusinek, H., Convit, A., Tarshish, C. Y., Roche, A., et al.

(2001). Hippocampal formation glucose metabolism and volume losses in MCI

and AD. Neurobiol. Aging 22, 529–539. doi: 10.1016/S0197-4580(01)00230-5

do Carmo, M. P. (1992). Riemannian geometry. translated from the second

portuguese edition by francis flaherty. mathematics: Theory & applications.

Birkhauser 5052, 5041–5052.

Elahi, S., Bachman, A. H., Lee, S. H., Sidtis, J. J., and Ardekani, B. A. (2015).

Corpus callosum atrophy rate in mild cognitive impairment and prodromal

Alzheimer’s disease. J. Alzheimer’s Dis. 45, 921–931. doi: 10.3233/JAD-142631

Eskildsen, S. F., Coupé, P., Fonov, V., Manjón, J. V., Leung, K. K., Guizard, N., et

al. (2012). Beast: brain extraction based on nonlocal segmentation technique.


Eskildsen, S. F., Coupé, P., García-Lorenzo, D., Fonov, V., Pruessner, J. C.,

and Collins, D. L. (2013). Prediction of Alzheimer’s disease in subjects with

mild cognitive impairment from the ADNI cohort using patterns of cortical

thinning. NeuroImage 65, 511–521. doi: 10.1016/j.neuroimage.2012.09.058

Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and Lin, C.-J. (2008a).

Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9,

1871–1874.

Fan, Y., Batmanghelich, N., Clark, C. M., and Davatzikos, C. (2008b). Spatial

patterns of brain atrophy in MCI patients, identified via high-dimensional

pattern classification, predict subsequent cognitive decline. NeuroImage 39,

1731–1743. doi: 10.1016/j.neuroimage.2007.10.031

Fan, Y., Shen, D., and Davatzikos, C. (2005). “Classification of structural images

via high-dimensional image warping, robust feature extraction, and SVM” in

Medical Image Computing and Computer-Assisted Intervention MICCAI 2005,

Vol. 3749 of Lecture Notes in Computer Science, eds J. Duncan, and G. Gerig

(Berlin; Heidelberg: Springer), 1–8.

Fan, Y., Shen, D., Gur, R. C., Gur, R. E., and Davatzikos, C. (2007). COMPARE:

Classification of morphological patterns using adaptive regional elements. IEEE

Trans. Med. Imaging 26, 93–105. doi: 10.1109/TMI.2006.886812

Fiot, J.-B., Raguet, H., Risser, L., Cohen, L. D., Fripp, J., Vialard, F.-X., et al.

(2014). Longitudinal deformation models, spatial regularizations and learning

strategies to quantify alzheimer’s disease progression. NeuroImage 4, 718–729.

doi: 10.1016/j.nicl.2014.02.002

Fiot, J.-B., Risser, L., Cohen, L. D., Fripp, J., and Vialard, F.-X. (2012). “Local vs

global descriptors of hippocampus shape evolution for Alzheimer’s longitudinal

population analysis,” in International Workshop on Spatio-temporal Image

Analysis for Longitudinal and Time-Series Image Data (Berlin; Heidelberg:

Springer), 13–24. doi: 10.1007/978-3-642-33555-6_2

Fonov, V. S., Evans, A. C., McKinstry, R. C., Almli, C., and Collins, D. (2009).

Unbiased nonlinear average age-appropriate brain templates from birth to

adulthood. NeuroImage 47:S102. doi: 10.1016/S1053-8119(09)70884-5

Frisoni, G. B., Fox, N. C., Jack, C. R., Scheltens, P., and Thompson, P. M. (2010).

The clinical use of structural MRI in Alzheimer disease. Nat. Rev. Neurol. 6,

67–77. doi: 10.1038/nrneurol.2009.215

Guerrero, R., Wolz, R., Rao, A., and Rueckert, D. (2014). Manifold population

modeling as a neuro-imaging biomarker: application to ADNI and ADNI-GO.


Hernandez, M., Bossa, M. N., and Olmos, S. (2009). Registration of anatomical

images using paths of diffeomorphisms parameterized with stationary

vector field flows. Int. J. Comput. Vis. 85, 291–306. doi: 10.1007/s11263-00

9-0219-z

Jack, C. R., Bernstein, M. A., Fox, N. C., Thompson, P., Alexander, G.,

Harvey, D., et al. (2008). The Alzheimer’s disease neuroimaging initiative

(ADNI): MRI methods. J. Magn. Res. Imaging 27, 685–691. doi: 10.1002/jmri.

21049

Jack, C. R., Knopman, D. S., Jagust, W. J., Petersen, R. C., Weiner, M. W., Aisen,

P. S., et al. (2013). Tracking pathophysiological processes in alzheimer’s disease:

an updated hypothetical model of dynamic biomarkers. Lancet Neurol. 12,

207–216. doi: 10.1016/S1474-4422(12)70291-0

Joshi, S., Davis, B., Jomier, M., and Gerig, G. (2004). Unbiased diffeomorphic

atlas construction for computational anatomy. NeuroImage 23, S151–S160.

doi: 10.1016/j.neuroimage.2004.07.068

Klein, S., Staring, M., Murphy, K., Viergever, M. A., and Pluim, J. P. (2010). Elastix:

a toolbox for intensity-based medical image registration. Med. Imaging IEEE

Trans. 29, 196–205. doi: 10.1109/TMI.2009.2035616

Klöppel, S., Stonnington, C. M., Chu, C., Draganski, B., Scahill, R. I., Rohrer, J. D.,

et al. (2008). Automatic classification of MR scans in Alzheimer’s disease. Brain

131, 681–689. doi: 10.1093/brain/awm319

Koikkalainen, J., Lötjönen, J., Thurfjell, L., Rueckert, D., Waldemar, G.,

and Soininen, H. (2011). Multi-template tensor-based morphometry:

application to analysis of Alzheimer’s disease. NeuroImage 56, 1134–1144.

doi: 10.1016/j.neuroimage.2011.03.029

Kozauer, N., and Katz, R. (2013). Regulatory innovation and drug development

for early-stage Alzheimer’s disease. New Engl. J. Med. 368, 1169–1171.

doi: 10.1056/NEJMp1302513

Leon, M. J., Mosconi, L., Li, J., Santi, S., Yao, Y., Tsui, W. H., et al. (2007).

Longitudinal CSF isoprostane and MRI atrophy in the progression to AD. J.

Neurol. 254, 1666–1675. doi: 10.1007/s00415-007-0610-z

Liu, M., Suk, H.-I., and Shen, D. (2013). “Multi-task Sparse Classifier for Diagnosis

of MCI Conversion to AD with Longitudinal MR Images,” inMachine Learning

in Medical Imaging, eds G. Wu, D. Zhang, D. Shen, P. Yan, K. Suzuki, and F.

Wang (Cham: Springer), 243–250.

Lorenzen, P., Davis, B. C., and Joshi, S. (2005). “Unbiased atlas formation via large

deformations metric mapping,” in Medical Image Computing and Computer-

Assisted Intervention–MICCAI 2005, (Berlin; Heidelberg: Springer), 411–418.

doi: 10.1007/11566489_51

Lorenzi, M., Ayache, N., and Pennec, X. (2015a). Regional flux analysis

for discovering and quantifying anatomical changes: an application to

the brain morphometry in Alzheimer’s disease. Neuroimage 115, 224–234.

doi: 10.1016/j.neuroimage.2015.04.051

Lorenzi, M., and Pennec, X. (2013). Geodesics, parallel transport & one-parameter

subgroups for diffeomorphic image registration. Int. J. Comput. Vis. 105,

111–127. doi: 10.1007/s11263-012-0598-4

Lorenzi, M., Pennec, X., Ayache, N., and Frisoni, G. (2012). “Disentangling the

normal aging from the pathological Alzheimer’s disease progression on cross-

sectional structural MR images,” in MICCAI Workshop on Novel Imaging

Biomarkers for Alzheimer’s Disease and Related Disorders (NIBAD’12), (Nice:

Springer), 145–154.

Lorenzi, M., Pennec, X., Frisoni, G. B., and Ayache, N. (2015b).

Disentangling normal aging from Alzheimer’s disease in structural

magnetic resonance images. Neurobiol. Aging 36, S42–S52.


Mattsson, N., Zetterberg, H., Hansson, O., Andreasen, N., Parnetti, L., Jonsson, M.,

et al. (2009). CSF biomarkers and incipient Alzheimer disease in patients with

mild cognitive impairment. JAMA 302, 385–393. doi: 10.1001/jama.2009.1064



https://doi.org/10.1002/hbm.22926

https://doi.org/10.1177/106002809402800612


https://doi.org/10.1109/TPAMI.2012.142

https://doi.org/10.1016/j.neurobiolaging.2010.05.023


https://doi.org/10.1093/brain/awp091

https://doi.org/10.1016/S0197-4580(01)00230-5

https://doi.org/10.3233/JAD-142631




https://doi.org/10.1109/TMI.2006.886812

https://doi.org/10.1016/j.nicl.2014.02.002

https://doi.org/10.1007/978-3-642-33555-6

https://doi.org/10.1016/S1053-8119(09)70884-5

https://doi.org/10.1038/nrneurol.2009.215


https://doi.org/10.1007/s11263-00

https://doi.org/10.1002/jmri.

https://doi.org/10.1016/S1474-4422(12)70291-0


https://doi.org/10.1109/TMI.2009.2035616

https://doi.org/10.1093/brain/awm319


https://doi.org/10.1056/NEJMp1302513

https://doi.org/10.1007/s00415-007-0610-z

https://doi.org/10.1007/11566489_51


https://doi.org/10.1007/s11263-012-0598-4


https://doi.org/10.1001/jama.2009.1064





Misra, C., Fan, Y., and Davatzikos, C. (2009). Baseline and longitudinal patterns

of brain atrophy in MCI patients, and their use in prediction of short-

term conversion to AD: Results from ADNI. NeuroImage 44, 1415–1422.

doi: 10.1016/j.neuroimage.2008.10.031

Mitchell, A., and Shiri-Feshki, M. (2008). Temporal trends in the long term risk

of progression of mild cognitive impairment: a pooled analysis. J. Neurol.

Neurosurg. Psychiatry 79, 1386–1391. doi: 10.1136/jnnp.2007.142679

Park, H., Bland, P. H., Hero III, A. O., and Meyer, C. R. (2005). “Least biased target

selection in probabilistic atlas construction,” in Medical Image Computing and

Computer-Assisted Intervention–MICCAI 2005, (Springer) 419–426.

Querbes, O., Aubry, F., Pariente, J., Lotterie, J.-A., Démonet, J.-F., Duret, V., et al.

(2009). Early diagnosis of Alzheimer’s disease using cortical thickness: impact

of cognitive reserve. Brain 132, 2036–2047. doi: 10.1093/brain/awp105

Schneider, L. S., Tariot, P. N., Dagerman, K. S., Davis, S. M., Hsiao, J. K.,

Ismail, M. S., et al. (2006). Effectiveness of atypical antipsychotic drugs

in patients with Alzheimer’s disease. New Engl. J. Med. 355, 1525–1538.

doi: 10.1056/NEJMoa061240

Schölkopf, B., Smola, A., and Müller, K.-R. (1997). “Kernel principal component

analysis,” in Artificial Neural Networks ICANN’97, eds W. Gerstner, A.

Germond, M. Hasler, and J.-D. Nicoud (Berlin, Heidelberg: Springer), 583–588.

Singh, N., Vialard, F.-X., andNiethammer,M. (2015). Splines for diffeomorphisms.

Med. Image Anal. 25, 56–71. doi: 10.1016/j.media.2015.04.012

Sled, J. G., Zijdenbos, A. P., and Evans, A. C. (1998). A nonparametric method for

automatic correction of intensity nonuniformity in MRI data. Med. Imaging

IEEE Trans. 17, 87–97. doi: 10.1109/42.668698

Sun, Z., Lelieveldt, B. P. F., and Staring, M. (2015). “Fast linear geodesic

shape regression using coupled logdemons registration,” in 2015 IEEE

12th International Symposium on Biomedical Imaging (ISBI), 1276–1279.

doi: 10.1109/ISBI.2015.7164107

Thirion, J.-P. (1998). Image matching as a diffusion process: an

analogy with Maxwell’s demons. Med. Image Anal. 2, 243–260.

doi: 10.1016/S1361-8415(98)80022-4

Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O.,

Delcroix, N., et al. (2002). Automated anatomical labeling of activations in

spm using a macroscopic anatomical parcellation of the mni mri single-subject

brain. Neuroimage 15, 273–289. doi: 10.1006/nimg.2001.0978

Van der Maaten, L., and Hinton, G. (2008). Visualizing data using t-SNE. J. Mach.

Learn. Res. 9, 2579–2605.

Vercauteren, T., Pennec, X., Perchant, A., and Ayache, N. (2009). Diffeomorphic

demons: efficient non-parametric image registration. NeuroImage 45, S61–S72.

doi: 10.1016/j.neuroimage.2008.10.040

Wang, P. J., Saykin, A. J., Flashman, L. A., Wishart, H. A., Rabin, L. A.,

Santulli, R. B., et al. (2006). Regionally specific atrophy of the corpus callosum

in AD, MCI and cognitive complaints. Neurobiol. Aging 27, 1613–1617.


Westman, E., Simmons, A., Muehlboeck, J.-S., Mecocci, P., Vellas, B.,

Tsolaki, M., et al. (2011). AddNeuroMed and ADNI: similar patterns of

Alzheimer’s atrophy and automated MRI classification accuracy in Europe

and North America. NeuroImage 58, 818–828. doi: 10.1016/j.neuroimage.2011.

06.065

Wolz, R., Julkunen, V., Koikkalainen, J., Niskanen, E., Zhang, D. P., Rueckert,

D., et al. (2011). Multi-method analysis of MRI images in early diagnostics

of Alzheimer’s disease. PLoS ONE 6:e25446. doi: 10.1371/journal.pone.

0025446

Zhang, D., and Shen, D. (2012). Predicting future clinical changes of MCI

patients using longitudinal and multimodal biomarkers. PLoS ONE 7:e33182.

doi: 10.1371/journal.pone.0033182

Zhang, Y., Schuff, N., Camacho, M., Chao, L. L., Fletcher, T. P., Yaffe, K., et al.

(2013). MRI markers for mild cognitive impairment: comparisons between

white matter integrity and gray matter volume measurements. PLoS ONE

8:e66367. doi: 10.1371/journal.pone.0066367

Conflict of Interest Statement: The authors declare that the research was

conducted in the absence of any commercial or financial relationships that could

be construed as a potential conflict of interest.

Copyright © 2017 Sun, van de Giessen, Lelieveldt, Staring. This is an open-access

article distributed under the terms of the Creative Commons Attribution License (CC

BY). The use, distribution or reproduction in other forums is permitted, provided the

original author(s) or licensor are credited and that the original publication in this

journal is cited, in accordance with accepted academic practice. No use, distribution

or reproduction is permitted which does not comply with these terms.



https://doi.org/10.1136/jnnp.2007.142679

https://doi.org/10.1093/brain/awp105

https://doi.org/10.1056/NEJMoa061240

https://doi.org/10.1016/j.media.2015.04.012

https://doi.org/10.1109/42.668698

https://doi.org/10.1109/ISBI.2015.7164107

https://doi.org/10.1016/S1361-8415(98)80022-4

https://doi.org/10.1006/nimg.2001.0978




https://doi.org/10.1371/journal.pone.0025446



http://creativecommons.org/licenses/by/4.0/








detection of conversion from mild cognitive impairment to ... · trajectory of the brain anatomy...

Documents