fmri data analysis techniques and the self-organizing maps ... · fmri data analysis techniques and...

fMRI Data Analysis Techniquesand the Self-Organizing Maps Approach

Tiago Jose de Oliveira JordaoInstituto Superior Tecnico Polo do Taguspark, Porto Salvo, Portugal

Functional Magnetic Resonance Imaging (fMRI) is awidely used technique to know more about how thebrain function supports mental activities. AlthoughfMRI is a powerful tool to detect functional activationwithin the brain, the obtained data from fMRI experi-ments cannot be easily directed analyzed because ofa number of factors: weakness of the signal, abundantnoise in the data and the difficulty of separating activa-tions of interest from other types. To overcome some ofthese difficulties powerful analysis techniques are usedto interpret the fMRI data.

In this paper, besides describing some of the mostregular approaches, we will provide a more detailedanalysis of one technique in particular: the self orga-nizing maps (SOM). To conclude about the performanceof this approach we developed a data mining tool imple-menting the SOM algorithm and tested it with real fMRIdata. A presentation and a discussion of these resultswill be provided.

Keywords: Neuroimaging, fMRI, data analysis tech-niques, self-organizing maps.

Functional magnetic resonance imaging (fMRI) isone of the most successful tools in the investigationof cognitive function, enabling brain imaging with ahigh spatial resolution in a non-invasive way. Theacquisition of data is commonly achieved with tech-niques that measure the blood oxygen level-dependent(BOLD) signal changes.

The development of the BOLD contrast fMRI tech-nique represents a considerable advancement in thearea of cognitive neuroscience, but is far from givingprecise responses about the mechanisms involved inthe functionality of the brain. Nevertheless, a largenumber of experiments and studies, based on this tech-nique, have been one of the main sources of knowledgeabout brain function we have today and considered animportant means of knowing even more.

The analysis of fMRI data has the objective to ex-tract functional correlates from the obtained data sets[1] and identify brain regions involved in functions ofinterest. One of the main difficulties, when analyzingthe fMRI data, is to separate the noise from the sig-nals of interest. Other problem is the interpretationof the relation of these signals with some experimen-tal behavior. In order to conduct the analysis of fMRIdata, assumptions about the brain function must bemade and sophisticated analysis techniques must beemployed.

Inferential methods like statistical parametric map-ping (SPM), clustering techniques and transforma-tion based methods, like ICA (independent componentanalysis) and PCA (principal component analysis) areamong the most applied approaches today in fMRIanalysis.

Statistical parametric mapping (SPM) [2] is basedon the general linear model (GLM) and is one of themost commonly used approaches for fMRI data anal-ysis. SPM includes methods like ANOVA, correlationcoefficients and t-tests.

The fundamental principle of SPM is that a sig-nal depending simultaneously on various variables canbe decomposed in terms of the variables contributions.This is only valid if sufficient sampling of the signalsis obtained with different contributions of the indepen-dent variables.

The center of the GLM is a simple equation that re-lates observations to expectations by expressing the ob-served response Y as a linear combination of expectedcomponents (or explanatory variables) X and an asso-ciated residual error ε:

Y = Xβ + ε (1)

In terms of an fMRI experiment, Y represent the time-course (TC) of the voxel we want to analyze. The ma-trixX is called design matrix and contains the explana-tory variables that represent the experimental condi-tions under which the observations were made. Eachrow of the design matrix represents a different scanand each column some effect of the experience or aneffect that may confound the results. The explanatoyvariables or predictors are obtained by using a box-car function with a standard time-course of the hemo-dynamic response. A simple condition box-car of thetime-course could be defined with values of 1 when anexperimental condition is verified (on) and values of0 in other cases (off). β is the set of coefficients tobe determined, relating the voxel TC values to the ex-perimental independent variables. In other words βcharacterizes preference profiles of the voxel for the ex-perimental conditions modeled in the design matrix.Finally ε is a set of random error terms conforming toa Gaussian or normal distribution.

The estimation of β and its variance can be used ina vast range of statistical analyses. Despite this possi-bility the main focus here should be a good formulationof the design matrix X in order to model with a goodprecision the experimental design and obtain the bestresults from the inferences made. If the design matrixdoes not contain all relevant predictors, changes in thesignal of the voxels will be accounted for errors insteadof the model. Inferences about the contributions of thepredictors to the observed signal are made using F orT statistics.

Methods based on the general linear model, likeSPM, are currently one of the most used analysisstrategies. One of the reasons is that the method offersan intuitive approach to the analysis of fMRI. On theother side these approaches are based on a set of ten-uous assumptions. The first is that the observations

fMRI Data Analysis Techniques and the Self-Organizing Maps Approach Tiago Jordao

have a known distribution (e.g. Gaussian). SecondSPM assumes that the time-course of different sourcescan be reliably estimated in advance. This involves fit-ting the acquired data into a canonical hemodynamicresponse function, making an assumption about thetemporal evolution of the fMRI data. The third as-sumption is that the variances and convariances ofthe BOLD signal between repeated measurements areequal. The last assumption is that the signals at dif-ferent voxels are considered independent. All these as-sumptions can lead to invalid or inefficient statisticaltests. Finally, one other thing to take into accountwhen using SPM is that it relies on smoothing of thedata, which may degrade the inherently good spatialresolution offered by fMRI.

Tranformation based methods transform originaldata into a high-dimensional vector space in order toseparate different functional responses and types ofnoise from each other. The new vector space willbe composed of several components, each one repre-senting typical spatial or temporal responses of func-tional activity and various noise sources. There are twotransformation-based methods applied to fMRI: princi-pal component analysis (PCA) and independent com-ponent analysis (ICA). Both methods use a transfor-mation matrix to remove diffuse and complex patternsof correlation between the element vectors of the orig-inal data.

PCA uses only second-order statistics and decorre-lates the outputs using an orthogonal matrix. Let Xbe the fMRI data matrix M ×N , with zero empiricalmean (the empirical mean of the distribution has beensubtracted from the data set), where N is the numberof column vectors in the data set and M is the numberof elements in each column vector (dimension). ThePCA method is used to generate a new feature spaceY using the following equation:

Y T = XTW (2)

where W is the M ×P matrix of basic column vectorscomposed by a set of P eigenvectors from the M ×Mcovariance matrix C of the data.

One of the main problems when using PCA forfMRI analysis is the difficulty to capture small changesin signal variance related to some task related experi-ences. This happens because the principal componentsare projected onto orthogonal eigenvectors that expressonly the greatest variance in the data. The orthogonal-ity between the principal components is also the causeof other limitation. If the signals of interest and thesignals from other artifacts, such scanner or physiolog-ical noise, are non-orthogonal, this will result in lossof important signal. Finally this approach based onvoxel-pair covariance will certainly miss some overallpatterns of association (e.g. some voxels becoming si-multaneously activated during an experiment).

ICA attempts to make the outputs as statisticallyindependent as possible while placing no constraints onthe transformation matrix:

C = WX (3)

Here X is the matrix t (time-points) × n (voxels) ofobserved data, W is the unmixing matrix derived fromICA and C is the c (components) × n component ma-trix (Fig. 2). We can only observe the variables in Xand must estimate W and C using X.

The analysis of fMRI data with the ICA approachhas some pros and cons. The spatial division of thedata into non-overlapping and specific sets provides avery nice method to identify spatial nodes that are in-dependent and sparse. The correlation between time-courses of different components removes the constraintthat artifacts non-related to the experiment have tobe orthogonal to those derived from the experiment(in fMRI analyses any confounding between signals ofinterest and artifacts means loss of signal) [3]. Oneweakness of this approach is that the attempt to findmaps that are maximally independent tends to frag-ment some broad areas of activation into multiple mapswith all having strong correlated TCs. The ICA ap-proach difficults the identification of non-linear acti-vation relationships between active areas, which is animportant issue regarding the theory of functional in-tegration of the brain.

Clustering algorithms attempt to classify the time-course (TC) signals of the voxels into several patternsaccording to the similarity among them. This informa-tion is organized in clusters and is independent of theirspatial neighborhood. These clusters can be describedby an average TC or a cluster center obtained by av-eraging all the TCs of the cluster in question. Theresultant output maps can be calculated by labelingthe pixels of the same cluster (membership map) orby plotting the distance of the TCs to a given clustercenter (distance map).

All the clustering algorithms share the same princi-ple, the minimization of an objective function. We willdescribe de K-means clustering (KMC) variant. Letthe set {xj} be composed of N vectors from <i whereeach vector corresponds to a voxel time-course (TC)and i is the number of images taken by the MRI scan-ner. Next we consider K clusters, and their respectivecenter ck ∈ <i and 1 ≤ k ≤ K. The data is partitionedby clusters such each xj (voxel’s TC) is assigned toexactly one cluster Ck. The clustering algorithm ob-jective is this assignment while minimizing an objectivefunction to give the low-dimensional approximation tothe data. So we have the K-means objective function:

Iw =1

N

K∑k=1

∑xj∈Ck

d2(xj , ck),K ≤ N (4)

where d2 is the squared distance of two vectors andCk represents the number of elements in the respectivecluster. The distance d is typically the Euclidean butother types are also used.

The results of clustering approaches, like KMC, de-pend largely on a number of factors. The number ofclusters must be specified before the algorithm and achoice that does not reflect the data structure will re-sult in weak or meaningless results. Regardless the ex-

2


istence or not of structure the algorithm will performdata partitioning. This makes the validation of the re-sults necessary. The choice of the distance d metricwill also have a deep influence on the results. The useof the distance function also demands pre-processing ofthe data [4]. As it is the partitioning is based on the av-erage of the TC signals. Normally this is not the objec-tive since what is wanted is to gather in one cluster TCswith similar waveforms (temporal profiles). The use ofraw inputs based on the distance metric will merelysegment the brain [5]. The final factor to consider isthat the K-means algorithm is non-deterministic andthe results are dependent of the cluster initialization.

Self-Organizing MapsThe self-organizing maps (SOM) algorithm introducedby Kohonen [6] is an analog, in this case ironically, tothe human brain way of organizing information in alogical manner. It is theorized that cognitive cells inthe human brain, like the ones in the visual cortex,are trained in a supervised manner, while others func-tion in a self-organized unsupervised manner. Thesecells are organized topologically in a way such adja-cent areas perform related cognitive functions. TheSOM method emulates this unsupervised learning. Interms of the algorithm it tries to reveal structure tothe data by bringing together characteristics of twoother algorithms. Kohonen’s maps do not partition thedata into independent subsets but model its interrela-tion, like cluster algorithms, while performing in it alower-dimensional projection like topological preserv-ing mapping algorithms. The SOM approach differsfrom cluster approaches in the way that accounts forthe neighborhood of cluster centers. One interestingthing in its use is that addresses some difficulties ofthe conventional clustering:

• The choice of the number of expected clusters.

• The clusters validity.

• The detection of small and large clusters withinthe same data set.

Kohonen’s maps consist of one layer of neurons(neuron map), usually a two-dimensional grid, andeach neuron has a feature array. In the case of fMRIanalysis this array represents the voxel’s time-course(TC). Each neuron in the map is a cluster center andhas as many input connections as the data-samplesthat will be used in the map training. The trainingprocedure is performed in several steps and has theobjective of organize the voxel’s TC such that similarones are close to each other.

The first steps consist in fixing the SOM and train-ing parameters dimension. Each node of the map isinitialized with random noise. The training of the mapis then made iteratively by selecting a random TC fromthe measured data from the entire imaged volume orfrom a region of interest. All the selected voxel’s TCused as input for the training will also be normalized.For each iteration the algorithm looks for the neuron

that is more similar to the input and declares it thewinner. The determination of the degree of similarityCan be done using metrics like the Euclidean distanceor the scalar product of the input with the tested neu-ron for example.

The winner cluster center will be moved towardsthe selected input as all the centers of the neighborneurons by an amount inversely to the distance to thewinner neuron using

nk(t+ 1) = nk(t) + hck(t) ∗ (xi(t)− nk(t)) (5)

where t is the current iteration value, hck is a neigh-borhood function that controls how much the winnerand his neighbors are updated and to what degree, andxi is the selected TC. As the algorithm goes on, thecenters of the winning cluster and the considered cur-rent neighbors will change less, as a means of achievingthe map convergence and preserving the quantizationof the data. As the iterations progresses, the neigh-borhood function shrinks the neighborhood and at theend only individual nodes are updated. This functioncould be defined in numerous ways, being a shrinkingGaussian neighborhood function one of the most used[7]:

hck = α(t) ∗ exp(−‖rk − rc‖2

2σ2(t)) (6)

where 0 < α(t) < 1 is the learning rate, that decreasesover time, controling how faster the neurons learn, andrk ∈ <2 and rc ∈ <2 are neuron coordinates from thewinner and updated neuron respectively. Finally σ(t)corresponds to the width of the neighborhood func-tion, which also decreases with time. The ending ofthe training processes could be based in a chosen num-ber of iterations or on an evaluation of the quality ofthe obtained map.

With map training we obtained a set of small clus-ters as large as the map’s size. The clusters can thenbe combined to form larger super clusters. This canbe done in a different number of ways. One approachwould be to add small clusters interactively to thesuper clusters with the support of some visualizationtechnique [8]. Other ways include automatic calcula-tion of the super clusters by means of constraint (need-driven) clustering using metrics like least-mutual dis-tance [9] or least-squares distance [10], or can also bedone using data-driven clustering like fuzzy c-meansclustering [11].

ImplementationOur analysis of the performance of the SOM algorithmwill be based in two experiments, one based in the stim-ulation of the auditory cortex and other in the stimu-lation of the visual cortex.

The first data set obtained from the first experiencecomprises whole brain EPI-BOLD images acquired ona 2T Siemens MAGNETOM Vision system. 96 ac-quisitions were made with TR = 7s and each acquisi-tion consisted of 64 contiguous slices (64x64x64 3mmx 3mm x 3mm voxels). This was a block design exper-iment where auditory stimulation was alternated with

3


rest periods. The blocks of auditory stimulation con-sisted on bi-syllabic words presented to subject at arate of 60 per minute. The experiment was conductedby Geraint Rees under the direction of Karl Friston andthe FIL methods group. This experiment has not beenformally written up and is freely available for educationand evaluation purposes. The data set and a more de-tailed description can be found at http://www.fil.

ion.ucl.ac.uk/spm/data/auditory/. The chapter28 of the SPM5 manual (http://www.fil.ion.ucl.ac.uk/spm/doc/spm5_manual.pdf) illustrates a stepby step analysis of this data set, using the SPM5 pack-age, presenting the respective final results. We willcompare these results to those obtained by our SOMapproach using the same data set.

The second data set is composed of whole brainEPI-BOLD images acquired on a 3T Philips MRIsystem. 108 acquisitions were made with TR = 3sand each acquisition consisted of 40 contiguous slices(80x80x40 ∼ 2.875 x 2.875 mm x 3 mm voxels). Thisexperiment used a rapid event-related paradigm de-sign where various pairs of faces were presented inone of 6 possible orientations: 0, 60, 120, 180, 240or 300. The above experiment was previously ana-lyzed using hypothesis driven analysis by other inves-tigation team [12] using FSL (fMRI software library -www.fmrib.ox.ac.uk/fsl) for the preprocessing andapplication of the GLM. In this particular experimentthe investigation team tried to find not only activationin the visual cortex (Figure 4C) but also more special-ized zones (Figure 5 C) within this cortex related to theface inversion effect (FIE). Using the SOM approach weexplored the data in order to find first one zone corre-spondent to the visual cortex activation and similar tothe one found with the GLM approach. Later we alsotried to find within this same zone smaller specializedzones of activation.

In both analysis using the model driven approach,it was performed a single subject analysis in the firstcase and a multi-subject in the second case. We willcompare our results with the results of these experi-ments, but using single subject analysis in both cases.In the second experiment we opted to choose a subjectrandomly instead of using the data from all of them.

To test the interest and validity of the SOM algo-rithm in fMRI analysis of the above described data setsand in general, we developed a software package usingMatlab and the programming language C. For the twoexperiments the raw data was preprocessed using theSPM5 package. In both cases the steps were the same:

1. Spatial preprocessing with realignment of thefMRI images.

2. Coregistration between structural and functionaldata.

3. Normalization of the data onto a standardanatomical template.

4. Smoothing of the data using a Gaussian smooth-ing kernel of 8.

In both cases each time course was subtracted byits mean. In the fMRI images the signal correspondentto the BOLD response is very low compared to thestructural signal. This step of subtracting the mean tothe time-courses was done automatically by our appli-cation and has the objective of normalize the data toaccount only for its variance, and to avoid convergenceof the SOM algorithm based on the mean values of thetime courses. Other step taken before running the al-gorithm was the application of a threshold to both datasets, as a means of excluding from the analysis voxelsoutside of the brain structure.

After the steps taken above we applied our algo-rithm. The metric used to the determination of thedegree of similarity between the time courses was theEuclidean distance. The chosen neighborhood func-tion was a Gaussian neighborhood function as it wasdefined in Equation 6. A quadratic grid of 10 x 10was used throughout both experiments giving a totalof 100 nodes per map, each node representing a meanof about 40-50 TCs. The choice of 100 exemplar timecourses represented by each node, in general, seemsan ample enough size to classify 5 to 6 possible fMRIcluster types (activation, head motion, functional con-nectivity, and noise, among other possibilities). TheSOM map was initialized with random noise. At eachiteration of the algorithm all time courses of interestwere presented to the map. The training of the neuronmap was made in a two stage process. In the first stagethe number of iterations was set to 10, which representsthat the whole data was processed 10 times by the al-gorithm. The initial Gaussian smooth kernel was setto 6 and the initial learning rate to 0.05. In the secondphase, called calibration, the total number of iterationswas set to 100, the initial Gaussian smooth kernel to 3and the initial learning rate to 0.001. For the updat-ing of the training parameters (Gaussian smooth kerneland learning rate) was used a power series function.

In the case of the first experiment the results werenttotal satisfactory, and so a second analysis was made.In this second approach we selected a smaller ROI to beanalyzed by the algorithm. Only voxels from the slicesthat were known to contain the known activation area(auditory cortex) were presented to the SOM, ratherthan the whole brain. As we will see the definitionof a smaller ROI will represent much more conclusiveresults.

The largest data set analyzed was composed by 108frames and a ROI of 52779 voxels. In this case it tookabout 7 minutes to train the map on a virtual machinewith an equivalent processor of 2.2 GHz and 1024 MBof RAM.

The final results were analyzed with the support ofa homemade visual tool. By trial and error we triedto merge the map nodes into superclusters until we ob-tained areas of activation similar to those that were ex-pected from the experiments. Nodes that representedscattered voxels in brain were considered some kind ofnoise and were discarded as a contribution to an acti-vation of interest. Besides trying to identify previouslyknown regions of activation, found in the hypothesis

4


driven experiences, we also tried to explore the capa-bilities of the SOM in finding other zones of activationrelated to the experiment. Although we dont have theexpertise or the knowledge to interpret these results,this action has the objective to empathize the SOMcapability in discovering non-expected behaviors fromthe brain that maybe would be worth exploring.

ResultsFor each experiment a self organizing map of 10 x 10with 100 nodes was obtained. Like we already men-tioned, this map was explored using a visualization toolwith the objective of finding zones of activation similarto those found by hypothesis driven analysis. In thissubsection we will show our results and well make therespective comparison between the SOM approach andthe other inferential data analysis paradigms results.

Experiment 1: auditory fMRI data First we presentthe results achieved with SPM by applying a t-contrast,with a p = 0.05. Using an adequate design matrix theresults obtained are shown in Figure 1C and 2C. Analy-sis of the data was also made using our SOM algorithm.First we trained the map using the time courses fromall brain and found similar areas of activation in theauditory cortex (Figure 1B) by defining a superclustercomposed by 4 nodes (Figure 1A). Although auditorycortex regions were found by our algorithm, we can seethat other areas outside this cortex are also activated.It is possible that this other regions can have a rela-tion to the experiment and the auditory stimulus, sincethey are strongly correlated. Nevertheless one of ourobjectives was to demonstrate that our algorithm coulddeliver similar results to the hypothesis driven ones. Tosee if we could obtain better results, more like the onesdelivered from SPM, we trained another map. Thistime we only used a data set composed from the slicesof the brain we knew that contained the wanted areasof activation, the slices where the auditory cortex is lo-cated. Defining a supercluster of 2 nodes (Figure 2A)we achieved more satisfactory results as we can see inFigure 2A and by comparing with Figure 2B.

Fig. 1. (A) Supercluster formed by merging four nodes of a SOM

obtained with whole brain data training (auditory experiment).

The supercluster was formed interactivily by trying to find acti-

vation zones similar to the ones found with the GLM approach

in Figure 1C. (B) Zones of activation (SOM approach) of the

auditory experiment represented by the nodes of the superclus-

ter in Figure 1A. (C) Zones of activation obtained from auditory

experiment with GLM approach.

In a second phase of analysis of the trained mapwe tried to find other unknown homogenous zones ofactivation as a means of exploring unknown brain be-havior. This is a great example of how the SOM can beused we analyzing brain function. We found one otherinteresting zone (Figure 3B) represented by 5 nodes ofthe map (Figure 3A). Although we do not have suffi-cient knowledge to interpret this result, we know fora fact that the found homogeneous area represents acertain behavior of the brain during the experiment.The data driven approaches represent methods thatfind structure in the data, but they do not give us ameaning to the divisions made. As we will discuss later,the SOM is a good way to explore the function of thebrain when information of the experiment is not avail-able or when we have complex experiments difficult tomodel. Because of this the SOM, like other data drivenapproaches must be most of the times complementedwith model driven methods and the expertise of the re-searchers, so that can be given meaning to the divisionof the data.

Fig. 2. (A) Supercluster formed by merging two nodes of a SOM

obtained by training data from slices known to contain activa-

tions of interest (auditory experiment). The supercluster was

formed interactivily by trying to find activation zones similar to

the ones found with the GLM approach in Figure 2C. (B) Zones

of activation (SOM approach) of the auditory experiment repre-

sented by the nodes of the supercluster in Figure 2A. (C) Zones

of activation obtained from auditory experiment with GLM ap-

proach.

5


Fig. 3. (A) Supercluster formed by merging five nodes of a SOM

obtained with whole brain data training (auditory experiment).

The nodes were merged interactivily by searching for unexpected

homogeneous zones of activation. (B) Zones of activation (SOM

approach) of the auditory experiment represented by the nodes

of the supercluster in Figure 3A.

Experiment 2: visual fMRI data The results with aGLM approach using FSL are shown in Figure 4C. Theareas of activation are a result of a multi-subject anal-ysis and our analysis was done using data from onlyone subject chosen randomly. Knowing this, it is rea-sonable to assume that our results can not be a perfectmatch to those shown before. In Figure 4B we see theareas of activation found using our SOM analysis al-gorithm. By grouping 15 nodes into a supercluester(Figure 4A) we can see activations, in the zone of thevisual cortex, similar to the ones obtained using FSLand multi-subject analysis.

Also with the GLM approach there were in a sec-ond phase identified five functional ROI in the subjects(Figure 5B). These areas are identified in the figure bythree different color clusters. We tried to explore thecapabilities of the SOM in finding similar areas andsmaller clusters within the data set. For this, we triedto divide our 15 nodes supercluester in a set of threesmaller superclusters (Figure 5A). As we can see inFigure 5B, we found three homogeneous and symmet-ric areas of activation with this division. Because wedo not have the expertise and we are using a modelfree approach without any information about the ex-periment, we cannot give a meaningful interpretationto this division. Nevertheless we know that this foundROIs represent different behaviors, and from this wecan propose an hypothesis stating that these areas per-form different functions within the visual cortex. Alsoif we compare the three smaller clusters obtained, wecan see some similarities to the zones found by theGLM approach in Figure 5B.

Fig. 4. (A) Supercluster formed by merging fifteen nodes of

a SOM obtained with whole brain data training (visual exper-

iment). The supercluster was formed interactivily by trying to

find activation zones similar to the ones found with the GLM

approach in Figure 4C. (B) Zones of activation (SOM approach)

of the visual experiment represented by the nodes of the super-

cluster in Figure 4A. (C) Zones of activation obtained from visual


Fig. 5. (A) Division of the supercluster in Figure 4A into three

smaller superclusters (visual experiment). This division was

made interactivily by trying to find zones symetric and simi-

lar to the ones found with the GLM approach (Figure 5C). (B)Zones of activation (SOM approach) of the visual experiment

represented by the nodes of the three superclusters in Figure

5A. (C) Three distinct zones of activation obtained from visual


DiscussionThe self organizing map (SOM) approach was appliedto two experiments. These experiments were pre-viously analyzed by other investigating teams using

6


model driven approaches based on the general linearmodel (GLM). The GLM method requires knowledgeabout the experiment and involves the construction ofa model function that describes the experimental pro-tocol. Our SOM approach did not require any exter-nal reference. Model free methods offer a great alter-native to fMRI analysis by analyzing the data basedon signal alone without user bias. This is a very im-portant characteristic in cases where we have complexexperimental protocols or when we are dealing withunknown response functions that hardly can be modelcorrectly by the user (e.g. memory studies). Never-theless, model free approaches only find structure inthe data without giving this structure any meaning.The researchers expertise is needed to interpret the re-sults and these methods may have to be complementedwith inferential analysis as a means of associating thepartitioning of the data to the experimental protocol.In our analysis we did not have exactly the expertiseto interpret the structuring of the data made by theSOM algorithm. In our case, we used the portioning ofthe data delivered by the SOM and tried to isolate ho-mogeneous zones of activation in the brain that weresimilar to those found in other approaches and thatwere located in the known cortexes related to the ex-periment. The results were within our expectations.With our SOM algorithm we achieved similar resultsto those found with the model driven methods, findingzones of activation in the brain within the expectedcortexes (the auditory in the first experiment and thevisual in the second experiment).

More than to find similar zones of activation we alsotried to find homogeneous zones not represented in theresults delivered by the GLM approach and outside thecortexes supposedly involved in the experiments. Wedid this regarding the auditory experiment and found azone of interest outside the auditory cortex that couldbe somewhat related to the experimental protocol (Fig-ure 3B). Although this is not conclusive, this examplehas the objective of strengthening the capabilities ofthe SOM in finding unexpected responses of the brainand its capabilities as a research tool.

The SOM method also offers a good alternative toother data driven approaches. It does not have to dealwith the constraints of orthogonality and independencyof the data of the PCA and ICA approaches respec-tively and addresses some of the difficulties of otherclustering algorithms as we already mentioned previ-ously. K-Means clustering (KMC) for example wouldbe able to find differently sized and populated clusters.Unfortunately this property cannot be assumed in thecase of fMRI analysis, since the clusters in the dataare severely blurred and have high mutual proximity.This problem of the KMC is even worst when we nor-malize the data to make clustering more sensitive tothe dynamics of the brain rather than the time-coursesmean values. KMC can still do well when separat-ing noise from the signal of interest. However if weset the algorithm to find a small number of clustersit would miss small zones of activation. These zoneswould simply be grouped into a single larger cluster.

KMC by minimizing the sum of squared distances hasthe tendency to equalize the sizes of identified clusterswhich difficult the detection of small and larger clus-ters within the same data set. This can be solved bysetting a large number of initial clusters. Althoughthis makes possible to find smaller zones of activation,this zones will be represented by different independentclusters. A way to solve this is to merge these smallclusters that are supposed to belong together. In theother hand if we have a larger cluster representativeof a larger zone of activation and we want to divideit in smaller specialized zones the solution would beto partition it into smaller clusters. Both options aresupported by the SOM approach. This was done in ex-periment 2 by dividing the larger cluster (Figure 4A)into a set of 3 smaller clusters (Figure 5A). With thisoperation it was possible to find more specialized zoneswith different behaviors within the visual cortex. Thischaracteristic of the SOM also deals with the problemof the validity of the portioning of the KMC. By merg-ing nodes interactively it is possible to define whichnodes should belong together within the same clusterand which nodes do not contribute to activations ofinterest.

Regarding the SOM algorithm alone it is visiblethat the results depend on a number of factors andvariables:

• The method of initialization of the map.

• The number of iterations and the size of the map.

• The learning rate and the neighborhood widthvariables.

• The distance metric.

• Functions that define the updating of the vari-ables and the neighborhood function.

• The method to form superclusters.

All this factors strengthen the idea that running thealgorithm with alternative functions and values mightbe a good idea in order to find the best results. Thesevariables can all be chosen using common sense, by ex-perimenting or with the help of common practice ref-erences.

Automatisms can also be used to optimize the al-gorithm or to reduce user bias. For example a meansquared error (MSQE) [10] between time courses canbe calculated at each iteration of the algorithm to ac-cess about its convergence. With this mechanism it ispossible to stop the algorithm at an iteration where nofurther appreciable changes in the map occur. The cal-culation of supercluesters can also be made automati-cally with the support of methods like contiguity con-straint clustering [13] which merges neighboring nodeswith least mutual distances. Although the automaticformation of superclusters seems to be a good methodto reduce the user bias, we cannot underestimate thepower of an interactive method (our approach) basedon the researchers expertise and the visual capabilities

7


offered by the SOMs topographical mapping of high-dimensional data.

The SOM learning rate or neighborhood contrac-tion rate can also be optimized by finding which valuesattain the lowest total squared error [10] for example.This represents a great alternative when choosing thebest values to these variables.

As we also have observed the choosing of smallerROIs can help to improve the results returned by theSOM, as we eliminate form the training process thecontributions of less interesting time courses. Althoughnormally a 10x10 grid of 100 nodes seems to be suffi-cient to characterize different types of signals, a largergrid can sometimes be a better option to find evensmaller and specialized zones of activation.

Extracting extra properties from the maps deliv-ered by the SOM can also help to better characterizethe data. Gradient images that calculate the averageddistance between neighborhood nodes and frequencyplots that count the number of time-courses for eachnode can help in defining how many clusters distinctclusters exist in the data. Also calculating the averagespatial distance between the nodes in the map couldmake easier to detect which nodes in the feature spaceform clusters in the image pane.

Outlook: SOM proved to be a very flexible approachthat, as we have discussed, addresses the typical clus-tering problems while maintaining the advantages ofthis kind of approach. It also can be use as a methodof initialization to other algorithms as the fuzzy C-means clustering. The algorithm topological orderingwith the help of visualization techniques proved to bea great means to visualize complex data and to inves-tigate the overall dynamics of an experiment. On thedownside the SOM approach is dependent of many fac-tors, like initialization, parameters definition and oth-ers. Although this is true, we discussed a number ofmethods that can be used to introduce more automa-tisms into the algorithm and reduce the user bias.

We tried to illustrate how SOM can be used to an-alyze fMRI data, showing some of the results obtainedwith this approach. We cannot answer with our studythe question of statistical significance but we tried todiscuss and show why the SOM approach can be in-teresting in terms of fMRI analysis. With this idea inmind, we hope this paper encourages further researchin this matter with the achievement of promising re-sults, bringing us one step closer to understand moreclearly how our brain works.

References[1] Friedrich T. Sommer and Andrzej Wichert, edi-

tors. Exploratory analysis and data modeling infunctional neuroimaging. MIT Press, 2003.

[2] K.J. Friston. Human Brain Function, chapterAnalysing brain images: principles and overview,pages 25–42. Academic Press USA, 1997.

[3] Karl J. Friston. Modes or models: a critique onindependent component analysis for fmri. Trendsin Cognitive Sciences, 2:373–375, 1998.

[4] Ray L. Somorjai, MarkExploratory analysis Jar-masz, and data modeling in functional neuroimag-ing. Exploratory analysis of fmri data by fuzzyclustering: philosophy, strategy, tactics, imple-mentation. In Exploratory analysis and datamodeling in functional neuroimaging. MIT Press,2003.

[5] Gordon B. Scarth, M. McIntyre, B. Wowk, andRay L. Somorjai. Detection of novelty in func-tional images using fuzzy clustering. In Pro-ceedings of the Society of Magnetic Resonanceand the European Society For Magnetic ResonanceMedicine And Biology, Nice, France, August 19–25, 1995, 1995.

[6] Teuvo Kohonen. Self-Organizing Maps. Springer-Verlag, 1995.

[7] Teuvo Kohonen. The self-organizing map. Neuro-computing, 21:1–6, 1998.

[8] Erkki Hkkinen and Pasi Koikkalainen. Som basedvisualization in data analysis. In Lecture Notes inComputer Science, pages 601–606. Springer Berlin/ Heidelberg, 1997.

[9] F. Murtagh. Interpreting the kohonen self-organizing feature map using contiguity-constrained clustering. Pattern RecognitionLetters, 16:399–408, 1995.

[10] SJ Peltier, TA Polk, and DC Noll. Detecting low-frequency functional connectivity in fmri using aself-organizing map (som) algorithm. Hum BrainMapping, 20:220–226, 2003.

[11] Kai-Hsiang Chuang, Ming-Jang Chiu, Chung-Chih Lin, and Jyh-Horng Chen. Model free func-tional mri analysis using kohonen clustering neuralnetwork and fuzzy c-means. IEEE Transactionson Medical Imaging, 18:1117–1128, 1999.

[12] Catarina Saiote, Joana Silva, Carlos Gomes,Martin Lauterbach, Sofia Reimo, and PatriciaFigueiredo. Parametric fmri correlates of faces atmultiple orientations.

[13] H. Fischer and J. Hennig. Neural network-basedanalysis of mr time series. Magn Reson Med,41(1):124–131, January 1999.

8

fmri data analysis techniques and the self-organizing maps ... · fmri data analysis techniques and...

Documents