i. introduction - university of haifacs.haifa.ac.il/~manevitz/computationalcognitioncourse/… ·...

22
1 Identifying Human Memory Encoding Mechanisms from Physiological fMRI data via Machine Learning Techniques Asaf Gilboa 1 , Hananel Hazan 2 , Ester Koilis 2 , Larry M. Manevitz 2 , Tali Sharon 3* 1 The Rotman Research Institute, Canada 2 Computer Science Department, University of Haifa, Israel 3 Psychology Department, University of Haifa, Israel Abstract - Neuropsychological theories postulate that there are multiple memory systems in the brain but there is controversy as to whether declarative memory is a unitary memory system. In this study, we succeeded in classifying two distinct declarative memory acquisition mechanisms directly from physiological data by the use of machine learning techniques on functional MRI (fMRI) scans of subjects, thereby adding explicit physiological justification to the existence of multiple declarative memory systems. The data were gathered in previous experiments which were designed so that subjects acquired identical declarative information, but used different processes in doing so. The analysis was based on the multi- voxel pattern analysis of neural information obtained from fMRI signals. Support Vector Machines (SVM) type classifiers identified the memory patterns from complex, high dimensional and noisy fMRI activations evoked by participants while they acquired novel information in one of two methods: fast mapping encoding and explicit encoding enabling prediction of whether the subject succeeded in the recollection attempt for data acquired with each of two encoding methods. A further classifier succeeded in distinguishing the type of encoding used for novel knowledge acquisition - fast mapping or explicit encoding. Finally, applying a multivariate “searchlight” method assisted in construction of qualitative brain maps for both paradigms enabling identification of activation

Upload: ngohanh

Post on 31-Jan-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: I. Introduction - University of Haifacs.haifa.ac.il/~manevitz/computationalcognitioncourse/… · Web viewAbstract - Neuropsychological theories postulate that there are multiple

1

Identifying Human Memory Encoding Mechanisms from Physiological fMRI data via Machine Learning

Techniques

Asaf Gilboa1, Hananel Hazan2, Ester Koilis2, Larry M. Manevitz2, Tali Sharon3*

1 The Rotman Research Institute, Canada

2 Computer Science Department, University of Haifa, Israel

3 Psychology Department, University of Haifa, Israel

Abstract - Neuropsychological theories postulate that there are multiple memory systems in the brain but there is controversy as to whether declarative memory is a unitary memory system.  In this study, we succeeded in classifying two distinct declarative memory acquisition mechanisms directly from physiological data by the use of machine learning techniques on functional MRI (fMRI) scans of subjects, thereby adding explicit physiological justification to the existence of multiple declarative memory systems.

The data were gathered in previous experiments which were designed so that subjects acquired identical declarative information, but used different processes in doing so.  The analysis was based on the multi-voxel pattern analysis of neural information obtained from fMRI signals. Support Vector Machines (SVM) type classifiers identified the memory patterns from complex, high dimensional and noisy fMRI activations evoked by participants while they acquired novel information in one of two methods: fast mapping encoding and explicit encoding enabling prediction of whether the subject succeeded in the recollection attempt for data acquired with each of two encoding methods. A further classifier succeeded in distinguishing the type of encoding used for novel knowledge acquisition - fast mapping or explicit encoding. Finally, applying a multivariate “searchlight” method assisted in construction of qualitative brain maps for both paradigms enabling identification of activation patterns associated with each method and highlighting the physiological differences between them.

Keywords - Machine Learning, fMRI, SVM, Memory Encodings, Human, Word Learning

I. IntroductionThe present work uses machine learning techniques to demonstrate the uniqueness of Fast-Mapping (FM). FM is a neurocognitive mechanism enabling rapid acquisition of declarative novel information (arbitrary associations) independently of the hippocampus (Sharon, Moscovitch, & Gilboa, 2011). This mechanism is known to support vocabulary

acquisition in children as fast as after only a single exposure to the word-object association (Carey &Bartlett, 1978). The FM mechanism allows for a rapid mapping to be created between a word and its referent by the child based on logical hypothesis formation that probably relies on disjunctive syllogism. Despite the literature on the various aspects of FM in children as a word learning mechanism, little is known about the characteristics of this mechanism in adults, or

Page 2: I. Introduction - University of Haifacs.haifa.ac.il/~manevitz/computationalcognitioncourse/… · Web viewAbstract - Neuropsychological theories postulate that there are multiple

2

about its neural substrate. It could be that FM serves as a general learning mechanism, not solely dedicated for word learning and as such should be accessible to adults. Sharon et al. (Sharon,Moscovitch, & Gilboa, 2011) have recently demonstrated that adults with extensive damage to the Medial Temporal Lobe and the hippocampus were able to acquire novel declarative associations through FM despite a profound impairment in declarative learning through explicit encoding.

The goal of the present study is to investigate the neural basis of FM learning in adults, and the possible role of FM as a neurocognitive mediator for the acquisition of novel declarative semantic memories. The hypothesis is that FM declarative learning depends on cortical structures that are distinct from those essential for learning declarative associations through a matched explicit episodic encoding control paradigm (EE).

To address this question, the neuroanatomical correlates of FM and EE learning were explored using an event related functional Magnetic Imaging method (ER-fMRI) (Sharon, 2010). fMRI is a noninvasive technique for investigating the neural correlates of cognitive processes in which the hemodynamic response (i.e. the change in blood oxygenation level) related to neural activity in the brain is measured. The fMRI combines high spatial resolution anatomic imaging capabilities of conventional MRI with the hemodynamic specificity of nuclear tracer techniques (positron emission tomography), allowing spatially accurate mapping of human brain function to underlying anatomy. ER-fMRI is a more recently developed fMRI paradigm designed to measure regional responses to single sensory or cognitive events, in contrast to "blocked" designs in which activity was measured over blocks consisting of several trials.

The fMRI data were gathered during the information recollection task performed by participants. The task was designed so that successful acquisition of novel associations was based either on incidental fast mapping - FM or on explicit encoding – EE. In (Sharon, 2010) the data were analyzed using the tools of SPM5 (SPM,

2011) to identify the regions of interest appropriate to the task.

In this work the data collected during the previous experiments were used (Sharon, 2010). The following questions were asked in regard to the abilities of machine learning techniques used for analysis: 1) Is it possible to distinguish between “recollection success” and “recollection failure” conditions in EE-based tasks? 2) Is it possible to distinguish between “recollection success” and “recollection failure” conditions in FM-based tasks? 3) Can we predict which of the original mapping paradigms, FM or EE, were used by participant in “recollection success” condition? 4) Can we identify the brain activity areas associated with FM and EE mechanisms?

In this paper, we show that the answer to these questions is affirmative.

Interpreting brain image experiments requires analysis of complex, multivariate data. Methods used for the analysis depend on the specific research question. It may be retrieval or decoding stimuli, mental states, behaviors and other variables of interest from the raw data and thereby showing the data contain information about them – brain decoding (answering the question of “is there information about a variable of interest”). Questions 1-3 belong to this category of tasks. However, it is usually not enough, and the research question requires finding out how the information is mapped to the activity patterns in the particular brain regions – brain mapping (answering the question of “‘where the information resides inside the brain”). Question 4 falls to this category.

In addition, fMRI analysis methods can be categorized according to the number of variables included into the analysis. Univariate methods perform voxel-wise analysis, multivariate methods provide inference about larger parts or the entire brain simultaneously. Univariate methods are widely applied in the neuroscience domain. The standard method is Statistic Parameter Mapping (SPM), which is based upon the hypothesis of linear correlation between neuro-activities and tasks, and utilizes general linear model (GLM) to

Page 3: I. Introduction - University of Haifacs.haifa.ac.il/~manevitz/computationalcognitioncourse/… · Web viewAbstract - Neuropsychological theories postulate that there are multiple

3

do regression analysis (Friston, Holmes, Worsley,Poline, Frith, & Frackowiak, 1994). The motivation for using multivariate learning techniques in this work stems from the known limitations of GLM.

One of the limitations is related to the uni-variability of this method. Possible between-voxel interactions are not taken into consideration during the analysis thus weakening a general inferring strength of this method. Another significant disadvantage is the assumptions of a particular fMRI response model driving the regression - the voxels in GLM are rated by univariate analysis of the correlation between the real signal and the estimated Hemodynamic Response Function (HRF). There are recent and sophisticated HRF models trying to better capture the complex structure of the fMRI response (Zheng,Martindale, Johnston, Jones, Berwick, & J., 2002). Nonetheless, these parametric models still encode the ideal expected fMRI signal not considering confounds in the design protocol and not including the dependencies due both to the brain structure (e.g., proximity of a big vessel, location) and to the cognitive/ perceptual tasks under investigation. Actually, most cognitive fMRI research to date appears to be exclusively focused on estimating the magnitude of evoked activations and does not pay much attention to co-action of different areas or HRF variability. As revealed by a recent survey of 170 fMRI studies, 96% of experiments used a canonical HRF model, thus ignoring the difference in shape between individuals or areas of the brain (Grinbald, Wager, Lindquist, & Hirsch, 2008).

In general, is it feasible to use machine learning techniques for the prediction of complex cognitive states? There is no convincing answer to this question. However a growing number of studies (Cox & Savoy, 2003; Haxby, Gobbini, Furey,Ishai, Schouten, & Pietrini, 2001; Haynes & Rees,2005; Kamitani & Tong, 2005; Mitchell, et al.,2004; Mitchell, et al., 2008; Kriegeskorte, Goebel,& Bandettini, 2006) show that machine learning techniques can be used to extract new information from the neuroimaging data. Both brain decoding and brain mapping techniques are explored in

these works. The approach is usually multivariate, with different strategies used for variables subset selection. Thus, it was demonstrated (Shinkareva,Mason, Malave, Wang, Mitchell, & Just, 2008) that one may observe differences in neural activity using fMRI, as people think about different items, and train a machine learning classifier to discover the patterns of activity associated with these items. Moreover, it was shown (Mitchell T. , et al., 2008) that a machine learning classifier trained on fMRI data collected from a group of people could successfully distinguish which item a new person was thinking about, despite the fact that the classifier had never seen data from this person (although accuracies vary by person).

The recollection task discussed in this paper is much more complex as it involves additional cognitive dimensions, for example, decision making or response production.

Various machine learning classifiers can be used for decoding the different variables of interest. Classification is the analogue of regression when the variable being predicted is discrete, rather than continuous. Also, classifiers are used in the reverse direction, predicting parts of the design matrix from many input variables. A classifier is a function that takes the values of various features (independent variables or predictors, in regression) in an example (the set of independent variable values) and predicts the class that that example belongs to (the dependent variable). In neuroimaging, the features are voxels and the class is usually the type of stimulus the individual was looking at when the voxel fMRI signals were recorded. The trained classifier is essentially a model of the relationship between the features and the class label. Once trained, the classifier can be used to determine whether the features used contain information about the class of the example. Different types of classifiers exist, but in this work we will concentrate on the most basic form – a linear classifier, whether the classification function is defined as a linear combination of the features.

As there are usually much more voxels than data points in the fMRI data sets, often it is advisable to perform feature selection - a process reducing the

Page 4: I. Introduction - University of Haifacs.haifa.ac.il/~manevitz/computationalcognitioncourse/… · Web viewAbstract - Neuropsychological theories postulate that there are multiple

4

number of features by selecting the significant ones only. Reducing the ratio of features to data points decreases the chance of overfitting, as well as gets rid of the non-informative features to enable the classifier to focus on the informative ones. Moreover, this process also able to reduce feature redundancy which decreases noise on the input.

Both univariate and multivariate approaches for feature selection exist. In univariate selection, the features are ranked by a given criterion where each feature is scored individually, and features with best ranking are selected. In multivariate selection, new features are picked according to by how much impact they have on the classifier, given the features already selected on the previous step. Alternatively, in reverse, the initial set may include all the features to begin with, and the features are removed until the performance does not decrease.

II. Materials and Data GatheringThe full details of human experiments briefly described below are found in (Sharon, 2010).

ParticipantsTwenty five healthy volunteers participated in this study: thirteen participants who performed the FM paradigm and twelve who performed the EE paradigm. Of these participants, 15 were males and 10 females, their mean age 26.64 (SD=3.41). No significant difference was found neither in the age of the participants in the FM (M=25.38, SD=2.36) and EE paradigms (M=28.09, SD=3.93)

[t(23) =2.03, ns], nor in the gender distribution [(1) =0.03, ns].

Experimental ParadigmDuring the experiment, participants were given either FM or EE series of tasks. In each paradigm, novel and familiar target trials (either FM or EE trials) were intermixed with base line trials. Each trial, whether an FM, EE or base line trial was composed of the following steps: at first a question/statement was presented both visually and auditory for 3 seconds. Next, the relevant

pictures and a reminder for the manner of response appeared for an additional 2 seconds. Next, the participants were given 1.5 seconds in order to respond while the pictures and the reminder were still presented on screen. Next, the subjects received relevant feedback for their response; the feedback was presented on the screen for 0.5 seconds. Finally, a red fixation cross was presented for either 4 seconds in half of the events or 6 seconds in the other half. Thus, each event lasted either 9 or 13 s, a mean of 11 s per event.

In the FM trials, the stimuli were two pictures of a novel and a familiar animal/fruit/vegetable/flower and the question presented was a perceptual question regarding one of these pictures, the target picture (for example, “Does the chayote has leaves?”). The participants were instructed to press either the right button on the response box in order to answer 'yes' and the left button if their answer was 'no'. No mention was made about a later memory test.

Figure 1 FM stimuli example.

In the EE trials, one picture, either novel or familiar, was presented alongside a scrambled picture and participants were explicitly instructed to remember the item for a later test (for example, “Try to remember lornec”). The participants were also requested to look for an x under either the picture of the scrambled picture and, as in the FM paradigm, to press the right or left response buttons on the response box in order to answer the question.

Finally, in the base line trials, the participants were presented with two scrambled pictures (the original pictures from the FM paradigms were scrambled) and were asked "Is the picture on the right brighter?" Again, participants were to answer

Page 5: I. Introduction - University of Haifacs.haifa.ac.il/~manevitz/computationalcognitioncourse/… · Web viewAbstract - Neuropsychological theories postulate that there are multiple

5

using the response box similarly to the FM and EE trials.

Figure 2 EE stimuli example.

Each experiment, either FM or EE, was transmitted in 3 runs. The first two runs included 40 events and lasted 8 minutes and 2 seconds each. The last run included 44 events and lasted 8 minutes and 52 seconds. The events were organized in 5 sequences (E-Prime "lists") of 8 events in the first two runs and an additional sequence of 4 events in the third run. The events were pseudo randomly assigned such that each sequence of 8 events contained 4 novel FM/EE target trials, 2 familiar FM/EE target trials and 2 base line trials. Every run began with 12 seconds of a presentation of either a reminder of the instructions (on the first run) or a blank screen (the second and third runs). The images acquired during these 12 seconds were intended to allow global image intensity to reach equilibrium, and they were later excluded from data analysis. Between every 8 events a blank screen appeared for duration of 6 seconds. Memory was tested outside the magnet using a 4-alternative forced choice recognition in which the label appeared in the center and four pictures around them. Participants had to select the correct picture to go with the label.

fMRI ProcedureImaging was performed on a GE 3T Signa HDx MR system with an 8-channel head coil located at the Whol Institute for Advanced Imaging in Tel Aviv Sourasky Medical center. The scanning session included T1-weighted anatomical 3D sequence spoiled gradient (SPGR) echo sequences (TR=9.14 ms, TE=3.6 ms, flip angle =13º) obtained with high-resolution 1-mm slice thickness and no interslice skip and a 256x256

matrix. .In addition T2*-weighted functional axial images (TR=2000 ms, TE=40 ms, flip angle =90º) were acquired from the bottom of the cerebrum to the top in 32 contiguous slices aligned parallel to the AC–PC plane, of 5 mm thickness with no interslice skip, a field of view of 20 cm and a 64x64 acquisition matrix. The functional images covered the whole cerebrum and yielded 3x3x5 mm voxels. The images were acquired in 3 runs. In the first 2 runs, 241 images were acquired during each run (7712 slices per run). In the third run, 266 images were acquired (8512 slices). At the beginning of each run, six images were acquired to allow global image intensity to reach equilibrium; these were later excluded from data analysis.

fMRI Data ProcessingData were preprocessed using SPM5 (SPM, 2011). The functional images were corrected for differences in slice acquisition timing by resampling all slices in time to match the middle slice. This was followed by a realignment of the time series of images to the first image of the run performed after acquisition of the anatomical image (for most subjects this was the third run). The data were then spatially normalized to MNI space and smoothed with a 5-mm FWHM of the Gaussian smoothing kernel.

Each data point used for analysis was constructed using scan data obtained for TR=1 (related to the stimuli exposition and the reminder) composing a vector of 517845 features. The selection of TR=1 was motivated by the pre-test classification results obtained for TR=0..4 and revealed the best classification accuracy. Each feature vector was detrended (session number SN = 3) normalized independently of others before analysis procedure. Two different labels were assigned to each data point, specifying the status of recollection action – “recollection success” in the case of a post-scan correct answer, or “recollection failure” in the case of a wrong answer, and in addition the paradigm this data point belongs to – “FM” or “EE”.

Page 6: I. Introduction - University of Haifacs.haifa.ac.il/~manevitz/computationalcognitioncourse/… · Web viewAbstract - Neuropsychological theories postulate that there are multiple

6

III. MethodsBasic contrasts defined in Questions 1-3 belong to the Brain Decoding domain, asking whether the classification information exists in a given dataset. Traditional univariate analysis of fMRI data does not provide the direct answer to these questions. The indirect classification information is usually fetched from the statistical analysis of data contained in the time course of the individual voxels (Friston, Holmes, Worsley, Poline, Frith, &Frackowiak, 1994). A multivariate analysis used in this study takes the advantage of knowledge contained in the activity patterns across the entire brain volume, from the multiple voxels (Formisano, De Martino, & Valente, 2008). A trained classifier takes the values of various voxels (features) in a data sample and predicts the class that this sample belongs to. A class of the sample is selected from the set of different stimuli defined for a given contrast. In our case we are interested in two types of contrasts with the following classes defined for each one of them: 1) “recollection success” and ”recollection failure” for Explicit Encoding (EE) tasks; 2) “recollection success” and ”recollection failure” for Fast Mapping (FM) tasks; and 3) “FM” and ”EE” for inter-paradigm classification. Each set contains 2 class labels only; therefore the classification of this kind is called two-class classification. A linear Support Vector Machine (Vapnik, 1999) was used as an underlying classifier for all study experiments. The classes in the data acquisition were selected to have equivalent frequency.

Classification results were evaluated using 3-fold cross-validation for within-subject experiments (according to the number of the collected sessions) and leave-one-out cross-validation for cross-subject experiments. In all analyses, the accuracy of prediction was based only on test data that was completely disjoint from the training data.

Considering the high data dimensionality used in the current study, feature selection procedure was performed in order to decide which voxels should be included into the multivariate classification analysis. Feature selection process was performed three separate times based on different scoring

methods ranking features by the individual voxel performance under each of the corresponding scoring methods. In each case, all voxels were sorted according to the assigned score in the descending order. The 1000 voxels having the highest ranking scores were included in the analysis.

The following feature selection methods were explored: (i) Activity - selects the voxels that are active in at least one condition relative to a control baseline, (ii) Accuracy - scores a voxel by how accurately an SVM classifier can predict the condition of each example in the training set, based only on this voxel, and (iii) SVM-RFE – a multivariate eliminating approach to the feature selection process (Guyon, Weston, Barnhill, &Vapnik, 2002) starting from a complete feature set and then eliminating 15% of the tail-ranked features (the rank is based on a feature weight obtained in the multivariate SVM classification) during each execution round, until the number of features is reduced to 1000. In all cases, the cross-validation success averaged rate was used as a voxel score.

Figure 3. A classification scheme used in the experiments. A feature selection process is followed by

the classifier training.

To evaluate the statistical significance of the observed classification accuracy, classification results were compared to those obtained by using random selection of data classes.

The prediction accuracy was evaluated for both within-subject (Questions 1 and 2 only) and cross-subject cases, using different spatial and temporal aspects of the input data.

Page 7: I. Introduction - University of Haifacs.haifa.ac.il/~manevitz/computationalcognitioncourse/… · Web viewAbstract - Neuropsychological theories postulate that there are multiple

7

For the within-subject case, the accuracy value was produced for each participant individually, and then the average accuracy was calculated. For cross-subject case, the accuracy was produced on a dataset combined of the individual participant’s datasets, using the leave-one-out cross validation method. The accuracy was calculated as an average over all cross-validation folds.

Unlike the contrasts classification (Questions 1-3), discovering the brain areas associated with each paradigm - FM or EE (Question 4)) - requires constructing brain maps. In machine learning terms, brain mapping is a process of highlighting voxels contributing most strongly and reliably to the classifier’s success. It may be achieved by determining which voxels are being selected by a classifier and also how their classification weight affects the classifier prediction. The major issue with this straightforward approach is that a group of voxels appearing in a conjunction of all cross-validation fold sets is relatively small (as a result of the initial information redundancy) and cannot be used as a completely reliable source for brain mapping. Information-based functional brain mapping method (Kriegeskorte, Goebel, &Bandettini, 2006) overcomes this limitation. The main idea of this method is to train classifiers on many small voxel sets which, put together, cover the entire brain. For example, we may train a distinct classifier for each voxel, using only the voxels spatially adjacent to it. Then the search area may be enlarged to include every voxel neighborhood in succession (This technique is often referred to in the fMRI machine learning community as training ‘searchlight classifiers’).

SVM (Support Vector Machines) was used as the underlying classifier in this study. The resulting brain map reconstructed the accuracies of a classifier trained on the spherical neighborhoods of a radius r=4mm. Voxels with a statistically significant accuracy were inserted into the brain map (in these brain maps the highlighting color strength reflects the accuracy rate). Both within-subject and cross-subject maps were produced. The disjunction of within-subject maps was constructed enabling even stronger highlighting of

the relevant patterns obtained during the cross-subject analysis.

The software used for these experiments was developed on Python programming language and based on pyMVPA library (Hanke, Sederberg,Hanson, Haxby, & Pollmann, 2009).

IV. ResultsThe empirical analysis aims to assess the ability of chosen classification model to predict the classification targets in a non-random manner. The encouraging results show that the model was able to predict the required targets in all 3 contrasts (Questions 1-3) although with different levels of the accuracy.

Experiment 1. Recollection Status and Memory Paradigm Prediction.

Contrast 1. EE task – recollection status.

The classification results for EE paradigm are presented in Table 1. The results are significant statistically. Random choice will give a level of 0.5, and all results are significant statistically found more than 2 SD above this value. The best classification results are obtained by using the multivariate SVM-RFE feature selection method - a mean value of 78% for correct class predictions in within-subject analysis, and a mean value of 73% for correct class predictions in CV cross-subject analysis.

Ana

lysi

s T

ype

Ran

king

M

etri

c

Pred

ictio

n A

ccur

acy

SD

With

in-S

ubje

ct Accuracy 0.66354255 0.04433817

Activity 0.67992270 0.04095269

SVM-RFE 0.7778667 0.0237164

Cro

ss-S

ubje

ct

Accuracy 0.60715518 0.04960175

Activity 0.60125534 0.04527102

SVM-RFE 0.7322059 0.0619211

Table 1 Experiment 1, Contrast 1. Classification accuracy for EE paradigm.

Page 8: I. Introduction - University of Haifacs.haifa.ac.il/~manevitz/computationalcognitioncourse/… · Web viewAbstract - Neuropsychological theories postulate that there are multiple

8

Contrast 2. FM task – recollection status.

The experiment environment was identical to that of Contrast 1 except for the data set source. The data set for this experiment was collected from the fMRI of participants performing the FM task. In similar to Contrast 1, the trained model was able to predict the recollection status under the Fast Mapping (FM) paradigm.

Ana

lysi

s T

ype

Ran

king

M

etri

c

Pred

ictio

n A

ccur

acy

SD

With

in-S

ubje

ct Accuracy 0.73157783 0.05037422

Activity 0.71005632 0.03937601

SVM-RFE 0.80722163 0.03902201

Cro

ss-S

ubje

ct

Accuracy 0.66204481 0.06090074

Activity 0.65448720 0.03683927

SVM-RFE 0.76072566 0.03072572

Table 2. Experiment 1, Contrast 2. Classification accuracy for FM paradigm .

Classification results for FM are presented in Table 2. They are slightly higher than those obtained for EE experimental data.

Contrast 3. FM vs. EE – paradigm prediction.

The FM vs. EE classification experiment is the most intriguing because of its ability to point to the essential difference between these two memory mapping paradigms.

The analysis was based on two-class classification, with target class labels “FM” and “EE”. Only successful trials for both mapping types (the “recollection success” data points) were taken into account. The classification results showed that the difference between FM and EE indeed exists and can be detected at the 88% level (Table 3).

Ranking Metric

Prediction Accuracy

SD

Accuracy 0.80210345 0.03642382

Activity 0.60213476 0.03240215

SVM-RFE 0.88794379 0.05643578

Table 3. Experiment 1, Contrast 3. Classification accuracy for FM vs. EE paradigms.

Using the above results we were able to move to the construction of brain maps.

Experiment 2. Brain Activation MappingThe results presented below obtained with a “searchlight” algorithm show that for both the individual maps and the cross-subject maps, the FM is characterized by the activity in a temporal pole area and the backline parts of the cerebellum, while the EE is more associated with the medial temporal regions. One may expect that there is no perfect match between different participants’ activation areas and the exact point of the information storage and retrieval has a sufficient interpersonal variation. This is why the disjunction we present of the within-subject maps was produced. In this map, the areas associated with two different declarative memory paradigms are clearly visible (see Figure 4 and Figure 5). Again, the large spots of the activity in the temporal pole area and the backline parts of the cerebellum are associated with Fast Mapping. We see from these maps that the extensive medial temporal regions are associated with EE while more moderate maps are produced for FM. From the other side, FM is characterized by activations of the unique anterior temporal lobe and polar areas not produced for EE. Note that the hippocampus shows up less in this paradigm as compared to the Explicit Encoding paradigm.

Page 9: I. Introduction - University of Haifacs.haifa.ac.il/~manevitz/computationalcognitioncourse/… · Web viewAbstract - Neuropsychological theories postulate that there are multiple

9

Figure 4 Experiment 2. A disjunction of the participants’ brain maps for Contrast 1 (Explicit Encoding). Axial view with 4 mm interslice spacing. Active areas are shown in yellow.

Figure 5 Experiment 2. A disjunction of the participants’ brain maps for Contrast 2 (Fast Mapping). Axial view with 4 mm interslice spacing. Active areas are shown in yellow.

Page 10: I. Introduction - University of Haifacs.haifa.ac.il/~manevitz/computationalcognitioncourse/… · Web viewAbstract - Neuropsychological theories postulate that there are multiple

10

Experiment 3. Spatial Analysis – Hippocampus versus Temporal PoleGiven the brain maps, we were able to evaluate a contribution of the individual areas highlighted in the maps to the classification accuracy. We were especially interested in the hippocampus and the temporal pole areas found in the maps and known from the previous studies (Sharon, 2010) as differentiating between the paradigms. For this purpose, the classification procedure for within-subject and cross-subject data was repeated for various brain cuts including: (i) the entire brain (All), (ii) the hippocampus only (Hippocampus Only), (iii) the temporal pole only (Temporal Pole Only), (iv) the entire brain with a hippocampus excluded from the analysis (All w/o Hippocampus), (v) the entire brain with a temporal pole excluded from the analysis (All w/o Temporal Pole), and (vi) the putamen (Putamen Only) – a control area with a size comparable to the size of the hippocampus and mostly not associated with any of two paradigms. This area was used for evaluation of random prediction accuracy.

The classification results are shown in the tables below. They depict the brain cut classification accuracy for within-subject and cross-subject analysis methods. For within-subject method, a mean value of subjects’ classification accuracy is reported, with a standard deviation shown in the braces. For cross-subject method, a mean value of one-leave-out cross-validation is reported, with a standard deviation between different folds shown in braces.

Contrast 1. Explicit Encoding task (EE) – recollection status for different brain cuts.

For Contrast 1, the classification accuracy is significantly higher than a baseline random level (0.5) for all tested brain cuts, except for the randomly selected control area (the putamen).

Contrast 2. Fast Mapping task (FM) – recollection status for different brain cuts.

For Contrast 2, the classification accuracy is significantly higher than a baseline random level (0.5) for all tested brain cuts, except for the randomly selected putamen area.

Brain Cut

μ (SD)

Within-Subj. Cross-Subj.

All0.77786665

(0.02371642)0.73220590

(0.06192115)

Hippocampus Only (BA36)

0.73320062 (0.04249911)

0.696540652 (0.04492871)

Temporal Pole Only (BA38,21)

0.70080948 (0.02278488)

0.66303595 (0.06915410)

All w/o Hippocampus

0.77695866 (0.02371641)

0.73531368 (0.04931522)

All w/o Temporal Pole

0.77746900 (0.02393828)

0.73424218 (0.05413447)

Putamen Only0.57933221

(0.04887691)0.59243121

(0.06249911)

Table 4 Experiment 3. Classification accuracy for Contrast 1 (Explicit Encoding paradigm).

Brain Cut

μ (SD)

Within-Subj. Cross-Subj.

All0.80722163

(0.03902227)0.76072566

(0.03072455)

Hippocampus Only (BA36)

0.72348420 (0.04597861)

0.68615242 (0.06431254)

Temporal Pole Only (BA38,21)

0.75566112 (0.04737763)

0.71372193 (0.05971212)

All w/o Hippocampus

0.80723226 (0.03902227)

0.76525352 (0.04215641)

All w/o Temporal Pole

0.80705919 (0.03902227)

0.76017063 (0.05447689)

Putamen Only0.56666919

(0.05971527)0.55744710

(0.05214834)

Table 5 Experiment 3. Classification accuracy for Contrast 2 (Fast Mapping paradigm) .

Again, the classification accuracy obtained for FM paradigm is slightly higher than for EE paradigm.

Page 11: I. Introduction - University of Haifacs.haifa.ac.il/~manevitz/computationalcognitioncourse/… · Web viewAbstract - Neuropsychological theories postulate that there are multiple

11

Classification results for Contrast 1 and Contrast 2 look a little bit controversial. Removing regions seems to contribute very little to classification success. This behavior may be explained by experiment participant composition assembled from healthy people only. Unlike with the real patients, healthy participants have all the available temporal structures in place during the brain encoding, and so the information in the rest of the brain reflects that fact and enables robust classification. On the other hand, looking at the classification success of each structure alone compared with the whole brain reveals the real relations between different brain cuts.

Because of the sufficient differences in the whole-brain classification accuracy between FM and EE (the classification is more accurate for FM than for EE), we compare the percentages. The two conditions, FM and EE, and the two structures, the hippocampus (H) and the temporal pole (TP) show a reverse pattern (Figure 6) which is the same as observed in patients (Sharon, 2010).

Figure 6. Experiment 3. Reduction in SVM-RFE prediction accuracy caused by Hippocampus area removal compared to the reduction in SVM-RFE prediction accuracy caused by Temporal Pole area removal. A reverse pattern of prediction accuracy reduction is observed for FM and EE tasks.

For EE, using H cut only reduces the classification success by 5.7% and using TP cut only there is a 9.9% reduction in the classification success. The reverse pattern is seen in FM (10.3% and 6.3% respectively). If one looks at the residual classification over random level (50%) then the

figures of the reduction in classification success are even more pronounced (for EE, 16% for H and 27.7% for TP; for FM, 27.2% for H and 16.7% for TP). All results are statistically significant. They lead to the conclusion that the hippocampus cut produces better classification results for EE than for FM; from the other side, the temporal pole cut produces better classification results for FM than for EE.

V. Discussion A basic question being addressed in the current study is whether the registered fMRI signal carries information about the particular patterns of knowledge acquisition and retrieval. In other words, it was concerned with pattern discrimination. It appears that although both FM and EE lead to the acquisition of declarative memory as reflected in the post-scan recognition performance, they do so by recruiting very distinct neuronal networks that can be efficiently distinguished using SVM.

In the first phase of the study, three different feature selection methods were used at the preprocessing stage of the classification process. The univariate methods selecting the individual voxels according to some predefined rank enabled to classify data points according to the given contrast, however, with relatively low prediction accuracy (up to 70%). Using an SVM-RFE, a multivariate feature selection method, based on pruning the features associated with the low absolute weight values produced by Support Vector Machine during the classification process, enabled to increase the accuracy of prediction by 10% in average. Thus, the study shows that using the multivariate methods for feature selection and classification purposes brings dramatic increase to the classification performance. Unfortunately no production SPM-level software exists implementing these methods leading to the almost complete ignorance of them by the wide neuroscientific audience.

For the next stage, we leveraged these results to try to address the question as to where the

Page 12: I. Introduction - University of Haifacs.haifa.ac.il/~manevitz/computationalcognitioncourse/… · Web viewAbstract - Neuropsychological theories postulate that there are multiple

12

discriminative patterns reside in the brain - pattern mapping. It was important to clarify which memory structures are involved in information retrieval for both fast mapping and explicit encoding designs. In the second part of the study, we were interested in finding the brain regions correlated with the formation of memory through EE and especially through FM paradigms. Our hypothesis was that underlying the FM learning would be a network of brain regions distinct from the network known to mediate the EE (the episodic memory). Indeed, according to the brain maps constructed using multivariate “searchlight” method, this network included amongst others regions positioned more lateral in the temporal neocortex, and specifically in the anterior temporal lobe and polar area, as opposed to medial temporal regions critical for episodic memory.

In conclusion, the results above were obtained using mainly multivariate machine learning

techniques proven to be more accurate and informative than basic univariate methods. Using these advanced techniques, we showed that Fast Mapping engages distinctly different regions than those activated by an explicit encoding tasks presumably relying on episodic memory encoding. In both cases, medial and lateral pre frontal activations along with superior and medial posterior parietal regions were found, however distinct areas within these regions were active for FM and EE tasks.

Another point concerns investigating the role of a hippocampus and the surrounding medial-temporal cortices for relational memory functioning. This is a point of discussion in the neuropsychological community. We hope and expect that current study based on empirical fMRI data and advanced machine learning techniques contributed to the discussion.

References

1. Carey, S., & Bartlett, E. (1978). Acquiring a single new word. Papers and Reports on Child Language Development , 15, 17-29.

2. Cox, D. D., & Savoy, R. L. (2003). Functional magnetic resonance imaging (fMRI) ‘brain reading’: detecting and classifying distributed patterns of fMRI activity in human visual cortex. Neuroimage , 19, 261–270.

3. Formisano, E., De Martino, F., & Valente, G. (2008). Multivariate analysis of fMRI time series: classification and regression of brain responses using machine learning. Magnetic Resonance Imaging , 28, 921–934.

4. Friston K. J., H. A., Worsley, K. J., Poline J. P., F. C., & S., F. R. (1994). Statistical parametric maps in functional imaging: A general linear approach. Human Brain Mapping , 2 (4), 189-210.

5. Grinbald, J., Wager, T., Lindquist, M. F., & Hirsch, J. (2008). Detection of time-varying signals in event-related fMRI designs. Neuroimage , 43, 509-520.

6. Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning , 46 (1-3), 389 - 422.

7. Hanke, M. H., Sederberg, P. B., Hanson, S. J., Haxby, J. V., & Pollmann, S. (2009). PyMVPA: A Python toolbox for multivariate pattern analysis of fMRI data. Neuroinformatics , 7, 37-53.

8. Haxby, J., Gobbini, M., Furey, M., Ishai, A., Schouten, J., & Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science , 293, 2425–2430.

Page 13: I. Introduction - University of Haifacs.haifa.ac.il/~manevitz/computationalcognitioncourse/… · Web viewAbstract - Neuropsychological theories postulate that there are multiple

13

9. Haynes, J. D., & Rees, G. (2005). Predicting the orientation of invisible stimuli from activity in primary visua cortex. Nature Neuroscience , 8, 686–691.

10. Haynes, J., & Rees, G. (2005). Predicting the stream of consciousness from activity in human visual cortex. Current Biology , 16, 1301–1307.

11. Kamitani, Y., & Tong, F. (2005). Decoding the visual and subjective contents of the human brain. Nature Neuroscience , 8, 679–685.

12. Kriegeskorte, N., Goebel, R., & Bandettini, P. (2006). Information-based functional brain mapping. In Proceedings of National Academy of Science USA (Vol. 103, pp. 3863–3868).

13. Mitchell, T., Hutchinson, R., Niculescu, R., Pereira, F., Wang, X., Just, M., et al. (2004). Learning to decode cognitive states from brain images. Machine Learning , 57, 145–175.

14. Mitchell, T., Shinkareva, S., Carlson, A., Chang, K., Malave, V., Mason, R., et al. (2008). Predicting human brain activity associated with the meanings of nouns. Science , 320, 1191–1195.

15. Mitchell, T., Shinkareva, S., Carlson, A., Chang, K., Malave, V., Mason, R., et al. (2008). Predicting Human Brain Activity Associated with the Meanings of Nouns. Science , 320, 1191.

16. Sharon, T. (2010). Bypassing the hippocampus. Rapid neocortical acquisition of long term arbitrary associations via Fast Mapping. A PhD Thesis, Department of Psychology, University of Haifa.

17. Sharon, T., Moscovitch, M., & Gilboa, A. (2011). Rapid neocortical acquisition of long-term arbitrary associations independent of the hippocampus. PNAS .

18. Shinkareva, S., Mason, R., Malave, V., Wang, W., Mitchell, T., & Just, M. (2008). Using fMRI Brain Activation to Identify Cognitive States Associated with Perception of Tools and Dwellings. PLoS ONE , 3 (1).

19. SPM. (2011). Wellcome Department of Cognitive Neurology, London, UK. Retrieved from SPM site: www.fil.ion.ucl.ac.uk/spm

20. Zheng, Y., Martindale, J., Johnston, D., Jones, M., Berwick, J., & J., M. (2002). A Model of the Hemodynamic Response and Oxygen Delivery to Brain. NeuroImage , 16 (3), 617–637.