multivariate models and machine learning for fmri - tnu · multivariate models and machine learning...

Multivariate models and machine learning for fMRI

Methods and Models in fMRI, 15.11.2016

Jakob Heinzleheinzle@biomed.ee.ethz.chTranslational Neuromodeling Unit (TNU) Institute for Biomedical Engineering (IBT)University and ETH Zürich

Many thanks to Sudhir Raman and Kay Brodersenfor material

Translational Neuromodeling Unit

Overview

fMRI Analysis and Classifcation 2

Motivation

Learning from data

Multivariate Bayes in SPM

Generative Embedding

Modelling Terminology

Why multivariate?

Univariate approaches are excellent for localizing activations in individual voxels.

v1 v2 v1 v2

reward no reward

Why multivariate?

Multivariate approaches can be used to examine responses that are jointly encoded in multiple voxels.

v1 v2 v1 v2

orange juice apple juice v1

A bit of history – Multidymensional scaling

Edelman et al, Psychobiology, 1998

Psychophysical rating fMRI

Two-dimensional projection of similarity measure for bothpsychophysical rating and fMRI response.

A bit of history – Classification Studies

Haxby et al, Science, 2001

A bit of history – Classification Studies

Kamitani and Tong, Nat Neurosci, 2005

Representational similarity analysis

Idea: Compare the similarity of representations (correlation betweenactivation patterns) between different stimuli. Allows for a comparison between monkey(neural firing pattern) and human (fMRI activation patterns).

Kriegeskorte et al, Neuron, 2008

Overview

Motivation

Learning from data

Modelling Terminology

Analysis steps

Feature Extraction

Modelling

Classification

Clustering

Regression

Prediction

Model Selection

Cross validation

Performance

Inference

Feature space

F1 F2 . . . FP

S1 1 0.5

S2 0 5.7

. 1 5.3SN 1 6.6

• Discrete• Continuous

Data Points

Features

Feature selection for fMRImultivariate analysis

Different features answer different questions.Reducing the dimensionality might reduce noise,but could also reduce relevant information.

Model parametersMean valuesRaw data

Model Parameters,e.g. DCM

Correlationsbetweenregions

Model selection - Generalizability

Model Fit

Model Complexity

Bishop (2006), Pitt & Miyung (2002), TICS

Encoding and decoding models

context (cause or consequence)𝑋𝑋𝑡𝑡 ∈ ℝ𝑑𝑑

BOLD signal𝑌𝑌𝑡𝑡 ∈ ℝ𝑣𝑣

conditionstimulus

responseprediction error

encoding model

decoding model

𝑔𝑔:𝑋𝑋𝑡𝑡 → 𝑌𝑌𝑡𝑡

ℎ:𝑌𝑌𝑡𝑡 → 𝑋𝑋𝑡𝑡

Modelling goals

• Prediction

Predictive Density

Modelling goals

• Model Selection

Sparse Coding Distributed Coding

Model Evidence

Overview

Motivation

Learning From Data

Modelling Concepts

Learning from data

Reinforcement Learning

Semi-supervised Learning

SupervisedLearning

Unsupervised Learning

Labels for trainingdata are known!

Labels for trainingdata are NOT known!

Supervised learning

Function - f

Independent variablesX

dependent variableY CategoricalContinuous

Classification

Support Vector Machines

• Kernel Function – K 𝒙𝒙𝒊𝒊,𝒙𝒙𝒋𝒋 = 𝝓𝝓 𝒙𝒙𝒊𝒊 .𝝓𝝓 𝒙𝒙𝒋𝒋

𝝓𝝓

Function - fX Y

Kernel Methods

Kernel methods for pattern analysis, Taylor , Cristianini, 2004

Other popular classifiers• Gaussian Processes

• Deep Belief networks

G.E. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief nets”, Neural Computation, vol 18, 2006

http://deeplearning.net/tutorial/DBN.html

C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006,

Generative and Discriminative classifiers

• Generative classifiers• Learn the parameters for the functions p(Y) and

p(X|Y), e.g. Naïve Bayes Classifier• Discriminative classifiers

• Learn the parameters for p(Y|X), e.g. logistic regression, SVM

Cross-validation

The generalization ability of a classifier can be estimated using a resampling procedure known as cross-validation. One example is 2-fold cross-validation:

examples123

?training exampletest examples

folds??

...performance evaluation

• Model Selection• Performance evaluation

• Balanced Accuracy• F1 Score

Cross-validation

Another commonly used variant is leave-one-out cross-validation.

examples123

?training exampletest example

?...98

folds?1

...performance evaluation

In fMRI often leave one-run-out

Performance – Single Subject

𝑝𝑝 = 𝑃𝑃 𝑋𝑋 ≥ 𝑘𝑘 𝐻𝐻0 = 1 − 𝐵𝐵 𝑘𝑘|𝑛𝑛,𝜋𝜋0

Brodersen et al. 2013, NeuroImage

Binomial Test

!!! Cross-validated data are not necessarilybinomially distributed Permutation tests are better!!!

Performance – Mulitple subjects

Brodersen et al. 2013, NeuroImage

Fixed effects

Random effects

http://www.translationalneuromodeling.org/tapas/

Confounds – GLM vs. MVPA

Todd et al. 2013, NeuroImage

Second level t-tests for accuracies?

True β-Values are normallydistributed.

True accuracies are not normal and truncated at chance.

A possible solution is givenby Allefeld et al.

Allefeld et al. Neuroimage, 2016

Statistical testing with classification

• Within subjects:– Permutation statistics

– Parametric tests ar not valid (assumptions not met), e.g. Biomial-or t-test (c.f. Schreiber and Krekelberg, 2013).

• Across subjects:– Assumptions for t-tests are not met

– Full Bayesian model (Bordersen et al. 2013, but assumptions arenot met for CV)

– Use prevalence statistic proposed in Allefeld et al., 2016

Research questions for classification

Temporal evolution of discriminability Model-based classificationaccuracy

within-trial time

Accuracy rises above chance

Participant indicates decision

Overall classification accuracy Spatial deployment of discriminative regions

accuracy

classification task

Truthor

Left or right

button?

Healthy or ill?

Pereira et al. (2009) NeuroImage, Brodersen et al. (2009) The New Collection

{ group 1, group 2 }

Decoding «hidden» intentions –searchlight approach

Haynes et al., Current Biology, 2007

Decoding of free decisions

Soon et al., Nat Neurosci, 2008

Decoding of fingerpresses (red line). Participants freely choose timingand hand.

Earliest information about left-rightlong before execution – free will?

Decoding task preparation –connectitivy based decoding

Heinzle et al., J Neurosci, 2012

SV-Classifier on connectivity graph (correlation)

Discriminative maps

Unsupervised learning

Building a representation of data

Dimensionality Reduction Time seriesClustering

K-means Mixture models

K-means clustering

• Cost function

• Algorithm1. Initialize2. Estimate assignments3. Estimate cluster centroids4. Repeat 2,3 until

convergence

Bishop PRML (2006)

Clustering – Mixture of Gaussians

Bishop PRML (2006)

Interpretation

• Cluster parameters

• Internal Criterion – Model Evidence• External Criterion - Purity

Inferred Labels

External Labels

Subjects

Cluster 1 Cluster 2

Motivation

Learning from Data

Generative EmbeddingModelling

Encoding vs. Decoding models

Coding Hypotheses

Spatial vectors Smooth vectors

Sparse vectors

Singular vectors of data Support vectors

Distributed vectors

𝑈𝑈 = 𝑅𝑅𝑌𝑌𝑇𝑇𝑈𝑈𝑈𝑈𝑉𝑉𝑇𝑇 = 𝑅𝑅𝑌𝑌𝑇𝑇

Coding Hypotheses

Friston et al. 2008 NeuroImage

Solved with variational Bayes

Example – Decoding of motion.

Experimental factors:1. Photic2. Motion3. Attention

Attention to motion dataset - Büchel & Friston 1999 Cerebral Cortex

Results

1 2 3 4 5-20

partitions

log-evidencemaximum p = 100.00%

-0.04 -0.02 0 0.02 0.040

voxel-weight

distribution of weights

Posterior probabilities at maxima ________________________________p(|w| > 0) location (x,y,z) weight (w)________________________________p = 0.993 -39.0,-90.0,-3.0mm q = 0.0254;p = 0.983 -33.0,-99.0,-3.0mm q = -0.0216;p = 0.983 -30.0,-99.0,3.0mm q = 0.0211;p = 0.982 -42.0,-90.0,9.0mm q = 0.0201;p = 0.980 -45.0,-75.0,-3.0mm q = 0.0168;p = 0.979 -30.0,-84.0,6.0mm q = -0.0187;p = 0.977 -39.0,-87.0,3.0mm q = -0.0196;p = 0.973 -30.0,-84.0,-6.0mm q = -0.0204;p = 0.972 -39.0,-81.0,-15.0mm q = 0.0166;p = 0.946 -36.0,-84.0,12.0mm q = -0.0144;p = 0.933 -48.0,-84.0,-3.0mm q = -0.0119;p = 0.929 -39.0,-75.0,3.0mm q = -0.0160;________________________________506 voxels; 360 scans

PPM: MVB_Motion (Motion)

0 100 200 300 400-1

ted re

MVB_Motion (prior: sparse)

targetprediction

-1 -0.5 0 0.5-0.4

contrast

iction

observed and predicted contrastSNR (variance) 0.64

SPM{T338}

Motion

SPMresults: .\SPM-practical\attention\GLMHeight threshold T = 4.874226 {p<0.05 (FWE)}

contrast(s)

Laminar activity related to novelty andepisodic memory

Maas et al. 2014 Nature Communications

Motivation

Learning from Data

Modelling Principles

Classifying Groups of Subjects

Subject 1

Subject 2

Subject N

Voxel activity

Subject 1

Subject 2

Subject N

Connectivity

Dynamic causal model (DCM)

ClassificationClustering

Group 1 Group 2

......

• High dimensionality• Unusual cluster distributions• Lack of interpretation

Brodersen et al. PLOS computation biology 2011.

DCM for speech processing

Working memory in Schizophrenia

• 41 Schizophrenia patients (DSM IV,ICD 10), 42 controls

• Visual numeric n-back working memory task

Deserno et al (2012) The Journal of Neuroscience

4 29 8

3 5900ms

Model based clustering

Brodersen et al 2014 Neuroimage

Results healthy vs. schizophrenia patients

Within patients clustering

Be aware

• Interpretation of decoding or classificationresults is difficult.

• The decoded information must be in thedata, but in what features exactly is oftenhard to find out …

Summary

Learning from Data

Modelling Principles

Acknowledgments

Many thanks to K.E. Stephan, Sudhir S. Raman and K. Brodersen for sharing their teaching material.

multivariate models and machine learning for fmri - tnu · multivariate models and machine learning...

Documents

introduction to machine learning multivariate methods

what makes a good multivariate model for fmri-based decoding...

application of multivariate methods to an fmri brain

multivariate analyses with fmri · pdf filemultivariate...

event-related fmri (er-fmri)

advance fmri (fast fmri)

machine learning and multivariate statistical methods in...

multivariate statistical analyses and machine learning for...

multivariate approaches to analyze fmri data yuanxin hu

decoding the infant mind: multivariate pattern analysis...

sudhirraman-multivariate analyses with fmri data … ·...

multivariate analysis and machine learning in properties...

human-machine symbiosis: a multivariate perspective for

tutorial: multivariate strategies for fmri...

decoding representational spaces with multivariate pattern...

main multivariate analysis of fmri time series

multivariate dynamical systems-based estimation of causal...

an introduction to machine learning for fmri

1 glen cowan multivariate statistical methods in particle...

machine learning for fmri - greydanus.github.io · machine...