dimension reduction · what is dimension reduction!! motivation for performing reduction on your...

! Dimension Reduction

Pamela K. Douglas

NITP Summer 2013

©2013 Pamela Douglas, UCLA NITP

Overview

! What is dimension reduction ! Motivation for performing reduction on your data

! Intuitive Description of Common Methods ! Applications in Neuroimaging

www.brainmapping.org ©2013 Pamela Douglas, UCLA NITP

Overview

! What is dimension reduction ! Motivation



! The number of dimension in a given data set correspond to the number of variables that are measured on each observation.

Data Dimensions


! In machine learning (ML) applications, the terms “feature” and/or “attribute” are often used to refer to dimensions being classified

fMRI Voxels EEG Channels

Examples

Goal


fMRI Voxels EEG Channels

Examples

! Given a d-dimensional data, (x1, . . . , xd)T ,find a lower dimensional representation s=(s1, . . . , sk) with k ≤ d, that describes the original data

! Ideally this occurs with minimal information loss according to some criterion

Is Dimension Reduction Necessary?

! “In an applica8on, whether it is classifica8on or regression, observa8on data that we believe contain informa8on are taken as inputs and fed to the system.

Ideally, we should not need feature selec8on as a separate process; the classifier (or regressor) should be able to use whichever features are necessary, discarding the irrelevant. However, there are several reasons why we are interested in reducing dimensionality as a separate step.”

www.brainmapping.org Alpaydin


Sparse Representation Motivation

7

! Reduce complexity, which depends on number of input dimensions, d, as well as N, the number of data exemplars or sample size

! Diminish computation (important beyond speed!) ! Simpler models tend to depend less on noise ! Potential for increased knowledge extraction/

interpretability

Alpaydin, Machine Learning 2004


Douglas et al. 2012; Colby et al. 2012

Graph Theory Metrics for ASD classification prior to and post feature selection

before

aPer

ADHD 200 Initiative

! Public release of (n=973) subjects including structural, resting state fMRI, and demographic information from ADHD subtypes

! Our team (3rd place) and others had 1,000s of neuroimaging features ! Winning team used only demographic features! ! Brief survey of Keggle Big Data ML Competitions – winning teams use FS


Judged on 200 held out samples

Overview

! What is dimension reduction ! Motivation for performing reduction on your data



Methods for Reducing Dimensions

©2013 Pamela Douglas, UCLA NITP www.brainmapping.org

Feature Selection - Feature Extraction

- Extracting data from regions of interest (ROIs) - A priori knowledge for ROI or EEG channel selection

- ML Feature Selection methods

Tend to be supervised

! Two common approaches

Projection/Clustering Methods - NNMF & Topic Modeling

- SVD

- PCA

- ICA

Tend to be unsupervised

Region of Interest Feature Extraction


! Reducing dimensions to selected ROIs or paths may be useful when the number of features is very large

! Applied commonly in functional connectivity analysis (e.g. rs-fcMRI)

! Historically, data is warped to a common canonical atlas and time courses are extracted from each ROI (Pro: Easy to implement)

! Example – Dosenbach et al. (2010) found that weakening of short range connections were predictive of brain maturation


! One should use caution. Movement, slice timing and data quality should be checked and corrected or “scrubbed”

! Power et al. (2012) found that long range connections were specifically decreased with subject motion

! Alternative data-driven methods for defining ROIs may prove more successful

Rs-fcMRI Caution

Periera et al. 2013 PRNI

Feature Selection

! Reduces Complexity & diminishes risk of overfitting (e.g. Kohavi & John 1997)

Methods Include: ! Filtering – variable ranking, almost like a preprocessing

step (e.g. t-test thresholding on training data) ! Wrapper Methods – “wrap” the induction algorithm

within a nested cross validation to assess the importance of a variable or subset of variables by removing it (e.g. SVM-RFE) (Maldonodo & Weber 2009; Das et al. 2001; Guyon & Eisseleef 2003)

! Embedded Methods – usually use a 2 part objective function, assessing goodness-of-fit with a penalty term for a large number of parameters


! Corr(ROIi, ROIj) ! ! !

Douglas et al. 2010 OHBM

Parsimonious Dynamic Systems Models��(state space models similar to DCM)

! Using physiologic knowledge to constrain a problem in state space modeling approaches is also useful


dx1

dt

dxn

dt

⎡

⎣

⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥

=

a11 a1n

an1 ann

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

x1

xn

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

+b11 b1n

bn1 bnn

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

u1

un

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

! Classic Linear Model with Two state variables

V1 FFA

Y(t) u(t)

! Analytically, the model is underdetermined or structurally unidentifiable

! Remove either leak term, and the model becomes uniquely identifiable

kv,f

kf,v

k0,v k0,f

V1 FFA

u(t) Y(t)

kv,f

kf,v

k0,v k0,f

Inspired by work from Joe D. DiStefano III

! Adding nonlinearity in the model (like a bilinear term) can also identifiable

Example – Classic linear state space model

! Sparsity & DCM for ML ! Depending on the goal, one may wish to use more or less variables for

analysis ! Part I: Goal is to classify moderately aphasic patients from controls

accurately

www.brainmapping.org ©2013 Pamela Douglas, UCLA NITP Inspired by work from Joe D. DiStefano III

! Restricted analysis to ROIs selected from GLM analysis ! Tested a number of feature selection methods ! Accuracy Result: DCM Features > PCA> t-test > searchlight > All Voxels in ROIs


! Part II: Investigate which features were informative ! Used l1 approximation to the l0-norm regularizer to impose sparsity ! Using only 9 (highlighted below) out of 22 original features yielded same

balanced accuracy


Projection Methods

! We are interested in finding a mapping from original d-dimensional input space to a new (k<d) dimensional space with minimum loss of information

! Common Linear Projection Methods Include:

! Principle Component Analysis ! Independent Component Analysis ! Other Close Relatives: Multidimensional

Scaling, Factor Analysis


A bit more intuition ��(matrix factorization & eigenvalues)

! Matrix Factorization Can be useful for a number of reasons


! Use Singular Value Decomposition Instead

A is n × m

A = U∑VT where ∑ = D 00 0

⎡

⎣⎢⎢

⎤

⎦⎥⎥

D is r × r

! A full rank symmetric n x n matrix can be factored

A = PDP−1 with D diagonaland P−1 = PT if P has orthonormal columns

D =λ1 0 00 00 0 λn

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

where λ1 ≥ λ2 ≥λn ≥ 0

But Most Matrices that we work with are not square symmetric.

Ax = b A x b

y

U L

n-‐r columns

m-‐r rows

Singular Value Decomposition ��(a simple example)

! Matrix, A is a linear transformation that maps unit sphere {x: ||x|| =1} onto an ellipse


! Apply SVD from here

A = 3 / 10 1/ 101/ 10 −3 / 10

⎡

⎣⎢⎢

⎤

⎦⎥⎥

6 10 0 00 3 10 0

⎡

⎣⎢⎢

⎤

⎦⎥⎥

1/ 3 −2 / 3 2 / 32 / 3 −1/ 3 −2 / 32 / 3 2 / 3 1/ 3

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

! Calculate eigenvalues, eigenvectors of ATA

λ1 = 6 10 λ2 = 90 λ3 = 0

v1 =1/ 32 / 32 / 3

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

v1 =−2 / 3−1/ 32 / 3

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

v1 =2 / 3−2 / 31/ 3

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

D = 6 10 00 3 10

⎡

⎣⎢⎢

⎤

⎦⎥⎥, ∑ = 6 10 0 0

0 3 10 0

⎡

⎣⎢⎢

⎤

⎦⎥⎥

Ax = b A = 4 11 14

8 7 −2⎡

⎣⎢

⎤

⎦⎥

A

A = U∑VT

Principle Component Analysis��(if calculated by hand)

! Matrix of Observations, X


! Compute Covariance Matrix

S =

1N −1

BBT where B = X̂1X̂N⎡⎣ ⎤⎦

! Effectively perform SVD from here

S =

s11

s jj

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

s jj = σ2 var

si≠ j ⇒ cov ar

σ2 = tr(S)∑

X =w1 w2

h1 h2

…w n−1 w n

hn−1 hn

⎡

⎣⎢⎢

⎤

⎦⎥⎥ X = X1…X N⎡⎣ ⎤⎦

Example Heights and Weights observed w

h

! First Center/demean data

X̂k = Xk − µ

w

h

D =λ1 0 00 00 0 λn

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥


Principle Component Analysis��(in practice)

! In practice, iterative calculation of SVD is typically faster and more accurate than an eigenvalue decomposition of S.


! Typically proceeds as follows:

w* = arg max

subject to w:||w||=1wT ∑w

! Subsequent components must be orthogonal

Interpreting PCA & ICA Results

! Why is having this decomposition useful?


! PCA is useful when most of the variation or dynamic range of the data can be explained via a linear combination of only a few of the new orthogonal variables

! The fraction of the total variance “captured” or explained by a certain variable

λ itr(D)

D =λ1 0 00 00 0 λn

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥


! FSL MELODIC Output


! Variance explained by a given eigenvalue is generally monotonically decreasing and can be visualized with a Scree Graph

λ itr(D)

The Scree Graph��(Interpreting our PCA Results)

Frac8on of Variance explained Different Methods for finding

Cut-off Points (e.g. Levenberg-Marquadt)

! Similar to PCA, but relaxes the orthogonality constraint ! Goal is to find weight matrix W~=A-‐1 and components, s, of data x

! Depending on algorithm weights may be ini8alized via PCA or probabilis8c PCA

! Weights learned according to a cost func8on ! Two common algorithms used in neuroimaging are:

! Infomax (Bell & Sejnowski 1995) – mutual informa8on ! FastICA (Hyvarinen 1998) – negentropy as cost func8on ! Many others!! ! Including new work Tudai Adali


Overview

! What is dimension reduction ! Motivation



ICA & Dual Regression

! ICA based noise removal – Dual Regression Approach

! Group ICA – Identify RSN of interest

! DMN Subnetwork Identification


Clewef Notes on Dual Regression

Independent Components as Classifier Dimensions

Hypothesis: Brain Cogni8ve States can be described as the superposi8on of current ac8vity in IC basis images


Belief Disbelief

IC 5

IC 13

IC 15

IC 19

Decision Tree & Interpre8ng Classifier Output

B>DB Common to B DB Common to DB IC Spa8al Mask

Douglas et al. (2010) Neuroimage; Douglas et al. (2013) Fron8ers (in press)

Belief Disbelief

IC 5

IC 13

IC 15

IC 19

Decision Tree & Interpre8ng Classifier Output

Areas that were uniquely iden8fied by ICs were amygdala, right medial frontal gyrus & cingulate cortex.

Newer Applications of ICA

! Mutimodal Fusion ! Joint ICA ! Linked ICA


Thanks!

dimension reduction · what is dimension reduction!! motivation for performing reduction on your...

Documents