dimension reduction · what is dimension reduction!! motivation for performing reduction on your...

32
Dimension Reduction Pamela K. Douglas NITP Summer 2013 ©2013 Pamela Douglas, UCLA NITP

Upload: others

Post on 15-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

!   Dimension Reduction

Pamela K. Douglas

NITP Summer 2013

©2013  Pamela  Douglas,  UCLA  NITP    

Page 2: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Overview

!   What is dimension reduction !   Motivation for performing reduction on your data

!   Intuitive Description of Common Methods !   Applications in Neuroimaging

www.brainmapping.org  ©2013  Pamela  Douglas,  UCLA  NITP    

Page 3: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Overview

!   What is dimension reduction !   Motivation

!   Intuitive Description of Common Methods !   Applications in Neuroimaging

www.brainmapping.org  ©2013  Pamela  Douglas,  UCLA  NITP    

Page 4: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

!    The number of dimension in a given data set correspond to the number of variables that are measured on each observation.  

Data Dimensions

©2013  Pamela  Douglas,  UCLA  NITP    

!  In machine learning (ML) applications, the terms “feature” and/or “attribute” are often used to refer to dimensions being classified

fMRI  Voxels   EEG  Channels  

Examples  

Page 5: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Goal

©2013  Pamela  Douglas,  UCLA  NITP    

fMRI  Voxels   EEG  Channels  

Examples  

!  Given a d-dimensional data, (x1, . . . , xd)T ,find a lower dimensional representation s=(s1, . . . , sk) with k ≤ d, that describes the original data

!  Ideally this occurs with minimal information loss according to some criterion

Page 6: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Is Dimension Reduction Necessary?

!    “In  an  applica8on,  whether  it  is  classifica8on  or  regression,  observa8on  data  that  we  believe  contain  informa8on  are  taken  as  inputs  and  fed  to  the  system.    

    Ideally,  we  should  not  need  feature  selec8on  as  a  separate  process;  the  classifier  (or  regressor)  should  be  able  to  use  whichever  features  are  necessary,  discarding  the  irrelevant.    However,  there  are  several  reasons  why  we  are  interested  in  reducing  dimensionality  as  a  separate  step.”  

www.brainmapping.org  Alpaydin  

©2013  Pamela  Douglas,  UCLA  NITP    

Page 7: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Sparse Representation Motivation

7  

!   Reduce complexity, which depends on number of input dimensions, d, as well as N, the number of data exemplars or sample size

!   Diminish computation (important beyond speed!) !   Simpler models tend to depend less on noise !   Potential for increased knowledge extraction/

interpretability

Alpaydin,  Machine  Learning  2004  

©2013  Pamela  Douglas,  UCLA  NITP    

Douglas et al. 2012; Colby et al. 2012

Graph Theory Metrics for ASD classification prior to and post feature selection

before  

aPer  

Page 8: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

ADHD 200 Initiative

!   Public release of (n=973) subjects including structural, resting state fMRI, and demographic information from ADHD subtypes

!   Our team (3rd place) and others had 1,000s of neuroimaging features !   Winning team used only demographic features! !   Brief survey of Keggle Big Data ML Competitions – winning teams use FS

www.brainmapping.org  ©2013  Pamela  Douglas,  UCLA  NITP    

Judged  on  200  held  out  samples  

Page 9: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Overview

!   What is dimension reduction !   Motivation for performing reduction on your data

!   Intuitive Description of Common Methods !   Applications in Neuroimaging

www.brainmapping.org  ©2013  Pamela  Douglas,  UCLA  NITP    

Page 10: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Methods for Reducing Dimensions

©2013  Pamela  Douglas,  UCLA  NITP     www.brainmapping.org  

Feature Selection -  Feature Extraction

-  Extracting data from regions of interest (ROIs) -  A priori knowledge for ROI or EEG channel selection

- ML Feature Selection methods

Tend to be supervised

!   Two common approaches

Projection/Clustering Methods -  NNMF & Topic Modeling

-  SVD

-  PCA

-  ICA

Tend to be unsupervised

Page 11: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Region of Interest Feature Extraction

©2013  Pamela  Douglas,  UCLA  NITP     www.brainmapping.org  

!  Reducing dimensions to selected ROIs or paths may be useful when the number of features is very large

!   Applied commonly in functional connectivity analysis (e.g. rs-fcMRI)

!   Historically, data is warped to a common canonical atlas and time courses are extracted from each ROI (Pro: Easy to implement)

!   Example – Dosenbach et al. (2010) found that weakening of short range connections were predictive of brain maturation

Page 12: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

©2013  Pamela  Douglas,  UCLA  NITP     www.brainmapping.org  

!   One should use caution. Movement, slice timing and data quality should be checked and corrected or “scrubbed”

!   Power et al. (2012) found that long range connections were specifically decreased with subject motion

!   Alternative data-driven methods for defining ROIs may prove more successful

Rs-fcMRI Caution

Periera et al. 2013 PRNI

Page 13: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Feature Selection

!   Reduces Complexity & diminishes risk of overfitting (e.g. Kohavi & John 1997)

Methods Include: !   Filtering – variable ranking, almost like a preprocessing

step (e.g. t-test thresholding on training data) !   Wrapper Methods – “wrap” the induction algorithm

within a nested cross validation to assess the importance of a variable or subset of variables by removing it (e.g. SVM-RFE) (Maldonodo & Weber 2009; Das et al. 2001; Guyon & Eisseleef 2003)

!   Embedded Methods – usually use a 2 part objective function, assessing goodness-of-fit with a penalty term for a large number of parameters

©2013  Pamela  Douglas,  UCLA  NITP    

! Corr(ROIi, ROIj) ! ! !

Douglas et al. 2010 OHBM

Page 14: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Parsimonious Dynamic Systems Models���(state space models similar to DCM)

!   Using physiologic knowledge to constrain a problem in state space modeling approaches is also useful

www.brainmapping.org ©2013 Pamela Douglas, UCLA NITP

dx1

dt

dxn

dt

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

=

a11 a1n

an1 ann

⎢⎢⎢⎢

⎥⎥⎥⎥

x1

xn

⎢⎢⎢⎢

⎥⎥⎥⎥

+b11 b1n

bn1 bnn

⎢⎢⎢⎢

⎥⎥⎥⎥

u1

un

⎢⎢⎢⎢

⎥⎥⎥⎥

!   Classic Linear Model with Two state variables

V1 FFA

Y(t) u(t)

!   Analytically, the model is underdetermined or structurally unidentifiable

!   Remove either leak term, and the model becomes uniquely identifiable

kv,f

kf,v

k0,v k0,f

V1 FFA

u(t) Y(t)

kv,f

kf,v

k0,v k0,f

Inspired by work from Joe D. DiStefano III

!   Adding nonlinearity in the model (like a bilinear term) can also identifiable

Example – Classic linear state space model

Page 15: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

!   Sparsity & DCM for ML !   Depending on the goal, one may wish to use more or less variables for

analysis !   Part I: Goal is to classify moderately aphasic patients from controls

accurately

www.brainmapping.org ©2013 Pamela Douglas, UCLA NITP Inspired by work from Joe D. DiStefano III

Page 16: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

!   Restricted analysis to ROIs selected from GLM analysis !   Tested a number of feature selection methods !   Accuracy Result: DCM Features > PCA> t-test > searchlight > All Voxels in ROIs

www.brainmapping.org ©2013 Pamela Douglas, UCLA NITP Inspired by work from Joe D. DiStefano III

Page 17: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

!   Part II: Investigate which features were informative !   Used l1 approximation to the l0-norm regularizer to impose sparsity !   Using only 9 (highlighted below) out of 22 original features yielded same

balanced accuracy

www.brainmapping.org ©2013 Pamela Douglas, UCLA NITP Inspired by work from Joe D. DiStefano III

Page 18: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Projection Methods

!   We are interested in finding a mapping from original d-dimensional input space to a new (k<d) dimensional space with minimum loss of information

!   Common Linear Projection Methods Include:

!   Principle Component Analysis !   Independent Component Analysis !   Other Close Relatives: Multidimensional

Scaling, Factor Analysis

www.brainmapping.org  ©2013  Pamela  Douglas,  UCLA  NITP    

Page 19: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

A bit more intuition ���(matrix factorization & eigenvalues)

!   Matrix Factorization Can be useful for a number of reasons

www.brainmapping.org  ©2013  Pamela  Douglas,  UCLA  NITP    

!   Use Singular Value Decomposition Instead

A is n × m

A = U∑VT where ∑ = D 00 0

⎣⎢⎢

⎦⎥⎥

D is r × r

!   A full rank symmetric n x n matrix can be factored

A = PDP−1 with D diagonaland P−1 = PT if P has orthonormal columns

D =λ1 0 00 00 0 λn

⎢⎢⎢⎢

⎥⎥⎥⎥

where λ1 ≥ λ2 ≥λn ≥ 0

But Most Matrices that we work with are not square symmetric.

Ax = b A x b

y

U L

n-­‐r  columns  

m-­‐r  rows  

Page 20: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Singular Value Decomposition ���(a simple example)

!   Matrix, A is a linear transformation that maps unit sphere {x: ||x|| =1} onto an ellipse

www.brainmapping.org  ©2013  Pamela  Douglas,  UCLA  NITP    

!   Apply SVD from here

A = 3 / 10 1/ 101/ 10 −3 / 10

⎣⎢⎢

⎦⎥⎥

6 10 0 00 3 10 0

⎣⎢⎢

⎦⎥⎥

1/ 3 −2 / 3 2 / 32 / 3 −1/ 3 −2 / 32 / 3 2 / 3 1/ 3

⎢⎢⎢

⎥⎥⎥

!   Calculate eigenvalues, eigenvectors of ATA

λ1 = 6 10 λ2 = 90 λ3 = 0

v1 =1/ 32 / 32 / 3

⎢⎢⎢

⎥⎥⎥

v1 =−2 / 3−1/ 32 / 3

⎢⎢⎢

⎥⎥⎥

v1 =2 / 3−2 / 31/ 3

⎢⎢⎢

⎥⎥⎥

D = 6 10 00 3 10

⎣⎢⎢

⎦⎥⎥, ∑ = 6 10 0 0

0 3 10 0

⎣⎢⎢

⎦⎥⎥

Ax = b A = 4 11 14

8 7 −2⎡

⎣⎢

⎦⎥

A

A = U∑VT

Page 21: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Principle Component Analysis���(if calculated by hand)

!   Matrix of Observations, X

www.brainmapping.org  ©2013  Pamela  Douglas,  UCLA  NITP    

!   Compute Covariance Matrix

S =

1N −1

BBT where B = X̂1X̂N⎡⎣ ⎤⎦

!   Effectively perform SVD from here

S =

s11

s jj

⎢⎢⎢⎢

⎥⎥⎥⎥

s jj = σ2 var

si≠ j ⇒ cov ar

σ2 = tr(S)∑

X =w1 w2

h1 h2

…w n−1 w n

hn−1 hn

⎣⎢⎢

⎦⎥⎥ X = X1…X N⎡⎣ ⎤⎦

Example Heights and Weights observed w

h

!   First Center/demean data

X̂k = Xk − µ

w

h

D =λ1 0 00 00 0 λn

⎢⎢⎢⎢

⎥⎥⎥⎥

where λ1 ≥ λ2 ≥λn ≥ 0

Page 22: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Principle Component Analysis���(in practice)

!   In practice, iterative calculation of SVD is typically faster and more accurate than an eigenvalue decomposition of S.

www.brainmapping.org  ©2013  Pamela  Douglas,  UCLA  NITP    

!   Typically proceeds as follows:

w* = arg max

subject to w:||w||=1wT ∑w

!   Subsequent components must be orthogonal

Page 23: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Interpreting PCA & ICA Results

!   Why is having this decomposition useful?

www.brainmapping.org  ©2013  Pamela  Douglas,  UCLA  NITP    

!   PCA is useful when most of the variation or dynamic range of the data can be explained via a linear combination of only a few of the new orthogonal variables

!   The fraction of the total variance “captured” or explained by a certain variable

λ itr(D)

D =λ1 0 00 00 0 λn

⎢⎢⎢⎢

⎥⎥⎥⎥

where λ1 ≥ λ2 ≥λn ≥ 0

Page 24: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

!    FSL MELODIC Output

www.brainmapping.org  ©2013  Pamela  Douglas,  UCLA  NITP    

!   Variance explained by a given eigenvalue is generally monotonically decreasing and can be visualized with a Scree Graph

λ itr(D)

The Scree Graph���(Interpreting our PCA Results)

Frac8on  of  Variance  explained   Different Methods for finding

Cut-off Points (e.g. Levenberg-Marquadt)

Page 25: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

!    Similar  to  PCA,  but  relaxes  the  orthogonality  constraint    !    Goal  is  to  find  weight  matrix  W~=A-­‐1  and  components,  s,  of  data  x  

!    Depending  on  algorithm  weights  may  be  ini8alized  via  PCA  or  probabilis8c  PCA    

!    Weights  learned  according  to  a  cost  func8on    !    Two  common  algorithms  used  in  neuroimaging  are:  

!    Infomax  (Bell  &  Sejnowski  1995)  –  mutual  informa8on  !    FastICA  (Hyvarinen  1998)  –  negentropy  as  cost  func8on  !    Many  others!!  !    Including  new  work  Tudai  Adali      

www.brainmapping.org  ©2013  Pamela  Douglas,  UCLA  NITP    

Page 26: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Overview

!   What is dimension reduction !   Motivation

!   Intuitive Description of Common Methods !   Applications in Neuroimaging

www.brainmapping.org  ©2013  Pamela  Douglas,  UCLA  NITP    

Page 27: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

ICA & Dual Regression

!   ICA based noise removal – Dual Regression Approach

!   Group ICA – Identify RSN of interest

!   DMN Subnetwork Identification

www.brainmapping.org  ©2013  Pamela  Douglas,  UCLA  NITP    

Clewef  Notes  on  Dual  Regression  

Page 28: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Independent Components as Classifier Dimensions

    Hypothesis:  Brain  Cogni8ve  States  can  be  described  as  the  superposi8on  of  current  ac8vity  in  IC  basis  images    

©2013  Pamela  Douglas,  UCLA  NITP    

Page 29: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Belief Disbelief

IC 5

IC 13

IC 15

IC 19

Decision  Tree  &  Interpre8ng  Classifier  Output  

B>DB  Common  to  B  DB  Common  to  DB  IC  Spa8al  Mask  

Douglas  et  al.  (2010)  Neuroimage;  Douglas  et  al.  (2013)  Fron8ers  (in  press)  

Page 30: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Belief Disbelief

IC 5

IC 13

IC 15

IC 19

Decision  Tree  &  Interpre8ng  Classifier  Output  

Areas  that  were  uniquely  iden8fied  by  ICs  were  amygdala,  right  medial  frontal  gyrus  &  cingulate  cortex.  

Page 31: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Newer Applications of ICA

!   Mutimodal Fusion !   Joint ICA !   Linked ICA

www.brainmapping.org  ©2013  Pamela  Douglas,  UCLA  NITP    

Page 32: Dimension Reduction · What is dimension reduction!! Motivation for performing reduction on your data!! Intuitive Description of Common Methods!! Applications in Neuroimaging! ©2013&Pamela&Douglas,&UCLA&NITP&&

Thanks!