data analysis and manifold learning lecture 12: some examples of manifold learning...

Data Analysis and Manifold LearningLecture 12: Some Examples of Manifold Learning

Applications

Radu HoraudINRIA Grenoble Rhone-Alpes, France

[email protected]://perception.inrialpes.fr/

Radu Horaud Data Analysis and Manifold Learning; Lecture 12

http://perception.inrialpes.fr/

Outline of Lecture 12

Spectral embedding of a graph applied to fMRI data analysis

The Google PageRank algorithm


Material for this lecture

X. Shen and F. Meyer. Low-Dimensional Embedding of fMRIDatasets. NeuroImage 41 (2008).http://ecee.colorado.edu/~fmeyer/Pub/neuroimage08.pdf

T. Hastie, R. Tibshirani, and J. Friedman. The Elements ofStatistical Learning. Springer (2009). The Google PageRankalgorithm is briefly described 576–578.

Lawrence Page, Sergey Brin, Rajeev Motwani, and TerryWinograd. The PageRank Citation Ranking: Bringing Orderto the Web. Technical Report. Stanford University (1999).http://ilpubs.stanford.edu:8090/422/


http://ecee.colorado.edu/~fmeyer/Pub/neuroimage08.pdf

http://ilpubs.stanford.edu:8090/422/

Low-dimensional embedding of fMRI

fMRI provides a large scale measurement of neuronal activity


fMRI data

Each voxel vi generates a time series

xi =(xi(1) . . . xi(T )

)> ∈ RT


A network of functionally correlated voxels

A connectivity graph is formed with the standard approach:

Wij ={

exp(−‖xi − xj‖2/σ2

)if vi ∼ vj

0 otherwise

The Euclidean distance between time series:

‖xi − xj‖2 =T∑

t=1

(xi(t)− xj(t))2

Choice for σ:σ = 2 min

i<j‖xi − xj‖

Choice for nn (nearest neighbor): user defined and varies from7 to ... 100.


Matrices

Diagonal degree matrix: D(i, i) =∑

j Wi,j

Transition matrix: P = D−1W

Transition probabilities:

Pi,j = P(i, j) =Wi,j∑j Wi,j

It is a row-stochastic matrix:∑

j Pi,j = 1


Random walk and commute-time

Consider a random walk on the graph denoted by Zn: if thewalk is at vi it jumps to one of its neighbors vj withprobability Pi,j .

if vi and vj are in the same functional area and vi and vk arein different functional areas, we expect that Pi,j � Pi,k.

The average hitting time measures the number of steps that ittakes for a random walk starting in vi to hit vj for the firsttime:

H(vi, vj) = Ei[Tj ] with Tj = min{n ≥ 0;Zn = j}

The hitting time is not symmetric, use the commute timeinstead:

κ(vi, vj) = H(vi, vj) +H(vj , vi) = Ei[Tj ] + Ej [Ti]


The commute time distance in the spectral domain

Let (λk,φk)Nk=1 be the eigenvalue-eigenvector pairs of matrix

P that can be easily computed because D1/2PD−1/2 is a realsymmetric matrix. Moreover (see Lecture #3) we have:

−1 ≤ λN ≤ . . . ≤ λk ≤ . . . ≤ λ1 = 1

The commute time distance is:

κ(xi,xj)2 =K+1∑k=2

11− λk

(φk(i)√πi− φk(j)√πj

)

π = (π1 . . . πi . . . πN )> is the eigenvector P>π = π with:

πi =di∑

i,j Wi,j


Embedding

The initial time series can now be embedded using themapping: xi −→ Ψ(xi), i.e., RT → RK :

Ψ(xi) =1√πi

(φk(i)√1− λ2

. . .φk(i)√1− λk

. . .φk(i)√

1− λK+1

)>This is strictly equivalent to the commute-time embeddingbased on the normalized graph Laplacian (see Lecture #3).


Choosing the dimension

Remind that each voxel in the brain corresponds to a graphnode and there is a time series at each voxel:

X =

x1(1) . . . x1(t) . . . x1(T )

......

...xi(1) . . . xi(t) . . . xi(T )

......

...xN (1) . . . xN (t) . . . xN (T )

Each column x(t) in this matrix is a scalar function definedover the graphs’ vertices. It can be decomposed using theeigenvectors:

x(t) =K+1∑k=2

〈x(t),φk〉φk + r(t)



This can be written as:

x̂(t) =K+1∑k=2

〈x(t),φk〉φk =K+1∑k=2

xk(t)φk

Therefore, each entry i (at each brain location or graphvertex) of this approximated vector is:

x̂i(t) =K+1∑k=2

xk(t)φk(i)

The discrepancy between the initial observations and theirapproximate representation is:

εi(K) =∑T

t=1(xi(t)− x̂i(t))2∑Tt=1 x

2i (t)



The authors suggest to average εi(K) over the voxels lying ina ”functional” region, and to take the max over all theseaverage values.

It is not clear what is a functional region, since this is what itis searched for and why the average is selected.


Segmentation using K-means

The authors notice that the embedding has a star-like shape.


Segmentation using K-means

The ”arms” correspond to:

activated time series orstrong physiological artifacts

The center blob corresponds to ”background activity”

The embedded data are projected on a hyper-sphere ofdimension K.

The background is spread over the sphere.

The K-means algorithm is applied to the spherical data


Event related dataset

Study of age-related changes in functional anatomy

2050 time series.


Spectral embedding/clustering results


Results obtained with PCA and ISOMAP


Activation maps


The Google PageRank Algorithm

A web page is important if many other pages point to it.

Lij = 1 if page j points to page i and zero otherwise

cj =∑

i Lij number of pages pointing to by page j (numberof outlinks)

PageRanks pi are defined by the recursive relationship:

pi = (1− d) + d∑

j

(Lij

cj

)pj

d = 0.85


In matrix form

A =1− dN

ee> + dLD−1c

A has positive entries and each column sums to one - acolumn stochastic matrix.

The algorithm becomes:

p = Ap

e = (1 . . . 1)>, Dc = Diag(ci)


Markov chain intepretation

PageRank as a model of user behavior: a random web surferclicks on links randomly without regard of the content.

The surfer does a randome walk on the web, choosing amongavailable outgoing links at random

The maximal (equal to 1) eigenvalue-eigenvector pair of Acorresponds to the stationary distribution of an irreducibleaperiod Markov chain over the N web pages.


data analysis and manifold learning lecture 12: some examples of manifold learning...

Documents