data analysis and manifold learning lecture 12: some examples of manifold learning...
TRANSCRIPT
Data Analysis and Manifold LearningLecture 12: Some Examples of Manifold Learning
Applications
Radu HoraudINRIA Grenoble Rhone-Alpes, France
[email protected]://perception.inrialpes.fr/
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Outline of Lecture 12
Spectral embedding of a graph applied to fMRI data analysis
The Google PageRank algorithm
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Material for this lecture
X. Shen and F. Meyer. Low-Dimensional Embedding of fMRIDatasets. NeuroImage 41 (2008).http://ecee.colorado.edu/~fmeyer/Pub/neuroimage08.pdf
T. Hastie, R. Tibshirani, and J. Friedman. The Elements ofStatistical Learning. Springer (2009). The Google PageRankalgorithm is briefly described 576–578.
Lawrence Page, Sergey Brin, Rajeev Motwani, and TerryWinograd. The PageRank Citation Ranking: Bringing Orderto the Web. Technical Report. Stanford University (1999).http://ilpubs.stanford.edu:8090/422/
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Low-dimensional embedding of fMRI
fMRI provides a large scale measurement of neuronal activity
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
fMRI data
Each voxel vi generates a time series
xi =(xi(1) . . . xi(T )
)> ∈ RT
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
A network of functionally correlated voxels
A connectivity graph is formed with the standard approach:
Wij ={
exp(−‖xi − xj‖2/σ2
)if vi ∼ vj
0 otherwise
The Euclidean distance between time series:
‖xi − xj‖2 =T∑
t=1
(xi(t)− xj(t))2
Choice for σ:σ = 2 min
i<j‖xi − xj‖
Choice for nn (nearest neighbor): user defined and varies from7 to ... 100.
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Matrices
Diagonal degree matrix: D(i, i) =∑
j Wi,j
Transition matrix: P = D−1W
Transition probabilities:
Pi,j = P(i, j) =Wi,j∑j Wi,j
It is a row-stochastic matrix:∑
j Pi,j = 1
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Random walk and commute-time
Consider a random walk on the graph denoted by Zn: if thewalk is at vi it jumps to one of its neighbors vj withprobability Pi,j .
if vi and vj are in the same functional area and vi and vk arein different functional areas, we expect that Pi,j � Pi,k.
The average hitting time measures the number of steps that ittakes for a random walk starting in vi to hit vj for the firsttime:
H(vi, vj) = Ei[Tj ] with Tj = min{n ≥ 0;Zn = j}
The hitting time is not symmetric, use the commute timeinstead:
κ(vi, vj) = H(vi, vj) +H(vj , vi) = Ei[Tj ] + Ej [Ti]
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
The commute time distance in the spectral domain
Let (λk,φk)Nk=1 be the eigenvalue-eigenvector pairs of matrix
P that can be easily computed because D1/2PD−1/2 is a realsymmetric matrix. Moreover (see Lecture #3) we have:
−1 ≤ λN ≤ . . . ≤ λk ≤ . . . ≤ λ1 = 1
The commute time distance is:
κ(xi,xj)2 =K+1∑k=2
11− λk
(φk(i)√πi− φk(j)√πj
)
π = (π1 . . . πi . . . πN )> is the eigenvector P>π = π with:
πi =di∑
i,j Wi,j
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Embedding
The initial time series can now be embedded using themapping: xi −→ Ψ(xi), i.e., RT → RK :
Ψ(xi) =1√πi
(φk(i)√1− λ2
. . .φk(i)√1− λk
. . .φk(i)√
1− λK+1
)>This is strictly equivalent to the commute-time embeddingbased on the normalized graph Laplacian (see Lecture #3).
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Choosing the dimension
Remind that each voxel in the brain corresponds to a graphnode and there is a time series at each voxel:
X =
x1(1) . . . x1(t) . . . x1(T )
......
...xi(1) . . . xi(t) . . . xi(T )
......
...xN (1) . . . xN (t) . . . xN (T )
Each column x(t) in this matrix is a scalar function definedover the graphs’ vertices. It can be decomposed using theeigenvectors:
x(t) =K+1∑k=2
〈x(t),φk〉φk + r(t)
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Choosing the dimension
This can be written as:
x̂(t) =K+1∑k=2
〈x(t),φk〉φk =K+1∑k=2
xk(t)φk
Therefore, each entry i (at each brain location or graphvertex) of this approximated vector is:
x̂i(t) =K+1∑k=2
xk(t)φk(i)
The discrepancy between the initial observations and theirapproximate representation is:
εi(K) =∑T
t=1(xi(t)− x̂i(t))2∑Tt=1 x
2i (t)
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Choosing the dimension
The authors suggest to average εi(K) over the voxels lying ina ”functional” region, and to take the max over all theseaverage values.
It is not clear what is a functional region, since this is what itis searched for and why the average is selected.
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Segmentation using K-means
The authors notice that the embedding has a star-like shape.
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Segmentation using K-means
The ”arms” correspond to:
activated time series orstrong physiological artifacts
The center blob corresponds to ”background activity”
The embedded data are projected on a hyper-sphere ofdimension K.
The background is spread over the sphere.
The K-means algorithm is applied to the spherical data
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Event related dataset
Study of age-related changes in functional anatomy
2050 time series.
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Spectral embedding/clustering results
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Results obtained with PCA and ISOMAP
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Activation maps
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
The Google PageRank Algorithm
A web page is important if many other pages point to it.
Lij = 1 if page j points to page i and zero otherwise
cj =∑
i Lij number of pages pointing to by page j (numberof outlinks)
PageRanks pi are defined by the recursive relationship:
pi = (1− d) + d∑
j
(Lij
cj
)pj
d = 0.85
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
In matrix form
A =1− dN
ee> + dLD−1c
A has positive entries and each column sums to one - acolumn stochastic matrix.
The algorithm becomes:
p = Ap
e = (1 . . . 1)>, Dc = Diag(ci)
Radu Horaud Data Analysis and Manifold Learning; Lecture 12
Markov chain intepretation
PageRank as a model of user behavior: a random web surferclicks on links randomly without regard of the content.
The surfer does a randome walk on the web, choosing amongavailable outgoing links at random
The maximal (equal to 1) eigenvalue-eigenvector pair of Acorresponds to the stationary distribution of an irreducibleaperiod Markov chain over the N web pages.
Radu Horaud Data Analysis and Manifold Learning; Lecture 12