connection between self-organising maps and...
TRANSCRIPT
www.manchester.ac.uk IJCNN, August 2007 [email protected]
Connection between Self-OrganisingMaps and Multidimensional Scaling
Hujun YinThe University of Manchester
www.manchester.ac.uk IJCNN, August 2007 [email protected]
Dimensionality Reduction and Data Visualisation
-10 -5 0 5 10 15
010
2030-15
-10
-5
0
5
10
15
www.manchester.ac.uk IJCNN, August 2007 [email protected]
Self-Organising Maps and MultidimesnionalScaling
PCA, MDS, Principal Curve/Surface
ISOMAP, LLE, Kernel PCA, Eigenmap
SOM & Data Visualisation
ViSOM & Principal Curve/Surface, MDS
Examples of SOM, ViSOM, MDS, Isomap, etc.
Conclusions
www.manchester.ac.uk IJCNN, August 2007 [email protected]
1. PCA, MDS & Principal Curve/Surface
∑ ∑=
−x
qq 2
1
||)(||minm
jj
Tj XX
• X : n-dimensional vector, zero-mean• qj: orthogonal eigenvectors of covariance C=E[XXT]• m≤n
° To reduce the dimensionality of the data set° To identify new “meaningful” (hidden) variables
PCA: A linear coordinate transformation
|C-λiI|=0
(C-λiI)qi=0 ΛQQ =][ TT XXE• Q=[q1, q2, … , qn]• Λ=diag [λ1, λ2, … λn]• λ1 ≥ λ2 ≥ … ≥ λn eigenvalues or variances
PCA decomposition
simple, direct visualisation stable (fast) solution
linear mappingbatch operation
jijiiiTi ≠⊥= , ,max 2 qqCqq σ
www.manchester.ac.uk IJCNN, August 2007 [email protected]
1. PCA, MDS & Principal Curve/Surface
MDS (Multidimensional Scaling)
• dij: inter-point distance in original space• Dij: inter-point distance in projected plot
• Classical MDS: Dij≈δij
• Metric MDS: Dij≈f(δij)=dij
• Nonmetric MDS: δij≤δkl δij≤δkl, ∀i,j,k,l
∑∑ <<
−=
ji ij
ijij
jiij
Sammon d
Dd
dS
2][1
nonlinear, direct visualisation stable solution
point-point mapping (no function)computational intensive
∑∑ <<
−=ji
ijij
jiij
Ddd
S 22 ][1
www.manchester.ac.uk IJCNN, August 2007 [email protected]
1. PCA, MDS & Principal Curve/Surface
MDS: Sammon Mapping
4.9 3.0 1.4 0.24.7 3.2 1.3 0.24.6 3.1 1.5 0.25.0 3.6 1.4 0.25.4 3.9 1.7 0.44.6 3.4 1.4 0.3……7.0 3.2 4.7 1.46.4 3.2 4.5 1.56.9 3.1 4.9 1.55.5 2.3 4.0 1.36.5 2.8 4.6 1.55.7 2.8 4.5 1.3……6.3 3.3 6.0 2.55.8 2.7 5.1 1.97.1 3.0 5.9 2.16.3 2.9 5.6 1.86.5 3.0 5.8 2.27.6 3.0 6.6 2.1…...……
www.manchester.ac.uk IJCNN, August 2007 [email protected]
1. PCA, MDS & Principal Curve/Surface
Principal Curve/Surface Projection:)(inf)(:sup)( ϑρρρ
ϑρfff −=−=
Λ∈xxx
])(|[)( ρρρ == XX fEf
Expectation:
∑∑
= S
i i
S
i iiF
),(
),()(
ρρκ
ρρκρ
x
-Hastie and Stuetzle (1989)
A smooth and self-consistent curve passing through the “middle” of the data.
Kernel smoothing:
principled nonlinear extension of PCA smooth mapping function
lack good algorithm, esp. in 2Dboundary problems
www.manchester.ac.uk IJCNN, August 2007 [email protected]
2. Isomap, LLE, Eigenmap & Kernel PCA
• Isomap: Tenenbaum, Silva and Langford (2000)for nonlinear dimensional scaling (dimensionality reduction).
° Construct neighbourhood graph: by dX(i,j)<ε or k nearest neighbours.° Compute the shortest (geodesic) paths: mindG(i,j), dG(i,k)+ dG(k,j). ° Construct low dimension embedding: by applying MDS,
2||||min YG DDE −=
point-point mapping (no function)difficult choice of k or ε
nonlinear, direct visualisation global geometric structure
www.manchester.ac.uk IJCNN, August 2007 [email protected]
2. Isomap, LLE, Eigenmap & Kernel PCA
° Select neighbourhood graph:k nearest neighbours.
° Reconstruct linear weights:
° Compute embedding coordinates Y:∑ ∑−=
ijj iji XWXW 2||||min)(ε
• LLE (Local Linear Embedding):- Roweis and Saul (2000)for nonlinear dimensionality reduction
∑ ∑−=Φi
jj iji YWYY 2||||min)(
point-point mapping (no function)computational intensive
nonlinear, direct visualisation stable, single pass solution
www.manchester.ac.uk IJCNN, August 2007 [email protected]
° Select neighbourhood graph:k nearest neighbours or ε rule.
° Construct the weightings (heat kernel):
° Eigenmap: compute the embedding
2. Isomap, LLE, Eigenmap & Kernel PCA
tXX
ij
ji
eW
2|||| −−
=
• Laplacian Eigenmap: Belkin and Niyogi (2003)for dimensionality reduction.
ff DL λ=
))(),...,(),(( 21 iiiX mi fff→
∑ −ij
ijji WYY 2)(minObjective function:
WDLandWDj jiii −==∑ ,
Spectral Clustering– Weiss 1999; Ng, Jordan and Weiss 2002
www.manchester.ac.uk IJCNN, August 2007 [email protected]
2. Isomap, LLE, Eigenmap & Kernel PCA
• Kernel PCA: Shölkopf, Smola and Müller (1998)for nonlinear PCA.
° Kernel method has become popular.
φ : X → F ,κ : X×X ∈ℜ,
° PCA
° Kernel PCA
)(),();( yxyx φφκ =)(xx φa
,qCq λ= ,1∑=i
Tiin
xxC ,∑=i
iixq α
,αKα λ= ,)(),(: jiijK xx φφ= ,],......,[ 21T
nααα=α
,)(∑=i
ii xq φα ,),(),( ∑=i
ikik xxqx καφ
www.manchester.ac.uk IJCNN, August 2007 [email protected]
3. Self-Organising Maps
SOM: Background–lateral inhibition
+
+− −
−
It explains Mach-band effect and abstraction purpose
Hartline, et al. 1960s
from V. Bruce & P.R Green
www.manchester.ac.uk IJCNN, August 2007 [email protected]
3. Self-Organising Maps
SOM: ModelHebbian learning (Hebb 1949) ∆w=αxyvon der Malsburg and Willshaw’s model (1973, 1976)
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
=
nx
xx
::
2
1
x
⎩⎨⎧
∉
∈−=−=
−=∂
∂
(t) j if 0
(t)j if )],()([)()]()([
)()()()()(
η
ηαα
βα
twtxtytwtx
twtytxtyt
tw
ijijiji
ijjijij
Kohonen’s model (1982) is an abstraction of von der Malsburg and Willshaw’s model
∑∑∑ −+=+∂
∂
'
*''
* )()()()()()(
kkikk
kik
jiiji
i tybtyetxtwtcytty
constant subject to ),(*)()(
==∂
∂ ∑ ijjiij wtytxt
twα ⎩
⎨⎧ >−
=otherwise 0
)( if ,)()(*
θθ tytyty jj
j
⎟⎟⎠
⎞⎜⎜⎝
⎛+=+ ∑
iiij
Tjj tyhtty )()()1( xwϕ
www.manchester.ac.uk IJCNN, August 2007 [email protected]
3. Self-Organising Maps
SOM: Algorithm
• At each time t, present an input, x(t), select the winner.
cctv wx −=
Ω∈)(minarg
• Updating the weights of winner and its neighbours.
)]()()[,,()()( tttkvtt vk wxw −=∆ ηα
• Repeat until the map converges.
Typical neighbourhood function: ])(2
exp[),,( 2
2
tkv
tkvσ
η−
−∝
www.manchester.ac.uk IJCNN, August 2007 [email protected]
3. Self-Organising Maps
SOM: Quantisation, topology & cost function
Topologically “ordered” map
“Error tolerant” coding -HVQ (Luttrell, NC 1994)xxwxww dphE
iV
kkkiN
i
)(),...(2
,1 ∑∫ ∑ −=
(Heskes, 1999) “Minimum wiring” (Mitchison, NC 1995), (Durbin&Mitchson, Nature1990)
www.manchester.ac.uk IJCNN, August 2007 [email protected]
3. Self-Organising Maps
SOM: Variants & extensions° HVQ (Luttrell 1989)° HSOM (Miikkulainen 1990), DISLEX (1990, 1997)° PSOM (Ritter 1993), Hyperbolic SOM (1999), H2SOM° Temporal Kohonen Map (Chappell &Taylor 1993)° Neural Gas (Martinetz et al.1991), Growing Grid (Fritzke1995)° ASSOM (Kohonen 1997)° Recurrent SOM (Koskela, 1997)° Bayesian SOM & SOMN (Yin&Allinson 1995,1997; Utsugi 1997)° GTM (Bishop et al. 1998)° GHSOM (Merkl et al. 2000)° PicSOM (Laaksonen, Oja, et al., 2000)° ViSOM (Yin 2001, 2002)
www.manchester.ac.uk IJCNN, August 2007 [email protected]
3. Self-Organising Maps
SOM: Applications -snapshots
www.manchester.ac.uk IJCNN, August 2007 [email protected]
3. Self-Organising Maps
SOM: Applications -snapshots
∫ ∫∫=
dtydtx
dtyxyxce
22 ''
''),( Neuron 8
G243.5 min,107 min.
Neuron 6M/G1
11.4 min,66.5 min.
Neuron 3G1
21.4 min, 81.0.
Neuron 2G1/S
27.1 min, 90.8 min.
Neuron 4G1
17.8 min,78.2 min.
Neuron 7M
56.0 min.
Neuron 5M/G1
11.4 min,78.8 min.
Neuron 1S
37.8 min,101.6 min.
A Temporal Shape Metric :Co-Expression Coefficient (Möller-Levet & Yin, Int. J. Neural Systems, 15: 311-322, 2005)
SOM+MLP HSOM Neural Gas GARCH ARMA
Mean Square Error (e-005) 2.05 4.11 2.65 2.90 2.94
Correct Prediction (%)
73.62 50.55 65.38 51.1152.2
Foreign exchange modelling :SOM+local SVM (H. Ni & Yin, 2006)
www.manchester.ac.uk IJCNN, August 2007 [email protected]
3. Self-Organising Maps
SOM: Data visualisation & dimensionality reduction
topology preserving mapping (discrete) mapping function
non distance preservingboundary problems
www.manchester.ac.uk IJCNN, August 2007 [email protected]
3. Self-Organising Maps
SOM: Data visualisation & knowledge management
courtesy of S. Kaski and T. Kohonen (1996)
Tree-View SOM(Freeman and Yin, TNN 2005)
www.manchester.ac.uk IJCNN, August 2007 [email protected]
4. ViSOM & Principal Curve/Surface, MDS
ViSOM: Visualisation induced SOM(Yin, 2001,2002)
° To preserve distance/metric (locally) on the map
° To extrapolate smoothly
Principle
x(t)
wv(t)
wk(t)
Fvx
FvxFkv
Fkx
)]()()[,,()()()1( tttkvttt kkk wxww −+=+ ηαx(t)
wi(t)
wk(t)
wv(t)
x(t)
wi(t)
wk(t)
wv(t)
Contraction
Expansion
SOM update:
ViSOM
www.manchester.ac.uk IJCNN, August 2007 [email protected]
4. ViSOM & Principal Curve/Surface, MDS
ViSOM: Algorithm• Grid structure and winner selection same to SOM
)( )()]()([)]()([),,()()(λ
ληαvk
vkvkkvvk
dtttttkvtt∆
∆−−+−=∆ wwwxw
• Updating
• RefreshingAt certain iterations (e.g. 20%), choosing a neuron randomly andusing its weight as an alternative input.
)( )]()()][1)(1([)]()([),,()()( ttdtttkvtt kvvk
vkvkk wwwxww −−
∆−++−+=∆
λξξηα
www.manchester.ac.uk IJCNN, August 2007 [email protected]
4. ViSOM & Principal Curve/Surface, MDS
ViSOM: Examples
SOM
ViSOM
www.manchester.ac.uk IJCNN, August 2007 [email protected]
4. ViSOM & Principal Curve/Surface, MDS
ViSOM: Example: Iris dataset
PCA Sammon
SOM ViSOM
www.manchester.ac.uk IJCNN, August 2007 [email protected]
4. ViSOM & Principal Curve/Surface, MDS
ViSOM: Examples Ranking table of UK universitiesRanking table of UK universities- source: The Sunday Times, 18 September 2000
Ranking University F1 F2 F3 F4 F5 F6 F7 Total1 Cambridge 241 182 247 97 88 100 50 10052 Oxford 214 175 244 97 81 100 30 9413 LSE 200 175 233 97 68 100 50 9234 Imperial 203 154 232 98 67 100 10 8645 York 206 143 208 94 63 76 60 8506 UCL 172 152 210 95 71 100 30 8307 St Andrews 139 131 194 96 73 91 100 8248 Warwick 153 155 215 97 69 86 20 7959 Bath 132 142 211 97 66 83 60 7919 Nottingham 176 125 218 96 74 72 30 79111 Bristol 145 131 218 96 75 94 20 77911 Durham 163 132 207 91 64 72 50 77911 Edinburg 106 145 218 96 74 100 40 77914 Lancaster 156 144 186 95 62 63 50 75615 UMIST 135 144 188 97 58 100 30 75216 Birmingham 146 127 204 96 67 87 20 74717 Loughborough 162 115 177 95 57 66 60 73218 Southampton 143 124 180 93 55 71 50 71619 King’s College 135 126 204 96 63 100 -10 71420 Newcastle 134 117 193 97 60 87 20 70821 Manchester 125 134 198 96 66 98 -10 70722 Leeds 122 127 199 97 61 74 20 70023 Sheffield 143 125 213 97 61 72 -20 69124 East Anglia 125 127 176 96 63 60 40 68724 Leicester 125 120 183 94 52 93 20 687
F1:Research F2:Teaching F3:A-levels F4:Employment F5:S/S ratio F6:1st/2:1s F7:Dropout rate
www.manchester.ac.uk IJCNN, August 2007 [email protected]
4. ViSOM & Principal Curve/Surface, MDS
ViSOM: Examples League “map”
www.manchester.ac.uk IJCNN, August 2007 [email protected]
4. ViSOM & Principal Curve/Surface, MDS
ViSOM: A discrete principal curve/surface (Yin, 2002)
Projection:
∑∑
= S
i
S
i ik
ikh
ikh
),(
),(xw
SOM/ViSOM smoothing:
SOM: ||k-i||≠||wk-wi||
ViSOM: ||k-i||≈||wk-wi||
)(inf)(:sup)( ϑρρρϑρ
fff −=−=Λ∈
xxx
])(|[)( ρρρ == XX fEf
Expectation:
Kernel smoothing:
∑∑
= S
i i
S
i iiF
),(
),()(
ρρκ
ρρκρ
x
www.manchester.ac.uk IJCNN, August 2007 [email protected]
4. ViSOM & Principal Curve/Surface, MDS
x
x’
wv
v
ViSOM Extensions:° Resolution Enhancement
by Local Linear Projection -LLP (Yin, 2003)
° PRSOM (Wu&Chow,2005)= STVQ +ViSOM
° Neural Gas ViSOM (Estévez and Figueroa, 2006)
° Other extensions are readily applicable, e.g. hierarchical, growing, etc..
www.manchester.ac.uk IJCNN, August 2007 [email protected]
4. ViSOM & Principal Curve/Surface, MDS
-8 -6 -4 -2 0 2 4 6 8-8
-6
-4
-2
0
2
4
6
8
Generating curveData HS algorithm SOM ViSOM
Yin (2002)
www.manchester.ac.uk IJCNN, August 2007 [email protected]
4. ViSOM & Principal Curve/Surface, MDS
Other PC/S algorithms:
∑∑
= S
i
S
i ik
ikh
ikh
),(
),(xw
• SOM has been related to PC/S and termed discrete PC/S by Ritter, Martinetz & Schulten in 1992. However, the differences are:° Projection onto nodes instead of curve/surface° Smoothing is governed by indexes in the map space, not the input space
∑∑
= S
i i
S
i iiF
),(
),()(
ρρκ
ρρκρ
x
SOM: ||k-i||≠||wk-wi||=||ρ -ρi|| ViSOM: ||k-i||≈||wk-wi||
More importantly for the SOM, one cannot get the curve/surface at any point other than the nodes, even with interpolations.GTM (generative topographic mapping) and PPS (probabilistic principal surface) are parametrised SOMs with GTM using spherical and PPS oriented Gaussians for the nodes.
www.manchester.ac.uk IJCNN, August 2007 [email protected]
4. ViSOM & Principal Curve/Surface, MDS
SOM, ViSOM & MDS:
• SOM is a qualitative (nonmetric) MDS.• ViSOM is a metric MDS.
° MDS cost function:
° SOMs cost function (sample cost of Voronoi region i):
° SOM:° ViSOM:
∑∑∑ −=−+=−ji ijijji ijijijijji ijij DdDdDdDd
,,22
,2 )2(min)2(min)(min
∑∑
∑∑−≈=
−−=−
jijij
jijij
jji
jj
dDdDf
jifji
22
22
)(min)(min
||)(||min),(min xxwxη
2|||| jiDij −=
|||||||| jiij jiD ww −≅−=
∑∑=i j
jMiMGjiFC )](),([),(
C Measure (Goodhill&Sejnowski, 1997)
www.manchester.ac.uk IJCNN, August 2007 [email protected]
5. Examples
Iris:
ViSOM
Sammon metric MDS
SOM
www.manchester.ac.uk IJCNN, August 2007 [email protected]
5. Examples
LLE
ViSOM embedding ViSOM output
Isomap
-2-1
01
2
0
2
4
6-2
-1
0
1
2
3
4
0 2 4 6 8 10 12 14 16 180
1
2
3
4
5
6
7
8
9
10
-3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3Two-dimensional Isomap embedding (with neighborhood graph).
-3 -2 -1 0 1 2 3-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
S-curve:
www.manchester.ac.uk IJCNN, August 2007 [email protected]
4. ViSOM & Principal Curve/Surface, MDS
LLE
Swissroll data & ViSOM embeddingSwissroll:
Isomap-10 -5 0 5 10 15
0
10
20
30-15
-10
-5
0
5
10
15
-4 -3 -2 -1 0 1 2 3-2
-1.5
-1
-0.5
0
0.5
1
1.5
-60 -40 -20 0 20 40 60-15
-10
-5
0
5
10
15Two-dimensional Isomap embedding (with neighborhood graph).
0 10 20 30 40 50 60 70
0
2
4
6
8
10
12
14
16
18
ViSOM
www.manchester.ac.uk IJCNN, August 2007 [email protected]
Summary• Objective functions of many mappings such as MDS, LLE
(and eigenmap) and SOM (ViSOM) are all closely linked, but a good algorithm or implementation is the key.
• SOM is a useful tool for data clustering, visualisation and management. ViSOM is particularly suited for direct,metric-preserving visualisation.
• ViSOM is a metric scaling and SOM is a relational or ordinal (nonmetric ) scaling.
• SOM-based methods produce a scaling (dimension reduction) and also provide a principal manifold.
• SOM naturally approximates kernel method.