the self-organizing mapdasgupta/254-neural-ul/jenny.pdf\rule of thumb": 500 number of network...
TRANSCRIPT
The Self-Organizing Map
Jenny Hamer
University of California, San Diego
Paper by Teuvo Kohonen, IEEE 1990
November 8, 2018
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 1 / 20
Overview
1 Competitive learning
2 Learning vector quantization (LVQ) methodsUnsupervised vector quantizationSupervised LVQ
3 Topological neighborhood
4 Self-organizing map algorithm
5 Applications and results
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 2 / 20
Competitive learning
Let x ∈ Rn be an input vector. Let d(a,b) be a similarity metric betweentwo vectorial inputs, for example the Euclidean distance between a,b.
1. Initialize a set of “reference” vectors M = {mi | mi ∈ Rn, i = 1, , k}(e.g. randomly)
2. Simultaneously compare x with each mi ; compute the similaritybetween x,mi : d(x,mi )
3. The best-matching mi is declared the “winner”; let mc be thisreference vector
4. Tune mc such that d(x,mc) decreases while all other mi are keptunchanged.
Supposing that the x follow some distribution with probability densityfunction p, then the tuned mi form an approximation of p.
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 3 / 20
Competitive learning visualization
The unit uj in the layer F2 is the “winner” for input pattern I: input I’scluster is determined by this unit.
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 4 / 20
Unsupervised Vector Quantization: k-means
Goal: learn a finite, discrete approximation M of continuous probabilitydensity function of vectorial data X = {x | x ∈ Rn}.
As in competitive learning, randomly initialize the codebook M.For each input vector x:
1. Assign x to the cluster of its “winner” mc satisfying mini{||x−mi ||}(in general, w.r.t. Lp norm: we’ll use p = 2 or Euclidean distance)
2. Tune the codebook vectors:mc(t + 1) = mc(t) + α(t)[x(t)−mc(t)] where mc was a “winner”mi (t + 1) = mi (t) for all i 6= c
Originally used in signal processing and approximation:
often more economical to use a batched method: match and assignnumerous x(t), then update in a single step.
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 5 / 20
Supervised LVQ
Suppose some set of the input data x have known class labels: assignseveral mi ∈ M as representatives for each class (e.g. randomly). The mi
closest to x determines x’s class.
1. Assign class of mc to x
2. Update the mi as follows:
If x is correctly classified:
mc(t + 1) = mc(t) + α(t)[x−mc(t)]
If x is incorrectly classified:
mc(t + 1) = mc(t)− α(t)[x−mc(t)]
leaving all weights i 6= c unchanged: mi (t + 1) = mi (t)Here α(t), 0 < α(t) < 1, is a scalar gain which should decreasemonotonically over time.
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 6 / 20
Topological neighborhood
Goal: partition the cells of the map such that each responds to a discreteset of features of the data. Like competitive learning, but withneighborhood-dependent updates of cells in Nc to learn a spatial ordering.
This algorithm allows for high-dimensional data to be visualized in lowerdimensional (often 2D) space.
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 7 / 20
SOM architecture
Network: (most commonly)n ×m cells positioned on agrid or lattice, where eachcell mi ∈ Rn (the red nodes)
x1, x2, x3 are the inputvectors or patterns ∈ Rn
(the yellow nodes)
Each input xi is “connected”to every mi in the network(so that each xi can becompared with all mi )
Figure: An example of a 3× 3 network
(Snapshot from https://youtu.be/ Euwc9fWBJw)
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 8 / 20
SOM Learning algorithm
Forming the topological order of the map: begin with a large Nc , roughly1/2 number of neurons in network.Iteratively:
1. Find cell c s.t. ||x−mc || = mini{||x−mi ||}2. Update the mi :
If cell i ∈ Nc(t)
mi (t + 1) = mi (t) + α(t)[x(t)−mi (t)]
The adaption gain scalar α, 0 < α < 1 is often replaced with a scalar“kernel function” hci (t), most commonly a Gaussian neighborhood:
hci (t) = h0(t) exp(−||ri − rc ||2/σ(t)2)
where h0(t), σ(t) decrease over time, and ri , rc denote cells c and i .
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 9 / 20
Simple demonstration with colors
Using 15 samples of RGB values, this 20× 30 cell network learns aclassification of the colors.
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 10 / 20
Simple demonstration with colors
Using 15 samples of RGB values, this 20× 30 cell network learns aclassification of the colors.
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 10 / 20
Simple demonstration with colors
Using 15 samples of RGB values, this 20× 30 cell network learns aclassification of the colors.
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 10 / 20
Simple demonstration with colors
Using 15 samples of RGB values, this 20× 30 cell network learns aclassification of the colors.
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 10 / 20
Simple demonstration with colors
Using 15 samples of RGB values, this 20× 30 cell network learns aclassification of the colors.
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 10 / 20
Effect of decrease in network size
Alternative clustering of the same 15 RGB vectors with a 10× 15 network:
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 11 / 20
Effect of decrease in network size
Alternative clustering of the same 15 RGB vectors with a 10× 15 network:
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 11 / 20
Effect of decrease in network size
Alternative clustering of the same 15 RGB vectors with a 10× 15 network:
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 11 / 20
Effect of decrease in network size
Alternative clustering of the same 15 RGB vectors with a 10× 15 network:
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 11 / 20
Effect of decrease in network size
Alternative clustering of the same 15 RGB vectors with a 10× 15 network:
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 11 / 20
Effect of decrease in network size
Now, with a 50× 60 network:
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 12 / 20
Effect of decrease in network size
Now, with a 50× 60 network:
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 12 / 20
Effect of decrease in network size
Now, with a 50× 60 network:
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 12 / 20
Effect of decrease in network size
Now, with a 50× 60 network:
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 12 / 20
Effect of decrease in network size
Now, with a 50× 60 network:
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 12 / 20
Practical tips in applications
Initialization of mi (0):May be arbitrarily or randomly chosenOnly restriction is that they are distinct
Number of iterations:“Rule of thumb”: 500× number of network cells
Learning rate: αUse a high learning rate (close to 1) for the first 1000 iterations, thenmonotonically decrease α during fine-tuning
Neighborhood size: Nc
If Nc too small → map will not learn global orderingIf Nc too large → map will be very coarse-grained in clusteringChoose are generous Nc at start of training:Nc(0) can be > than half the size of the network;Reduce the size during training
Note that the updates to each cell’s weights tend to smooth out overlearning and weight vectors tend to become ordered in value along theaxes of the network.
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 13 / 20
Theory behind the SOM
Extremely difficult to capture the dynamics of the SOM inmathematical theorems and analyses
Many strict analyses have been attempted under simplifyingconstraints, though these are beyond the scope of this paper
Cottrell and Fort (1987) have proven that the SOM will converge to aglobally ordered state in some constrained low-dimensional cases
Ritter, Martinez, and Schulten (1992) have shown that under theassumption the x are discrete-valued, local order can be achieve formore general dimensionalities of the vectors
Several meaningful applications have been simulated using the SOMto demonstrate its utility and efficacy
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 14 / 20
Application: speech recognition
Building a “phonetic typewriter” to identify and recognize phonemes:represent continuous Finnish speech as spectral vectors, x
x ∈ R15 spectralvectors computedevery 9.83ms using256-pt FFT
No segmentation ofspeech and fullyunsupervised:naturally occurringfeatures contributedto self-organization
Figure: 8× 12 Finnish phoneme mapThe 21 phonemes of Finnish have been learned:most cells respond to a unique phoneme, butsome cells do to two.
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 15 / 20
Application: Semantic mapping
Goal: extract abstract logically similar, semantic information fromsymbolic data.
Meaning of a symbolic encodingrequires consideration of theconditional probabilities of itsoccurrences (contexts) withother encodings
Represent symbolic encoding asa vector xs of symbols and theircontext xc :
x =
[xsxc
]=
[xs0
]+
[0xc
]
Figure: Vocabulary used in experimentSemantic meaning should be inferredonly from context in which words occur:Each word encoded as a random, 7-dim.unit vector
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 16 / 20
Application: semantic maps
Presented 2, 000 word-context-pairs derived from 10,000 random sentences
Figure: 15× 10-cell learned semantic map
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 17 / 20
Concluding remarks
The SOM is a powerful neural-model capable of creating localized,structured, clustered representations of input data
On its own, the SOM is not well suited for classification tasks unlessfine-tuning methods are used to increase decision accuracy
Initialization of the map affects the locality of its learned responsesand ordering
Particularly useful for high-dimensional data: reduces dimensionalityof the data and the topological mapping allows underlying propertiesof data to be visualized
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 18 / 20
References
Tuevo Kohonen
The Self-Organizing Map
Proceedings of the IEEE Vol. 78, No. 9, September 1990.
Tuevo Kohonen
Essentials of the self-organizing map
Elsevier Volume 37, January 2013, Pages 52-65. 2012.
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 19 / 20
Any questions?
1 Competitive learning
2 Learning vector quantization (LVQ) methodsUnsupervised vector quantizationSupervised LVQ
3 Topological neighborhood
4 Self-organizing map algorithm
5 Applications and results
Jenny Hamer (UCSD) Self-organizing map November 8, 2018 20 / 20