self organizing maps (som) unsupervised learning

38
Self Organizing Maps (SOM) Unsupervised Learning

Upload: madeleine-allyson-powell

Post on 20-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Self Organizing Maps (SOM) Unsupervised Learning

Self Organizing Maps (SOM)

Unsupervised Learning

Page 2: Self Organizing Maps (SOM) Unsupervised Learning

Self Organizing Maps

T. Kohonen (1995), Self-Organizing Maps.

T. KohonenDr. Eng., Emeritus Professor of the Academy of Finland

His research areas are the theory of self-organization, associative memories, neural networks, and pattern recognition, in which he has published over 300 research papers and four monography books.

Page 3: Self Organizing Maps (SOM) Unsupervised Learning

SOM – What is it?

• The most popular ANN algorithm in the unsupervised learning category

• Converts relationships between high-dimensional data items into simple geometric relationships on a low-dimensional display

• Compresses information while preserving the most important topological and metric relationships of the primary data items

• Data visualization, feature extraction, pattern classification, adaptive control of robots, etc.

Page 4: Self Organizing Maps (SOM) Unsupervised Learning

Vector quantization (VQ)

Signal approximation method that forms an approximation to the probability density function p(x) of stochastic variable x using a finite number of so-called codebook vectors (reference vectors/basis vectors) wi, i=1, 2,…,k.

Finding closest reference vector wc:

c = arg mini {||x-wi||},

where, ||x-wi|| - Euclidian norm

Reference vector wi

Voronoi set

Page 5: Self Organizing Maps (SOM) Unsupervised Learning

VQ: OptimizationAverage expected square of quantization error:

E = ||x-wc||2p(x)dx∫For every x, with occurrance probability given via p(x), we calculate the error how good some wc would approximate x and then integrate over all x to get the total error.

Gradient descent method:

= ci (x-wi), ci – Kronecker delta (=1 for c=i, 0 otherwise) dwi

dt

Gradient descent is used to find those wc for which the error is minimal.

Page 6: Self Organizing Maps (SOM) Unsupervised Learning

SOM: Feed-forward network

X …

W…

Page 7: Self Organizing Maps (SOM) Unsupervised Learning

SOM: Components

Inputs: x

Weights: w

X=(R,G,B) is a vector!Of which we have sixhere.

We use 16 codebook vectors (you can choose how many!)

Page 8: Self Organizing Maps (SOM) Unsupervised Learning

SOM: Algorithm

1. Initialize map (weights)

2. Select a sample (input)

3. Determine neighbors

4. Change weights

5. Repeat from 2 for a finite number of steps

Page 9: Self Organizing Maps (SOM) Unsupervised Learning

SOM: possible weight initialization methods

•Random initialization

•Using initial samples

•Ordering

Page 10: Self Organizing Maps (SOM) Unsupervised Learning

SOM: determining neighbors

Hexagonal grid Rectangular grid

Page 11: Self Organizing Maps (SOM) Unsupervised Learning

SOM: Gaussian neighborhood function

hci= exp(- ||rc-ri||2

2t2 )

Page 12: Self Organizing Maps (SOM) Unsupervised Learning

SOM: Neighborhood functions

Page 13: Self Organizing Maps (SOM) Unsupervised Learning

SOM: Learning rule

= thci (x-wi), 0<<1, hci – neighbourhood functiondwi

dt

= ci (x-wi), ci – Kronecker delta (=1 for c=i, 0 otherwise) dwi

dt

Gradient-descent method for VQ:

SOM learning rule:

Page 14: Self Organizing Maps (SOM) Unsupervised Learning

SOM: Learning rate function

0 100 200 300 400 500 600 700 800 900 10000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Linear: t=0(1-t(1/T))

t

Time (steps)

Power series: t=0(0/T)t/T

Inverse-of-time: t=a/(t+b)

0 – initial learning rate

T – final learning rate

a, b – constants

Page 15: Self Organizing Maps (SOM) Unsupervised Learning

W

X

SOM: Weight development ex.1

Neighborhood relationships are usually preserved (+)

Final structure depends on initial condition and cannot be predicted (-)

Eight Inputs

40x40 codebook vectors

Page 16: Self Organizing Maps (SOM) Unsupervised Learning

SOM: Weight development

Time (steps)

wi

Page 18: Self Organizing Maps (SOM) Unsupervised Learning

SOM: Examples of mapsGood, all neighbors meet! Bad, some neighbors stay apart!

Bad

Bad

Bad cases could be avoided by non-random initialization!

Bad

Page 20: Self Organizing Maps (SOM) Unsupervised Learning

SOM: Calculating goodness of fit

dj = ∑ir ||wj-wi||

Average distance to neighboring cells:

1r

Where i=1…r, r is the number of neighboring cells, andj=1…N, N is the number of reference vectors w.

The „amount of grey“ measures how good neighbors meet. The less grey the better!

Page 21: Self Organizing Maps (SOM) Unsupervised Learning

SOM: Examples of grey-level maps

Worse

Better

Page 22: Self Organizing Maps (SOM) Unsupervised Learning

X

SOM: Classification

1) Use input vector:

W2) Do SOM:

3) Take example:

4) Look in SOM Map who is close to example!

5) Here is your cluster for classification!

Some more examples

Page 23: Self Organizing Maps (SOM) Unsupervised Learning

Biological SOM model

= thci (x-wiwiTx), 0<<=1, hci – neighbourhood function

dwi

dt

Biological SOM equation:

= thci (x-wi), 0<<1, hci – neighbourhood functiondwi

dt

SOM learning rule:

Page 24: Self Organizing Maps (SOM) Unsupervised Learning

Variants of SOM

• Neuron-specific learning rates and neighborhood sizes

• Adaptive or flexible neighborhood definitions

• Growing map structures

Page 25: Self Organizing Maps (SOM) Unsupervised Learning

“Batch map”

1. Initialize weights (the first K training samples, where K is the number of weights)

2. For each map unit i collect a list of copies of all those training samples x whose nearest reference vector belongs to the topological neighborhood set Ni of unit i

3. Update weights by taking weighted average of the respective list

4. Repeat from 2 a few times

wi =∑j hic(j)xj

∑j hic(j)

Learning equation:(K-means algorithm)

Page 26: Self Organizing Maps (SOM) Unsupervised Learning

LVQ-SOM

= thci(x-wi), if x and wi belong to the same classdwi

dt

= - thci(x-wi), if x and wi belong to different classesdwi

dt

0<t < 1 is the learning rate and decreases monotonically with time

Page 27: Self Organizing Maps (SOM) Unsupervised Learning

Orientation maps using SOM

Orientation map in visual cortex of monkey

Combined ocular dominance and orientation map. Input consists of rotationally symmetric stimuli (Brockmann et al, 1997)

Stimuli

Page 28: Self Organizing Maps (SOM) Unsupervised Learning

Image analysis

Learned basis vectors from natural images. The sampling vectors consisted of 15 by 15 pixels.

Kohonen, 1995

Page 29: Self Organizing Maps (SOM) Unsupervised Learning

Place cells developed by SOM

As SOMs have lateral connections, one gets a spatially ordered set of PFs, which is biologically unrealistic.

Page 30: Self Organizing Maps (SOM) Unsupervised Learning

Place cells developed by VQ

In VQ we do not have lateral connections. Thus one gets no orderint, which is biologically more realistic.

Page 31: Self Organizing Maps (SOM) Unsupervised Learning

Learning perception-action cycles

Page 32: Self Organizing Maps (SOM) Unsupervised Learning

Features

x

Page 33: Self Organizing Maps (SOM) Unsupervised Learning

0 200 400 600 800 1000

-50

0

50

0 200 400 600 800 1000250

300

350

400

0 200 400 600 800 1000-100

0

100

Input and output signals

x

Steering (s)

Inputs

Output

Page 34: Self Organizing Maps (SOM) Unsupervised Learning

Learning procedure

x

x

Input Layer

Output Layer

Associative LayerSteering (sa)

Page 35: Self Organizing Maps (SOM) Unsupervised Learning

Learning procedure

For training we used 1000 data samples which contained input-ouput pairs:

(t), x(t) -> s(t).

We initialize weights and values for SOM from our data set:k(j), x(k) = x(j) and sa(k)=s(j), where k=1…250 and j denotes indices of random samples from data set.

Initialization

Learning

1. Select a random sample and present it to the network X(i) = {a(i), x(i)}

2. Find a best matching unit by c=arg min ||X(i)-W(k)||

3. Update weights W and values of associated output sa by

4. Repeat for a finite number of times

= thci (x(i)-k) ,dk

dt= thci (s(i)-s k)

ds k

dt a

a

Page 36: Self Organizing Maps (SOM) Unsupervised Learning

Generalization and smoothing

x

x

Training data

Steering (s)

Learned weights and steering

Steering (sa)

Page 37: Self Organizing Maps (SOM) Unsupervised Learning

Learned steering actions

0 100 200 300 400 500 600 700 800 900 1000-150

-100

-50

0

50

100

150

Time (Steps)

Ste

erin

g

Real (s)Learnt (sa)

With 250 neurons we were able relatively well to approximate human behavior