the dynamics of learning vector...

30
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005 The Dynamics of Learning Vector Quantization The Dynamics of Learning Vector Quantization The Dynamics of Learning Vector Quantization The Dynamics of Learning Vector Quantization Rijksuniversiteit Groningen Mathematics and Computing Science Michael Biehl, Anarta Ghosh TU Clausthal-Zellerfeld Institute of Computing Science Barbara Hammer

Upload: hanhan

Post on 08-Aug-2019

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

The Dynamics of Learning Vector QuantizationThe Dynamics of Learning Vector QuantizationThe Dynamics of Learning Vector QuantizationThe Dynamics of Learning Vector Quantization

Rijksuniversiteit Groningen

Mathematics and Computing Science

Michael Biehl Anarta Ghosh

TU Clausthal-Zellerfeld

Institute of Computing Science

Barbara Hammer

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization (VQ)

Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)

IntroductionIntroductionIntroductionIntroduction

The dynamics of learningThe dynamics of learningThe dynamics of learningThe dynamics of learning

a model situation randomized data

learning algorithms for VQ und LVQ

analysis and comparison dynamics success of learning

SummarySummarySummarySummary

OutlookOutlookOutlookOutlook

prototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning from example datarepresentation classificationclassificationclassificationclassification

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization (VQ) Vector Quantization (VQ) Vector Quantization (VQ) Vector Quantization (VQ)

aim

representation of large amounts

of data by (few) prototype vectorsprototype vectorsprototype vectorsprototype vectors

example

identification and grouping

in clusters clusters clusters clusters of similar data

assignment of feature vector ξξξξto the closest closest closest closest prototypeprototypeprototypeprototype wwww

(similarity or distance measure

eg Euclidean distance )

The Dynamics of Learning Vector Quantization RUG 10012005

unsupervised competitive learningunsupervised competitive learningunsupervised competitive learningunsupervised competitive learning

bull initialize K prototype vectors

bull present a single example

bull identify the closest prototypeie the so-called winner winner winner winner

bull move the winner even closer towards the example

intuitively clear plausible procedure

- places prototypes in areas with high density of data

- identifies the most relevant combinations of features

- (stochastic) onononon----line gradient descent line gradient descent line gradient descent line gradient descent with respect to

the cost function cost function cost function cost function

The Dynamics of Learning Vector Quantization RUG 10012005

quantization errorquantization errorquantization errorquantization error

( ) ( )microj

microk

K

jk

P

1micro

jmicro

K

1j

VQ ddΘ2

wξH minusminus= prodsumsumne==

microjdprototypes data wj is the winner

here

Euclidean distance

aim faithful representation (in general ne clustering )

Result depends on - the number of prototype vectors

- the distance measure metric used

The Dynamics of Learning Vector Quantization RUG 10012005

Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)

aim

classification classification classification classification of data

learning from examples

LearningLearningLearningLearning choice of prototypes according to example data

example situtation

3 classes3 classes3 classes3 classes

classification

assignment of a vector ξξξξto the class of the closest

prototype w w w w

3 prototypes 3 prototypes 3 prototypes 3 prototypes

aim generalization abilitygeneralization abilitygeneralization abilitygeneralization ability ie correct classification

of novel data after training

The Dynamics of Learning Vector Quantization RUG 10012005

prominent example [Kohonen] ldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquo

bull present a single example

bull initialize prototype vectors(for different classes)

bull identify the closest correctand the closest wrong prototype

bull move the corresponding winnertowards away from the example

known convergence stability problems

eg for infrequent classes

mostly heuristicallyheuristicallyheuristicallyheuristically motivated variations of competitive learningcompetitive learningcompetitive learningcompetitive learning

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible

- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data

random vectors ξξξξ isin ℝN according to σ)P(p )P(

1σσ ξξξξξξξξ sum

plusmn=

=

( )( )

minus=

2

σN2-

2

1exp

1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians

orthonormal center vectors

BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0

prior weights of classes p+ p-p+ + p- = 1

BBBB+

BBBB-

(p+)

(p-)

separation ℓℓ

jj Bσσξ l=

22222l Nξ1ξξ

N

1σσ

+==rarr=minus sum=j

jjj ξξξξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Nrarrinfin)

400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro

By

ξξ ξξ sdot=

minusminus

(240)

(160)

projections into the plane of center vectors B+ B-

microBy ξξξξsdot= ++

micro2

2xξξ ξξ

ww wwsdot

=

(240)(160)

projections in two independent random directions wwww12

micro11x ξξξξwwww sdot=

model for studying typical behavior of LVQ algorithmsnot density-estimation based classification

NoteNoteNoteNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training

sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros

micross minusΘ= minus

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1

classcorrectclasswrong

+minus=sdot=

here two prototypes noexplicit competition

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+=

( )21minusminus=

plusmn=

micros

micromicrosd

1σS

wwwwξξξξ

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 2: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization (VQ)

Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)

IntroductionIntroductionIntroductionIntroduction

The dynamics of learningThe dynamics of learningThe dynamics of learningThe dynamics of learning

a model situation randomized data

learning algorithms for VQ und LVQ

analysis and comparison dynamics success of learning

SummarySummarySummarySummary

OutlookOutlookOutlookOutlook

prototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning from example datarepresentation classificationclassificationclassificationclassification

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization (VQ) Vector Quantization (VQ) Vector Quantization (VQ) Vector Quantization (VQ)

aim

representation of large amounts

of data by (few) prototype vectorsprototype vectorsprototype vectorsprototype vectors

example

identification and grouping

in clusters clusters clusters clusters of similar data

assignment of feature vector ξξξξto the closest closest closest closest prototypeprototypeprototypeprototype wwww

(similarity or distance measure

eg Euclidean distance )

The Dynamics of Learning Vector Quantization RUG 10012005

unsupervised competitive learningunsupervised competitive learningunsupervised competitive learningunsupervised competitive learning

bull initialize K prototype vectors

bull present a single example

bull identify the closest prototypeie the so-called winner winner winner winner

bull move the winner even closer towards the example

intuitively clear plausible procedure

- places prototypes in areas with high density of data

- identifies the most relevant combinations of features

- (stochastic) onononon----line gradient descent line gradient descent line gradient descent line gradient descent with respect to

the cost function cost function cost function cost function

The Dynamics of Learning Vector Quantization RUG 10012005

quantization errorquantization errorquantization errorquantization error

( ) ( )microj

microk

K

jk

P

1micro

jmicro

K

1j

VQ ddΘ2

wξH minusminus= prodsumsumne==

microjdprototypes data wj is the winner

here

Euclidean distance

aim faithful representation (in general ne clustering )

Result depends on - the number of prototype vectors

- the distance measure metric used

The Dynamics of Learning Vector Quantization RUG 10012005

Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)

aim

classification classification classification classification of data

learning from examples

LearningLearningLearningLearning choice of prototypes according to example data

example situtation

3 classes3 classes3 classes3 classes

classification

assignment of a vector ξξξξto the class of the closest

prototype w w w w

3 prototypes 3 prototypes 3 prototypes 3 prototypes

aim generalization abilitygeneralization abilitygeneralization abilitygeneralization ability ie correct classification

of novel data after training

The Dynamics of Learning Vector Quantization RUG 10012005

prominent example [Kohonen] ldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquo

bull present a single example

bull initialize prototype vectors(for different classes)

bull identify the closest correctand the closest wrong prototype

bull move the corresponding winnertowards away from the example

known convergence stability problems

eg for infrequent classes

mostly heuristicallyheuristicallyheuristicallyheuristically motivated variations of competitive learningcompetitive learningcompetitive learningcompetitive learning

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible

- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data

random vectors ξξξξ isin ℝN according to σ)P(p )P(

1σσ ξξξξξξξξ sum

plusmn=

=

( )( )

minus=

2

σN2-

2

1exp

1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians

orthonormal center vectors

BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0

prior weights of classes p+ p-p+ + p- = 1

BBBB+

BBBB-

(p+)

(p-)

separation ℓℓ

jj Bσσξ l=

22222l Nξ1ξξ

N

1σσ

+==rarr=minus sum=j

jjj ξξξξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Nrarrinfin)

400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro

By

ξξ ξξ sdot=

minusminus

(240)

(160)

projections into the plane of center vectors B+ B-

microBy ξξξξsdot= ++

micro2

2xξξ ξξ

ww wwsdot

=

(240)(160)

projections in two independent random directions wwww12

micro11x ξξξξwwww sdot=

model for studying typical behavior of LVQ algorithmsnot density-estimation based classification

NoteNoteNoteNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training

sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros

micross minusΘ= minus

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1

classcorrectclasswrong

+minus=sdot=

here two prototypes noexplicit competition

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+=

( )21minusminus=

plusmn=

micros

micromicrosd

1σS

wwwwξξξξ

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 3: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

Vector Quantization (VQ) Vector Quantization (VQ) Vector Quantization (VQ) Vector Quantization (VQ)

aim

representation of large amounts

of data by (few) prototype vectorsprototype vectorsprototype vectorsprototype vectors

example

identification and grouping

in clusters clusters clusters clusters of similar data

assignment of feature vector ξξξξto the closest closest closest closest prototypeprototypeprototypeprototype wwww

(similarity or distance measure

eg Euclidean distance )

The Dynamics of Learning Vector Quantization RUG 10012005

unsupervised competitive learningunsupervised competitive learningunsupervised competitive learningunsupervised competitive learning

bull initialize K prototype vectors

bull present a single example

bull identify the closest prototypeie the so-called winner winner winner winner

bull move the winner even closer towards the example

intuitively clear plausible procedure

- places prototypes in areas with high density of data

- identifies the most relevant combinations of features

- (stochastic) onononon----line gradient descent line gradient descent line gradient descent line gradient descent with respect to

the cost function cost function cost function cost function

The Dynamics of Learning Vector Quantization RUG 10012005

quantization errorquantization errorquantization errorquantization error

( ) ( )microj

microk

K

jk

P

1micro

jmicro

K

1j

VQ ddΘ2

wξH minusminus= prodsumsumne==

microjdprototypes data wj is the winner

here

Euclidean distance

aim faithful representation (in general ne clustering )

Result depends on - the number of prototype vectors

- the distance measure metric used

The Dynamics of Learning Vector Quantization RUG 10012005

Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)

aim

classification classification classification classification of data

learning from examples

LearningLearningLearningLearning choice of prototypes according to example data

example situtation

3 classes3 classes3 classes3 classes

classification

assignment of a vector ξξξξto the class of the closest

prototype w w w w

3 prototypes 3 prototypes 3 prototypes 3 prototypes

aim generalization abilitygeneralization abilitygeneralization abilitygeneralization ability ie correct classification

of novel data after training

The Dynamics of Learning Vector Quantization RUG 10012005

prominent example [Kohonen] ldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquo

bull present a single example

bull initialize prototype vectors(for different classes)

bull identify the closest correctand the closest wrong prototype

bull move the corresponding winnertowards away from the example

known convergence stability problems

eg for infrequent classes

mostly heuristicallyheuristicallyheuristicallyheuristically motivated variations of competitive learningcompetitive learningcompetitive learningcompetitive learning

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible

- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data

random vectors ξξξξ isin ℝN according to σ)P(p )P(

1σσ ξξξξξξξξ sum

plusmn=

=

( )( )

minus=

2

σN2-

2

1exp

1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians

orthonormal center vectors

BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0

prior weights of classes p+ p-p+ + p- = 1

BBBB+

BBBB-

(p+)

(p-)

separation ℓℓ

jj Bσσξ l=

22222l Nξ1ξξ

N

1σσ

+==rarr=minus sum=j

jjj ξξξξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Nrarrinfin)

400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro

By

ξξ ξξ sdot=

minusminus

(240)

(160)

projections into the plane of center vectors B+ B-

microBy ξξξξsdot= ++

micro2

2xξξ ξξ

ww wwsdot

=

(240)(160)

projections in two independent random directions wwww12

micro11x ξξξξwwww sdot=

model for studying typical behavior of LVQ algorithmsnot density-estimation based classification

NoteNoteNoteNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training

sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros

micross minusΘ= minus

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1

classcorrectclasswrong

+minus=sdot=

here two prototypes noexplicit competition

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+=

( )21minusminus=

plusmn=

micros

micromicrosd

1σS

wwwwξξξξ

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 4: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

unsupervised competitive learningunsupervised competitive learningunsupervised competitive learningunsupervised competitive learning

bull initialize K prototype vectors

bull present a single example

bull identify the closest prototypeie the so-called winner winner winner winner

bull move the winner even closer towards the example

intuitively clear plausible procedure

- places prototypes in areas with high density of data

- identifies the most relevant combinations of features

- (stochastic) onononon----line gradient descent line gradient descent line gradient descent line gradient descent with respect to

the cost function cost function cost function cost function

The Dynamics of Learning Vector Quantization RUG 10012005

quantization errorquantization errorquantization errorquantization error

( ) ( )microj

microk

K

jk

P

1micro

jmicro

K

1j

VQ ddΘ2

wξH minusminus= prodsumsumne==

microjdprototypes data wj is the winner

here

Euclidean distance

aim faithful representation (in general ne clustering )

Result depends on - the number of prototype vectors

- the distance measure metric used

The Dynamics of Learning Vector Quantization RUG 10012005

Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)

aim

classification classification classification classification of data

learning from examples

LearningLearningLearningLearning choice of prototypes according to example data

example situtation

3 classes3 classes3 classes3 classes

classification

assignment of a vector ξξξξto the class of the closest

prototype w w w w

3 prototypes 3 prototypes 3 prototypes 3 prototypes

aim generalization abilitygeneralization abilitygeneralization abilitygeneralization ability ie correct classification

of novel data after training

The Dynamics of Learning Vector Quantization RUG 10012005

prominent example [Kohonen] ldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquo

bull present a single example

bull initialize prototype vectors(for different classes)

bull identify the closest correctand the closest wrong prototype

bull move the corresponding winnertowards away from the example

known convergence stability problems

eg for infrequent classes

mostly heuristicallyheuristicallyheuristicallyheuristically motivated variations of competitive learningcompetitive learningcompetitive learningcompetitive learning

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible

- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data

random vectors ξξξξ isin ℝN according to σ)P(p )P(

1σσ ξξξξξξξξ sum

plusmn=

=

( )( )

minus=

2

σN2-

2

1exp

1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians

orthonormal center vectors

BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0

prior weights of classes p+ p-p+ + p- = 1

BBBB+

BBBB-

(p+)

(p-)

separation ℓℓ

jj Bσσξ l=

22222l Nξ1ξξ

N

1σσ

+==rarr=minus sum=j

jjj ξξξξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Nrarrinfin)

400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro

By

ξξ ξξ sdot=

minusminus

(240)

(160)

projections into the plane of center vectors B+ B-

microBy ξξξξsdot= ++

micro2

2xξξ ξξ

ww wwsdot

=

(240)(160)

projections in two independent random directions wwww12

micro11x ξξξξwwww sdot=

model for studying typical behavior of LVQ algorithmsnot density-estimation based classification

NoteNoteNoteNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training

sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros

micross minusΘ= minus

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1

classcorrectclasswrong

+minus=sdot=

here two prototypes noexplicit competition

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+=

( )21minusminus=

plusmn=

micros

micromicrosd

1σS

wwwwξξξξ

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 5: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

quantization errorquantization errorquantization errorquantization error

( ) ( )microj

microk

K

jk

P

1micro

jmicro

K

1j

VQ ddΘ2

wξH minusminus= prodsumsumne==

microjdprototypes data wj is the winner

here

Euclidean distance

aim faithful representation (in general ne clustering )

Result depends on - the number of prototype vectors

- the distance measure metric used

The Dynamics of Learning Vector Quantization RUG 10012005

Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)

aim

classification classification classification classification of data

learning from examples

LearningLearningLearningLearning choice of prototypes according to example data

example situtation

3 classes3 classes3 classes3 classes

classification

assignment of a vector ξξξξto the class of the closest

prototype w w w w

3 prototypes 3 prototypes 3 prototypes 3 prototypes

aim generalization abilitygeneralization abilitygeneralization abilitygeneralization ability ie correct classification

of novel data after training

The Dynamics of Learning Vector Quantization RUG 10012005

prominent example [Kohonen] ldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquo

bull present a single example

bull initialize prototype vectors(for different classes)

bull identify the closest correctand the closest wrong prototype

bull move the corresponding winnertowards away from the example

known convergence stability problems

eg for infrequent classes

mostly heuristicallyheuristicallyheuristicallyheuristically motivated variations of competitive learningcompetitive learningcompetitive learningcompetitive learning

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible

- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data

random vectors ξξξξ isin ℝN according to σ)P(p )P(

1σσ ξξξξξξξξ sum

plusmn=

=

( )( )

minus=

2

σN2-

2

1exp

1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians

orthonormal center vectors

BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0

prior weights of classes p+ p-p+ + p- = 1

BBBB+

BBBB-

(p+)

(p-)

separation ℓℓ

jj Bσσξ l=

22222l Nξ1ξξ

N

1σσ

+==rarr=minus sum=j

jjj ξξξξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Nrarrinfin)

400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro

By

ξξ ξξ sdot=

minusminus

(240)

(160)

projections into the plane of center vectors B+ B-

microBy ξξξξsdot= ++

micro2

2xξξ ξξ

ww wwsdot

=

(240)(160)

projections in two independent random directions wwww12

micro11x ξξξξwwww sdot=

model for studying typical behavior of LVQ algorithmsnot density-estimation based classification

NoteNoteNoteNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training

sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros

micross minusΘ= minus

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1

classcorrectclasswrong

+minus=sdot=

here two prototypes noexplicit competition

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+=

( )21minusminus=

plusmn=

micros

micromicrosd

1σS

wwwwξξξξ

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 6: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)

aim

classification classification classification classification of data

learning from examples

LearningLearningLearningLearning choice of prototypes according to example data

example situtation

3 classes3 classes3 classes3 classes

classification

assignment of a vector ξξξξto the class of the closest

prototype w w w w

3 prototypes 3 prototypes 3 prototypes 3 prototypes

aim generalization abilitygeneralization abilitygeneralization abilitygeneralization ability ie correct classification

of novel data after training

The Dynamics of Learning Vector Quantization RUG 10012005

prominent example [Kohonen] ldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquo

bull present a single example

bull initialize prototype vectors(for different classes)

bull identify the closest correctand the closest wrong prototype

bull move the corresponding winnertowards away from the example

known convergence stability problems

eg for infrequent classes

mostly heuristicallyheuristicallyheuristicallyheuristically motivated variations of competitive learningcompetitive learningcompetitive learningcompetitive learning

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible

- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data

random vectors ξξξξ isin ℝN according to σ)P(p )P(

1σσ ξξξξξξξξ sum

plusmn=

=

( )( )

minus=

2

σN2-

2

1exp

1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians

orthonormal center vectors

BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0

prior weights of classes p+ p-p+ + p- = 1

BBBB+

BBBB-

(p+)

(p-)

separation ℓℓ

jj Bσσξ l=

22222l Nξ1ξξ

N

1σσ

+==rarr=minus sum=j

jjj ξξξξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Nrarrinfin)

400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro

By

ξξ ξξ sdot=

minusminus

(240)

(160)

projections into the plane of center vectors B+ B-

microBy ξξξξsdot= ++

micro2

2xξξ ξξ

ww wwsdot

=

(240)(160)

projections in two independent random directions wwww12

micro11x ξξξξwwww sdot=

model for studying typical behavior of LVQ algorithmsnot density-estimation based classification

NoteNoteNoteNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training

sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros

micross minusΘ= minus

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1

classcorrectclasswrong

+minus=sdot=

here two prototypes noexplicit competition

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+=

( )21minusminus=

plusmn=

micros

micromicrosd

1σS

wwwwξξξξ

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 7: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

prominent example [Kohonen] ldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquo

bull present a single example

bull initialize prototype vectors(for different classes)

bull identify the closest correctand the closest wrong prototype

bull move the corresponding winnertowards away from the example

known convergence stability problems

eg for infrequent classes

mostly heuristicallyheuristicallyheuristicallyheuristically motivated variations of competitive learningcompetitive learningcompetitive learningcompetitive learning

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible

- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data

random vectors ξξξξ isin ℝN according to σ)P(p )P(

1σσ ξξξξξξξξ sum

plusmn=

=

( )( )

minus=

2

σN2-

2

1exp

1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians

orthonormal center vectors

BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0

prior weights of classes p+ p-p+ + p- = 1

BBBB+

BBBB-

(p+)

(p-)

separation ℓℓ

jj Bσσξ l=

22222l Nξ1ξξ

N

1σσ

+==rarr=minus sum=j

jjj ξξξξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Nrarrinfin)

400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro

By

ξξ ξξ sdot=

minusminus

(240)

(160)

projections into the plane of center vectors B+ B-

microBy ξξξξsdot= ++

micro2

2xξξ ξξ

ww wwsdot

=

(240)(160)

projections in two independent random directions wwww12

micro11x ξξξξwwww sdot=

model for studying typical behavior of LVQ algorithmsnot density-estimation based classification

NoteNoteNoteNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training

sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros

micross minusΘ= minus

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1

classcorrectclasswrong

+minus=sdot=

here two prototypes noexplicit competition

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+=

( )21minusminus=

plusmn=

micros

micromicrosd

1σS

wwwwξξξξ

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 8: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are frequently applied in a variety of problems involving

the classification of structured data a few examples

- appear plausible intuitive flexible

- are fast easy to implement

- real time speech recognition

- medical diagnosis eg from histological data

- texture recognition and classification

- gene expression data analysis

-

The Dynamics of Learning Vector Quantization RUG 10012005

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data

random vectors ξξξξ isin ℝN according to σ)P(p )P(

1σσ ξξξξξξξξ sum

plusmn=

=

( )( )

minus=

2

σN2-

2

1exp

1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians

orthonormal center vectors

BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0

prior weights of classes p+ p-p+ + p- = 1

BBBB+

BBBB-

(p+)

(p-)

separation ℓℓ

jj Bσσξ l=

22222l Nξ1ξξ

N

1σσ

+==rarr=minus sum=j

jjj ξξξξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Nrarrinfin)

400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro

By

ξξ ξξ sdot=

minusminus

(240)

(160)

projections into the plane of center vectors B+ B-

microBy ξξξξsdot= ++

micro2

2xξξ ξξ

ww wwsdot

=

(240)(160)

projections in two independent random directions wwww12

micro11x ξξξξwwww sdot=

model for studying typical behavior of LVQ algorithmsnot density-estimation based classification

NoteNoteNoteNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training

sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros

micross minusΘ= minus

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1

classcorrectclasswrong

+minus=sdot=

here two prototypes noexplicit competition

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+=

( )21minusminus=

plusmn=

micros

micromicrosd

1σS

wwwwξξξξ

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 9: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data

random vectors ξξξξ isin ℝN according to σ)P(p )P(

1σσ ξξξξξξξξ sum

plusmn=

=

( )( )

minus=

2

σN2-

2

1exp

1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians

orthonormal center vectors

BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0

prior weights of classes p+ p-p+ + p- = 1

BBBB+

BBBB-

(p+)

(p-)

separation ℓℓ

jj Bσσξ l=

22222l Nξ1ξξ

N

1σσ

+==rarr=minus sum=j

jjj ξξξξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Nrarrinfin)

400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro

By

ξξ ξξ sdot=

minusminus

(240)

(160)

projections into the plane of center vectors B+ B-

microBy ξξξξsdot= ++

micro2

2xξξ ξξ

ww wwsdot

=

(240)(160)

projections in two independent random directions wwww12

micro11x ξξξξwwww sdot=

model for studying typical behavior of LVQ algorithmsnot density-estimation based classification

NoteNoteNoteNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training

sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros

micross minusΘ= minus

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1

classcorrectclasswrong

+minus=sdot=

here two prototypes noexplicit competition

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+=

( )21minusminus=

plusmn=

micros

micromicrosd

1σS

wwwwξξξξ

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 10: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

healthy cells damaged cells

prototypes obtained by LVQ (1)

illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data

random vectors ξξξξ isin ℝN according to σ)P(p )P(

1σσ ξξξξξξξξ sum

plusmn=

=

( )( )

minus=

2

σN2-

2

1exp

1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians

orthonormal center vectors

BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0

prior weights of classes p+ p-p+ + p- = 1

BBBB+

BBBB-

(p+)

(p-)

separation ℓℓ

jj Bσσξ l=

22222l Nξ1ξξ

N

1σσ

+==rarr=minus sum=j

jjj ξξξξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Nrarrinfin)

400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro

By

ξξ ξξ sdot=

minusminus

(240)

(160)

projections into the plane of center vectors B+ B-

microBy ξξξξsdot= ++

micro2

2xξξ ξξ

ww wwsdot

=

(240)(160)

projections in two independent random directions wwww12

micro11x ξξξξwwww sdot=

model for studying typical behavior of LVQ algorithmsnot density-estimation based classification

NoteNoteNoteNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training

sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros

micross minusΘ= minus

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1

classcorrectclasswrong

+minus=sdot=

here two prototypes noexplicit competition

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+=

( )21minusminus=

plusmn=

micros

micromicrosd

1σS

wwwwξξξξ

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 11: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms

- are often based on purely heuristic arguments

or derived from a cost function with unclear

relation to the generalization ability

- almost exclusively use the Euclidean distance measure

inappropriate for heterogeneous data

- lack in general a thorough theoretical understanding of

dynamics convergence properties

performance wrt generalization etc

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data

random vectors ξξξξ isin ℝN according to σ)P(p )P(

1σσ ξξξξξξξξ sum

plusmn=

=

( )( )

minus=

2

σN2-

2

1exp

1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians

orthonormal center vectors

BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0

prior weights of classes p+ p-p+ + p- = 1

BBBB+

BBBB-

(p+)

(p-)

separation ℓℓ

jj Bσσξ l=

22222l Nξ1ξξ

N

1σσ

+==rarr=minus sum=j

jjj ξξξξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Nrarrinfin)

400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro

By

ξξ ξξ sdot=

minusminus

(240)

(160)

projections into the plane of center vectors B+ B-

microBy ξξξξsdot= ++

micro2

2xξξ ξξ

ww wwsdot

=

(240)(160)

projections in two independent random directions wwww12

micro11x ξξξξwwww sdot=

model for studying typical behavior of LVQ algorithmsnot density-estimation based classification

NoteNoteNoteNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training

sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros

micross minusΘ= minus

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1

classcorrectclasswrong

+minus=sdot=

here two prototypes noexplicit competition

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+=

( )21minusminus=

plusmn=

micros

micromicrosd

1σS

wwwwξξξξ

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 12: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

In the following

analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- asymptotic behavior in the limit of many examples

typical behavior in a model situation

- randomized high-dimensional data

- essential features of LVQ learning

aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications

The Dynamics of Learning Vector Quantization RUG 10012005

model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data

random vectors ξξξξ isin ℝN according to σ)P(p )P(

1σσ ξξξξξξξξ sum

plusmn=

=

( )( )

minus=

2

σN2-

2

1exp

1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians

orthonormal center vectors

BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0

prior weights of classes p+ p-p+ + p- = 1

BBBB+

BBBB-

(p+)

(p-)

separation ℓℓ

jj Bσσξ l=

22222l Nξ1ξξ

N

1σσ

+==rarr=minus sum=j

jjj ξξξξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Nrarrinfin)

400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro

By

ξξ ξξ sdot=

minusminus

(240)

(160)

projections into the plane of center vectors B+ B-

microBy ξξξξsdot= ++

micro2

2xξξ ξξ

ww wwsdot

=

(240)(160)

projections in two independent random directions wwww12

micro11x ξξξξwwww sdot=

model for studying typical behavior of LVQ algorithmsnot density-estimation based classification

NoteNoteNoteNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training

sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros

micross minusΘ= minus

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1

classcorrectclasswrong

+minus=sdot=

here two prototypes noexplicit competition

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+=

( )21minusminus=

plusmn=

micros

micromicrosd

1σS

wwwwξξξξ

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 13: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data

random vectors ξξξξ isin ℝN according to σ)P(p )P(

1σσ ξξξξξξξξ sum

plusmn=

=

( )( )

minus=

2

σN2-

2

1exp

1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians

orthonormal center vectors

BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0

prior weights of classes p+ p-p+ + p- = 1

BBBB+

BBBB-

(p+)

(p-)

separation ℓℓ

jj Bσσξ l=

22222l Nξ1ξξ

N

1σσ

+==rarr=minus sum=j

jjj ξξξξ

independent components

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Nrarrinfin)

400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro

By

ξξ ξξ sdot=

minusminus

(240)

(160)

projections into the plane of center vectors B+ B-

microBy ξξξξsdot= ++

micro2

2xξξ ξξ

ww wwsdot

=

(240)(160)

projections in two independent random directions wwww12

micro11x ξξξξwwww sdot=

model for studying typical behavior of LVQ algorithmsnot density-estimation based classification

NoteNoteNoteNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training

sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros

micross minusΘ= minus

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1

classcorrectclasswrong

+minus=sdot=

here two prototypes noexplicit competition

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+=

( )21minusminus=

plusmn=

micros

micromicrosd

1σS

wwwwξξξξ

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 14: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

high-dimensional data (formally Nrarrinfin)

400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro

By

ξξ ξξ sdot=

minusminus

(240)

(160)

projections into the plane of center vectors B+ B-

microBy ξξξξsdot= ++

micro2

2xξξ ξξ

ww wwsdot

=

(240)(160)

projections in two independent random directions wwww12

micro11x ξξξξwwww sdot=

model for studying typical behavior of LVQ algorithmsnot density-estimation based classification

NoteNoteNoteNote

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training

sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros

micross minusΘ= minus

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1

classcorrectclasswrong

+minus=sdot=

here two prototypes noexplicit competition

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+=

( )21minusminus=

plusmn=

micros

micromicrosd

1σS

wwwwξξξξ

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 15: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training

sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

above examples

unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros

micross minusΘ= minus

The Winner Takes It All (classes irrelevantunknown)

Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1

classcorrectclasswrong

+minus=sdot=

here two prototypes noexplicit competition

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+=

( )21minusminus=

plusmn=

micros

micromicrosd

1σS

wwwwξξξξ

update of prototype vectors

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 16: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

[ ] ( )

[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν

1Οffη QxfηQxfη1N

QQ

Ryfη1N

RR

ts1-micro

stmicrost

1-microst

microts

1-microst

microst

1-microsσ

microσs

1-microsσ

microsσ

++minus+minus=minus

minus=minus

2

[ ] ( ) 1-micros

micromicros-

micross

1-micros

micros σSddf

N

ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions

mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics

( ) ( ) 1221 -micross

micros

micromicros

micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ

micromicromicro1-micros

micros ξByx sdot=sdot= ττξξξξwwwwprojections

distances

random vector ξmicro enters only in the form of

( )11 +minusisinsdot=sdot= σtsmicrot

micros

microstσ

micros

microsσ QBR wwwwwwwwwwww

projections in the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities

( here ℝ2N rarr ℝ7 )

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 17: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

N

1jjσjsσ

N

1jjsσs R x ll === sumsum

==

Bww jξ

completely specified in terms of first and second moments (wo indices micro)

in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin

random vector acc to σ)|P( micro rarrξξξξmicromicro

micro1-micros

micros

By

wx

ξξξξ

ξξξξ

sdot=

sdot=

ττ

correlated Gaussianrandom quantities

stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =

ρττρτρ δ yy- yyσσσ

===

=

else

σ ifsσσ

y0

Sl

l δτ

2 average over the current example2 average over the current example2 average over the current example2 average over the current example

rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ

1σσ LL sum

plusmn=

=

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 18: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)

microsσ

microst

R Q

learning dynamics is completely described in terms of averagesaveragesaveragesaverages

3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties

4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time

N

micro α =

of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst

recursions rarr coupled ordinary differential equations

rarr evolution of projections

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 19: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

probability for misclassification of a novel example

( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε

( ) ( )

Φminus+

Φ=

minusminus+

minus+minus

++

minus+minus

minusminusminus

++minus

minusminus

minusminus+

minus+minus

++

+minusminus

++minus

minusminusminus

+++

QQQ

RR2QQ

QQQ

RR2QQpp

22 2

1

2

1 ll

L

5 learning curve5 learning curve5 learning curve5 learning curve

generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization

-

investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions

maximizeα

g

d

d εεεε

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 20: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error

BBBB-

BBBB+

(p-gtp+ )

(p+)

separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)

excess error

minimal εg as a function

of prior weights ℓ=2

εg

025

050

005 100 p+

ℓ=1

ℓ=0

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 21: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

ldquoLVQ 21ldquo update the correct and wrong winner

( ) 1-micros

micro1-micros

micros Sσ

N

ηwwwwξξξξwwwwwwww minus+=

(analytical)integrationfor wwwws(0) = 0

( ) ( )

( ) ( ) KKll

Kll

αmηαmη

αmηαmη

e12

m1

mRe1

2

m1

mR

Qe12

m1

mRe1

2

m1

mR

++minusminusminus

++minus

minus+minus

++

minus+

=+minusminus

minus=

=minusminus

minus=minus+

=

pσ = (1+m σ ) 2 (mgt0)

[Seo Obermeyer] LVQ21 ս cost function

(likelihood ratios)

infinrarrinfinrarrminus+minusminus+minusminusminus

++minus+++

αQQRR

Q R R

with

finite remain

Q ++ R ++ R minus+

R +minus Q minus+

Q minusminus R minusminus

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 22: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

(p- )

(p+gt p-)

sssstrategiestrategiestrategiestrategies

- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]

density-estimation based cost function

limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only

if the example is currently misclassified

slow learning poor generalization

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αrarrinfin

εg = max p+p-

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 23: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo

numericalintegrationfor wwwws(0)=0

theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs

Q++

Q--

Q+-

α

wwww++++

wwww----

ℓℓℓℓ BBBB++++

ℓℓℓℓ BBBB----

trajectories in the (B+B- )-plane

(bull) α=2040140

optimal decision boundary____ asymptotic position

RS+

RS-

R--

R-+

R--

R++

winner wwwws plusmn1

I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros

micromicromicroS

microS

1-micros

micros Sσdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

only the winner is updated according to the class membership

wwww-

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 24: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

learning curvelearning curvelearning curvelearning curve

α

εg η=12

(p+=02 ℓ=12)

εg (αrarrinfin) grows lin with η

- stationary state

- role of the learning rate

α100 200 300

εg

026

022

018

0140

η

20

04

02

ηrarr0 - variable rate η(α)

- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics

(ODE linear in η)

10

εg

20 30 40 500014

026

022

018

min εg

(η α)

ηrarr0

η rarr0 αrarrinfin

( η α ) rarr infin

suboptimal

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 25: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo

II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)

[ ] ( ) ( ) 1-micros

microS

microσ

microS

microS

1-micros

micros δdd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

winner correct

αrarrinfin asymptotic configuration

symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2

wwww-

wwww+

ℓ ℓ ℓ ℓ BBBB+

ℓ ℓ ℓ ℓ BBBB-

p+=02 ℓ=12 η=12

classification scheme and the

achieved generalization error are

independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn

(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )

LVQ+ asymp VQ within the classes

(ws updated only from class S)

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 26: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

- LVQ 21

trivial assignment to the

more frequent class

optimal classification

εg

pppp++++

min p+p-

- LVQ 1

here close to optimal

classification

pppp++++

- LVQ+

min-max solution

pplusmn -independent classification

p+=02 ℓ=10 η=10εg

α

learning curveslearning curveslearning curveslearning curves

LVQ+

LVQ1

asymptotics ηrarr0 (ηα)rarrinfin

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 27: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

Vector QuantizationVector QuantizationVector QuantizationVector Quantization

competitive learning [ ] ( ) 1-micros

micromicroS

microS

1-micros

micros dd

N

ηwwwwξξξξwwwwwwww minusminusΘ+= minus

wwwws winner

class membership is unknown

or identical for all data

numerical integration for wwwws(0)asymp0

( p+=02 ℓ=10 η=12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10

system is invariant under

exchange of the prototypes

rarr weakly repulsive fixed points

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 28: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

interpretations

- VQ unsupervised learningunlabelled data

- LVQ two prototypes of thesame class identical labels

- LVQ different classes butlabels are not used in training

εg

pppp++++

asymptotics (αrarrηrarr0 ηαrarrinfin)

pppp++++asymp0 asymp0 asymp0 asymp0

pppp----asymp1 asymp1 asymp1 asymp1

- low quantization error

- high gen error εg

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 29: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook

bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]

bull model different cluster variances more clustersprototypes

bull optimized procedures learning rate schedules

variational approach density estimation Bayes optimal on-line

bull several classes and prototypes

Summary

bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning

Vector Quantization and Learning Vector Quantization

bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms

LVQ 21 instability trivial (stationary) classification

LVQ 1 close to optimal asymptotic generalization

LVQ + min-max solution wrt asymptotic generalization

VQ symmetry breaking representation

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications

Page 30: The Dynamics of Learning Vector Quantizationlibvolume3.xyz/computers/btech/semester6/datacompression/vectorquantization...The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

The Dynamics of Learning Vector Quantization RUG 10012005

Perspectives

bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]

adaptive metrics eg distance measure ( )sum=

minus=N

i

iii w

1

2)( sλ ξξwd λ

training

bullapplications applications applications applications