dynamical analysis of lvq type learning rules

23
namical Analysis of LVQ type algorithms, WSOM 2005 Dynamical analysis of LVQ type learning rule Rijksuniversiteit Groningen Mathematics and Computing Science http://www.cs.rug.nl/~biehl [email protected] Michael Biehl, Anarta Ghosh Clausthal University of Technology Institute of Computing Science Barbara Hammer

Upload: tariq

Post on 05-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Dynamical analysis of LVQ type learning rules. Barbara Hammer. Michael Biehl, Anarta Ghosh. Clausthal University of Technology Institute of Computing Science. Rijksuniversiteit Groningen Mathematics and Computing Science http://www.cs.rug.nl/~biehl [email protected]. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical analysis of LVQ type learning rules

Rijksuniversiteit Groningen

Mathematics and Computing Science

httpwwwcsrugnl~biehl

mbiehlrugnl

Michael Biehl Anarta GhoshClausthal University of Technology

Institute of Computing Science

Barbara Hammer

Dynamical Analysis of LVQ type algorithms WSOM 2005

bull identify the closest prototype ie the so-called winner

bull initialize prototype vectors for different classes

bull present a single example

bull move the winner - closer towards the data (same class)

- away from the data (different class)

classification

assignment of a vector to the class of the closest

prototype w

aim generalization ability

classification of novel data

after learning from examples

Learning Vector Quantization (LVQ)- identification of prototype vectors from labelled example data

- parameterization of distance based classification schemes

example basic LVQ scheme [Kohonen] ldquoLVQ 1rdquo

often heuristically motivated variations of competitive learning

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ algorithms

- frequently applied in a variety

of practical problems

- plausible intuitive flexible

- fast easy to implement

- often based on heuristic arguments

or cost functions with unclear relation to generalization

- limited theoretical understanding of

- dynamics and convergence properties

- achievable generalization ability

here analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- typical properties in a model situation

Dynamical Analysis of LVQ type algorithms WSOM 2005

Model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

σN2

σ

- v 2

1exp

v 2π

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation prop ℓ ℓ

jj Bσσξ

σσσvξξ

22jj

independent components

with variance

ℝN

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamics of on-line training

sequence of new independent random examples 123μσμμ ξ

drawn according to μμσ σPp μ ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

example

LVQ1 original formulation [Kohonen]

Winner-Takes-All (WTA) algorithm

μs

μs

μs d d σS f

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww 21

μs

μμs

μ

d

1σS

update of two prototype vectors w+ w-

Dynamical Analysis of LVQ type algorithms WSOM 2005

Nξffη QxfηQxfη1N

QQ

Ryfη1N

RR

μts

1-μst

μst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

22

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww

recursions

Mathematical analysis of the learning dynamics

μσ

μσ

μ1-μs

μs ξByx ξw

random vector ξμ enters only through

its length and the projections

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

Dynamical Analysis of LVQ type algorithms WSOM 2005

completely specified in terms of first and second moments

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging property

Dynamical Analysis of LVQ type algorithms WSOM 2005

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

1-μsσ

μσs

sσ1-μsσ

μσs

1-μsσ

μsσ Ryfη

dRRyfη

1N

RR

probability for misclassification of a novel example

ddpddp gε

QQQv

RR2QQ

QQQv

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

Dynamical Analysis of LVQ type algorithms WSOM 2005

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = min p+p- RS+

RS-

Dynamical Analysis of LVQ type algorithms WSOM 2005

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- lowest minimum assumed for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLearning From Mistakes (LFM)rdquo

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

work in progress outlook

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 1 close to optimal asymptotic generalization

LVQ 21 instability trivial (stationary) classification

+ stopping potentially very good performance

LFM far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 2: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

bull identify the closest prototype ie the so-called winner

bull initialize prototype vectors for different classes

bull present a single example

bull move the winner - closer towards the data (same class)

- away from the data (different class)

classification

assignment of a vector to the class of the closest

prototype w

aim generalization ability

classification of novel data

after learning from examples

Learning Vector Quantization (LVQ)- identification of prototype vectors from labelled example data

- parameterization of distance based classification schemes

example basic LVQ scheme [Kohonen] ldquoLVQ 1rdquo

often heuristically motivated variations of competitive learning

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ algorithms

- frequently applied in a variety

of practical problems

- plausible intuitive flexible

- fast easy to implement

- often based on heuristic arguments

or cost functions with unclear relation to generalization

- limited theoretical understanding of

- dynamics and convergence properties

- achievable generalization ability

here analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- typical properties in a model situation

Dynamical Analysis of LVQ type algorithms WSOM 2005

Model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

σN2

σ

- v 2

1exp

v 2π

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation prop ℓ ℓ

jj Bσσξ

σσσvξξ

22jj

independent components

with variance

ℝN

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamics of on-line training

sequence of new independent random examples 123μσμμ ξ

drawn according to μμσ σPp μ ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

example

LVQ1 original formulation [Kohonen]

Winner-Takes-All (WTA) algorithm

μs

μs

μs d d σS f

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww 21

μs

μμs

μ

d

1σS

update of two prototype vectors w+ w-

Dynamical Analysis of LVQ type algorithms WSOM 2005

Nξffη QxfηQxfη1N

QQ

Ryfη1N

RR

μts

1-μst

μst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

22

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww

recursions

Mathematical analysis of the learning dynamics

μσ

μσ

μ1-μs

μs ξByx ξw

random vector ξμ enters only through

its length and the projections

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

Dynamical Analysis of LVQ type algorithms WSOM 2005

completely specified in terms of first and second moments

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging property

Dynamical Analysis of LVQ type algorithms WSOM 2005

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

1-μsσ

μσs

sσ1-μsσ

μσs

1-μsσ

μsσ Ryfη

dRRyfη

1N

RR

probability for misclassification of a novel example

ddpddp gε

QQQv

RR2QQ

QQQv

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

Dynamical Analysis of LVQ type algorithms WSOM 2005

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = min p+p- RS+

RS-

Dynamical Analysis of LVQ type algorithms WSOM 2005

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- lowest minimum assumed for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLearning From Mistakes (LFM)rdquo

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

work in progress outlook

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 1 close to optimal asymptotic generalization

LVQ 21 instability trivial (stationary) classification

+ stopping potentially very good performance

LFM far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 3: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ algorithms

- frequently applied in a variety

of practical problems

- plausible intuitive flexible

- fast easy to implement

- often based on heuristic arguments

or cost functions with unclear relation to generalization

- limited theoretical understanding of

- dynamics and convergence properties

- achievable generalization ability

here analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- typical properties in a model situation

Dynamical Analysis of LVQ type algorithms WSOM 2005

Model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

σN2

σ

- v 2

1exp

v 2π

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation prop ℓ ℓ

jj Bσσξ

σσσvξξ

22jj

independent components

with variance

ℝN

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamics of on-line training

sequence of new independent random examples 123μσμμ ξ

drawn according to μμσ σPp μ ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

example

LVQ1 original formulation [Kohonen]

Winner-Takes-All (WTA) algorithm

μs

μs

μs d d σS f

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww 21

μs

μμs

μ

d

1σS

update of two prototype vectors w+ w-

Dynamical Analysis of LVQ type algorithms WSOM 2005

Nξffη QxfηQxfη1N

QQ

Ryfη1N

RR

μts

1-μst

μst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

22

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww

recursions

Mathematical analysis of the learning dynamics

μσ

μσ

μ1-μs

μs ξByx ξw

random vector ξμ enters only through

its length and the projections

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

Dynamical Analysis of LVQ type algorithms WSOM 2005

completely specified in terms of first and second moments

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging property

Dynamical Analysis of LVQ type algorithms WSOM 2005

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

1-μsσ

μσs

sσ1-μsσ

μσs

1-μsσ

μsσ Ryfη

dRRyfη

1N

RR

probability for misclassification of a novel example

ddpddp gε

QQQv

RR2QQ

QQQv

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

Dynamical Analysis of LVQ type algorithms WSOM 2005

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = min p+p- RS+

RS-

Dynamical Analysis of LVQ type algorithms WSOM 2005

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- lowest minimum assumed for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLearning From Mistakes (LFM)rdquo

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

work in progress outlook

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 1 close to optimal asymptotic generalization

LVQ 21 instability trivial (stationary) classification

+ stopping potentially very good performance

LFM far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 4: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

Model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

σN2

σ

- v 2

1exp

v 2π

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separation prop ℓ ℓ

jj Bσσξ

σσσvξξ

22jj

independent components

with variance

ℝN

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamics of on-line training

sequence of new independent random examples 123μσμμ ξ

drawn according to μμσ σPp μ ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

example

LVQ1 original formulation [Kohonen]

Winner-Takes-All (WTA) algorithm

μs

μs

μs d d σS f

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww 21

μs

μμs

μ

d

1σS

update of two prototype vectors w+ w-

Dynamical Analysis of LVQ type algorithms WSOM 2005

Nξffη QxfηQxfη1N

QQ

Ryfη1N

RR

μts

1-μst

μst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

22

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww

recursions

Mathematical analysis of the learning dynamics

μσ

μσ

μ1-μs

μs ξByx ξw

random vector ξμ enters only through

its length and the projections

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

Dynamical Analysis of LVQ type algorithms WSOM 2005

completely specified in terms of first and second moments

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging property

Dynamical Analysis of LVQ type algorithms WSOM 2005

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

1-μsσ

μσs

sσ1-μsσ

μσs

1-μsσ

μsσ Ryfη

dRRyfη

1N

RR

probability for misclassification of a novel example

ddpddp gε

QQQv

RR2QQ

QQQv

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

Dynamical Analysis of LVQ type algorithms WSOM 2005

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = min p+p- RS+

RS-

Dynamical Analysis of LVQ type algorithms WSOM 2005

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- lowest minimum assumed for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLearning From Mistakes (LFM)rdquo

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

work in progress outlook

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 1 close to optimal asymptotic generalization

LVQ 21 instability trivial (stationary) classification

+ stopping potentially very good performance

LFM far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 5: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamics of on-line training

sequence of new independent random examples 123μσμμ ξ

drawn according to μμσ σPp μ ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

example

LVQ1 original formulation [Kohonen]

Winner-Takes-All (WTA) algorithm

μs

μs

μs d d σS f

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww 21

μs

μμs

μ

d

1σS

update of two prototype vectors w+ w-

Dynamical Analysis of LVQ type algorithms WSOM 2005

Nξffη QxfηQxfη1N

QQ

Ryfη1N

RR

μts

1-μst

μst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

22

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww

recursions

Mathematical analysis of the learning dynamics

μσ

μσ

μ1-μs

μs ξByx ξw

random vector ξμ enters only through

its length and the projections

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

Dynamical Analysis of LVQ type algorithms WSOM 2005

completely specified in terms of first and second moments

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging property

Dynamical Analysis of LVQ type algorithms WSOM 2005

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

1-μsσ

μσs

sσ1-μsσ

μσs

1-μsσ

μsσ Ryfη

dRRyfη

1N

RR

probability for misclassification of a novel example

ddpddp gε

QQQv

RR2QQ

QQQv

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

Dynamical Analysis of LVQ type algorithms WSOM 2005

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = min p+p- RS+

RS-

Dynamical Analysis of LVQ type algorithms WSOM 2005

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- lowest minimum assumed for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLearning From Mistakes (LFM)rdquo

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

work in progress outlook

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 1 close to optimal asymptotic generalization

LVQ 21 instability trivial (stationary) classification

+ stopping potentially very good performance

LFM far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 6: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

Nξffη QxfηQxfη1N

QQ

Ryfη1N

RR

μts

1-μst

μst

1-μst

μts

1-μst

μst

1-μsσ

μσs

1-μsσ

μsσ

22

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww

recursions

Mathematical analysis of the learning dynamics

μσ

μσ

μ1-μs

μs ξByx ξw

random vector ξμ enters only through

its length and the projections

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

Dynamical Analysis of LVQ type algorithms WSOM 2005

completely specified in terms of first and second moments

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging property

Dynamical Analysis of LVQ type algorithms WSOM 2005

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

1-μsσ

μσs

sσ1-μsσ

μσs

1-μsσ

μsσ Ryfη

dRRyfη

1N

RR

probability for misclassification of a novel example

ddpddp gε

QQQv

RR2QQ

QQQv

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

Dynamical Analysis of LVQ type algorithms WSOM 2005

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = min p+p- RS+

RS-

Dynamical Analysis of LVQ type algorithms WSOM 2005

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- lowest minimum assumed for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLearning From Mistakes (LFM)rdquo

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

work in progress outlook

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 1 close to optimal asymptotic generalization

LVQ 21 instability trivial (stationary) classification

+ stopping potentially very good performance

LFM far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 7: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

completely specified in terms of first and second moments

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

characteristic quantities

- depend on the random sequence of example data

- their variance vanishes with N (here prop N-1)

μsσ

μst R Q

learning dynamics is completely described in terms of averages

3 self-averaging property

Dynamical Analysis of LVQ type algorithms WSOM 2005

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

1-μsσ

μσs

sσ1-μsσ

μσs

1-μsσ

μsσ Ryfη

dRRyfη

1N

RR

probability for misclassification of a novel example

ddpddp gε

QQQv

RR2QQ

QQQv

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

Dynamical Analysis of LVQ type algorithms WSOM 2005

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = min p+p- RS+

RS-

Dynamical Analysis of LVQ type algorithms WSOM 2005

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- lowest minimum assumed for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLearning From Mistakes (LFM)rdquo

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

work in progress outlook

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 1 close to optimal asymptotic generalization

LVQ 21 instability trivial (stationary) classification

+ stopping potentially very good performance

LFM far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 8: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

1-μsσ

μσs

sσ1-μsσ

μσs

1-μsσ

μsσ Ryfη

dRRyfη

1N

RR

probability for misclassification of a novel example

ddpddp gε

QQQv

RR2QQ

QQQv

RR2QQpp

22 2

1

2

1

5 learning curve

generalization error εg(α) after training with α N examples

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

Dynamical Analysis of LVQ type algorithms WSOM 2005

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = min p+p- RS+

RS-

Dynamical Analysis of LVQ type algorithms WSOM 2005

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- lowest minimum assumed for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLearning From Mistakes (LFM)rdquo

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

work in progress outlook

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 1 close to optimal asymptotic generalization

LVQ 21 instability trivial (stationary) classification

+ stopping potentially very good performance

LFM far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 9: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

Dynamical Analysis of LVQ type algorithms WSOM 2005

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = min p+p- RS+

RS-

Dynamical Analysis of LVQ type algorithms WSOM 2005

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- lowest minimum assumed for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLearning From Mistakes (LFM)rdquo

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

work in progress outlook

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 1 close to optimal asymptotic generalization

LVQ 21 instability trivial (stationary) classification

+ stopping potentially very good performance

LFM far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 10: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = min p+p- RS+

RS-

Dynamical Analysis of LVQ type algorithms WSOM 2005

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- lowest minimum assumed for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLearning From Mistakes (LFM)rdquo

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

work in progress outlook

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 1 close to optimal asymptotic generalization

LVQ 21 instability trivial (stationary) classification

+ stopping potentially very good performance

LFM far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 11: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLVQ 21ldquo [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)p+=08 ℓ=1 v+=v-=1 =05 averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification fuumlr αinfin

εg = min p+p- RS+

RS-

Dynamical Analysis of LVQ type algorithms WSOM 2005

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- lowest minimum assumed for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLearning From Mistakes (LFM)rdquo

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

work in progress outlook

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 1 close to optimal asymptotic generalization

LVQ 21 instability trivial (stationary) classification

+ stopping potentially very good performance

LFM far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 12: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- lowest minimum assumed for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLearning From Mistakes (LFM)rdquo

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

work in progress outlook

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 1 close to optimal asymptotic generalization

LVQ 21 instability trivial (stationary) classification

+ stopping potentially very good performance

LFM far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 13: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

ldquoLearning From Mistakes (LFM)rdquo

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

work in progress outlook

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 1 close to optimal asymptotic generalization

LVQ 21 instability trivial (stationary) classification

+ stopping potentially very good performance

LFM far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 14: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

Dynamical Analysis of LVQ type algorithms WSOM 2005

work in progress outlook

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 1 close to optimal asymptotic generalization

LVQ 21 instability trivial (stationary) classification

+ stopping potentially very good performance

LFM far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 15: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

work in progress outlook

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Summary

bullprototype-based learning

Vector Quantization and Learning Vector Quantization

bulla model scenario two clusters two prototypes

dynamics of online training

bullcomparison of algorithms

LVQ 1 close to optimal asymptotic generalization

LVQ 21 instability trivial (stationary) classification

+ stopping potentially very good performance

LFM far from optimal generalization behavior

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 16: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

Perspectives

bullSelf-Organizing Maps (SOM)

(many) N-dim prototypes form a (low) d-dimensional grid

representation of data in a topology preserving map

neighborhood preserving SOM Neural Gas (distance based)

bullGeneralized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 17: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

Outlook

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 18: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 19: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 20: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

sσσ

N

1jjsσs R x

jw

completely specified in terms of first and second moments (wo indices μ)

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

2 average over the current example

averaged recursions closed in p σ1σ

σ

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

μsσ

μst R Q

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 21: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

N

- repulsiveattractive fixed points of the dynamics

- asymptotic behavior for - dependence on learning rate separation initialization-

investigation and comparison of given algorithms

- time-dependent learning rate η(α)

- variational optimization wrt fs[]

-

optimization and development of new prescriptions

maximizeα

g

d

d ε

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 22: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Page 23: Dynamical analysis of LVQ type learning rules

Dynamical Analysis of LVQ type algorithms WSOM 2005

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=044 v-=044μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw