self organized mapping of data clusters to neuron groups

10
Neural Networks 22 (2009) 415–424 Contents lists available at ScienceDirect Neural Networks journal homepage: www.elsevier.com/locate/neunet Self organized mapping of data clusters to neuron groups Dieter Müller * Leibniz Universität Hannover, Institut für Praktische Informatik, Welfengarten 1 D 30167 Hannover, Germany article info Article history: Received 2 November 2006 Received in revised form 2 December 2007 Accepted 22 September 2008 Keywords: SOM SOM variants Self organization Data clusters Neuron groups Time dependency Adaptation procedure Cortical mapping Focal dystonia abstract T. Kohonen’s self organizing map (SOM) may be considered as a plausible structure for modelling pattern recognition processes in the brain. Neighborhood preservation corresponds closely to what is called somatotopy in the neurosciences, and the context specificity of mappings observed (e. g. in malfunctions of the brain) becomes easily explicable in the framework of the SOM. However, there are two features which impair the aptitude of the classical SOM for neurophysiological models: the adaptation procedure is explicitly time dependent and the procedure consumes the whole set of disposable neurons. Because of the latter property, a SOM cannot learn different tasks, adapting one subset of neurons to a data set X 1 and another to a subsequently presented data set X 2 . The present paper describes a modified SOM which avoids the drawbacks mentioned above. Its adaptation procedure is time independent. When the training sequence consists of data from successive data clusters X k each cluster is mapped to a subset G k of the neuron set G while the other neurons are left almost unchanged. The behavior of the resulting DCNG-SOM is demonstrated in several experiments. © 2008 Elsevier Ltd. All rights reserved. 1. Introduction Kohonen’s famous paper on self-organized maps (Kohonen, 1982, 1995) has given rise to an enormous number of publications. The original Kohonen map is inspired from the observation that, in the mammalian brain, the mapping of sensory inputs to the cortex is somatotopic (i.e. that signals from neighboring body areas are mapped to neighboring cortex regions). This concept has proven very successful in practical applications. However, it is based on intuitive considerations, and not on a strict mathematical theory. Therefore, it suffers from certain deficiencies which are discussed in Kohonen (1995). The original SOM and most of its variants require that the learning parameters (i.e. learning strength and range of adaptation within the neuron grid) are successively modified during the training phase. It is necessary to define a so called annealing scheme, controlling this time dependency. Unfortunately there is no firm theoretical basis for constructing such schemes, and often they are determined empirically. That the mapping changes with time is the purpose of the process. However the time dependency is an explicit one: learning parameters vary with time and thus the quantitative properties of the process itself change during the training phase. This prevents the classical SOM from learning a new mapping when the training is complete. * Tel.: +49 05723 81221, +49 0511 762 3677(O); fax: +49 0511 762 19679. E-mail address: [email protected]. Meanwhile, there are a large number of papers which define new SOM types. Most of them have a mathematical basis and some of them avoid the explicit time dependency. Typical examples are the SOAN (self organization with adap- tive neighborhood neural network) (Iglesias & Barro, 1999) and the parameterless PLSOM (Berglund & Sitte, 2006). Both use the cur- rent mapping error to adjust the internal parameters of the adapta- tion process. In the time-adaptive SOM (TASOM), (Shah-Hosseini & Safabakhsh, 2003) each node has its own variable learning strength and neighborhood size. As a consequence, the network as a whole keeps its ability to learn independent from time. Somewhat sim- plified, the TASOM approach may be considered as shifting the de- pendency on externally controlled learning parameters to a depen- dency on individual node properties, thus converting the explicit time dependency into an implicit one. The present paper pursues a similar strategy. A rather elaborate mathematical procedure is used in the auto-SOM (Haese, 1999; Haese & Goodhill, 2001): here the weight vectors are adapted by means of a Kalman filter and the learning parameters are determined so as to minimize the predic- tion error variance. With regard to the result, the generative topo- graphic mapping (GTM Bishop, Svensén, and Williams (1998)) is also a SOM. The adaptation algorithm is probabilistic and does not need a decreasing learning strength or a shrinking range of adap- tation. However, the mapping problem is formulated in terms of a latent variable model, and thus the connection to Kohonen’s ap- proach is weak. In some SOM variants, the number of neurons and the whole network structure change with time. Although this is a very strong time dependency, it is an implicit one: addition and 0893-6080/$ – see front matter © 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.neunet.2008.09.017

Upload: dieter-mueller

Post on 21-Jun-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Self organized mapping of data clusters to neuron groups

Neural Networks 22 (2009) 415–424

Contents lists available at ScienceDirect

Neural Networks

journal homepage: www.elsevier.com/locate/neunet

Self organized mapping of data clusters to neuron groupsDieter Müller ∗Leibniz Universität Hannover, Institut für Praktische Informatik, Welfengarten 1 D 30167 Hannover, Germany

a r t i c l e i n f o

Article history:Received 2 November 2006Received in revised form 2 December 2007Accepted 22 September 2008

Keywords:SOMSOM variantsSelf organizationData clustersNeuron groupsTime dependencyAdaptation procedureCortical mappingFocal dystonia

a b s t r a c t

T. Kohonen’s self organizing map (SOM) may be considered as a plausible structure for modelling patternrecognition processes in the brain. Neighborhood preservation corresponds closely to what is calledsomatotopy in the neurosciences, and the context specificity of mappings observed (e. g. in malfunctionsof the brain) becomes easily explicable in the framework of the SOM. However, there are two featureswhich impair the aptitude of the classical SOM for neurophysiological models: the adaptation procedureis explicitly time dependent and the procedure consumes the whole set of disposable neurons. Becauseof the latter property, a SOM cannot learn different tasks, adapting one subset of neurons to a data setX1 and another to a subsequently presented data set X2 .The present paper describes a modified SOM which avoids the drawbacks mentioned above. Its

adaptation procedure is time independent. When the training sequence consists of data from successivedata clusters Xk each cluster is mapped to a subset Gk of the neuron set Gwhile the other neurons are leftalmost unchanged. The behavior of the resulting DCNG-SOM is demonstrated in several experiments.

© 2008 Elsevier Ltd. All rights reserved.

1. Introduction

Kohonen’s famous paper on self-organized maps (Kohonen,1982, 1995) has given rise to an enormous number of publications.The original Kohonenmap is inspired from the observation that, inthe mammalian brain, the mapping of sensory inputs to the cortexis somatotopic (i.e. that signals from neighboring body areas aremapped to neighboring cortex regions). This concept has provenvery successful in practical applications. However, it is based onintuitive considerations, and not on a strict mathematical theory.Therefore, it suffers from certain deficiencies which are discussedin Kohonen (1995).The original SOM and most of its variants require that the

learning parameters (i.e. learning strength and range of adaptationwithin the neuron grid) are successively modified during thetraining phase. It is necessary to define a so called annealingscheme, controlling this time dependency. Unfortunately there isno firm theoretical basis for constructing such schemes, and oftenthey are determined empirically. That the mapping changes withtime is the purpose of the process. However the time dependencyis an explicit one: learning parameters vary with time and thusthe quantitative properties of the process itself change during thetraining phase. This prevents the classical SOM from learning a newmapping when the training is complete.

∗ Tel.: +49 05723 81221, +49 0511 762 3677(O); fax: +49 0511 762 19679.E-mail address: [email protected].

0893-6080/$ – see front matter© 2008 Elsevier Ltd. All rights reserved.doi:10.1016/j.neunet.2008.09.017

Meanwhile, there are a large number of papers which definenew SOM types. Most of them have amathematical basis and someof them avoid the explicit time dependency.Typical examples are the SOAN (self organization with adap-

tive neighborhood neural network) (Iglesias & Barro, 1999) and theparameterless PLSOM (Berglund & Sitte, 2006). Both use the cur-rentmapping error to adjust the internal parameters of the adapta-tion process. In the time-adaptive SOM (TASOM), (Shah-Hosseini &Safabakhsh, 2003) each node has its own variable learning strengthand neighborhood size. As a consequence, the network as a wholekeeps its ability to learn independent from time. Somewhat sim-plified, the TASOM approach may be considered as shifting the de-pendency on externally controlled learning parameters to a depen-dency on individual node properties, thus converting the explicittime dependency into an implicit one. The present paper pursuesa similar strategy. A rather elaborate mathematical procedure isused in the auto-SOM (Haese, 1999; Haese & Goodhill, 2001): herethe weight vectors are adapted bymeans of a Kalman filter and thelearning parameters are determined so as to minimize the predic-tion error variance. With regard to the result, the generative topo-graphic mapping (GTM Bishop, Svensén, and Williams (1998)) isalso a SOM. The adaptation algorithm is probabilistic and does notneed a decreasing learning strength or a shrinking range of adap-tation. However, the mapping problem is formulated in terms ofa latent variable model, and thus the connection to Kohonen’s ap-proach is weak. In some SOM variants, the number of neurons andthe whole network structure change with time. Although this is avery strong time dependency, it is an implicit one: addition and

Page 2: Self organized mapping of data clusters to neuron groups

416 D. Müller / Neural Networks 22 (2009) 415–424

removal of neurons are controlled solely by the stream of inputsignals. Examples are the growing cell structure (Fritzke, 1994a),the growing neural gas (Fritzke, 1995, 1994b) and the plastic self-organizing map (Lang & Warwick, 2002).Compared to the original Kohonen SOM, these variants are

remarkable improvements. They do not need an external controlof parameters and, especially the TASOM, is well suited for taskswith changing data sets. Above all, they achieve a better mappingquality, at least in certain applicationsOn the other hand, mapping quality and performance become

less important if SOM structures are used to model processes inthe brain. Here we are confronted not only with excellent function,but alsowithmalfunction occurring in certain situations. Thereforeit seems useful to study another kind of SOM which is designedprimarily for biological modelling purposes, and not for betterperformance.Soon after the first SOM publications, Merzenich et al.

demonstrated that the reorganization observed in the sensorycortex of mammals after peripheral nerve damage can bemodeled as an adaptation process in a Kohonen map (Merzenichet al., 1984). Martinetz, Ritter, and Schulten (1988) showed,that the auditive cortex of a bat may be considered as aneighborhood preserving map of the relevant space of ultrasonicsignals. Meanwhile the SOM concept is accepted as a tool forunderstanding the processing of sensory signals in the brain(Kaas, 1991; Sirosh & Miikkulainen, 1995; Turrigiano & Nelson,2004; Wiemer, Spengler, Joublin, Stagge, & Wacquant, 2000).Nevertheless, at least the classical SOM has two properties whichhamper its use for the modelling task described above:

1. An explicit time dependence means that there exists acontrol mechanism outside the SOM. When the training for atask is completed, this mechanism would be responsible forrestoring the plasticity if a new task is to be learned. Thetime independent SOM variants mentioned above, avoid thisproblem, but their mathematical procedures cannot easily betransferred into a biological framework.

2. The training process of the classical SOMadapts thewhole set ofdisposable neurons to a given set of signal vectors. There remainno neuronswhich can be used for a subsequent training processwith a different set of signal vectors. This is in contradiction tothe observed plasticity of the brain. We can learn many newtasks without forgetting those we have learned before.

After all, it appears useful to modify the Kohonen trainingprocedure, retaining the principal properties of the original SOMand avoiding the undesirable properties pointed out above. Thenecessary modifications are the following:

• The adaptation procedure must not contain an explicit timedependency.• Training with a certain data set X1 should result in a mappingof X1 to a certain neuron group G1 ⊂ G and not to the wholeneuron set G. The bulk of neurons should be left disposable forsubsequent training phases with different data sets X1, X2, . . ..

The SOM introduced in the present paper fulfills theserequirements. However in contrast to the time independent SOMsmentioned above, it is constructed in a purely intuitive way.The goal is not an improved mapping, but a neurophysiologicallyplausible mechanism. Rather than from amathematical theory thedesign starts from the idea, that a firing neuron remains for sometime in an excited state with increased sensibility and that itsexcitation spreads to his neighbors. As in the TASOM, a particularneuron has not only its own synaptic coupling vector Ew but, inaddition, a statewhichdetermines its behavior. The state, however,is only a single entity a, the activation. This is enough to allow fora time independent adaptation procedure which does not use all

available neurons, but only a certain neuron group, to represent adata cluster. Because of this property, the resulting SOM is calledDCNG-SOM. As in the growing neuron structures (Fritzke, 1994a,1995; Lang&Warwick, 2002), the incoming streamof data controlsthe number of neurons involved in the mapping of a cluster. Incontrast to these SOM variants, the neurons are not taken from apotentially infinite stock. Instead, the number of neurons is fixed,and they have a fixed position in physical space.The aspect of performance in applications is not considered.

We simply study the properties of the DCNG-SOM resulting fromthe use of the modified neuron. This is because the long-termobjective is not only to model neuroplasticity, but also to explaindisorderings in the mapping of sensory signals to the sensorycortex. Such disorderings have been observed (e. g. in patientssuffering from focal dystonia).Focal dystonia is a movement disorder occurring in several

forms, as for example, writers cramp and musicians cramp.Apparently the cause is not an organic defect but rather somethinglike an overtraining. This suggests that focal dystonia has to dowith a misled self organization process. Sanger and Merzenich(2000) have hypothesized that it is a manifestation of an unstablesensorimotor control loop. They discuss several mechanismswhich possibly could lead to a gain >1 in the feedback loop. Themapping mentioned above is a link in this loop. As yet, theoreticalstudies on focal dystonia concentrate on the time behavior ofsensory signals and on the control theoretic aspects of the problem.However, in some musicians with focal dystonia, the disorderedcortical representation of digits could be observed directly byfunctional magnetic resonance imaging (Elbert et al., 1998). Theincreasing empirical material in this field suggests the study of theformation of disordered mappings with SOM based modelsIt is a characteristic of focal dystonias that they are task

specific. The focal dystonia of musicians affects the control offinger movements only in the context of instrument playing. Itdoes not affect the function of the same fingers in other activities.This means that the disturbed mapping is effective only in aspecial context. The context specificity becomes understandable ifa cortical region is described as a SOM. A sensory stimulus mightbe considered as a signal vector

Ex = (x1, . . . , xn) = (s1, . . . , sk, c1, . . . , cn−k) = (Es, Ec) (1)

where Ec describes the context. In this picture, it appears plausiblethat certain stimuli (Es(i), Ec) are mapped to a corrupted part ofthe map, while stimuli with the same Es(i) and a different contextpart Ec ′ are processed in correctly working regions. Admittedly,it remains an open question as to whether sensory stimuli areactually encoded as high dimensional signal vectors.In the literature, there are some context-aware SOM variants

(e.g. the recursive self-organizing map by Voegtlin (2002)). Theyare designed to represent the temporal context of patterns, andin principle they allow the reconstruction of pattern sequences.In contrast, the present paper restricts itself to the more simpleproblem of constructing different maps for data clusters which arepresented successively to the net.

2. Mapping of data clusters to neuron Groups: The trainingprocedure

Let us assume a data set

X = X1 ∪ X2 ∪ X3 · · · ∪ Xk ⊂ Rm (2)

with the property

∃M ∈ (0, 1.0) : ∀a, b ∈ X s, u, v ∈ X t : |a− b|,|u− v| < M|a− u|, s 6= t (3)

i.e. X consists of distinct clusters X s.

Page 3: Self organized mapping of data clusters to neuron groups

D. Müller / Neural Networks 22 (2009) 415–424 417

Further, assume a set G ofQ neurons located in a finite region ofthree dimensional space. This regionmight be thought as a corticalregion with irregularly distributed neurons. For a first study of ouradaptation procedure, however, we will consider the simple caseof a two dimensional regular grid with Q = n× n neurons N[i, j] .This simplifies the description, the simulation and the visualizationof the principle. As in Kohonen’s SOM, the input signal Ex is given inparallel to all N[i, j] and the output produced by a specific N[i, j]depends on the distance between Ex and the neurons internal vectorEwij. However the situation assumed for the learning process isdifferent:The training starts with signals Ex from cluster X r , corresponding

to some task T r . After a certain time, we switch to another taskT s with input signals Ex from X s. Our goal is a learning rule whichadapts a subsetGr of neighboring neurons to X r and another subsetGs to X s, leaving a large part of G almost unchanged. This can beaccomplished by considering explicitly for each neuron N[i, j] itsactivation N[i, j].a . Just as in Kohonen’s adaptation process for agiven input Ex, the neuron with maximal output y is determinedand is called the winner neuron Nc . This winner neuron, however,does not only define the center of a neighborhood within whichadaptation takes place. Instead it influences its neighbors N[i, j]by increasing their activation values N[i, j].a. These activationscontrol the subsequent shifting ofw-vectors towards Ex. In addition,however, they influence the response of all neurons when the nextinput Ex ′ is processed.It is to be kept in mind that the training sequence is assumed

to consist of subsequences Sr containing exclusively signals fromX r . As a consequence, only the mapping of X r to Gr will beneighborhood preserving. The location of Gr in grid space dependson the neurons available at the moment when the formation ofGr starts. The relative positions of the X r in grid space are notpreserved, and themap of a complicated data set X = X1∪X2 · · ·∪X r · · ·will in general not be a dimension reduced representation ofthe whole set X.This means that the DCNG SOM is not an adequate tool for

technical applications. It is designed for biological modellingpurposes and here mainly two different situations are to beconsidered:

1. A subsequence S contains signals from a single cluster X1 orfrom several possibly overlapping clusters X r within a region ofdiameter less than a certain distance DThreshold. The DCNG SOMmaps these signals to one and the same neuron group (i.e. theoverlapping clusters are considered as a single cluster Y ).

2. A subsequence S contains signals from different clusters X rwith mutual distances greater than DThreshold. In principle thisis a limiting case of the assumptions made above, becauseS can be considered as a sequence of possibly very small‘pure’ subsequences. In this case, mappings to separate groupsare generated (i.e. we get separate cortical representations).Indeed, the switching between different neuron groups workseven with pure subsequences of length 5.

The quantity DThreshold is defined in Eq. (5) and its meaning isdemonstrated in the next section (experiment EX2, Fig. 5)At first sight, the procedure described above seems to be only

an alternative description of a Kohonen procedure starting with avery small neighborhood. Actually we get a different type of SOMbecause

• the activation a persists for a short time and• the output y depends not only on | Ewij − Ex| but also onthe momentary activation N[i, j].a . In the original Kohonenprocedure, only the strength of adaptation decreases withincreasing distance from thewinning neuron. In the DCNG SOMthere is an additional neighborhood effect: the last winningneuron increases the activation a of its neighbors. When the

next signal Ex arrives neurons of this neighborhood have anenlarged chance to become a winner because, for equal valuesof | Ewij − Ex |, neurons with increased activation a produce agreater output y. Neurons from a certain neighborhood in gridspace are favored as long as input signals do not leave a certainregion of signal space. A quantity DThreshold which characterizesthis ‘‘certain region of signal space’’ more clearly is introducedat the end of this section (Eq. (5)).

Neurons located near to a recently activated winner neuronhave an increased sensibility and form a group Gr for input signalsEx ∈ X r . If, however, new inputs arrive from another cluster X s,the effect of the increased sensibility is no longer sufficient tocompensate for the lack of similarity between Ex and thew-vectorswithin Gr . Supposing the initialization was adequate, there will bea neuron outside Gr with Ew not too far from Ex. This triggers theformation of a new group Gs to which X s is mapped.In the following description of the adaptation process, an object

oriented notation will be used (i.e. a neuron N[i, j] has privatevariables N[i, j].w, N[i, j].y, and N[i, j].a ).

DCNG (Data Clusters to Neuron Groups):

1. for i := 1 to n do for j := 1 to n doN[i, j].w := random vector from Rm;adaptationstep := 0; Choose the cluster-index r;

2. while adaptationstep < MAXADAPTATIONSTEP dobegin {adaptation loop : }(a) Generate an input signal Ex from X r ;(b) for i := 1 to n do for j := 1 to n doN[i, j].y :=((1+ λ · N[i, j].a)/(1+ λ)) · e−β·distance(x, N[i,j].w);

(c) Let Nc ≡ Np,q be the neuron with maximal output y:for i := 1 to n do for j := 1 to n dobeginN[i, j].a:= 0.8 · e−((i−p)

2+(j−q)2)/4;

if N[i, j].a < A0 then N[i, j].a := A0;end;

(d) for i := 1 to n do for j := 1 to n doN[i, j].w := N[i, j].w + (N[i, j].a)2 · α · (x− N[i, j].w);

(e) for i := 1 to n do for j := 1 to n doN[i, j].a := N[i, j].a · ρif N[i, j].a < A0 then N[i, j].a := A0;

(f) adaptationstep := adaptationstep+ 1 ;if condition(adaptationstep) then choose another cluster-index r;

end; {adaptation loop, while . . . }

The dimension of the signal space is denoted bym, and n is thesize of the neuron grid (n2 neurons). The symbols A0, α, β, λ andρ denote time independent parameters acting as follows:

A0 = 0.1 is the initial and minimum value of the activation a.α = 0.05 is the constant learning strength. (In the experimentsa Kohonen SOMwith identical input, Exwas simulated in parallelwith the DCNG SOM. Naturally it had its own time dependentlearning strength.)β = 0.4 determines the dependence of the output y on thedistance | Ewij − Ex|.λ = 0.2measures the influence of a neuron’s activation a on itssensitivity.ρ = 0.3 controls the relaxation of the activation a,

Page 4: Self organized mapping of data clusters to neuron groups

418 D. Müller / Neural Networks 22 (2009) 415–424

Fig. 1. Dependence of activation N[pwinner , qwinner + j].a on distance j within thegrid.

At first sight the relaxation step 2(e) may appear unnecessary,since the activation N[i, j].a is overwritten in step 2(c) in any case.The relaxation, however, affects the state of the grid which iseffective in the next pass when the outputs N[i, j].y are calculatedin step 2(b). This, in turn, determines the next winner neuron.Only after this has been done, step 2(c) generates a new activationpattern centered at the new winner neuron, and this patternspecifies the range of the adaptation step 2(d). As a consequence,the adaptation region is broader than the region of increased inputsensitivity. This is illustrated by Fig. 1. As long as training inputsEx come from one and the same cluster X r , the activations N[i, j].awithin the corresponding neuron group Gr have increased values,while the other neurons have activation values near A0. Let beg ∈ Gr and g′ 6∈ Gr with

| Ew − Ex r | ≈ | Ew ′ − Ex r |. (4)

Then g has a better chance to become the winner neuron sinceits output g.y is increased by the factor (1+ λ · a)/(1+ λ · A0).Now consider an input Ex s from a new cluster X s centered in a

distance D from X r . The hitherto weakly affected neurons havew-vectors which are not far from the original random distribution.Hence there will be one, say g ′′, with Ew′′ near to Ex s. For allneurons inGr the corresponding distance is approximatelyD. Theirincreased activation values can no longer compensate the effect ofthis large distance if D is greater than a certain threshold DThreshold.As a consequence g ′′ will become the winner neuron.An estimate for DThreshold is obtained by a straightforward

calculation:

DThreshold =1β· ln

((1+ λ · a)(1+ λ · A0)

)≈1β· ln

((1+ λ · 0.5)(1+ λ · A0)

). (5)

The validity of this estimate was experimentally verified (seeFig. 5).

3. Experiments

Inwhat follows,wewill restrict ourselves to exampleswithX ⊂R3. In this case, results can still be visualized easily, while on theother hand the dimension is sufficient to illustrate the motivationbehind the mapping of clusters to neuron groups. Imagine forinstance that components x1, x2 represent sensory signals from a

finger, and that two distinct values of x3 characterize two differentcontexts, e.g. writing and violin playing. One of the test casesstudied (Experiment EX1) is visualized in Figs. 2 and 3. In thisexperiment, the training set consisted of two clusters X1 and X2.The projections of X1 and X2 onto the 1–2-plane are one and thesame ring structure. However the clusters differ in x3:

Ex ∈ X1 ⇒ x3 = 0.5; Ex ∈ X2 ⇒ x3 = 1.5. (6)

This training set is depicted in the leftmost pane of Fig. 2.In order to clarify the difference between Kohonen’s SOM and

the DCNG-mapping, the training sequence was applied in parallelto both types of nets. For the Kohonen SOM, the sequence ofadaptation steps had to be subdivided in cycles with constantvalues of learning strength and neighborhood size. A cycle lengthwith a mean of 3 adaptation steps per neuron appeared sufficient,i.e. cyclelength = 3n2. As to the DCNG-SOM, the cycle length ismeaningless since all parameters are constant. Nevertheless a cycleis a useful unit for the description of signal sequences. One of theapplied sequences consisted of three cycles from X1, alternatingwith three cycles from X2. The results are shown in Fig. 2 for theDCNG-SOM and in Fig. 3 for the basic Kohonen SOM.Remember that the ‘‘location’’ of a neuron has two different

meanings:

• The location in grid space (G-location) is the fixed physicalposition within the grid, determined by the index pair i, j.In a more realistic neurophysiological model, this wouldcorrespond to the position of the neuron within a section ofthe cortex. The rightmost panes in Figs. 2 and 3 refer to the G-location.• The location ofN[ij] in signal space (Rm-location) is a point in Rmand varies during the adaptation process. It is given byN[i, j].w,interpreted as a position vector. The center pane of Fig. 2 and theleftmost pane of Fig. 3 refer to the Rm-location and use a slightlyslanted projection onto the 1–2-plane.

The Kohonen-SOM is visualized, as usual, with lines corre-sponding to rows and columns of the grid G. For the DCNG-SOM,no such lines are drawn, since a large part of the neurons remainsnear to its initial randomly distributed Rm-location. The connectinglines would obscure the picture. Instead the neurons are plotted assquares, with size proportional to their activation values at themo-ment of the snapshot.It is evident fromFig. 2 that the adaptation process of theDCNG-

SOM in fact establishes amapping of data clusters X1, X2 to neurongroups G1, G2. Which neurons constitute these groups can be seenfrom the grid space representations in Figs. 2 and 3. Here theelements ofG1,G2 aremarked as ‘1’ and ‘2’ respectively. The groupsare well separated.Within the twogroups of neurons, the neighborhoodpreserving

property of the mapping was checked as follows: in any stateof the adaptation process, a procedure can be started whichgenerates a sequence of input signals Ex corresponding to closedpolygons within X1 and X2. The winner neurons belonging tothese signals are marked and connected by lines. This is done forboth the DCNG-SOM and for the Kohonen-SOM. The mapping isroughly neighborhood preserving but not perfect. Keeping inmindhowever, that G1 and G2 are ring-shaped sets in R3, this was notto be expected. Naturally, the Kohonen mappings shown in thepresent paper are worse than those for similar data sets knownfrom the literature. This is due to the fact that our experimentssomewhat misuse the Kohonen procedure: after a sequence ofsignals from X1, the Kohonen-SOM is confronted with signals fromX2 and this occurs at a moment when the time dependent range ofadaptation is already reduced for a finer tuning.In Fig. 4 the temporal development of both SOM types is traced

in parallel. Although the total number of neurons is not very large

Page 5: Self organized mapping of data clusters to neuron groups

D. Müller / Neural Networks 22 (2009) 415–424 419

Fig. 2. Behavior of the DCNG-SOM in experiment EX1 with two ring shaped data clusters X1, X2 (left pane) and n = 20. The training sequence consisted of 4 subsequencesSX1, SX2, SX1, SX2 , each with 3600 signals. Center pane: State in signal space representation at the end of this sequence. Right pane: State in grid space representation andpictures of two polygons P1 and P2 approximating X1 and X2 respectively.

Fig. 3. Behavior of a KOHONEN SOM simulated in parallel with the DCNG SOM in experiment EX1. The two panes correspond to the center and right pane of Fig. 2.

Fig. 4. Development of DCNG SOM (upper row) and KOHONEN SOM (lower row) traced in parallel during experiment EX1 (ring shaped clusters X1 and X2).

(n2 = 400), it is visible that in the DCNG-SOM a considerablenumber of neurons is almost unaffected by the training process.We will return to this point below.

Fig. 5 demonstrates the behavior of the DCNG-SOM for a morecomplicated training sequence (Experiment EX2): A sequence offive data clusters X0, . . . , X4 is repeatedly presented to the net. The

Page 6: Self organized mapping of data clusters to neuron groups

420 D. Müller / Neural Networks 22 (2009) 415–424

Fig. 5. Experiment EX2: Five clusters X0, . . . , X4 are presented in turn to a DCNG-SOM. The clusters have identical ring shaped projections to the x1–x2-plane. However,their positions with respect to the x3-axis are different. In the small window, these positions are shown together with the distance Dthreshold computed according to Eq. (5).Center pane: Signal space representation of the SOM (center pane) with two clearly separated neuron groups corresponding to X0 and X1 . In contrast X2 , X3 and X4 aremapped to overlapping neuron sets. This is more clearly visible in the grid space representation (right pane).

clusters have identical, ring shaped projections to the x1–x2-plane.However they are shifted along the x3-axis by decreasing amounts.Their position along the x3-axis is shown togetherwith thedistanceDThreshold computed from Eq. (5). The neuron groups Gr are clearlyseparated as long as the distance between the corresponding dataclusters is greater than DThreshold. When this distance becomessmaller, the neuron groups begin to merge. This can be seen in thegrid-space representation too: the regions G0 and G1 are separatedwhile G2, G3 and G4 overlap.

4. Neurons outside of groups

In neurobiology, learning of novel associations by adult humansis explained by synaptic plasticity and recruitment of ‘unused’neurons (see for example Hogan and Diederich (1995)). Thequestion as to whether and to what extent unused neurons existandnewneurons are generated, is still under discussion (Draganskiet al., 2004). In literature on neuronal networks, the conceptof unused neurons also occurs (Khosravi & Safabakhsh, 2005).Evidently, it has slightly different meanings in different fields. Inthe context of the present paper, however, the term ‘unused’ isunnecessary, since the adaptation procedure of the DCNG SOMdoes not refer to it. The procedure refers only to the clearly definedattributes Ew and a of an object N[i, j]. We consider a mapping ofdata clusters to neuron groups, and hence the membership of aneuron in a group is a clearly definable property: N[i, j] belongsto Gr if, and only if, it is the picture of at least one element of X r .This implies that the membership of a neuron can change duringthe training process. Neurons which do not occur as winners forat least one signal from a cluster will be called neurons outside ofgroups.The DCNG SOMwas constructed with the hope that the groups

Gr would be rather stable and that the change of membershipwould be a relatively rare event. This can be checked by tracingthe paths which are described by the end points of the w-vectorsduring a training sequence.In experiment EX3 a DCNG SOM with n = 20 was trained

with the two ring shaped signal sets X1, X2 already used in EX1.To obtain more detailed information the signal sequence wassubdivided as follows:

• 2000 signals Ex ∈ X1 (SX1c, coarse adaptation of a neuron groupG1)• 1600 signals Ex ∈ X1 (SX1f, fine adjustment of G1)

• 2000 signals Ex ∈ X2 (SX2c, coarse adaptation of a neuron groupG2)• 1600 signals Ex ∈ X2 (SX2f, fine adjustment of G2).Besides, each neuronwas providedwith additional attributes to

note its occurrence as a winner for X1 or X2 respectively. The pathsdescribed by the moving w-vectors are shown in Figs. 6 and 7.As expected, most movements during subsequence SX1f are smallcompared to those during SX1c (Fig. 6). Nevertheless, SX1f toocreates a number of longer paths for neurons which do not belongto G1. Fig. 7 shows the paths created after switching to signalset X2: there is a sufficient number of hitherto weakly affectedneurons to establish a new group G2 during SX2c. However it isvisible that signals from X2 at least influence members of G1 andthat possibly some neurons may even change their membership.For technical applications this would be a severe drawback of theDCNG SOM. For modelling brain structures it might be at least aninteresting property.During signal sequence SX1c the group G1 is established by and

by. Therefore the membership of a neuron in G1 was tested onlyduring SX1f, and analogously the membership in G2 was testedduring SX2f. The results are shown in the table below:

Netsizen

LengthSX1c, SX2c

LengthSX1f, SX2f

|G1| |G2| Swappedwinners

20 2000 1600 90 90 225 3125 2500 149 122 8

At first sight, it appears surprising that for n = 25 there is a20%-difference between |G1| and |G2|. However it is to be kept inmind that the quantities |G1|, |G2| refer to the end of the respectivesubsequences SX1f, SX2f. Within the appropriate fine adjustmentphases |G1| increases from 92 to 149 while |G2| rises from 69 to122. When the experiment is continued, the group sizes increasefurther on, although by decreasing amounts (see also Fig. 12).

5. Behavior for large signal space dimensionm

If m is large, say m ≥ 20, the distances between an arbitrarysignal vector Ex and all the weight vectors Ew are very close to eachother. This is due to the fact that the spherical shell between rand r + dr containing all points Ew with distance r from Ex hasa volume increasing with Rm−1, and that on the other hand r islimited by the finite hypercube. Thus one should suspect that thewinner cannot be uniquely identified. To investigate this effect,

Page 7: Self organized mapping of data clusters to neuron groups

D. Müller / Neural Networks 22 (2009) 415–424 421

Fig. 6. Experiment EX3: The paths described by the w-vectors during training with two ring shaped data clusters X1, X2 . In the training sequence SX1–SX2 , each SX r wassubdivided into SX r c (coarse adaption of group Gr ) and SX r f (fine adjustment). The figure shows the paths during training with X1 = SX1c–SX1f .

Fig. 7. Experiment EX4: Ew-paths during training with X2 (after training with X1). The diagrams are the continuation of Fig. 6. It can seen in the left pane that a large numberof hitherto weakly affected neurons has remained and is now disposable for building the group G2 .

an experiment EX4 was carried out with m = 36 and a data setD ⊂ R36 representing an E-shaped point set E ⊂ R2. The grid sizewas n = 25. Fig. 8 shows how D was generated. E was subdividedinto four subsets Ep and for each Ep the corresponding winnerneurons in grid space representation were labelled. It revealedthat the the mapping was clearly neighborhood preserving. Thisresult can be explained: even in the worst case, that all distancesd(i, j; Ex) = | Ewij−Ex| are equal, the adaptation procedure necessarilyselects one winner neuron. Although its w-vector is not better, itis shifted more than all others towards Ex. Later signals Ex′ from theneighborhood of Ex will further increase this differentiation. Thus,after a sufficient number of adaptation steps, there is a subsetof neurons with w-vectors near to D. This was verified by takinghistograms of d(i, j; Ex) during the training process. The DCNG-SOM has an additional mechanism that facilitates the selection ofthe winner: as soon as a winner has been selected the activation

values of its immediate grid neighbors are increased. If in thenext adaptation step, more than one N[i, j]. Ew is very close toEx. Nevertheless, a small subset of neurons is favored becauseselection of the next winner depends on the output N[i, j].y. Theoutput does not depend only on d(i, j; Ex), but also on the activationN[i, j].a.

6. Initialization for large dimensionm

For high-dimensional signal spaces, the adequate initializationof a SOM becomes difficult. In the experiments EX1-EX4, the DCNGSOM was initialized randomly, i.e. the w-vectors were distributedrandomly within a hypercube enclosing the anticipated data sets.The idea behind this strategy is that for any signal Ex there shouldbe a vector N[i, j] Ew not too far from Ex. This can be passably fulfilled

Page 8: Self organized mapping of data clusters to neuron groups

422 D. Müller / Neural Networks 22 (2009) 415–424

Fig. 8. Generation of a data cluster D ⊂ R36 representing a two dimensional E-shaped point set E = E1 ∪E2 ∪E3 ∪E4∪. The subdivision of E into subsets makes it possible tocheck the neighborhood preservation of the DCNG-mapping. Each sensor element within the 6×6-array emits a signal which is taken as a component of the 36-dimensionalsignal vector Ex. Its strength decreases with increasing distance from the stimulus s.

Fig. 9. Experiment EX5, influence of an inadequate initialization: The DCNGSOM was initialized with w-vectors distributed randomly in cubes CAinit , C

Binit

respectively. Thedata clustersX1, X2 lie outside of both cubes. The results presentedin Figs. 10 and 11 were obtained with initialization cube CAinit .

for m = 3 but not for large m. Let us assume for the momentthat the n2 initialw-vectors define a regular grid with n2 points insignal space. With n2 = 625 and m = 3 an edge of the hypercubecomprises roughly 8 points. Form ≥ 7, however, a regular grid nolonger exists. Thus it is surprising that the experiment with m =36 nevertheless yielded a reasonable, neighborhood preservingmapping.To study this phenomenon directly in a high-dimensional signal

space is difficult since the paths of the w-vectors cannot bevisualized directly. For m = 3 however it can be demonstratedthat even a rather unreasonable initialization may result in asatisfactory mapping. For that purpose, experiment EX1 with tworing shaped data sets X1, X2 and m = 3 was repeated with arather unusual initialization (Experiment EX5): The n2 = 625initial w-vectors defined a set Cinit of points distributed randomlyin a small cube. EX5 was carried out with two such initializationsets CAinit , C

Ainit . Their position relative to the data clusters X

1, X2is depicted in Fig. 9. The training sequence alternated betweenX1, X2 with 1875 adaptation steps for each data cluster. Fig. 10(lower row) shows the temporal development for n = 25 andinitialization set CAinit . In the lower row, the w-paths traced during

three representative intervals of 1875 steps are depicted. Theupper row contains snapshots of the SOMs state at the end ofthe corresponding intervals. During the first subsequence withX1, part of the neurons is used to form G1 (Fig. 10, first column).When the signal input is switched to X2 the group G2 is formedmainly by neurons recruited from G1 because in signal spacethis group is located more closely to the corresponding Ew-regionthan to Cinit . The following X1-signals restore group G1, againusing neurons from G2 as well as from the initialization region.This interplay is repeated until the number of neurons movedfrom Cinit to the G1–G1-region is sufficient to allow for a furthertuning with minor mutual interferences (Fig. 10, third column).In the complete tracing of the experiment, this can be observedmore clearly. Naturally compared to experiments with a moretypical initialization, a longer sequence is necessary to obtain asatisfactory mapping.At the end of each subsequence X1-winners respectively

X2-winners and the neurons which had changed their groupmembership were counted. The results are plotted in Fig. 11.For comparison the curves obtained with the usual initialization(random vectors distributed in a cube enclosing X1 and X2) areshown in Fig. 12.The mechanism which repairs the consequences of an ex-

tremely bad initialization could be visualized only form = 3. How-ever, there is no obvious reason why it should not work for largervalues of m and thus it becomes understandable why the experi-ment with m = 36 was successful. In experiment EX5, a KohonenSOMwas trained in parallel with theDCNG SOM. This revealed thata similar mechanism is effective here.

7. Conclusions

As already explained the intention of this paper is not to presentan improved SOM for purposes of data analysis. However theDCNG-SOM has some properties which might qualify it as a usefulstructure to model neuroplasticity, and especially those aspects oflearning which are connected with focal dystonias. They can besummarized as follows:

• The adaptation procedure has no explicit time dependence.• The DCNG-SOM does not exhaust the whole neuron grid for asingle data cluster. This is important because cortical regions ofthe brain are able to adapt themselves to more then one task.Apparently there are no physiological or anatomic boundarieswhich limit the adaptation process for a single task to asubregion. Thus wemust assume that the limitation is inherentto the adaptation process. In contrast to the Kohonen SOM, theadaptation procedure of the DCNG SOM recruits only a subset

Page 9: Self organized mapping of data clusters to neuron groups

D. Müller / Neural Networks 22 (2009) 415–424 423

Fig. 10. Experiment EX5: A DCNG SOM was initialized with endpoints of the N[i, j]. Ew in a small cube outside of the data clusters X1 and X2 . The upper row shows thedevelopment of groups G1,G2 (snapshots in signal space representation). In the lower row the Ew-paths are shown for intervals enclosing the corresponding snapshot in theupper row. The training sequence consisted of alternating subsequences with signals from X1, X2 respectively.

Fig. 11. Experiment EX5, time behavior of group memberships: Alternatingsubsequences of 1875 steps with signals from X1 , X2 respectively were presentedto the DCNG SOM with n2 = 625 neurons. Within a X1-subsequence X1-winnersand former X2-winners becoming X1-winners are marked and counted (vice versafor X2-subsequences.)

of the disposable neurons for the representation of a single datacluster X r . Supposing the number of neurons is large enough,there will remain a sufficient stock of neurons to represent thenext cluster X s by a group Gs. The neuron group Gr previouslytrained to represent X r is left almost unchanged.• After a training sequence of sufficient length, the groups Grshow a remarkable stability. Depending on the initialization,presenting each data cluster X r once can be sufficient toestablish clearly visible groups Gr . This possibly explainsa rather paradoxical observation made in neuroplasticityexperiments: Training a new task quickly produces newstructures within the brain. On the other hand, it is difficult toerase an older, unfavorable structure (Bangert & Altenmüller,2003; Jabusch & Altenmúller, 2006).• The adaptation procedure automatically differentiates betweenlearning a new structure and tuning an existing structure:

Fig. 12. Experiment EX5a (=EX5 with usual initialization).

Training signals from a new cluster Xn sufficiently distant fromthe hitherto used clusters X r(r 6= n) trigger the formation ofa new neuron group Gn. Signals from the neighborhood of analready used cluster mainly modify the mapping of this cluster.• With respect to an inadequate initialization,which is practicallyinevitable for large signal space dimensionm, the DCNG SOM isfairly robust. This, however, is not a particular property of thisSOM variant. A similar robustness is observed in the KohonenSOM.

References

Bangert, M., & Altenmüller, E. (2003). Apollos Gabe und Fluch – Funktionelle undDysfunktionelle Plastizität bei Musikern. Neuroforum, 7, 4–14.

Berglund, E., & Sitte, J. (2006). The parameter-less self-organizing map algorithm.IEEE Transactions on Neural Networks, 17(2), 305–316.

Bishop, C. M., Svensén, M., & Williams, C. K. I. (1998). GTM: The generativetopographic mapping. Neural Computation, 10(1), 215–235.

Draganski, B., Gaser, C., Busch, V., Schuierer, G., Bogdahn, U., & May, A. (2004).Neuroplasticity: Changes in grey matter induced by training. Nature, 427,311–312.

Page 10: Self organized mapping of data clusters to neuron groups

424 D. Müller / Neural Networks 22 (2009) 415–424

Elbert, T., Candia, V., Altenmüller, E., Rau, H., Sterr, A., Rockstroh, B., et al. (1998).Alteration of digital representations in somatosensory cortex in focal handdystonia. NeuroReport , 16, 3571–3575.

Fritzke, B. (1994a). Growing cell structures—a self-organizing network forunsupervised and supervised learning. Neural Networks, 7(9), 1441–1460.

Fritzke, B. (1995). A growing neural gas network learns topologies. In G. Tesauro, D.S. Touretzky, & T. K. Leen (Eds.), Advances in neural information processing: Vol.7 (pp. 625–632).

Fritzke, B. (1994b). Growing cell structures — a self-organizing network forunsupervised and supervised learning. Neural Networks, 7(9), 1441–1460.

Haese, K. (1999). Kalman filter implementation of self-organizing feature maps.Neural Computation, 11(5), 1211–1233.

Haese, K., & Goodhill, G. J. (2001). Auto-SOM: Recursive parameter estimation forguidance of self-organizing feature maps. Neural Computation, 13(3), 595–619.

Hogan, J., & Diederich, J. (1995). Random neural networks of biologically plausibleconnectivity. Technical report. Australia: Queensland University of Technology.

Iglesias, R., & Barro, S. (1999). SOAN: Self organizing with adaptive neighbourhoodneural network. Proceedings of the IWANN (pp. 591–600).

Jabusch, H.-Chr., & Altenmüller, E. (2006). Focal dystonia in musicians: Fromphenomenology to therapy. Advances in Cognitive Psychology, 2(2–3), 207–220.

Kaas, J. H. (1991). Plasticity of sensory and motor maps in adult mammals. AnnualReview 0f Neuroscience, 14, 137–167.

Khosravi, M. H., & Safabakhsh, R. (2005). Human eye inner bound detection using amodified time adaptive self-organizing map. In IEEE international conference onimage processing 2005: Vol. 2. (pp. 11–14). (pp. II - 802-5).

Kohonen, T. (1982). Self-organized formation of topologically correct feature maps.Biological Cybernetics, 43, 59–69.

Kohonen, T. (1995). Self-organizing maps. Berlin: Springer-Verlag.Lang, R., &Warwick, K. (2002). The plastic self organizingmap. In Proc. 2002 int. jointconf. neural netw. Vol. 1 (pp. 727–732).

Martinetz, T., Ritter, H., & Schulten, K. (1988). Kohonen’s self-organizing map formodeling the formation of the auditory cortex of a bat. In SGAICO Proceedingsconnectionism in perspective (pp. 403–412).

Merzenich, M. M., Nelson, R. J., Stryker, M. P., Cynader, M., Schoppman,A., & Zook, J. M. (1984). Somatosensory cortical map changes followingdigit amputation in adult monkeys. Journal of Comparative Neurology, 224,591–605.

Shah-Hosseini, H., & Safabakhsh, R. (2003). TASOM: A new time adaptive self-organizing map. IEEE Transactions on Systems, Man, Cybernetics, Part B, 33(2),271–282.

Sanger, T. D., &Merzenich,M.M. (2000). Computationalmodel of the role of sensorydisorganization in focal task-specific dystonia. Journal of Neurophysiology, 84,2458–2464.

Sirosh, J., & Miikkulainen, R. (1995). Modeling cortical plasticity based on adaptinglateral interaction. In J. M. Bower (Ed.),Neurobiology of computation. Proceedingsof the third annual computation and neural systems conference (pp. 305–310).Norwell, MA: Kluwer academic publishers.

Turrigiano, G. G., & Nelson, S. B. (2004). Homeostatic plasticity in the developingnervous system. Nature Review Neuroscience, 5, 97–107.

Voegtlin, T. (2002). Recursive self-organizing maps. Neural Networks, 15(8–9),979–991.

Wiemer, J., Spengler, F., Joublin, F., Stagge, P., & Wacquant, S. (2000). Learningcortical topography from spatiotemporal stimuli. Biological Cybernetics, 82,173–187.