research article fusing swarm intelligence and self...

Research ArticleFusing Swarm Intelligence and Self-Assembly forOptimizing Echo State Networks

Charles E Martin1 and James A Reggia2

1HRL Laboratories LLC 3011 Malibu Canyon Road Malibu CA 90265 USA2Department of Computer Science University of Maryland College Park MD 20742 USA

Correspondence should be addressed to Charles E Martin martince444yahoocom

Received 26 December 2014 Revised 17 April 2015 Accepted 11 May 2015

Academic Editor Okyay Kaynak

Copyright copy 2015 C E Martin and J A ReggiaThis is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in anymedium provided the originalwork is properly cited

Optimizing a neural networkrsquos topology is a difficult problem for at least two reasons the topology space is discrete and the qualityof any given topology must be assessed by assigning many different sets of weights to its connections These two characteristicstend to cause very ldquoroughrdquo objective functions Here we demonstrate how self-assembly (SA) and particle swarm optimization(PSO) can be integrated to provide a novel and effective means of concurrently optimizing a neural networkrsquos weights and topologyCombining SA and PSO addresses two key challenges First it creates a more integrated representation of neural network weightsand topology so thatwe have just a single continuous search domain that permits ldquosmootherrdquo objective functions Second it extendsthe traditional focus of self-assembly from the growth of predefined target structures to functional self-assembly in which growthis driven by optimality criteria defined in terms of the performance of emerging structures on predefined computational problemsOur model incorporates a new way of viewing PSO that involves a population of growing interacting networks as opposed toparticles The effectiveness of our method for optimizing echo state network weights and topologies is demonstrated through itsperformance on a number of challenging benchmark problems

1 Introduction

In this paper we demonstrate how two very different nature-inspired methodologies self-assembly (SA) [1] and particleswarm optimization (PSO) [2] can be integrated to providea novel and effective means of concurrently optimizing aneural networkrsquos weights and topology Such an approachaddresses two important challenges The first challenge isfinding a more integrated representation of neural networkweights and topology so that rather than having to search inboth a continuousweight space and a discrete topology spacethere is just a single continuous search domain that permitsldquosmootherrdquo objective functions The second challenge isextending the traditional focus of self-assembly research fromthe growth of predefined target structures to functional self-assembly in which growth is driven by optimality criteriadefined in terms of the quality or performance of theemerging structures on predefined computational problems

Swarm intelligence systems which consist of autonomousagents interacting in a simple and local manner exhibit

complex global behavior that emerges as a consequence oflocal interactions among the agents [3ndash10] Researchers havecreated a wide range of new problem-solving algorithmsinspired by nature and based on the governing principles ofswarm intelligence [11ndash17] Of particular importance to theresearch presented in this paper is the powerful and broadlyapplicable swarm intelligence based optimization methodknown as particle swarm optimization (PSO) [18ndash23]

Particle swarm optimization has been applied exten-sively to train neural network weights A wide variety ofdifferent adaptations and hybridizations of PSO have beendeveloped for this purpose [24ndash30] Despite the success inusing PSO to optimize network weights there has beenonly limited success in applying it to topology optimiza-tion and such applications have largely been restricted tofeedforward networks The methods that do exist implementfairly complicated adaptations of the basic PSO algorithmor enforce stringent restrictions on the domain of feasiblenetwork topologies [31ndash33] While the method presented

Hindawi Publishing CorporationComputational Intelligence and NeuroscienceVolume 2015 Article ID 642429 15 pageshttpdxdoiorg1011552015642429

2 Computational Intelligence and Neuroscience

Input layer Reservoir Output layer

Figure 1 Schematic of an echo state network (ESN) The solidarrows represent connections with fixed weights and the dashedarrows represent connections with trainable weights Connectionsfrom the input layer to the output layer and from the output layerto the reservoir are optional As usual input to the network entersthrough the input layer and the networkrsquos output is generated in theoutput layerThe bias node and its connections with fixed weights tothe reservoir are not shown

in this paper does not adapt the number of nodes in thenetwork as some other algorithms do it optimizes overgeneral recurrent neural network topologies using one of themost basic forms of PSO which is a unique feature of ourapproach Furthermore the form of the underlying modelof network growth that we present here does not place anyconstraints on the type of PSO used to drive the optimizationand therefore the user may readily swap in whatever versionof PSO is deemed most suitable for the learning task athand Our methodrsquos incorporation of basic PSO would beparticularly advantageous in cases where the topology of aphysical network was being optimized

The work presented here involves the optimization ofa relatively recently developed class of recurrent neuralnetwork models known as echo state networks (ESNs) [34]Echo state networks have already been successfully applied toa wide range of different problems [35ndash45]They consist of aninput layer a hidden layer or ldquoreservoirrdquo and an output layer(shown in Figure 1) Typically each neuron in the input layerconnects to every neuron in the reservoir there is randomlygenerated sparse connectivity among reservoir neurons eachneuron in the reservoir connects to each neuron in the outputlayer a bias neuron may connect to the neurons in thereservoir and sometimes connections from the input layer tothe output layer and from the output layer to the reservoirare present The central innovation of the echo state networkapproach is that only the weights on the connections from thereservoir to the output neurons (output weights) are trainedand the activation functions of the output neurons are linearso all that is needed to train them is linear regression Theremaining weights are typically assigned random values Foran echo state network with 119873

119903reservoir neurons and 119873

119900

output neurons the output weights are trained as follows Asequence of training data of length 119871 + 119872 is chosen where119872 gt 119873

119903The first 119871 values of the sequence are passed through

the network in order to remove the effects of the initial stateof the reservoir Then the remaining119872 values are input intothe network and the resulting reservoir states 119890

119894isin R119873119903

for 119894 = 1 119872 are assigned to the rows of the matrixS isin R119872times119873119903 For each network input and resulting reservoirstate 119890

119894 there is a target network output 119889

119894The target network

outputs are assigned to the rows of the matrix D isin R119872times119873119900

such that the 119894th rows of S and D are the correspondingreservoir state and target output pair LetW isin R119873119903times119873119900 be theoutput weight matrix where the 119895th column ofW representsthe weights on the connections from the reservoir to the119895th output neuron Training the output weights amounts tofinding an approximate solution W

119886to the overdetermined

system

SW = D (1)

The output weights W119886are determined by solving (1) in a

ldquoleast squaresrdquo senseSelf-assembly involves the self-organization of discrete

components into a physical structure for example the growthof connections between nodes in a physical network Workin this area has traditionally focused on what we will refer toas the classic self-assembly problem which entails the designof local control mechanisms that enable a set of componentsto self-organize into a given target structure without indi-vidually preassigned component positions or central controlmechanisms Issues surrounding self-assembly have been avery active research area in swarm intelligence over the lastseveral years with recent work spanning computer simula-tions [46ndash51] physical robotics [52ndash57] and the modeling ofnatural systems [1 12] However to the best of our knowledgethere has been no work on using swarm intelligence methodsto extend the classic self-assembly problem to functional self-assembly in which components self-organize into a comput-ing structure that is optimized for a predefined computationalproblem

The research presented in this paper is concerned withthe self-assembly of neural network architectures Unlikemost past work on self-assembly a major aspect of thework presented here involves the growth of connectionsbetween discrete spatially separated nodescomponents [58]We recently demonstrated how swarm intelligence in theform of collective movements can increase the robustness ofnetwork self-assembly and enhance the ability to grow largetopologically complex neural networks [50] However thisearlier work focused on the classic self-assembly problemwhere a target network was prespecified which is in contrastto the more difficult problem of functional self-assembly thatwe consider here

Other related past works in computer science andengineering where research on artificial neural networksis concerned with application-related performance havelargely ignored the issues of neural network growth devel-opment and self-assembly with two exceptions First anumber of computational techniques have been created tooptimize neural network architectures by addingdeletingnodesconnections dynamically during learning [59ndash63]

Computational Intelligence and Neuroscience 3

Unlike the approach taken here these past network construc-tion methods do not involve growth or self-assembly in aphysical space and so are not considered further Second atechnique known as developmental encoding has been usedby researchers evolving neural network architectures withgenetic algorithmsprogramming [64ndash70] Unlike the workpresented here in these past techniques different individ-uals within a population do not directly interact with oneanother during the development process Such interactionoccurs only indirectly through the processes of selection andcrossover

Both weights and topology affect a neural networkrsquos per-formance To date substantially more focus has been placedon techniques for optimizing a neural networkrsquos weights asopposed to its topology (the number and connectedness ofits nodes) One of the primary reasons for this discrepancyis that the space searched by an optimization method for agood set of weights (the ldquoweight spacerdquo) is continuousThus agood set of weights can be found using one of awide variety ofpowerful and well-studied optimization techniques based onlocal gradient information [71] Additionally global optimiza-tion techniques such as particle swarm optimization (PSO)evolutionary computation (EC) and simulated annealinghave proven to be very effective at weight optimization

When optimizing in continuous domains specificallymultidimensional Euclidean space R119899 optimization algo-rithms tend to operate under the basic heuristic (referredto here as the continuity-heuristic) that given two points ina search space each of which represents a good solutionit is likely that a better solution exists somewhere betweenor around these points This heuristic has been found to begenerally useful in optimizing objective functions definedon R119899 The ldquotopology spacerdquo however is a discrete spaceThe discrete nature of this search domain coupled with theintrinsic interdependence between neural network weights(parameters) and topology (structure) results in a varietyof additional challenges not encountered when optimizingthe weights of a network with a fixed architecture Firstit is common for many of the nodes in a neural networkto be identical from a computational perspective such asthe nodes in the hidden layer which means that manypairs of points in the topology space that are far apart willrepresent identical or very similar topologies Second certainconnections may influence performance much more thanothers This effect depends on factors such as a networkrsquostopology the learning algorithmused and the computationalproblem a network is tasked with solving This meansthat in typical representations of the topology space wherea network that has an input-to-output connection wouldhave a nearly identical representation to one that does nothave such a connection (all other connections being thesame) many points (topologies) that are near each otherwill represent network architectures that are associated withvastly different fitness valuesThird the quality of a particulartopology is dependent on the set of weights associatedwith it and vice versa This interdependence means thatrather than having a fixed fitness value a point (topology)in the topology space has a distribution of fitness valuesgenerated by associating different sets of weights with its

connectionsThis fact increases the ldquoroughnessrdquo of the fitnesslandscape defined over the topology space The first of theabove characteristics implies that the distance between twopoints in the topology space often does not accurately reflectthe similaritydissimilarity of the topologies represented bythe points The second and third characteristics coupledwith the discrete nature of the topology space imply thatnearby points often represent topologies with very differentfitness values which produces very rough fitnessobjectivefunctionsTherefore the properties thatmake the continuity-heuristic useful in theweight space are largely absent from thetopology space

If we could find a more integrated means of represent-ing neural network weights and topology such that thesearch domain consisted of a single continuous ldquoweight-topology spacerdquo then this representation might preserve thecontinuity-heuristic and permit smoother objective func-tions This is precisely what we have done The integra-tion involves representing weights and topology using self-assembling neural networks that grow through a single con-tinuous three-dimensional space Our approach makes useof the fact that given two neural networks with differenttopologies if the connections that these networks do nothave in common have weights that are small enough inmagnitude then the networks will have roughly the sameeffective topologies in the sense that signals transmitted viathese connections will be highly attenuated and thus tend tohave very little influence on network dynamics Thereforefrom the perspective of network performance it is as ifthe connections are not actually there The idea of a neuralnetwork having an effective topology is a key concept inthe work presented here It explains why our approach cansimultaneously optimize both weights and topology whileoperating over a continuous space Specifically as long asthe weight threshold that triggers the removal (addition) ofa connection is small enough then traversing this thresholdin the direction that results in the removal (addition) ofa connection will yield a network with a new topologybut which has nearly identical dynamics compared to itscounterpart network This is the case because the removed(added) connection would have a weight with magnitudenear 0 Thus if the objective function depends only onnetwork dynamics then this newnetwork and its counterpartwill evaluate to nearly identical values under the objectivefunction

The primary contributions of the work presented herearise from addressing the challenges inherent in the simulta-neous optimization of neural network weights and topologyThe first contribution is the development of a means ofusing network self-assembly as a representation of the searchdomain encountered in this optimization problem therebysimplifying the domain to a single continuous space Secondour work puts forward a new way of viewing PSO thatinvolves a population of growing interacting networks asopposed to particles This adaptation is used to turn theself-assembly process into an optimization process and tothe best of our knowledge is the first demonstration ofsuch a complimentary relationship between self-assemblyand particle swarm optimization Third the version of PSO


that our work incorporates which is a particularly elegantform of the algorithm has not been previously used to simul-taneously optimize neural network weights and topologyLastly we demonstrate the effectiveness of a software-basedimplementation of the integration of self-assembly and PSOby using it to grow high-quality neural network solutionsto a variety of challenging benchmark problems from thedomains of time-series forecasting and control

2 Methods

In this section we introduce our model that represents anextension of the traditional self-assembly problem in that thegrowth of network structures is based on optimality criteriaand not on target structures that are specified a priori

21 Integrating Self-Assembly and Particle Swarm Optimiza-tion We now present the details of our model for simultane-ously optimizing neural network weights and topology Wecall this model SINOSA which stands for Swarm IntelligentNetwork Optimization through self-assembly In this modelgroups of growth cones that belong to different networkssimultaneously grow through the same three-dimensionalspace During the growth process the growth cones from dif-ferent networks interact with one another through a mecha-nism inspired by particle swarm optimization Concurrentlythe networks receive input derived from a computationalproblem that they must learn to solve The combination ofthis interaction and the activity run through the networksduring the development process leads to the self-assemblyof neural networks with weights and topologies that areoptimized for solving the problem at hand An animationthat exemplifies the growth process can be viewed at sup-plied URL (see Supplementary Material available online athttpdxdoiorg1011552015642429)

211 Objects and Relations Throughout this section theconcrete example of the model illustrated in Figure 2 isreferenced for clarification The SINOSA model consists of aset of cellsCwith fixed positions in 3D space that are assigneda priori The cells represent neuron cell bodies Each cell119888119894isin C has a set N

119888119894

sube C which may be empty of ldquoneighborrdquocells that it can connect to where 119894 = 1 2 |C| In Figure 2the three large grey spheres represent cells C and each cellis allowed to connect to any other cell including itself ThusN119888119894

= C for 119894 = 1 2 3Each growing network consists of the same set of cells C

and a unique set of growth cones that guide the networkrsquosaxons through the three-dimensional space Given 119899 simul-taneously growing networks each cell 119888

119894has 119899 sets of growth

cones G119894119895 where 119895 = 1 2 119899 Any given cell 119888

119894contributes

the same number of growth cones to each growing networkThat is for all 119895 and ℓ |G

119894119895| = |G

119894ℓ| ensuring that all of

the growth cone neighborhoods (explained below) amongthe growth cones in G

119894= ⋃119895G119894119895are of the same size

If N119888119894

is empty then so is G119894119895 for all 119895 The 119895th growing

network gnet119895consists of the set of cells C and the set

of growth cones G119895= ⋃119894G119894119895that produce the networkrsquos

growth That is gnet119895is defined by the ordered pair ⟨CG

119895⟩

Because each growing network consists of the same set of cellsC they all have exactly the same number of growth cones(|G119895| = |G

ℓ| where 119895 ℓ = 1 2 119899) In Figure 2 the small

circles represent growth cones and the lines that connectthe cells and growth cones are axons In this case there are119899 = 3 growing networks each having six growing axonsThe growing axons of any particular network are shown ina unique line-style (solid dotted or dash-dot) To clarifythis Figure 2(b) shows only the solid-line growing networkFigure 2(c) shows the static network that is derived fromthe solid-line growing network which is described in thenext section Figure 2(a) illustrates how all three networkssimultaneously grow through the same space and share thesame three cells

Let 119888119896isin N119888119894

be the 119896th neighbor cell of 119888119894 Then for

each 119888119896 the cell 119888

119894contributes exactly two growth cones (119892119904

119894119895119896

for 119904 isin ldquo + rdquo ldquo minus rdquo) to each of the growing networks119895 = 1 2 119899 When 119904 = ldquo+rdquo the growth cone representspositively weighted connections and when 119904 = ldquominusrdquo thegrowth cone represents negatively weighted connectionsThepositive-negative growth cone pair 119892+

119894119895119896and 119892minus119894119895119896

may establishconnections to exactly one target cell namely 119888

119896 Based on

these relations and since N119888119894

= C for 119894 = 1 2 3 in Figure 2each of the cells contributes three positive-negative growthcone pairs to each of the three growing networks Howeverfor the sake of clarity only 6 of the 18 growing axons pernetwork are shown

212 Interpreting Network Growth as Self-Assembly TheSINOSA model grows neural networks that in their com-pleted form have fixed connections Thus it is necessary tointerpret the positions of a growing networkrsquos growth conesrelative to their target cells so as to map a growing networkgnet119895 to a static network snet

119895 In particular if the positive-

negative growth cone pair 119892+119894119895119896

and 119892minus119894119895119896

from cell 119888119894and

growing network gnet119895are both positioned so as to be able

to establish a connection to cell 119888119896 then the weight on the

connection from 119888119894to 119888119896in the static network snet

119895is the

sum of the individual weights contributed by the growth conepair

In the SINOSA model the function from the space ofgrowing networks to the space of static networks is designedaround the need to create a neural network representationthat more closely integrates the concepts of topology andconnection weight so that the canonical PSO algorithm canbe used to optimize these network characteristics effectivelyThis function is implemented as follows Each growth coneis considered to be at the center of its own sphericallysymmetric ldquoweight fieldrdquo that is finite in extent and itscorresponding weight has a magnitude that decreases to zeroas the distance from the growth cone increases A growthcone establishes a connection with a target cell if the cellis within the boundary of its weight field otherwise noconnection is created The spacing between cells is such thatno more than one cell can be within a growth conersquos weightfield at any given time The weight on the connection is


+

++

+

+

+

+

+

+

minus

minus

minus

minus

minusminus

minus

minus

minus

(a)

13

10

13

22

15

05

07

++

+

minus

minus

minus

(b)

04

035

minus025

minus065

(c)

Figure 2 Three growing neural networks and their interpretations as static neural networks based on the SINOSA model The three largespheres represent cells the smaller circles represent growth cones and the lines between cells and growth cones denote connections (axons)The growth cones that are drawn with a ldquo+rdquo symbol have a positive weight field and those that are drawn with a ldquominusrdquo symbol have a negativeweight field (a)The growth cones and axons that belong to a particular growing network are all shown with the same line-style (solid dottedor dash-dot) The straight dashed lines between growth cones indicate two of the six growth cone neighborhoods Growth cones within aneighborhood interact with one another according to the canonical PSO algorithm All three growing networks share the same three cells(b) The solid-line growing network is shown without the other two (c) The corresponding static network to which the solid-line growingnetwork is mapped based on the proximity of its growth cones to their target cells The numbers represent connection weights

the value of the field at the target cellrsquos center Formally aweight field is a function from R to R with the form

119908 (119903) =

119886119903120572

+ 119887 if 119903 lt 1199030

0 if 119903 ge 1199030

(2)

where 119886 119887 isin R 119903 ge 0 is the distance of the target cell from thecenter of the growth cone 1199030 gt 0 is the extent of the weightfield and 120572 gt 0 In the SINOSA model it is assumed that119908(119903) rarr 0 as 119903 rarr 1199030 Thus 119908(1199030) = 119886119903

120572

0 + 119887 = 0 whichimplies that 119886 = minus1198871199031205720 = minus119908(0)119903

120572

0 Figures 2(b) and 2(c)

illustrate how one of the three interacting growing networksshown together in Figure 2(a) is mapped to a static networkbased on the weight field interpretation of the growth conesrsquopositions relative to their target cells The growth cones thatare drawn with a ldquo+rdquo symbol have a positive weight fieldrepresented by the function

119908+(119903) =

minus12119903 + 1 if 119903 lt 2

0 if 119903 ge 2(3)


where 119903 is the distance between a growth cone and one ofits target cells The growth cones that are drawn with a ldquominusrdquosymbol have a negative weight field expressed by the function

119908minus(119903) =

12119903 minus 1 if 119903 lt 2

0 if 119903 ge 2(4)

Thus in this scenario the weights are restricted to the interval[minus1 1] Figure 2(b) shows the solid-line networkrsquos growingaxons along with the distance between each growth cone andits nearest target cell Figure 2(c) shows the static networkderived from the solid-line growing network The numbershere are the connection weights This mapping occurs asfollows The cell shown in the lower left hand corner ofFigures 2(b) and 2(c) establishes a connection with weight119908minus(15) = minus025 to the upper cell but does not establish

a connection with the cell in the lower right hand cornerbecause 119908

+(22) = 0 The upper cell makes a connection

to the lower left hand cell with weight 119908+(05) + 119908

minus(13) =

04 The lower right hand cell makes a connection to thelower left hand cell with weight 119908

+(13) = 035 and it

makes a connection to the upper cell with weight 119908minus(07) =

minus065 The other two growing networks are mapped to theircorresponding static network representations in an analogousmanner

The function that maps growing networks to static net-works is formulated so that a small change in the positionof a growth cone produces a small change in the weighton a connection or if the change in position results in theaddition or removal of a connection then the added orremoved connection has a weight that is small in magnitudeIn other words a small change in the physical configurationof a growing network will produce a small change in theweights and topology of the static network to which itis mapped This characteristic coupled with the fact thatnetwork optimization via the SINOSA model occurs in asingle continuous weight-topology space results in muchsmoother objective functions

213 Incorporating PSO Using network self-assembly as ourrepresentation scheme yields a single continuous weight-topology space that needs to be searched during the opti-mization process In other words we need to extend theclassic self-assembly problem to functional self-assemblyGiven that we have a single continuous search domain awide variety of different optimization algorithms could beused for this purpose However we chose to use the PSOalgorithm because it is intuitive and effective and our modelincorporates a version of the algorithm for which applicationto the concurrent optimization of neural networkweights andtopology has not previously been explored Specifically weutilize one of the most basic formulations of PSO which willbe referred to as canonical PSO Canonical PSO specifies thatthe particles velocities are governed by

V119894(119905 + 1) larr997888 120594 (V

119894(119905) + 119886

1199011 otimes (119887119890119904119905119894 minus 119903119894) + 1198861198992

otimes ( 119899119887119890119904119905119894minus 119903119894))

(5)

where 119903119894(119905) is the position of the 119894th particle at time 119905 V

119894(119905)

is its velocity 119887119890119904119905119894

is the current best position of the 119894thparticle 119899

119887119890119904119905119894is the best position among any of its neighbor

particles 120594 is a scaling factor known as the constrictioncoefficient 119886

119901and 119886

119899are positive constants 1 and 2 are

vectors whose components are drawn from the uniformprobability distribution over the unit interval and the otimessymbol represents the component-wise vector product (ie[1198861 1198862] otimes [1198871 1198872] = [11988611198871 1198862b2]) It is standard practice toupdate the positions of the particles using a Forward Eulerstep with a step-size of 10 that is 119903

119894(119905 + 1) larr 119903

119894(119905) + V

119894(119905 + 1)

The appeal of this version of PSO lies in its simplicity andin its proven effectiveness on a wide range of optimizationproblems

In order to integrate particle swarm optimization andself-assembly the particles in PSO are viewed as being partof a larger structure Almost all implementations of PSOconsider the particle to be the fundamental type of objectcapable ofmovement and interaction during the optimizationprocess In the research presented here the growing networkplays the role of the fundamental type of object involvedin the optimization process That is instead of a populationof moving particles there is a population of growing net-works The transition from particles to networks is achievedby having growth cones play the role that particles do intraditional PSO Growth cones occur at the leading tips ofgrowing axons (connections) and guide their movementsthrough the physical space The growth conesrsquo movementsare dictated by the canonical PSO equation (5) and becausegrowth cones occur at the leading tips of growing axons theirmovements generate network growth Unlike in traditionalPSO the position of a growth cone (particle) however it isinterpreted is only meaningful when the axonneuron that itis a part of is taken into account

Since the growth cones from different growing networksinteract with one another according to the canonical PSOalgorithm during the self-assembly process each growth conemust be assigned a quality (fitness) value that indicates theusefulness of the best solution component (connection) thegrowth cone has found and it must remember its personalbest position which represents the best connection found bythe growth cone up to the current point in the growth processSpecifically at each discrete time-step 119905 isin N the performanceof each static network snet

119895(119905) on some set of training data is

determined where 119895 = 1 2 119899 For each growing networkgnet119895(119905) if the performance of snet

119895(119905) is better than the

performance of snet119895(119905 minus 120591) for all 120591 isin N such that 0 lt

120591 le 119905 then the fitness value of gnet119895(119905) or more specifically

its growth cones 119892119904119894119895119896isin G119895 is set to the performance value

of snet119895(119905) and the personal best position of each growth

cone 119892119904119894119895119896

is set to its current position In theory it is possibleto determine the fitness of each growth cone in a networkindividually rather than collectively at the network level Todo this one would need to determine a fitness value propor-tional to E[Network Performance | Growth Cone Weight]the expected value of the networkrsquos performance as a functionof the growth conersquos weight We chose the former approachfor two reasons First computing such an expectation value


requires averaging network performance over a very largenumber of possible growth cone positions that constituteinstantiations of different networks Second each individualconnection has only minimal influence on network perfor-mance and thus optimizing them individually tends to leadto convergence on suboptimal solutions

Any growth cone 119892119904119894119895119896

must have a set of neighbor growthcones N

119892119904

119894119895119896

that influence its movements As in most imple-mentations of PSO the research presented here adheres to thecondition that the neighbor relation is symmetric That is if119892119904

119894ℓ119896is a neighbor of119892119904

119894119895119896 then119892119904

119894119895119896is a neighbor of119892119904

119894ℓ119896There is

a wide variety of different ways that a growth conersquos neighborscould be selected However certain characteristics of the self-assemblyoptimization process limit the number of usefulchoices It is an underlying assumption of the PSO algorithmthat the closer two neighbor particles get to one another themore similar the solutions or solution components that theirpositions represent are It is essential for the effectivenessof the PSO algorithm that if two growth cones 119892119904

119894119895119896and

119892119904

119894ℓ119896are neighbors and they occupy the same position then

they represent exactly the same weighted connection in theirrespective static networks snet

119895and snet

ℓ

In the SINOSA model if two growth cones occupy thesame position but are guiding axons from different cells thenthey represent two completely different connections (solutioncomponents) Likewise if two growth cones occupy the sameposition but do not have exactly the same set of target cellsthen they may represent different connections These twoscenarios and the need for growth cones that are neighborsto avoid circumstances where they occupy the same positionand yet represent different weighted connections lead tothree necessary growth cone neighborhood properties Firstif a pair of growth cones are neighbors then they mustbe guiding axons from the same cell Second if a pair ofgrowth cones are neighbors then they must have exactlythe same set of target cells Third if a pair of growth conesare neighbors then their weight fields must be expressed bythe same function The following is a simple and effectiveway of choosing a growth conersquos neighbors such that theseproperties are satisfied For any cell 119888

119894and growing network

gnet119895 the neighbor growth cones of the growth cone 119892119904

119894119895119896isin

G119894119895with target cell 119888

119896and sign 119904 are members of the set

N119892119904

119894119895119896

sub 119892119904

119894ℓ119896isin G119894ℓ| ℓ = 1 2 119899 In Figure 2(a) the

dashed lines between growth cones explicitly show two ofthe 18 growth cone neighborhoods (only 6 of the 18 growingaxons per network are shown) Because each growth coneneighborhood consists of three growth cones connected ina ring topology N

119892119904

119894119895119896

= 119892119904

119894ℓ119896isin G119894ℓ| ℓ = 1 2 3 and ℓ = 119895

When the SINOSA model is used to grow a networkthat is optimized for a computational problem on everytime-step of the growth process the performance of eachstatic network is evaluated and used to update the fitnessvalues of the growth cones The positions of the growthcones are then updated according to the canonical PSOalgorithm Then on the next time-step the new physicalconfigurations of the three growing networks are mappedto their corresponding static networks and the evaluation

Mackey-Glass time-series

y

t

004

06

08

1

12

14

100 200 300 400 500 600 700 800 900 1000

Figure 3 An example of the time-series generated by (6) withparameters 120572 = 02 120573 = 100 120574 = 01 and 120591 = 17

process repeats The growth process terminates and the bestperforming static network found during the growth processis returned after a predefined number of time-steps or onceone of the static networks satisfies a prespecified performancecriterion

22 Experimental Methods Here we cover the implemen-tation details of the SINOSA model when it is used tooptimize neural networks for the Mackey-Glass time-seriesforecasting problem and the double pole balancing problemThese problems were selected because they are challengingand widely used benchmark tasks in the domains of time-series forecasting and control and a wide variety of differentneural network trainingoptimization algorithms have beenused in solving them

221 Computational Test Problems The first problem isforecasting the chaotic Mackey-Glass time-series [39 72 73]The time-series is generated by the delay differential equation

119889119910

119889119905=120572119910 (119905 minus 120591)

1 + 119910 (119905 minus 120591)120573minus 120574119910 (119905) (6)

where 120572 120573 120574 and 120591 isin R+ When 120591 gt 168 the time-series ischaotic Figure 3 shows a sample of the time-series

The second problem is the double pole balancing prob-lem (see Figure 4) which is a classic benchmark controlproblem particularly for neural network controllers (neuralcontrollers) [74ndash76] The double pole balancing problemconsists of using a controller to balance two poles withdifferent lengths that are hinged to the top of a cart thatmovesalong a track of finite length The controller attempts to keepthe poles up-right by applying a force 119865

119888to either side of the

cart in a direction parallel to the track To be successful thecontroller must keep the cart within a specified distance 119909limitfrom the center of the track and itmust keep each pole withina specified angular limit 120579limit from the verticalThe equationsgoverning the dynamics of a cart with119873 poles can be foundin [76]

222 Implementation Details The SINOSA model is imple-mented as a simulation environment written in Java Thecomputational experiments presented in Section 3 each ranon a computer with two quad-core 233GHz Intel Xeonprocessors 8 GB of shared RAM and 12MB of L2 cache


1205792

1205791

Fc

x

Figure 4 The cart-pole system used in the double pole balancingproblem The state of the system is defined by the position 119909 of thecart relative to the center of the track and the angular positions 1205791and 1205792 of the large and small poles relative to the verticalThe controlforce 119865

119888is applied to the side of the cart in a direction parallel to the

track

per processor The computational requirements listed hereare for the growth of echo state networks using collectivemovements unless stated otherwise The environment inwhich networks grow was unbounded and no particularunits were assigned to time and distance The components ofgrowing networks (cells axons and growth cones) were notable to collide with one anotherThe cellsrsquo positions remainedfixed throughout the growth process Unless stated otherwisein every experiment the positions of the cells were fixed on acentered rectangular lattice with 80 distance-units betweenadjacent lattice points there were 16 growing networks andthe growth cone neighborhoods adhered to a von Neumanntopology (square lattice with periodic boundary conditions)

The dynamics of the growth cones are governed by thecanonical PSO equation (5) where 119886

119901= 119886119899= 20120594 = 065 for

the Mackey-Glass experiments and 120594 = 0725 for the doublepole balancing experiments For all of the experiments eachgrowth conersquos weight field was linear (120572 = 1) and had a radius1199030 = 20 By convention one time-step in the SINOSAmodelis equivalent to 10 unit of time

Many of the experimental results presented in Section 3are compared to control cases that incorporated randomgrowth conemovements as opposed to collectivemovementsdriven by the canonical PSO equation generated as followsAt every time-step of the growth process each growth coneis placed at a unique randomly selected position that is lessthan a distance 1199030 from the center of its target cell where 1199030is the extent of the growth conersquos weight field This way theweights on the connections are randomly generated

For each of the computational problems discussed inSection 221 the SINOSA model was used to grow echostate networks as solutions The computational experimentsdescribed in the Results were designed to test the optimiza-tion capabilities of the SINOSA model

3 Results

31 Mackey-Glass Time-Series In all of the experiments thatinvolve the Mackey-Glass time-series the parameters of (6)were set to the following commonly used values 120572 = 02120573 = 10 120574 = 01 and 120591 = 17 These values yield aldquomildlyrdquo chaotic time-series The time-series was generatedby solving (6) using the Matlab delay differential equationsolver dde23 with a maximum step-size of 10 a relativeerror tolerance of 10minus4 and an absolute error tolerance of10minus16 For every time-series generated from (6) an initialsequence of data points was randomly generated from theuniform probability distribution over the interval [0 1] and(6) was integrated for 1000 time-steps before collection ofthe time-series data began This initial run-off period wasnecessary to remove the effects of the randomly generatedinitial condition Consecutive data points in the sequencesgenerated by the Mackey-Glass system were separated by10 units of time Data from the Mackey-Glass system wasmade more appropriate for processing by neural networksby mapping it into the interval [minus1 1] using the hyperbolic-tangent function tanh(119909) = (1198902119909minus1)(1198902119909+1) Network outputwas mapped back to the original range using the inverse oftanh(119909) for testing validation and analysis Each reservoirneuron used 119891(119909) = tanh(119909) as its transfer function and hadan internal state governed by the leaky integrator differentialequation [34] The output neuron used the linear transferfunction 119891(119909) = 119909 and did not have an internal state

The ESNs grown for the Mackey-Glass time-series con-sisted of a single input neuron a bias neuron a singleoutput neuron and 50 neurons in the reservoir The growthcones were permitted to establish connections from the inputneuron to the reservoir neurons from the bias neuron to thereservoir and from the reservoir back to the reservoir Addi-tionally each reservoir neuron had a permanent connectionto the output neuron because the weights on the reservoir-to-output connections were derived using linear regressionTheldquoEcho State Propertyrdquo which is a necessary condition on thereservoir dynamics for achieving good performance and inpast work has typically been attained by manually scaling thespectral radius of the reservoir weight matrix to be less thanone [34] was derived entirely through the network growthprocess

The input neuronrsquos set of neighbor neurons was the entirereservoir as was the case for the bias neuron Each reservoirneuronrsquos set of neighbor neurons consisted of 5 randomlyselected neurons in the reservoir The output neuron didnot have any neighbor neurons because it did not have anygrowing axons For each one of the 16 simultaneously growingnetworks each neuron contributed one positively weightedgrowth cone (119887 = 1 in (2)) and one negatively weightedgrowth cone (119887 = minus1) per neighbor neuron Each of thesepositive-negative growth cone pairs had the same singletarget neuron

On every time-step each growing networkwasmapped tothe static network represented by its current physical config-uration Before applying input to a network the internal stateandor output of each neuron was set to zero For each staticnetwork the weights on the connections from the reservoir


neurons to the output neuron were trained using teacher-forcing [39] with a sequence generated from the Mackey-Glass system that had 2100 data points The first 100 datapoints of this sequence were fed into a network prior to anytraining with the purpose of removing the effects of the initialstate of the reservoir The next 2000 data points were thenfed into the network and linear regression in the form of theleast squares method was performed between the resultingreservoir states (activities of the reservoir neurons) and thedesired network outputs The topology and weights of theconnections from the input neuron to the reservoir from thebias neuron to the reservoir and from the reservoir neuronsback to the reservoir were determined by the growth process

The performance of an ESN on the data used to trainthe output weights is typically not a good measure of thenetworkrsquos ability to generalize to new data [77] Thus duringthe growth process the generalization performance of eachstatic network was computed on every time-step and usedto update the fitness values of the growing networks Thegeneralization performance measure was the normalizedroot mean square error computed over a set of 84-steppredictions (NRMSE84) [34] Specifically on every time-step after training the output weights the NRMSE84 wascomputed for each static network on a group of 20 randomlyselected sequences from the Mackey-Glass system Each ofthese sequences consisted of 184 data points In order toprevent overgeneralization every 10 time-steps a new setof 20 sequences was randomly pulled from a pool of 200different sequences At the end of the growth process whichlasted for 3600 time-units the best positions of the growthcones in each growing network represent the best performingnetworks found over the course of the entire growth processThere is one best performing network per growing networkand it is instantiated by translating the best positions ofthe networkrsquos growth cones into the corresponding staticnetwork The performances of the best static networks werevalidated by computing the NRMSE84 of each network using100 new sequences each of a length of 2084 The bestperforming network on this validation data was taken as thesolution Towards the end of a growth process the growingnetworks tend to converge on a particular configuration andthis final validation step ensures that the solution network hasthe best generalization performance

For the SINOSA model 37 trials were run with collec-tive movements generated by canonical PSO and 38 trialswere run with random movements In the Mackey-Glassexperiments on average one epoch of growth (explainedin Section 31) of an ESN with 50 neurons in its reservoirrequires 2 hours of CPU time One epoch of growth of anESNwith 100 neurons in its reservoir requires 5 hours of CPUtime One epoch of growth of an ESN with 400 neurons in itsreservoir requires 3 days of CPU time The networks grownusing collective movements have a mean NRMSE84 that is68 smaller than those grown with randommovements with95 confidence

Once the growth process had finished (after 3600 time-units) the grown networks were further optimized by refiningthe search process and running it for an additional 3600time-units This refinement was implemented by continuing

Table 1 Mean NRMSE84 values on the Mackey-Glass time-seriesfor networks grown with the SINOSA model

Epoch Collective movements Random movements1 589 sdot 10minus3 plusmn 33 sdot 10minus4 184 sdot 10minus2 plusmn 8 sdot 10minus4

2 498 sdot 10minus3 plusmn 32 sdot 10minus4 148 sdot 10minus2 plusmn 7 sdot 10minus4

the growth (search) process with new growth cones thathad weight fields that were smaller in maximum magnitudeSpecifically for each connection in the best performing staticnetwork found during the first epoch except the connectionsfrom the reservoir to the output neuron a fixed connectionwith the same weight was created between the correspondingcells in the setCWhen a static networkwas instantiated froma growing network during the second epoch the weights onthe connections in the static network were the sum of theweight values contributed by the growth cones and the fixedconnections

The network growth process generated by the SINOSAmodel was extended by one epoch for a total of twoepochs of growth Table 1 compares the results obtained usingcollective movements with those obtained using randommovements when the growth process was extended Eachnumeric value represents the mean NRMSE84 and the 95confidence interval for the corresponding epoch of growthand class ofmovements It can be seen that for both collectiveand random movements there is a small but statisticallysignificant reduction in the mean NRMSE84 with each epochof growth Furthermore for each epoch the mean NRMSE84of the networks grown using collective movements is smallerthan that of the networks grown using randomly generatedmovements

Further evidence of the effectiveness of the SINOSAapproach can be gained through comparison with the studiespresented in [34 39] which represent state-of-the-art perfor-mance forMackey-Glass time-series prediction In [34] echostate networks with 400 neuron reservoirs were optimized toforecast the Mackey-Glass time-series (120591 = 170) The bestperforming of these networks which was hand-designed byan expert had an NRMSE84 of 28 sdot 10

minus4 In [39] using thesame parameter values for the Mackey-Glass time-series asused here an echo state networkwith a 1000-neuron reservoirwas hand-designed by an expert that had an NRMSE84 of63 sdot 10minus5 The SINOSA model was used to grow echo statenetworks with 400 neurons in their reservoirs to forecast theMackey-Glass time-series The grown networks produced anaverageNRMSE84 of 386sdot10

minus5 and the best of these networkshad an NRMSE84 of 273 sdot 10minus5 On average the grownnetworks with 400 neurons outperformed the best hand-designed 400-neuron ESN by about an order of magnitudeand they also performed better than the 1000-neuron ESNThese results provide strong evidence of the effectivenessof using the SINOSA model to grow echo state networksas opposed to the standard approach of optimizing themthrough trial and error

32 Double Pole Balancing Problem In all of the experi-ments that dealt with the double pole balancing problem


the parameters were set to the most commonly used values[75] as follows mass of the cart 119898

119888= 1 kg mass of the

1st pole 1198981 = 01 kg mass of the 2nd pole 1198982 = 001 kgcoefficient of friction between the cart and the track 120583

119888=

5 sdot 10minus4 Nsm coefficients of friction between the poles andtheir hinges 1205831 = 1205832 = 2 sdot 10minus6 Nms length of the 1st pole1198971 = 05m and length of the 2nd pole 1198972 = 005m Thecontrol force was restricted to the interval119865

119888isin [minus10N 10N]

The parameters defining the domain of successful controlwere set to 119909limit = 24m and 120579limit = 36∘ As is the casein most past work the equations governing the dynamicsof the system were solved numerically using a fourth-orderRunge-Kutta method with a step-size of 001 s During asimulation a portion of the state of the cart-pole systemwas given to a neural controller every 002 s at which pointthe control force was updated In the experiments presentedhere a neural controller was not given velocity informationas input rather it only received the current positions of thecart and two poles (119909 1205791 and 1205792) This was done in orderto make the task of controlling the cart more difficult Thesevalues were scaled to be in the interval [minus1 1] prior to beinginput into a neural controllerThis was done so that the valueswere in a range that was more appropriate for processingby neurons with hyperbolic-tangent transfer functions Thenetwork output (control signal) which was in the interval[minus1 1] was multiplied by 100N in order to produce thecontrol force Reservoir neurons and the output neuron usedthe hyperbolic-tangent function as their transfer functionNone of the neurons had an internal state

The SINOSAmodel was used to grow echo state networksas controllers for the double pole balancing problem Thesenetworks had three input neurons one for each type of infor-mation the network was given regarding the state of the cart-pole system (cart position position of pole 1 and position ofpole 2) The reservoir always consisted of 20 neurons Oneoutput neuron was present which produced the control sig-nal No bias neuronwas used due to the symmetry of the cart-pole system The growth cones were permitted to establishconnections from the input neurons to the reservoir and fromthe reservoir neurons back to the reservoir Additionally eachreservoir neuron had a permanent connection to the outputneuron The weights on the reservoir-to-output connectionswere fixed and drawn randomly with uniform probabilityfrom the interval [minus30 30] The network architecture wasotherwise identical to that of the Mackey-Glass network (seethird paragraph of Section 31) except that in this case eachreservoir neuron had only 4 neighbor neurons

During the growth process the performance of each staticnetworkwas computed on every time-stepThe function119891polewas evaluated to determine the performance of the echo statenetworks grown as controllers for the double pole balancingproblem and is given by

119891pole = 10minus4119899119868+ 09119891stable + 10

minus5119899119868119868+ 30

119899119878

625 (7)

Equation (7) was introduced in [75] and is based on per-formance (fitness) functions presented in past works on thedouble pole balancing problem To compute the first termin (7) the cart-pole system is set to the initial state (1205791(0) =

45∘ 1205791(0) = 1205792(0) = 1205792(0) = 119909(0) = (0) = 0) The networkis then allowed to control the system for up to 1000 time-steps The number of time-steps 119899

119868that the controller keeps

the cart and poles in the success domain (119909 isin [minus24m 24m]and 1205791 1205792 isin [minus36

∘

36∘]) is counted If the system leaves thesuccess domain at any time prior to time-step 1000 then thesimulation stopsThe second term is ameasure of the stabilityof the system during the last 100 time-steps while underneural network control and is expressed by the function

119891stable =

0 119899119868lt 100

075sum119899119868

119894=119899119868minus100 120588119894

119899119868ge 100

(8)

where

120588119894= (|119909 (119894)| + | (119894)| +

10038161003816100381610038161205791 (119894)1003816100381610038161003816 +100381610038161003816100381610038161205791 (119894)10038161003816100381610038161003816) (9)

The third and fourth terms are measures of a neural con-trollerrsquos ability to generalize If 119899

119868= 1000 after computing the

first term then the neural controller is allowed to control thesystem for up to additional 100000 time-steps The numberof additional time-steps 119899

119868119868that the controller keeps the

cart and poles in the success domain is counted and thesimulation stops if the system leaves the success domain or119899119868119868= 100 000 The fourth term is computed by putting

the cart-pole system in 625 different initial conditions andallowing the network to control it for up to 1000 time-stepsfrom each starting configuration The variable 119899

119878represents

the number of different initial conditions from which theneural controller was able to keep the system in the successdomain for 1000 consecutive time-steps The 625 uniqueinitial conditions are defined in [74]

On every time-step of the growth process each growingnetwork was mapped to the static network represented by itscurrent physical configuration so that its performance couldbe computed by evaluating (7) Before applying input to anetwork the output of each neuron was always set to zeroBefore a network was permitted to control the cart and polesthe dynamics of the cart-pole system were evolved for 02 sand the resulting sequences of 10 system states were inputinto the network The neural network growth process lastedfor 600 time-units after which the static network with thebest performance (largest value of 119891pole) was taken as thesolution

A total of 51 trials were run starting from differentrandomly generated initial conditions In the double polebalancing experiments 200 time-steps of growth requireapproximately 03 hours of CPU time 400 time-steps require14 hours and 600 time-steps require 33 hours Table 2compares the performance of networks grown using col-lective movements to the performance of networks grownusing random movements The comparison of performanceis made every 200 time-steps during the growth processEach of the numeric values in the tables is shown withits 95 confidence interval The values in Table 2 werecomputed as follows For each trial and at each of the threepredefined time-steps (200 400 and 600) two measures ofthe best performing network at that point in the growth


Table 2 Performance values on the double pole balancing problemfor networks grown with the SINOSA model

Time-step Collective MeasureII Random MeasureII200 0667 [0530 0780] 0026 [0005 0135]400 0961 [0868 0989] 0053 [0015 0173]600 10 [0930 10] 0053 [0015 0173]Time-step Collective Measure

119878Random Measure

119878

200 372 plusmn 29 10 plusmn 5400 462 plusmn 10 28 plusmn 15600 478 plusmn 7 41 plusmn 17

process were recorded The first measure was whether ornot the network succeeded in achieving 119899

119868119868= 100 000

when computing (7) The second measure was the value of119899119878 In Table 2 the term Measure

119868119868refers to the fraction of

best performing networks that achieved 119899119868119868= 100 000 The

term Measure119878refers to the average value of 119899

119878taken over

all of the best performing networks From these results itis clear that the networks grown with collective movementsvastly outperform those grown with randomly generatedmovements on both performances measures

A study that lends itself to comparison is presentedin [75] which represents state-of-the-art performance onthe double pole balancing problem In this case echo statenetworks were optimized as controllers for the double polebalancing problem via a state-of-the-art form of evolutionarystrategies that uses covariance matrix adaptation (CMA-ES)In this study CMA-ES was used to optimize the outputweights and the spectral radius of the reservoir weightmatrixThe experiments discussed in this section in which theSINOSA model was used to grow ESNs as controllers forthe double pole balancing problem adhered to the sameexperimental setup and methods used in [75] except thatin our study the grown neural controllers received only 10inputs from the cart-pole system prior to beginning controlinstead of 20 Because evaluating the fitnessperformanceof the networks during the optimization process is thecomputational bottleneck the number of such evaluationsduring an optimization run is a good measure of the overallcomputational cost of the process On average it required19796 evaluations for the CMA-ES approach to find a neuralcontroller capable of successfully controlling the cart for atleast 200 out of the 625 initial configurations (the average was224) and of these networks 914 were able to successfullycontrol the cart for the additional 100000 time-steps whenit was started in the standard initial configuration Theseresults are very good with respect to past work on the doublepole balancing problem The SINOSA model was able togrowmuch better performing neural controllers and at muchless computational expense After only 9600 evaluationson average the best performing grown networks were ableto successfully control the cart for 478 of the 625 initialconfigurations and of these networks 100 of them wereable to successfully control the cart for the additional 100000time-steps

4 Discussion

The SINOSA model incorporates an integrated representa-tion of a networkrsquos weights and topology The objects inthis representation are cells (neurons) axons and growthcones The cells have fixed positions but the growth conesare able to guide the cellsrsquo axons through a continuousthree-dimensional space producing a mature network withfixed connections and weights As a result of the integratedrepresentation it is possible to incorporate the simplestcanonical form of PSO into the model for the purpose ofsimultaneously optimizing network weights and topologiesIn effect the SINOSAmodel treats the network self-assemblyprocess as an optimization or search process in which thesimultaneous growth ofmultiple neural networks is driven bytheir interactions with one another and with problem relatednetwork input

The ability of the SINOSA model to optimize neuralnetworks for computational tasks was tested using two differ-ent very challenging and widely used benchmark problemsfrom the domains of time-series forecasting and control Foreach of the computational problems the echo state networksgrown using collectivemovements generated via PSOoutper-formed those grown using randomly generated movementsand in most circumstances the performance gap was verylarge Specifically compared to the networks grown withrandom movements those grown using the SINOSA modelwith collective movements performed 3 times better on theMackey-Glass time-series forecasting problem and 19 timesbetter and 12 times better on two generalization measures ofthe double pole balancing problem Furthermore the largeimprovements in network performance gained over randomsearch come at very little additional computational costbecause evaluation of network performance is the bottleneck

Comparison with control cases that involve randomsearch provides a base level of support for the optimizationcapabilities of the SINOSA model and demonstrates thefeasibility of functional self-assembly as a means of networkoptimization Further evidence of the effectiveness of themodel at optimizing networks can be found by comparingthe results presented here with studies that involve differentmethods of optimizing networks for the Mackey-Glass time-series forecasting problem and the double pole balancingproblem For example the 400-neuron echo state networksgrown using the SINOSA model perform nearly an orderof magnitude better than the best performing 400-neuronESN presented in [34] with the Mackey-Glass time-seriesFurthermore they even outperform the 1000-neuron ESNpresented in [39] by an average factor of 16 As a point offurther comparison the networks grown via the SINOSAapproach outperform those in [75] by an average factor of 21Moreover our ESNs were optimized with an average of 52less computational expense These results are also interestingin that they represent one of the comparatively small numberof studies where echo state networks have been successfullytrained as neural controllers using reinforcement learning

It is worth pointing out that the SINOSA model can becast in a more abstract representation by foregoing the self-assembly component Imagine we are optimizing a network


with 119872 possible weighted connections Then according to(2) there are two distinct one-dimensional Euclidean spacesassociated with each possible connection Furthermore thereis a unimodal function 119908

+(119903) that is nonnegative and sym-

metric defined over the first space and a function 119908minus(119903) =

minus119908+(119903) that is defined over the second space Each one

of these spaces would contain a set of particles (growthcones) that was restricted to move within it Only thoseparticles within a given space would interact according tothe PSO algorithm A network would be created based onthe positions of the particles in exactly the same mannerdescribed in Section 212We chose to integrate self-assemblyinto our model from a desire to illuminate the processes bywhich physical networks might optimize their own weightsand topology via self-assembly

5 Conclusions and Future Directions

The concurrent optimization of neural network weights andtopology is a challenging task due in large part to theroughness of the objective functions encountered when thesearch domain consists of both a continuousweight space anda discrete topology space Through the SINOSA model wehave demonstrated how network self-assembly can providea useful means of representing this search domain in that therepresentation simplifies the domain to a single continuoussearch space over which smoother objective functions canbe defined Furthermore by using swarm intelligence in theform of collective movements to drive the network growthprocess we were able to turn the self-assembly process intoan optimization process

The four primary contributions of our research are asfollows

(i) The SINOSA model constitutes a new and effectivemeans of simultaneously optimizing the weights andtopologies of neural networks The model was usedto grow echo state networks that performed substan-tially better on benchmark problems than networksoptimized via random search More importantly thegrown networks also outperformed the echo statenetworks presented in two different past studies onein which the networks were hand-designed by anexpert and the other in which they were optimizedusing a state-of-the-art form of evolutionary strate-gies (CMA-ES)

(ii) There has been little past work on using PSO for theconcurrent optimization of neural network weightsand topology The examples that do exist tend toinvolve fairly complicated adaptations of the methodsignificant constraints on permissible topologies orhybridizations with other classes of methods suchas evolutionary algorithms In contrast the SINOSAmodel uses the elegant canonical form of PSO togovern the growthoptimization process

(iii) In the vast majority of past work on PSO the par-ticles are embedded in a high dimensional abstractspace such as the domain of a function they arethe fundamental class of ldquoobjectsrdquo in the space and

the position of a particle represents a solution orsolution component to the problem being solved Incontrast the SINOSAmodel incorporates a novel wayof viewing PSO in which growth cones (particles) areembedded in a continuous three-dimensional spacethat is intended to model physical space and growingnetworks are the fundamental class of objects in thespace

(iv) Most past work on self-assembly has focused onthe classic self-assembly problem which entails thedesign of local controlmechanisms that enable a set ofcomponents to self-organize into a given target struc-ture The SINOSA model represents an extension ofthe classic self-assembly problem to functional self-assembly which includes the self-assembly of networkstructures with growth driven by optimality criteriadefined in terms of the quality or performance of theemerging structures as opposed to growth directedtowards assembling a prespecified target structure

There are a variety of potential future research directionsfor the SINOSA model Here we mention three possibilitiesFirst it would be useful to extend this work to allow thenumber of neurons in the physical space to be able toincrease or decrease during network assembly dependingon the computational requirements of the problem beingsolved Inspiration could likely be drawn from the fairly largenumber of past studies that involve dynamically modifyingthe number of nodes in a neural network Second furtherstudies are needed to determine towhat extent the parametersof the SINOSA model are problem dependent and whatvalues work well on a wide variety of different problemsLastly since its inception the canonical form of particleswarm optimization has undergone a vast array of adapta-tions and hybridizations Many of these enhancements couldbe incorporated into the SINOSA model without having toalter its fundamental constructs

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] G M Whitesides and B Grzybowski ldquoSelf-assembly at allscalesrdquo Science vol 295 no 5564 pp 2418ndash2421 2002

[2] A Engelbrecht Fundamentals of Computational Swarm Intelli-gence John Wiley amp Sons West Sussex UK 2005

[3] J Buhl D J T Sumpter I D Couzin et al ldquoFrom disorder toorder in marching locustsrdquo Science vol 312 no 5778 pp 1402ndash1406 2006

[4] A Cavagna and I Giardina ldquoLarge-scale behaviour in animalgroupsrdquo Behavioural Processes vol 84 no 3 pp 653ndash656 2010

[5] A Dussutour V Fourcassie D Heibing and J-L DeneubourgldquoOptimal traffic organization in ants under crowded condi-tionsrdquo Nature vol 428 no 6978 pp 70ndash73 2004


[6] D Helbing I Farkas and T Vicsek ldquoSimulating dynamicalfeatures of escape panicrdquo Nature vol 407 no 6803 pp 487ndash490 2000

[7] H Hildenbrandt C Carere and C K Hemelrijk ldquoSelf-organized aerial displays of thousands of starlings a modelrdquoBehavioral Ecology vol 21 no 6 pp 1349ndash1359 2010

[8] H Kunz and C K Hemelrijk ldquoArtificial fish schools collectiveeffects of school size body size and body formrdquo Artificial Lifevol 9 no 3 pp 237ndash253 2003

[9] C Lett and V Mirabet ldquoModelling the dynamics of animalgroups inmotionrdquo South African Journal of Science vol 104 no5-6 pp 192ndash198 2008

[10] RWinder and J A Reggia ldquoUsing distributed partialmemoriesto improve self-organizing collective movementsrdquo IEEE Trans-actions on Systems Man and Cybernetics Part B Cyberneticsvol 34 no 4 pp 1697ndash1707 2004

[11] S Bitam M Batouche and E-G Talbi ldquoA survey on beecolony algorithmsrdquo in Proceedings of the IEEE InternationalSymposium on Parallel and Distributed Processing Workshopsand Phd Forum (IPDPSW rsquo10) pp 1ndash8 IEEE New York NYUSA April 2010

[12] E Bonabeau M Dorigo and G Theraulaz Swarm IntelligenceFrom Natural to Artificial Systems Oxford University PressNew York NY USA 1999

[13] M Dorigo G di Caro and L M Gambardella ldquoAnt algorithmsfor discrete optimizationrdquo Artificial Life vol 5 no 2 pp 137ndash172 1999

[14] C J A B Filho F B D L Neto A J C C Lins A I SNascimento and M P Lima ldquoA novel search algorithm basedon fish school behaviorrdquo in Proceedings of the IEEE InternationalConference on Systems Man and Cybernetics (SMC rsquo08) pp2646ndash2651 IEEE New York NY USA October 2008

[15] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing vol 8no 1 pp 687ndash697 2008

[16] A Rodrıguez and J A Reggia ldquoExtending self-organizingparticle systems to problem solvingrdquo Artificial Life vol 10 no4 pp 379ndash395 2004

[17] X-S Yang ldquoMultiobjective firefly algorithm for continuousoptimizationrdquo Engineering with Computers vol 29 no 2 pp175ndash184 2013

[18] Z-L Gaing ldquoParticle swarm optimization to solving the eco-nomic dispatch considering the generator constraintsrdquo IEEETransactions on Power Systems vol 18 no 3 pp 1187ndash1195 2003

[19] M Omran A P Engelbrecht and A Salman ldquoParticle swarmoptimization method for image clusteringrdquo International Jour-nal of Pattern Recognition and Artificial Intelligence vol 19 no3 pp 297ndash322 2005

[20] Y Rahmat-Samii D Gies and J Robinson ldquoParticle swarmoptimization (PSO) a novel paradigm for antenna designsrdquoTheRadio Science Bulletin vol 305 pp 14ndash22 2003

[21] T Sousa A Silva and A Neves ldquoParticle swarm based datamining algorithms for classification tasksrdquo Parallel Computingvol 30 no 5-6 pp 767ndash783 2004

[22] S Kitagawa and Y Fukuyama ldquoComparison of particle swarmoptimizations for optimal operational planning of energyplantsrdquo in Proceedings of the IEEE swarm intelligence symposium(SISrsquo 03) pp 159ndash165 IEEE June 2005

[23] M Voss and J Howland III ldquoFinancial modelling using socialprogrammingrdquo in Proceedings of the IASTED InternationalConference on Financial Engineering amp Applications pp 1ndash10ACTA Press Calgary Canada 2003

[24] F van den Bergh ldquoParticle swarmweight initialization inmulti-layer perceptron artificial neural networksrdquo in Proceedings ofthe International Conference on Artificial Intelligence pp 42ndash45CSREA Press 1999

[25] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networksusing a hybrid PSO-EA algorithmrdquo in Proceedings of the IEEEInternational Joint Conference on Neural Networks (IJCNN rsquo04)pp 1647ndash1652 IEEE July 2004

[26] A Engelbrecht and A Ismail ldquoTraining product unit neuralnetworksrdquo Stability and Control Theory and Applications vol2 pp 59ndash74 1999

[27] H-W Ge Y-C Liang and M Marchese ldquoA modified particleswarm optimization-based dynamic recurrent neural networkfor identifying and controlling nonlinear systemsrdquo Computersand Structures vol 85 no 21-22 pp 1611ndash1622 2007

[28] C-F Juang ldquoA hybrid of genetic algorithm and particle swarmoptimization for recurrent network designrdquo IEEE Transactionson Systems Man and Cybernetics Part B Cybernetics vol 34no 2 pp 997ndash1006 2004

[29] SMirjalili S ZMohdHashim andHM Sardroudi ldquoTrainingfeedforward neural networks using hybrid particle swarm opti-mization and gravitational search algorithmrdquo Applied Mathe-matics and Computation vol 218 no 22 pp 11125ndash11137 2012

[30] J R Zhang T M Lok andM R Lyu ldquoA hybrid particle swarmoptimization-back-propagation algorithm for feedforwardneu-ral network trainingrdquo Applied Mathematics and Computationvol 185 no 2 pp 1026ndash1037 2007

[31] M Carvalho and T B Ludermir ldquoParticle swarm optimizationof neural network architectures and weightsrdquo in Proceedings ofthe 7th International Conference on Hybrid Intelligent Systems(HIS rsquo07) pp 336ndash339 IEEE Kaiserlautern Germany Septem-ber 2007

[32] S Kiranyaz T Ince A Yildirim andMGabbouj ldquoEvolutionaryartificial neural networks by multi-dimensional particle swarmoptimizationrdquo Neural Networks vol 22 no 10 pp 1448ndash14622009

[33] C Zhang and H Shao ldquoAn ANNrsquos evolved by a new evolution-ary system and its applicationrdquo in Proceedings of the 39th IEEEConfernce on Decision and Control vol 4 pp 3562ndash3563 IEEESydney Australia December 2000

[34] H Jaeger ldquoThe echo state approach to analysing and trainingrecurrent neural networksrdquo GMD Report 148 FraunhoferInstitute for Autonomous Intelligent Systems (AIS) 2001

[35] E A Antonelo B Schrauwen and D Stroobandt ldquoEvent detec-tion and localization for small mobile robots using reservoircomputingrdquo Neural Networks vol 21 no 6 pp 862ndash871 2008

[36] C Hartland N Bredeche and M Sebag ldquoMemory-enhancedevolutionary robotics the echo state network approachrdquo inProceedings of the IEEE Congress on Evolutionary Computation(CEC rsquo09) pp 2788ndash2795 IEEENewYorkNYUSAMay 2009

[37] G Holzmann and H Hauser ldquoEcho state networks with filterneurons and a delayampsum readoutrdquo Neural Networks vol 23no 2 pp 244ndash256 2010

[38] Y Itoh and M Adachi ldquoChaotic time series prediction bycombining echo-state networks and radial basis function net-worksrdquo in Proceedings of the IEEE 20th International Workshopon Machine Learning for Signal Processing (MLSP rsquo10) pp 238ndash243 New York NY USA September 2010

[39] H Jaeger and H Haas ldquoHarnessing nonlinearity Predictingchaotic systems and saving energy in wireless communicationrdquoScience vol 304 no 5667 pp 78ndash80 2004


[40] D Jing G K Venayagamoorthy and R G Harley ldquoHarmonicidentification using an echo state network for adaptive controlof an active filter in an electric shiprdquo in Proceedings of theInternational Joint Conference on Neural Networks (IJCNN rsquo09)pp 634ndash640 IEEE Atlanta Ga USA June 2009

[41] X Lin Z Yang and Y Song ldquoShort-term stock price predictionbased on echo state networksrdquo Expert Systemswith Applicationsvol 36 no 3 pp 7313ndash7317 2009

[42] M C Ozturk and J C Principe ldquoAn associative memoryreadout for ESNs with applications to dynamical pattern recog-nitionrdquo Neural Networks vol 20 no 3 pp 377ndash390 2007

[43] L Qin and B Lei ldquoDistributed multiagent for NAO robot jointposition control based on echo state networkrdquo MathematicalProblems in Engineering In press

[44] M H Tong A D Bickett E M Christiansen and G WCottrell ldquoLearning grammatical structure with echo state net-worksrdquo Neural Networks vol 20 no 3 pp 424ndash432 2007

[45] Y Xue L Yang and S Haykin ldquoDecoupled echo state networkswith lateral inhibitionrdquoNeural Networks vol 20 no 3 pp 365ndash376 2007

[46] D Arbuckle and A A G Requicha ldquoActive self-assemblyrdquo inProceedings of the IEEE International Conference onRobotics andAutomation (ICRA rsquo04) pp 896ndash901 May 2004

[47] A Grushin and J A Reggia ldquoStigmergic self-assembly of pre-specified artificial structures in a constrained and continuousenvironmentrdquo Integrated Computer-Aided Engineering vol 13no 4 pp 289ndash312 2006

[48] A Grushin and J A Reggia ldquoAutomated design of distributedcontrol rules for the self-assembly of pre-specified artificialstructuresrdquo Robotics and Autonomous Systems vol 56 no 4 pp334ndash359 2008

[49] EKlavins ldquoProgrammable self-assemblyrdquo IEEEControl SystemsMagazine vol 27 no 4 pp 43ndash56 2007

[50] C EMartin and J A Reggia ldquoSelf-assembly of neural networksviewed as swarm intelligencerdquo Swarm Intelligence vol 4 no 1pp 1ndash36 2010

[51] J Werfel and R Nagpal ldquoExtended stigmergy in collectiveconstructionrdquo IEEE Intelligent Systems vol 21 no 2 pp 20ndash282006

[52] J Bishop S Burden E Klavins et al ldquoProgrammable partsa demonstration of the grammatical approach to selforganiza-tionrdquo in Proceedings of the IEEE International Conference onIntelligent Robots and Systems (IROS rsquo05) pp 3684ndash3691 IEEENew York NY USA 2005

[53] R GroszligM Bonani F Mondada andM Dorigo ldquoAutonomousself-assembly in swarm-botsrdquo IEEE Transactions on Roboticsvol 22 no 6 pp 1115ndash1130 2006

[54] E Klavins R Ghrist and D Lipsky ldquoGraph grammars forself assembling robotic systemsrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo04) pp 5293ndash5300 IEEE New Orleans La USA AprilndashMay2004

[55] J Nembrini N Reeves E Poncet A Martinoli and A Win-field ldquoMascarillons Flying swarm intelligence for architecturalresearchrdquo in Proceedings of the IEEE swarm intelligence sympo-sium (SISrsquo 05) pp 233ndash240 IEEE June 2005

[56] M Rubenstein A Cornejo andRNagpal ldquoProgrammable self-assembly in a thousand-robot swarmrdquo Science vol 345 no6198 pp 795ndash799 2014

[57] P White V Zykov J Bongard and H Lipson ldquoThree dimen-sional stochastic reconfiguration of modular robotsrdquo in Pro-ceedings of Robotics Science and Systems pp 161ndash168MITPressCambridge Mass USA 2005

[58] DH Gracias J Tien T L Breen CHsu andGMWhitesidesldquoForming electrical networks in three dimensions by self-assemblyrdquo Science vol 289 no 5482 pp 1170ndash1172 2000

[59] D A Elizondo R Birkenhead M Gongora E Taillard andP Luyima ldquoAnalysis and test of efficient methods for buildingrecursive deterministic perceptron neural networksrdquo NeuralNetworks vol 20 no 10 pp 1095ndash1108 2007

[60] S Fahlman and C Lebiere ldquoThe cascade-correlation learningarchitecturerdquo in Advances in Neural Information ProcessingSystems II SD Touretzky Ed pp 524ndash532MorganKaufmannSan Francisco Calif USA 1990

[61] M Frean ldquoThe upstart algorithm a method for constructingand training feedforward neural networksrdquo Neural Computa-tion vol 2 no 2 pp 198ndash209 1990

[62] B Hassibi D G Stork and G J Wolff ldquoOptimal brain surgeonand general network pruningrdquo in Proceedings of the IEEEInternational Conference on Neural Networks (ICNN rsquo93) vol1 pp 293ndash299 IEEE April 1993

[63] M Marchand M Golea and P Rujan ldquoA convergence theoremfor sequential learning in two-layer perceptronsrdquo EurophysicsLetters vol 11 no 6 pp 487ndash492 1990

[64] J C Astor and C Adami ldquoA developmental model for theevolution of artificial neural networksrdquo Artificial Life vol 6 no3 pp 189ndash218 2000

[65] A Cangelosi D Parisi and S Nolfi ldquoCell division and migra-tion in a lsquogenotypersquo for neural networksrdquoNetwork Computationin Neural Systems vol 5 no 4 pp 497ndash515 1994

[66] J Chval ldquoEvolving artificial neural networks by means of evo-lutionary algorithms with L-systems based encodingrdquo ResearchReport Czech Technical University Prague Czech Republic2002

[67] P Eggenberger ldquoCreation of neural networks based on devel-opmental and evolutionary principlesrdquo in Proceedings of theInternational Conference on Artificial Neural Networks (ICANNrsquo97) W Gerstner A Germond M Hasler and J Nicoud Edspp 337ndash342 Springer Lausanne Switzerland October 1997

[68] F Gruau ldquoGenetic synthesis of modular neural networksrdquoin Proceedings of the 5th International Conference on GeneticAlgorithms (ICGA rsquo93) S Forest Ed pp 318ndash325 MorganKaufmann San Francisco Calif USA 1993

[69] J-Y Jung and J A Reggia ldquoEvolutionary design of neuralnetwork architectures using a descriptive encoding languagerdquoIEEE Transactions on Evolutionary Computation vol 10 no 6pp 676ndash688 2006

[70] H Kitano ldquoDesigning neural networks using genetic algo-rithms with graph generation systemrdquo Complex Systems vol 4pp 461ndash476 1990

[71] Y Chauvin and D Rumelhart Eds Backpropagation TheoryArchitectures and Applications Lawrence Erlbaum AssociatesHillsdale NJ USA 1995

[72] B A Whitehead and T D Choate ldquoCooperative-competitivegenetic evolution of radial basis function centers and widths fortime series predictionrdquo IEEE Transactions on Neural Networksvol 7 no 4 pp 869ndash880 1996

[73] L Zhao and Y Yang ldquoPSO-based single multiplicative neuronmodel for time series predictionrdquo Expert Systems with Applica-tions vol 36 no 2 pp 2805ndash2812 2009


[74] F Gruau D Whitley and L Pyeatt ldquoA comparison betweencellular encoding and direct encoding for genetic neural net-worksrdquo in Proceedings of the 1st Annual Conference on GeneticProgramming (GECCO rsquo96) pp 81ndash89 MIT Press CambridgeMass USA 1996

[75] F Jiang H Berry and M Schoenauer ldquoSupervised and evo-lutionary learning of echo state networksrdquo in Parallel ProblemSolving from NaturemdashPPSN X vol 5199 of Lecture Notes inComputer Science pp 215ndash224 Springer Berlin Germany2008

[76] A PWieland ldquoEvolving neural network controllers for unstablesystemsrdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo91) vol 2 pp 667ndash673 IEEE NewYork NY USA July 1991

[77] D Prokhorov ldquoEcho state networks appeal and challengesrdquo inProceedings of the IEEE International Joint Conference on NeuralNetworks (IJCNN rsquo05) vol 3 pp 1463ndash1466 IEEE New YorkNY USA July-August 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Distributed Sensor Networks


Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014


ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014


Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia


Biomedical Imaging


ArtificialNeural Systems

Advances in


RoboticsJournal of



Computational Intelligence and Neuroscience

Industrial EngineeringJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in



Input layer Reservoir Output layer

Figure 1 Schematic of an echo state network (ESN) The solidarrows represent connections with fixed weights and the dashedarrows represent connections with trainable weights Connectionsfrom the input layer to the output layer and from the output layerto the reservoir are optional As usual input to the network entersthrough the input layer and the networkrsquos output is generated in theoutput layerThe bias node and its connections with fixed weights tothe reservoir are not shown

in this paper does not adapt the number of nodes in thenetwork as some other algorithms do it optimizes overgeneral recurrent neural network topologies using one of themost basic forms of PSO which is a unique feature of ourapproach Furthermore the form of the underlying modelof network growth that we present here does not place anyconstraints on the type of PSO used to drive the optimizationand therefore the user may readily swap in whatever versionof PSO is deemed most suitable for the learning task athand Our methodrsquos incorporation of basic PSO would beparticularly advantageous in cases where the topology of aphysical network was being optimized

The work presented here involves the optimization ofa relatively recently developed class of recurrent neuralnetwork models known as echo state networks (ESNs) [34]Echo state networks have already been successfully applied toa wide range of different problems [35ndash45]They consist of aninput layer a hidden layer or ldquoreservoirrdquo and an output layer(shown in Figure 1) Typically each neuron in the input layerconnects to every neuron in the reservoir there is randomlygenerated sparse connectivity among reservoir neurons eachneuron in the reservoir connects to each neuron in the outputlayer a bias neuron may connect to the neurons in thereservoir and sometimes connections from the input layer tothe output layer and from the output layer to the reservoirare present The central innovation of the echo state networkapproach is that only the weights on the connections from thereservoir to the output neurons (output weights) are trainedand the activation functions of the output neurons are linearso all that is needed to train them is linear regression Theremaining weights are typically assigned random values Foran echo state network with 119873

119903reservoir neurons and 119873

119900

output neurons the output weights are trained as follows Asequence of training data of length 119871 + 119872 is chosen where119872 gt 119873

119903The first 119871 values of the sequence are passed through

the network in order to remove the effects of the initial stateof the reservoir Then the remaining119872 values are input intothe network and the resulting reservoir states 119890

119894isin R119873119903

for 119894 = 1 119872 are assigned to the rows of the matrixS isin R119872times119873119903 For each network input and resulting reservoirstate 119890

119894 there is a target network output 119889

119894The target network

outputs are assigned to the rows of the matrix D isin R119872times119873119900

such that the 119894th rows of S and D are the correspondingreservoir state and target output pair LetW isin R119873119903times119873119900 be theoutput weight matrix where the 119895th column ofW representsthe weights on the connections from the reservoir to the119895th output neuron Training the output weights amounts tofinding an approximate solution W

119886to the overdetermined

system

SW = D (1)

The output weights W119886are determined by solving (1) in a

ldquoleast squaresrdquo senseSelf-assembly involves the self-organization of discrete

components into a physical structure for example the growthof connections between nodes in a physical network Workin this area has traditionally focused on what we will refer toas the classic self-assembly problem which entails the designof local control mechanisms that enable a set of componentsto self-organize into a given target structure without indi-vidually preassigned component positions or central controlmechanisms Issues surrounding self-assembly have been avery active research area in swarm intelligence over the lastseveral years with recent work spanning computer simula-tions [46ndash51] physical robotics [52ndash57] and the modeling ofnatural systems [1 12] However to the best of our knowledgethere has been no work on using swarm intelligence methodsto extend the classic self-assembly problem to functional self-assembly in which components self-organize into a comput-ing structure that is optimized for a predefined computationalproblem

The research presented in this paper is concerned withthe self-assembly of neural network architectures Unlikemost past work on self-assembly a major aspect of thework presented here involves the growth of connectionsbetween discrete spatially separated nodescomponents [58]We recently demonstrated how swarm intelligence in theform of collective movements can increase the robustness ofnetwork self-assembly and enhance the ability to grow largetopologically complex neural networks [50] However thisearlier work focused on the classic self-assembly problemwhere a target network was prespecified which is in contrastto the more difficult problem of functional self-assembly thatwe consider here

Other related past works in computer science andengineering where research on artificial neural networksis concerned with application-related performance havelargely ignored the issues of neural network growth devel-opment and self-assembly with two exceptions First anumber of computational techniques have been created tooptimize neural network architectures by addingdeletingnodesconnections dynamically during learning [59ndash63]










2 Methods




119888119894






119894contributes


119894119895| = |G




If N119888119894





119895⟩




Let 119888119896isin N119888119894


each 119888119896 the cell 119888


119894119895119896


119894119895119896and 119892minus119894119895119896


119896 Based on






and 119892minus119894119895119896





119895is the




+

++

+

+

+

+

+

+

minus

minus

minus

minus

minusminus

minus

minus

minus

(a)

13

10

13

22

15

05

07

++

+

minus

minus

minus

(b)

04

035

minus025

minus065

(c)



119908 (119903) =

119886119903120572

+ 119887 if 119903 lt 1199030

0 if 119903 ge 1199030

(2)


120572


120572



119908+(119903) =

minus12119903 + 1 if 119903 lt 2

0 if 119903 ge 2(3)



119908minus(119903) =

12119903 minus 1 if 119903 lt 2

0 if 119903 ge 2(4)





minus(13) =


+(13) = 035 and it





V119894(119905 + 1) larr997888 120594 (V

119894(119905) + 119886

1199011 otimes (119887119890119904119905119894 minus 119903119894) + 1198861198992

otimes ( 119899119887119890119904119905119894minus 119903119894))

(5)


119894(119905)

is its velocity 119887119890119904119905119894




119901and 119886



119894(119905 + 1) larr 119903

119894(119905) + V

119894(119905 + 1)











cone 119892119904119894119895119896




Any growth cone 119892119904119894119895119896


119892119904

119894119895119896


119894ℓ119896is a neighbor of119892119904

119894119895119896 then119892119904

119894119895119896is a neighbor of119892119904

119894ℓ119896There is


119894119895119896and

119892119904



119895and snet

ℓ




119894119895119896isin



N119892119904

119894119895119896

sub 119892119904



119892119904

119894119895119896

= 119892119904

119894ℓ119896isin G119894ℓ| ℓ = 1 2 3 and ℓ = 119895



y

t

004

06

08

1

12

14

100 200 300 400 500 600 700 800 900 1000





119889119910

119889119905=120572119910 (119905 minus 120591)

1 + 119910 (119905 minus 120591)120573minus 120574119910 (119905) (6)







1205792

1205791

Fc

x



track



119901= 119886119899= 20120594 = 065 for




3 Results























119888=







minus5119899119868119868+ 30

119899119878

625 (7)





∘


119891stable =

0 119899119868lt 100

075sum119899119868

119894=119899119868minus100 120588119894

119899119868ge 100

(8)

where

120588119894= (|119909 (119894)| + | (119894)| +

10038161003816100381610038161205791 (119894)1003816100381610038161003816 +100381610038161003816100381610038161205791 (119894)10038161003816100381610038161003816) (9)







119878represents








119878



119868119868= 100 000





119878taken over



4 Discussion






















References
























































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in












2 Methods




119888119894






119894contributes


119894119895| = |G




If N119888119894





119895⟩




Let 119888119896isin N119888119894


each 119888119896 the cell 119888


119894119895119896


119894119895119896and 119892minus119894119895119896


119896 Based on






and 119892minus119894119895119896





119895is the




+

++

+

+

+

+

+

+

minus

minus

minus

minus

minusminus

minus

minus

minus

(a)

13

10

13

22

15

05

07

++

+

minus

minus

minus

(b)

04

035

minus025

minus065

(c)



119908 (119903) =

119886119903120572

+ 119887 if 119903 lt 1199030

0 if 119903 ge 1199030

(2)


120572


120572



119908+(119903) =

minus12119903 + 1 if 119903 lt 2

0 if 119903 ge 2(3)



119908minus(119903) =

12119903 minus 1 if 119903 lt 2

0 if 119903 ge 2(4)





minus(13) =


+(13) = 035 and it





V119894(119905 + 1) larr997888 120594 (V

119894(119905) + 119886

1199011 otimes (119887119890119904119905119894 minus 119903119894) + 1198861198992

otimes ( 119899119887119890119904119905119894minus 119903119894))

(5)


119894(119905)

is its velocity 119887119890119904119905119894




119901and 119886



119894(119905 + 1) larr 119903

119894(119905) + V

119894(119905 + 1)











cone 119892119904119894119895119896




Any growth cone 119892119904119894119895119896


119892119904

119894119895119896


119894ℓ119896is a neighbor of119892119904

119894119895119896 then119892119904

119894119895119896is a neighbor of119892119904

119894ℓ119896There is


119894119895119896and

119892119904



119895and snet

ℓ




119894119895119896isin



N119892119904

119894119895119896

sub 119892119904



119892119904

119894119895119896

= 119892119904

119894ℓ119896isin G119894ℓ| ℓ = 1 2 3 and ℓ = 119895



y

t

004

06

08

1

12

14

100 200 300 400 500 600 700 800 900 1000





119889119910

119889119905=120572119910 (119905 minus 120591)

1 + 119910 (119905 minus 120591)120573minus 120574119910 (119905) (6)







1205792

1205791

Fc

x



track



119901= 119886119899= 20120594 = 065 for




3 Results























119888=







minus5119899119868119868+ 30

119899119878

625 (7)





∘


119891stable =

0 119899119868lt 100

075sum119899119868

119894=119899119868minus100 120588119894

119899119868ge 100

(8)

where

120588119894= (|119909 (119894)| + | (119894)| +

10038161003816100381610038161205791 (119894)1003816100381610038161003816 +100381610038161003816100381610038161205791 (119894)10038161003816100381610038161003816) (9)







119878represents








119878



119868119868= 100 000





119878taken over



4 Discussion






















References
























































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in




+

++

+

+

+

+

+

+

minus

minus

minus

minus

minusminus

minus

minus

minus

(a)

13

10

13

22

15

05

07

++

+

minus

minus

minus

(b)

04

035

minus025

minus065

(c)



119908 (119903) =

119886119903120572

+ 119887 if 119903 lt 1199030

0 if 119903 ge 1199030

(2)


120572


120572



119908+(119903) =

minus12119903 + 1 if 119903 lt 2

0 if 119903 ge 2(3)



119908minus(119903) =

12119903 minus 1 if 119903 lt 2

0 if 119903 ge 2(4)





minus(13) =


+(13) = 035 and it





V119894(119905 + 1) larr997888 120594 (V

119894(119905) + 119886

1199011 otimes (119887119890119904119905119894 minus 119903119894) + 1198861198992

otimes ( 119899119887119890119904119905119894minus 119903119894))

(5)


119894(119905)

is its velocity 119887119890119904119905119894




119901and 119886



119894(119905 + 1) larr 119903

119894(119905) + V

119894(119905 + 1)











cone 119892119904119894119895119896




Any growth cone 119892119904119894119895119896


119892119904

119894119895119896


119894ℓ119896is a neighbor of119892119904

119894119895119896 then119892119904

119894119895119896is a neighbor of119892119904

119894ℓ119896There is


119894119895119896and

119892119904



119895and snet

ℓ




119894119895119896isin



N119892119904

119894119895119896

sub 119892119904



119892119904

119894119895119896

= 119892119904

119894ℓ119896isin G119894ℓ| ℓ = 1 2 3 and ℓ = 119895



y

t

004

06

08

1

12

14

100 200 300 400 500 600 700 800 900 1000





119889119910

119889119905=120572119910 (119905 minus 120591)

1 + 119910 (119905 minus 120591)120573minus 120574119910 (119905) (6)







1205792

1205791

Fc

x



track



119901= 119886119899= 20120594 = 065 for




3 Results























119888=







minus5119899119868119868+ 30

119899119878

625 (7)





∘


119891stable =

0 119899119868lt 100

075sum119899119868

119894=119899119868minus100 120588119894

119899119868ge 100

(8)

where

120588119894= (|119909 (119894)| + | (119894)| +

10038161003816100381610038161205791 (119894)1003816100381610038161003816 +100381610038161003816100381610038161205791 (119894)10038161003816100381610038161003816) (9)







119878represents








119878



119868119868= 100 000





119878taken over



4 Discussion






















References
























































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in





119908minus(119903) =

12119903 minus 1 if 119903 lt 2

0 if 119903 ge 2(4)





minus(13) =


+(13) = 035 and it





V119894(119905 + 1) larr997888 120594 (V

119894(119905) + 119886

1199011 otimes (119887119890119904119905119894 minus 119903119894) + 1198861198992

otimes ( 119899119887119890119904119905119894minus 119903119894))

(5)


119894(119905)

is its velocity 119887119890119904119905119894




119901and 119886



119894(119905 + 1) larr 119903

119894(119905) + V

119894(119905 + 1)











cone 119892119904119894119895119896




Any growth cone 119892119904119894119895119896


119892119904

119894119895119896


119894ℓ119896is a neighbor of119892119904

119894119895119896 then119892119904

119894119895119896is a neighbor of119892119904

119894ℓ119896There is


119894119895119896and

119892119904



119895and snet

ℓ




119894119895119896isin



N119892119904

119894119895119896

sub 119892119904



119892119904

119894119895119896

= 119892119904

119894ℓ119896isin G119894ℓ| ℓ = 1 2 3 and ℓ = 119895



y

t

004

06

08

1

12

14

100 200 300 400 500 600 700 800 900 1000





119889119910

119889119905=120572119910 (119905 minus 120591)

1 + 119910 (119905 minus 120591)120573minus 120574119910 (119905) (6)







1205792

1205791

Fc

x



track



119901= 119886119899= 20120594 = 065 for




3 Results























119888=







minus5119899119868119868+ 30

119899119878

625 (7)





∘


119891stable =

0 119899119868lt 100

075sum119899119868

119894=119899119868minus100 120588119894

119899119868ge 100

(8)

where

120588119894= (|119909 (119894)| + | (119894)| +

10038161003816100381610038161205791 (119894)1003816100381610038161003816 +100381610038161003816100381610038161205791 (119894)10038161003816100381610038161003816) (9)







119878represents








119878



119868119868= 100 000





119878taken over



4 Discussion






















References
























































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in





Any growth cone 119892119904119894119895119896


119892119904

119894119895119896


119894ℓ119896is a neighbor of119892119904

119894119895119896 then119892119904

119894119895119896is a neighbor of119892119904

119894ℓ119896There is


119894119895119896and

119892119904



119895and snet

ℓ




119894119895119896isin



N119892119904

119894119895119896

sub 119892119904



119892119904

119894119895119896

= 119892119904

119894ℓ119896isin G119894ℓ| ℓ = 1 2 3 and ℓ = 119895



y

t

004

06

08

1

12

14

100 200 300 400 500 600 700 800 900 1000





119889119910

119889119905=120572119910 (119905 minus 120591)

1 + 119910 (119905 minus 120591)120573minus 120574119910 (119905) (6)







1205792

1205791

Fc

x



track



119901= 119886119899= 20120594 = 065 for




3 Results























119888=







minus5119899119868119868+ 30

119899119878

625 (7)





∘


119891stable =

0 119899119868lt 100

075sum119899119868

119894=119899119868minus100 120588119894

119899119868ge 100

(8)

where

120588119894= (|119909 (119894)| + | (119894)| +

10038161003816100381610038161205791 (119894)1003816100381610038161003816 +100381610038161003816100381610038161205791 (119894)10038161003816100381610038161003816) (9)







119878represents








119878



119868119868= 100 000





119878taken over



4 Discussion






















References
























































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in




1205792

1205791

Fc

x



track



119901= 119886119899= 20120594 = 065 for




3 Results























119888=







minus5119899119868119868+ 30

119899119878

625 (7)





∘


119891stable =

0 119899119868lt 100

075sum119899119868

119894=119899119868minus100 120588119894

119899119868ge 100

(8)

where

120588119894= (|119909 (119894)| + | (119894)| +

10038161003816100381610038161205791 (119894)1003816100381610038161003816 +100381610038161003816100381610038161205791 (119894)10038161003816100381610038161003816) (9)







119878represents








119878



119868119868= 100 000





119878taken over



4 Discussion






















References
























































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in





















119888=







minus5119899119868119868+ 30

119899119878

625 (7)





∘


119891stable =

0 119899119868lt 100

075sum119899119868

119894=119899119868minus100 120588119894

119899119868ge 100

(8)

where

120588119894= (|119909 (119894)| + | (119894)| +

10038161003816100381610038161205791 (119894)1003816100381610038161003816 +100381610038161003816100381610038161205791 (119894)10038161003816100381610038161003816) (9)







119878represents








119878



119868119868= 100 000





119878taken over



4 Discussion






















References
























































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in







119878



119868119868= 100 000





119878taken over



4 Discussion






















References
























































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in




















References
























































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in





















































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in



research article fusing swarm intelligence and self...

Documents