ntic · 11. supplementary notes n"tic 1 2a. c1strib1jtioniavajlabilfty statement 12b....

AD-A264 802 ENTATION PAGE Form Approved

i T 0MB No. 07041-0 188!lli II iI I Ii 11 I -SIIeg~te: n1111" I~IIIII II~ 111 I III lii as 1 hour Per response. Including me0 time for re,10wt:; teat~cn . l.~ ~~rg er~slN daa rore rae~"otrma ton Sendcomments regardingthis burden evsýtra:e r a.y cmher aspe cis cic1tn0 Ir I a n inaci,' g)lrectorate tor Informaltorr Operallons and Re pols. 12:5 j2iefscr Cv'n s -'wy. Suile 12C4. MA$1on, VA 22•-22 43C2.(0704-O1881, Wa~sfhlltgonr DC• 20503

1 AGENCY USE ONLY (Leave blanIl 2. REPORT DATE 3 REPORT TYPE AND DATES COVERED

March 1993 Professiunal Paper

4. TITLE ANO SUBTITLE 5 FUNDI3M3 UMBERIS

EVOLVING NEURAL NETWORK ARCHITECTURE

6 AUMHOR(S) In Hlouse Funding

J. R. McDonnell, D. Waagen

7 PERFORMING ORGANIZATION NAMEJS) AND ADDRESSýES) 5 P;II FCRMING ORGANIZAT43NPEPORT NUMBER

Naval Command, Control and Ocean Surveillance Center (NCCOSC)RDT&E DivisionSan Diego, CA 92152-5001

9- SPON.ZORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10 SPONtCORiG,GVONrTOC1Nr,

AGENCY R.EPORT NUMBEA

Naval Command, Control and Ocean Surveillance Center (NCCOSC)RDT&E DivisionSan Diego, CA 92152-5001

11. SUPPLEMENTARY NOTES

n"TIC1 2a. C1STRIB1JTIONIAVAJLABILFTY STATEMENT 12b. DISTIIIBUTCON CODE 10 0'

Approved for public release; distribution is unlimited.

13 ABSTRACT (Maximum 200 w4ords)

This work investigates the application of a stochastic search technique, evolutionary programming, for developing self-organizing neural networks. The chosen stochastic search method is capable of simultaneously evolving both network archi-tecture and weights. The number of synapses and neurons are incorporated into an objective function so that networkparameter optimization is done with respect to computational costs as well as mean pattern error. Experiments are con-ducted using feedforward networks for simple binary mapping problems.

9 3-1135 4

Published in Proceedings, SPIE 92 International Symposium Neural and Stochastic Methods in Image and SignalProcessing, Vol 1766.

14 SUBJE.C TLPMS If NUMEI3OF PAGFS

neural networks mieain pattern error

stochastic search tech niluje In L RIcE co.k

1 ; [U 'tY i(.1A>, r" q"CA/tU l1ON 1-1 . (" T 1'> AA 22] iAGNl J' AUFIACO• ~ ~ ~ ~ ~ ' t!:•©•fr itf ,, r A',i 0 iA21 ;•AW C t lý

(INCILASSIFtED UIN(CL.ASSIFIED IJNI'NLASSIl1,"t1) SAME iVS RE:PORT

'I" ?Ot a".•( (l €•15 (;(tl I, ri , U '• 21101 '•

UNCLASSIFIED

21 & NAME OF RESPONSIBLE INDIVIDUAL 21b TEILEPHONE f"•j.deA,,ea Cc,'.) 21c OFFICE SYM•XX

J. R. McDonnell (619) 553-5762 Code 731

1 N7IASSIV I El) D

Accesion For

NTIS CRA,?,l

DTIC 7AF3 r

Evolving neural network architectureBy

John R. McDonnell and Don Waagen t -

NCCOSC, RDT&E Div. Availab::iy &

San Diego, CA 92152-5000 Distdi j orDist Spca

ABSTRACT W9-

This work investigates the application of a stochas'tic search technique, evolutionary progranning,, jor deelop.nrK' svei/organizing neural networks. 7he chosen stochastic search method is capable of .sinuh.ico.o v o!t.r, io'th tolh .architecture and weightr. The number of synapses and neurons are incorporated into an objective fiinction so tihat networkparameter optimization is done with respect to computational costs as well as mean pattern error. Experiments arc conductedusing feedforward networks for simple binary mapping problems.

1. INTRODUCTION

The neural network design process is largely heuristic. The designer's previous experience (or the work of other researchers)oftentimes dictates an initial network configuration for the problem at hand. A" the network can be trained to 3,h111"% th.-

designer's goals, then the design process is terminated. It success is not attained, then a testing phase which i.,, largels trialand error ensues. The end result can often be; network with excess parameters and little regard for computational costs.

This work attempts to incorporate the computatioi.al costs associated with neural network configurations into the opt imi zationprocedure. Potential benefits of a network which has an optimized architecture include increased throughput for rea-l-timesignal processing applications as well as decrea5ed memory requirements. Determining both network parameter', andstructure simultaneously requires a search procedure which is amenable to combinatorial optimization tasks. The morcsuccessful algorithms for these types of problems have generally been stochastic such as simulated annealine, gcncti:algorithms and simulated evolution. The simulated evolution', or evolutionary programming (EP), paradigm htas been KMho'A

to have the desired attributes: combinatorial optimization capabilities-, the ability to train neural networks' a, vcll vSdetermine model structure'.

Many investigations have been undertaken in an effort to obtain desired I/O mappings with variable configuration networks.Ash' developed the dynamic node creation (DNC) algorithm which created new nodes in the hidden layer when the trainingerror rate fell below an arbitrarily chosen critical value. Hirose et al.' use the same approach for node creation, but alsoremove nodes when small error values are attained. Dow and Sietsma1 '8 have developed post-processing algorithmis whichremove or prune excess nodes from network hidden layers. These algorithms are implemented as a two stage process. Thefirst stage removes nodes which have :onstant otupui. )or arc complimentaiyv t other n des The se, oid staic res ,ts i -

that are independent of the other nodes within the layer and whose output is not required to achieve a solution.

Li9 has developed a generalization of the backpriopagation algorithm which minimizes both th,. total ,Y-tem ct,(oi and the-aormt of the weights- In simulation, this approach yielded null portions of the weight space providing an implicit meirrs otnetwork optimization (weights and structure). Simulation results also resulted in nearly invariant weight sets regardless itthe initial weight values whereas straight backpropagation diid not. BIailey"0 generates neural network sructture by evaluatingsynapse importance and system error (synapse eflc4 tivencss) criteria. If both synapse effectiveness and importance heco:me

low then it becomes eligible for removal. Tenorio and 1lee" have applied simulated annealing to a group method dat.ihandling (GM DII) scheme to obtain a self-organ izmug new al network (SONN) A hierarchical minimum dcscrlpton lCIngth(MDL) function called the structure estimation criteria is used as the oblective tunction. The resulting network i•' sp;u••s,connected as each node. is arhitrarily limmted to two inputs Since a low order Kolmo gorovG(.Gahor polyrinal app~rOxxiniatiois implemented for the node ativalloill flnclr) 'i, more welt, is ihlan connections e.ist1 I lett rt "I W] rlisct, 3 Coticie, Ovkc\ W'.

oif other work on necural networik ,utow.tru, mun;vd pirinur a'rrthrm'.11T

The premise of this work is that tire des:igner can postulate- anobjective function which, if' optimized, will yield desirable resultsfor the task at hand. Further, the approach used in this Study seeks

to take advantage of computational resources duri'ng thedesign/training phase thereby re-moving the burden of evaluation bytrial-and-error from the designer. For purposes of discussion, Fig.I illustrates the structure of a hypothetically evolved neural network.The connectivity and number of nodes in the hidden layer are /* ~determined via a multi-agent stochastic search technique. The 1 iurinrUapplication of EP to this task is inve-stigated in three distinct phases. (Iddo,Initially, the EP training algorithm for neural networks given in the tnext section' is generali1zed so that connectivity between nodes can berandomly -elected. In the next phase of the study, the number ofhidden units 'is P'. random variable which is incorporated into thestochastic search. The final phase integrates methods employed inthe previous phases into a single EP search algorithm to achieve a oilt jmlnetwork swuch Ls that vsbiýwn in Fig. 1.

The evolutionary progranuming paradigm is outlined in the next Figure 1. A4 hypotheticaiN- evolved networksection along with applications of the EP search technique to fully- structure with, variable htidde n units andconnected feedforward neural networks. Subsequent sections con nectivitv.investigate evolving the connectivity structure of the network and 'henumber of nodes in the hidden layer(s). These methods are thencombined to yield self-organizing neural networks.

2. APPLY[NG EVOLUTIONARY PROGRAMMING TO NEURAL NETS

2. 1. Evolutionary Programming

Evolutionary programming is a neo-Darwinian search paradigm first suggested by Fo 'i -el et a!. [ 1his -,to'hash, c.ear~hmethod is typicall applie~d as a global optimizer. EP has been successfully applied to a var-IeIti t optiml,'lztoln prt)WL,11wincluding the trave.ling salesman problem, parameter estimation and system ID, and neural net trairriIne.

The EP optimization algorithm for locating extrema of any general function can be described by the followine tp

1. Form an initial population P,,, (x) oýf size 2N. The paraneters r associated with parent rlcine'm P, arc random/v-initialized from a user specified search domain.

2. Assign a fitness score S,(x) to each element P,(x) in the population.3. Reorder the population based on the number of wins generated from a1 StocIatiC COMI)Ctiti~'n puc4. GenerCate rW/%-pring~ (P,... P2,,) of the highestv ranked N elements ( .P, : en the /' ý'Uhl?,11.M

5. Loop to step 2.

The generality of this algorithmi lends power to it~s miplem~rentation be-sides providing a systteJilait Illf-al's 0 '~to wlua.t 'k-atc,,he user is not bound to any particular coding structure nor mutation strategy. A\s previtoisly siated. I-T is niwd In (1he,investigation since it is well suited for simultaneously evolving model sti-lcture and IpaiawitctS

2.2. Training Neural Networks with I'P

2.2. L Determining Network Weights

Sinceu IT is a general purpose optimizaition p~rocedure,. it call be readily ulsed to[ de1e 1u ItC ne I .I T IeI)twl I T1In~ tI I

strengths. [1C hjekohctiv ILunctio)n of the 1:11 trainingv appro~ach hr hilly tonnecAted Owdav trt~tK. ~ h 11 .I.. thjtuIsedJ In hackpiopagation inuinniror the. siitols,(11airt-I crrol Ion-( (il F 1A 1.' ' (r ,1) N "roo1i o!t

E by the number of training patterns presented. This section applies the algorithm given in the previous section to rineuralnetworks and then shows results for sample training tuns using simple binary mappings. It should be noted thal the examnplc(sgiven are by no means statistically significant and are primarily shown for exemplary purposes.

Each fully-connected feedforward network 4, in the population P of networks may be represented by the weight array

"4x = w[if[layerl/fromnode][ionode]

For this application, the initial population was instantiated with weights chosen from a U(-0.5, 0.5/ distribution. Next, acost is assigned to each network in the configuration. The N members of the population generate offspring (perturbed weio.'htsets) according the equation

W/, = W" + 6W,

where typically 6WP e N(O, SE'E) with a ,scaling coefficient S,. The scaling factor is a probabilistic analog to the stepsizeused in gradient descent methods and may also be treated as a random variable within the EP search strategy. The varianceof the weight perturbations is bound by the total system error in this application. To emulate the probabilistic nature ofsurvival, a pairwise competition is held where individual elements compete against randomly chosen members of thepopulation. For example, if network 4,j is randomly selected to compete against network 4,', then a win is awarded tonetwork 4,, if E, < E,. Another approach would be to let the probability of network 4,P winning be Ej/(E +E). The Nnetworks with the most wins are then used to generate offspring and the process is repeated.

Figures 2-4 shows the effect of varying different parameters in training a 2-2-! neural net for the XOR mapping. ig. 2shows the best mean sum-squared pattern error for various scaling factors from a population with 10 parent networks- Asshown, modifying the scaling factor effects the convergence rate of the training. The smaller scaling factors tend to producesmoother curves (smaller changes) while the larger scaling factors yield more dramatic rtýjlts due to larger step sizes. Thenumber of networks over which the search is conducted is generally user defined. However, the population size need notbe a static parameter as it can increase or decrease over the generations of the search. Training results with a 5x incre~a.sein population size is shown in Fig. 3 for a scaling factor of S, = 100. Fig. 4 shows successful training is possible .v!!h arandomly chosen scaling factor for S, E U[0,1000] and N - 10. The training run shown did not converge as fast as otherruns, but it did produce a smoother scaling factor sequence.

0.12 -_

0.10 L.L-- -

i 'N

008 SF - 1 001O

,b•0.06 , ,Uii

004

002

0000 10 0 4) %) O f 1

Fgr 2. E n 2,MIforN

[figure 2. EP' training of a 2-2-I XOR mapping network for variott scaling factors, A I0

0 1

004I •50 N=q10

002 ) k2

0 10 20 30 40 50 60 70 80 9 0 0 Q 20 5M 40 t7.

GENERATION

Figure 3. EP training of XOR mapping network Figure 4. EP training of XOR mapping network withwith different population sizes, S, = 100. variable S,.

2.2.2. Hard-limited Activations

Besides yielding only locally optimal solutions, it is necessary that activation functions be continuously differentiable whengradient descent methods are employed for training multi-layer perceptrons (MLPs. The continuity condition is nit arequirement when stochastic methods such as simulated annealing"3 , genetic algorithoLt4 or EP are used. An interestingmethod has been developed by Winter and Widrow15 for training MLPs with bipolar activation functions. This method,termed the Madaline Rule 11, consists of applying the minimal disturbance principle to weights associated with individualADALINES. If better re-suits are obtained, then the new weight values ale kept; otherwise, the new weights are ignored.If the training process exhausts trials involving a single ADALINE, pairwise (or higher) adaptations are attempted.

The 3-bit parity problem has been chosen as an example for training MLPs with hard-limited activation functions using EP.Fig. 5 compares training results when sigmoidal and binary activation functions and are used in a 3-3-1 netwerk. As shown,the discrete system error states which result when using a binary activation function are quite pronounced. The concept ofa different activation function for different neurons can also be easily employed within an EP search. Each differentactivation function could be treated as a random variable and be discretely indexed within a neuron structure.

2.2.3. Combining Backpropagation and EP

It is possible to do a local and global search in parallel. Such a strategy ha.s been investigated by Waagen et al. ' in thcdevelopment of the Stochastic Direction Set (SDS) method. This method maintains the integrity of searching for a solutionset of parameters without explicitly requiring the analytical function or empirically determined derivatives. The SDS methoduses an augmented (with orthogonal decomposition) I looke-Jeeves te,.chnique for the local search while EP is used for tindingthe global solution neighborhood. The variance of the stochastic step size is held constant over the duration of the seoarchrwo offspring are generated for each base point: one locally and onen globally- A local offsprinl. replaces its parent it itachieve.,s a better fitness score, 1PP and backpropagation training can be iiplemented together in a similar parallel tash"hoiFig. 6 shows a serial implementation wherein a 3-3-1 network is first trained with FP and then retined using backpiopagationfor th, 3 -bit parity mapping. This sample ni was generated with W0 parent nctork's and a ýcahnl.i' ac'tor IS 3.? lie'ha, I,: gr,,pa t't in parameters inchidi d a stepsi.e ot ý 0.9 arld a ii& Ilientuit ni t Iocfitciilt ll of t 2

012 0

binary octivaton HFP TRAINING

o o06 -. ---

0.04 HI P TRAINING

0,02 -4-

0.00 1- . . . . ; • • , ,

0 50 100 150 200 250 01000 2000 3000 4000 5000 6000 7000 8000 9000 10000CENERATION GENERATION/EPOCH

Figure 5. EP training for 3-bit parity mapping with Figure 6. A serial EP-BP training session for the 3-bitboth sigmoidal and binary activation functions. parity mapping.

3. EVOLVING CONNECTIVITY

This portion of the investigation addresses reducing the number of synapses between the neurons. Typically, work in thisarea incorporates a function of the weight magnitudes in the objective function. Hertz et al. ", for example, discussoptimizing an objective function J described by

2

J = E + WIJ|+ wd?

where E is the mean sum-squared pattern error. Weigend et al."7 implement a similar modification with the weightsnormalized by an arbitrary parameter w,

ii 1 4 (wý/I)'

"This study investigates modifying the object function according to

J = +

whereby the cost associated with the number of connections, N,_,,_, between neurons is scaled and incorporated into thecost function. For these studies a value ofO fi 0.0001 was used with a = 1. A heuristic which might be employed wouldbe to letfl a cEY/N,,N thereby incorporating the desired training error and the maximum number of connections possibleover the specified domain to reasonably weight the cost associated with the number of connections.

.1--- WSE-

50.06

.4°5

0.02 L 5 -

0 to 00 00400 500 600 700 800 900 1WOOGENERATION;

Figure 7. Evolving connectivity for the XOR mapping. Figure 8. The final evolved connectivity for the XORThe final architecture is shown to the right, mapping network.

To implement this capability, a connectivity array has been utilized where c[i][layer]/fromnode][tonode] l- I if aconnection exists and 0 if no connection is present. At the start of the run, the connectivity array is set to 1 giving a fully-connected feedforward network. The designer must specify the number of neurons over which the search is conducted. Thisdetermines the maximum number of connections. From the range of connections, a synapse is chosen at random andmodified based on its current state. It is connected if it has been disconnected or disconnected if it is currently connected.Essentially, the bit is flipped. The number of connections which may be affected at each mutation is arbitrarily set by thedesigner or determined in a random fashion. The connection array is incorporated in the neuron input dot product termthereby nulling any signals between disconnected neurons. As before, weights are (:ontinually modified in the event th:at .ý1

neuron pair is reconnected.

The results of evolving connectivity using a net'xork with 8 hidden units for the XOR mapping is shown in Figures 7 and8. After 1000 generations, the network with the lowest cost function has 12 connections as shown in Fig. 8. Although thestructure can be reduced to replicate a fully-connected 2-2-I feedforward architecture, it must be done through post-processing.

In an effort to place greater emphasis on signal propagation through the network, an alternraive strategy has been devulopedthough not yet implemented. This strategy assigns a probability of connection from N"v, neuron k in layerj, to subsequentlayers contingent on inputs to all the neurons in level j

0, W

This technique assumes a random connection process exists between the input layer and tirst hldden layer. Poor soluhati(such as exce.ss emphasis on a single node) will not he sustained unless the fitness value i.s competitive with other nme1borOf the population.

4. EVOLVING iHIDDEN NODES

The standard backpropagation approach to training neural networks entails specifying the nunlber of layers and the numberof nodes in each layer. This portion of the investigation allows the number of nodes in the hidden layer(s) to be tre-atd aw,a random variable. The EP training technique given in section 2.2. 1 is still used although other training incth•, (dNlh ;etbackpropagation) may be applicable.

The method outlined in section 2.2.1 is slightly moditied to allow for a variable number if hidden node-s. Thet maximumnumber of hidden nodes specified in the program is the domain over which the search is conducted. This tlhas the ()!bV)\lousshortcoming that it is still possible to under estimate the nwrnher of nodes which may be required to solve i 1)artliularproblem. This portion of the investigation maintains a fully-connected feed-forward arIlluiectue.+ Te cost tuntn t\•,.dis given by-

where ct = I andy = 0.01. A scaling factor of-y=0.01 allows smaller networks with larger mean sum-squared pattern errorterms to be selected. At each weight perturbation, the number of hidden nodes is randomly selected from a user ,pccifieddomain.

Fig. 9 shows the results for the XOR mapping problem with varying hidden units. The maximum number of hidden unit.,;possible was arbitrarily set at 20. After 200 generations the network has evolved a 2-2-1 structure as well as an appropriateset of weight parameters. At the end of 500 generations i meian sum-squared pattLer) o'fr •, 00)O0 1, attan•;•'I 1(1

shows the results for the XOR mapping problem if the search domain is increased to 50 hidd-n units. In this case, a 2-2-1structure is also evolved simultaneously with the weight parameters. At 500 generations a mean sum-squired patte'rn errorof 0.0001 is achieved. Both sets of experiments were conducted with SF = 100 and 10 parents.

014 0 14

UBM IIVE F'CN 1OIJi.'rTV- jill012 ? i......iMSE 0 12 -M .2 LHIDDEN UNITS :6 HIDDEN t.1;

0.10 1 .. .... 1 to

o 1o •to

0 04 0

0 002 o

.............................. 'I ,,('...............-..-............ -

+; 5S.fblh ATIO)N .;• p, i;TIiSSq;

Figure 9. Evolving hidden units for the XOR mapping. Figure 10. Evolving hidden (nits for the XOR mapping.The search was conducted over 20 possible hidden unitf.. Vhe search was conducted over 50 possible hidden units.

5. EX'OLVIN(; sTRUC'l'1RI*: A SFLF-OR(,ANIZIN(1 NI. I VNORK

A nicety of using the FiP search tochniqjUe i.s that additional tcrtrm can lie int lrpt 'r:td lin 1ý the -tb.es tun e lIr twIthv.. t ii

requiring a maJor progiraninting moth tication_ An example of thIs feature is; the iiiu ii dienivn ýnal path p)Lanning ! ik4, d'lle

by Page et al."~. Wall and Kriplani'" have incorporated training timeW Into their oltn\. et Iurn ii,

J A -Ix Traininghtn 1m ('x ('ux

where A, R and C' are constants. The Cost tern is at fuinction of the( network sizc and nuiiiifer it we~wi,'i> and h"

Cost (

An investigation very similar to this work has been conducted by lBornholdt and Graudeoi)". A mnajor ditfirene be~t%%eenthe methods is that [3ornholdt and 'Iraudenz used genetic algonithmis tor tevolva;,g the i.rkstruciutur. Fu-.1hi. Othenetworks implemented in the referenced work can have a-synchronous updates and rec-.irrcnt structure In the hiddenlsriBornholdt and Graudenz define a dilution ratio D = For example, the- dilution ratio for the netwurk Qohiwnin Fig. 8 is D =0.33. This result does not have the small dilution ratios found using rect'rren, hidden structurt- For thr:simple XOR mapping, it does not take as mrany generations to train since the stability i~sses assoetated with rctu rreritstructure does not effeýct the system.

Currently, the feed- fo rva rd structure is maintained and the ohloetivo tunc-tion in dit,-4I -a .:

with the number of conne~ctionis and number of neurons

J = cE + ONr~clv -fN det

Initially, the weighting coefficients were kept the same: ce-- 1, f3 -0.00M, -Y -- 0.0]. Tlw riithidsý used in seclions i id A.were combined and te~sted oin (he XOR mapping. Fig. I shows, the training re~sult< tor a hild-n cri dYc neomn ,I >0 idiunits. After 2500 g enerat ions, the network has evolved to a 2-2-1 structu re li wsi-the liieCut -. Ui -squared itt C:1-!

has not been sufficiently reduced as shown in Fig. I 1(b). 'Ih is usý diie to thet miannrc itihi< to -,1 pfi -gin areeu-r,-. j

new offspring contains a new. structure its well as per~tubed 'Weii'lttse bn:-iesi ,Cf Mii h! ext lii t ps

accomplished for any euveni structure.

The adaptation problem became mom apparent when the, samie approaOhWa taken -010 tik ith bit panltv nia.ppi-q I'

overcome this limitation, two offspring are generated for each parent. The first ottspruig isý ienerated with a pertul bedweight set and new structure. The second offspring is generated with a perturbed weigtht set and thie same stnuctuie, &, it,parent. Results for the 3-bit parity mapping problem are shown in Fig. 12. This experiment was conducted with xeili:hiinlecoefficients of at= ,Ji= 0,0001, -y=O.00I. The hidden layer domain consisted of 25 hidden units. After 500 generations.the mean sum-squared pattern error has been sufficiently reduced for 11 hidden uninds as shown in FieýI.hlt F'irthiioptimization occurs by reducing the iiumbe~r of connections as shown in iigIe I 2(.t-11W 11i rol n 0s ioi '--a

13. Obviously, not all of the remaining hidden units arc corineetesi anti tho bias red io lir1'1anbeel niialted ~- t

reduce the node count. The statistics at the end of 5WXX generations ', ~,~.~ -tic PI- 0i.00h_', 1

0.01285 2 and a dilution ratio of'F) 0 15.

This modified approach was applied to the XOR mapping problemn with if() part'rt i1,01' I.- I- r nest ik- rha 1,11 crci

connected from the inset, this search techn ique qu icklIy 1t11und a fullys-.ti ic e 1 -.litii i i in'aiid rV ei p hi ai-1t r itt i

occurred (over subse~quent generations. To make the problemi more kchat len cit oc , It- Tw(-i ýe k vja -. i tiiad' ithi t "-

probability of any given connection being present. The resulting structume is hii iln Pu t I.: and th- tt ýto, daa I.,shown in Fig, 15, This experiment was conducted with weighting, ctieflcicient> it it 1./ 00-1 0, 1 V) I~ '% I ih. luidllen

layer domain consisted itt 50 hidden unit.,, [he statistics at the end of tKA tilt i eie ut-'its an-. N % -

E - 0. 000000, J 0.0f1 23(X) and a dilution ratio of D) 0. 11

(a) ('.

01110o, It 340004

3W 50 " Pop 0010 0 9 4 f 211 7DO Z 300 "0 50 0000 "a 0 12M 40& 001 3140 "M4 ý'00 500

(b) (d)

0.94 2

0.10 4

004

0.04 4

-~ 4

Figure 11. Evolving weights and structure for the XOR mapping: (a) cost function, (b) mean sum-squared pattern error, (c) number of connections, and (d) the number of neurons.

.05

0.04 8 .

t.6W IA0 12

oil

IQ 4O "N*V50." 5WW

Figure 12. Evolving weights and structure for 3-hit parity: (a) objective furctitmn, (b) the inean sityn.squared pattern error, (c) number of connections and (d) number 01 neuron \.

,Aý,

7,

-- ~' iput

Input ~

- II I II l I

S\10

Figure 13. The final evolved architecture for Figure 14, Vie final evolved architecture for the XORthe 3-bit parity mapping problem. mapping problem.

71_7

030

' ',-. • .. .....'.4

a 0 Igo 00 300 000 So lOG 700 N00 00 low 20 lea M

(b) (d)

toi

70 0000 0023 'o

Figure 15. Evolving weighis and structure for the XOR mapping. (a) cost function, (b) mean sum.squared pattern error, (c) number of connecrtions, and (d) the number of neurons.

6. CONCLUSIONS

Experiments have shown that evolutionary programming can be used for simultaneously determining both network

architecture and parameters. These results indicate that the EP search paradigm has high potential for automating the design

of neural networks. It has been demonstrated that the general architecture shown in Fig. I can he achieved for simple binarymapping problems. However, some applications may be hindered by the excess memory requirements needed by multi-agent

searches"2 or the training time necessary to obtain adequate convergence. If the design process for a specific problem is not

capable of being fully automated, then this technique may prove useful in providing a de-signer with insight to the task at hand

as well as a reasonable starting point for further investigations.

Caution should be usd in trying to obtain minimal size networks. As Sietsina and Dow?-' point out, pruning nrivcorks

reduces their generalization capabilities if noise is present. Experience has that shown that adding a slight amount of noi'c

to the training data will result in more robust classifiers. Since only binary mapping problemrs were investigated, it v not

clear how the approach given in this .•tudy will work on classification or continuous mapping problems. This remains an

area of further research.

Nevertheless, stochastic training techniques are becoming p.evalent in neurocomputiig (especially in hardwareimplementations2"). Further research should allow for a more general connectivity structure including intcrconnections amocngthe hidden units as well as connections between input and output neurons. Since model structure is continually being

modified, it is incumbent that an information ciiteria I be addressed either expi,,.'tly or implicit in •he objective function (e.g.)

J -• uE N ,, IV, -

providing a viable approach for automated model selection based not only on error and computation cost, but on theinformation content as well.

REFERENCES

1. L. J. Fogel, A. J. Owens and M. 1. Walsh, Artififial Intelligence thr6ough Simulated Evolution, John Wiley and Soln,.1906.

2. D. B. Fogel, "An evolutionary approach to the traveling salesman problem", Biological Cybernetics, Vol. 60, No. 2.1988.

3. D.B. Fogel, L.J. Fogel and V.W. Porto, "Evolving neural networks", Biological Cybernetics, Vol. 63, 19Q0.

4. D. B. Fogel, System Identification through Simulated Evolution: a Machine Learning Approachi to Modeling, Gitin Pesx;.Neelham, MA., 1991.

5. T. Ash, "Dynamic node creation in backpropagation net-orks", Int. Joint Con-f on Neural Networks, San D)ieto. 19SM

6. Y. I|irose, K. Yamashita, and S. Ilijiya, "Back-propagation algorithm which varies the number of hidden units", NcaralNetworkr, Vol. 4, 1991.

7. J. Sietsma and R. J. F. Dow, "Neural net pruning: why and how%, IEFE Wt. Conf. on Neural Netwrks, San D)wL•,1988.

8. J_ Sietsma and RTF.1:, l)ow, "Creating artificial neural networks that gencrahite", N'ural Nr'ttvrkv, Vol 4,. 1001

9. S. Li, "An optimized backpropagation algorithm with minimum norm weights", Int. Joint Conof on Neuial Nctwok-,,San Diego, 1990.

10. A.W. Bailey, "Automatic evolution of neural net architectures", Wn. Joint Conf. on Neural Networks, Wa.shington, ).C.,1990.

11. M.F. Tenorio and W.T. Lee, "Self-organizing network for optimum supervi.s-ed learning', IEJ:E Trans. ,N NeuralNetworks, Vol. 1, No. 1, March 1990.

12. J. Hertz, A. Krogh, and R. G. Palmer, Introduction to tbe Theory of Neural Computation, Addison-Wesley, 1991.

13, E. Aarts and J. Korst, Simulated Annealing and Bohzman machines: a Stochastic Approach to CormbinatorialOptimization and Neural Computing, John Wiley & Sons, 1989.

14, D. H. Ackley, "Stochastic iterated genetic hill-climbing', CMU-CS-87-107, Ph. D. dissertation, Computer ScienceDept., Carnegie Mellon University, Pittsburgh, PA, 1987.

15, R. Winter and B. Widrow, "Madaline rule II: a training algorithm for neural networks", Int. Joint Ccnf. on NeuralNetworks, 1988.

16. D. Waagen, P. Diercks, and J. R. McDonnell, "The stochastic direction set algorithm: a hybrid technique for findingfunction extrema". First Annual Conf. on Evolutionary Programming. San Dico. IQ<

17. A.S. Weigend, D.E. Rumelhart, and B.A. Huberman, "Back-propagation, weight elimination !nd time seriesprediction", Connectionist Models: Proceedings of the 1990 Summer School, D. S. Touretzky Jed. ], Morgan Kaufman, 1991.

18. W. C. Page, J. R. McDonnell, and 1. L. Andersen. "An evolutionary programming approach to multi-dimensional pathplanning", First Annual Conf. on Evolutionary Programming, San Diego, 1992.

19. B. W. Wah and It. Kriplani, "Resource constrained design of artificial neural networkF", int int ('Cnt )n NeiuralNetworks, San Diego, 1990.

20. S. Bornholdt and D. Grauidenz. General asymmetric neural networksnand qtnicture d.sgin by e'?.cti,.: l',,rilhms. D17SY91-046, ISSN 0418-9833. Deutche-s Elektonen-Synchrotron, Hamburg, May 1991.

21. B. W. Lee and B. J. Shen, "Analysis and design of analog VLSI neural networks", in Neural Nttworksfor SignalProcessing, B. Kosko (Ed.), Prentice-Hall, 1992.

ntic · 11. supplementary notes n"tic 1 2a. c1strib1jtioniavajlabilfty statement 12b....

Documents