neural navigation approach for intelligent autonomous vehicles (iav) in partially structured...

P1: ABL

Applied Intelligence KL574-02-Chohra April 11, 1998 12:2

Applied Intelligence 8, 219–233 (1998)c© 1998 Kluwer Academic Publishers. Manufactured in The Netherlands.

Neural Navigation Approach for Intelligent Autonomous Vehicles (IAV)in Partially Structured Environments

A. CHOHRA AND C. BENMEHREZCentre de Developpement des Technologies Avancees, Laboratoire de Robotique et d’Intelligence Artificielle,

128, Chemin Mohamed Gacem El-Madania, BP 245 El-Madania, 16075, Algiers, [email protected]

A. FARAHEcole Nationale Polytechnique, Laboratoire Techniques Digitales et Systemes, 10, Avenue Hassen Badi

El-Harrach, Algiers, Algeria

Abstract. The use of Neural Networks (NN) is necessary to bring the behavior of Intelligent Autonomous Ve-hicles (IAV) near the human one in recognition, learning, decision-making, and action. First, current navigationapproaches based on NN are discussed. Indeed, these current approaches remedy insufficiencies of classical ap-proaches related toreal-time, autonomy, andintelligence. Second, aneuralnavigation approach essentially basedon pattern classification to acquire target localization and obstacle avoidance behaviors is suggested. This approachmust provide vehicles with capability, aftersupervised Gradient Backpropagationlearning, to recognize both six(06) target location and thirty (30) obstacle avoidance situations using NN1 and NN2Classifiers, respectively.Afterwards, the decision-making and action consist of two association stages, carried out byreinforcement TrialandError learning, and their coordination using a NN3. Then, NN3 allows to decide among five (05) actions (movetowards 30◦, move towards 60◦, move towards 90◦, move towards 120◦, and move towards 150◦). Third, simulationresults which display the ability of theneuralapproach to provide IAV with capability to intelligently navigate inpartially structured environments are presented. Finally, a discussion dealing with the suggested approach and howit relates to some other works is given.

Keywords: Intelligent Autonomous Vehicles, navigation, target localization, obstacle avoidance, partially struc-tured environments, neural networks, supervised gradient backpropagation learning, reinforcement trial and errorlearning

1. Introduction

The primary goal of Intelligent Autonomous Vehicles(IAV) is related to the theory and applications of roboticsystems capable of some degree ofself-sufficiency. Thefocus is on the ability to move and beself-sufficientin partially structured environments. IAV have manyapplications in a large variety of domains, from spa-tial exploration to handling material, and from militarytasks to the handicaped help. The recent developments

in autonomy requirements, intelligent components,multi-robot systems, and massively parallel computershave made the IAV very used in particular in planetaryexplorations, mine industry, and highways [7, 14, 17,18, 21]. Thus, IAV designers search to create dynamicsystems able to navigate and achieve intelligent behav-iors like human in real dynamic environments whereconditions are laborious. However, the environmentcomplexity is a specific problem to solve since theseenvironments can be imprecise, vast, dynamical, and

P1: ABL


220 Chohra, Farah and Benmehrez

partially or not structured. IAV must then be able tounderstand the structure of these environ ments.

To reach their targets without collisions with possi-bly encountered obstacles, IAV must have the capabil-ity to achieve target localization and obstacle avoidancebehaviors. More, current IAV requirements with regardto these behaviors arereal-time, autonomyand intel-ligence. Thus, to acquire these behaviors while an-swering IAV requirements, IAV must be endowed withrecognition, learning, decision-making, and action ca-pabilities. To achieve this goal, classical approachesrapidly have been replaced by current approaches inparticular the Neural Networks (NN) based approaches[2, 11, 16, 23]. Indeed, the aim of NN is to bringthe machine behavior near the human one in recogni-tion, learning, decision-making, and action [1, 12, 13,19, 25].

The aim of this work is to develop aneural nav-igation approach able to provide IAV withmoreautonomy, intelligence, and real-time processing ca-pabilities. In this paper, at first current NN basednavigation approaches which remedy insufficienciesof classical navigation approaches related toreal-time,autonomy, and intelligenceare discussed. Second, aneural navigation approach essentially based on pat-tern classification (or recognition) [24] to acquire tar-get localization and obstacle avoidance behaviors issuggested. This approach has been developed in [6]for three (03) possible movements of vehicles, whilein the suggested approach this number is increased tofive (05) possible movements. Third, simulation re-sults of theneuralnavigation approach are presented.Finally, a discussion dealing with the suggested ap-proach and how it relates to some other works isgiven.

2. Neural Networks BasedNavigation Approaches

In classical navigation approaches, it has often beenassumed that accurate a priori knowledge about the ve-hicle, its sensors, and moreover its environment is avail-able. This has limited the kind of environments suchvehicles can operate in, and consequently the kind oftasks such vehicles can solve. Indeed, these limitationsarise mainly from difficulties for human designers toprovide accurate models of vehicles and their environ-ments. Even if such knowledge is available, making itcomputer-accessible, i.e., hand-coding explicit models,

has often been found to require unreasonable amountsof programming time. By another way computationaltractability turned out to be a severe obstacle for de-signing control structures for IAV in complex environ-ments [23].

Recent research on IAV has pointed out a promisingdirection for future research in mobile robotics wherereal-time, autonomyand intelligence have receivedconsiderably more attention than, for instance, opti-mality and completeness. Many navigation approacheshave dropped the assumption that perfect environmentknowledge is available. They have also dropped the ex-plicit knowledge representation for an implicit onebased on acquisition of intelligent behaviors that enablethe vehicle to interact effectively with its environment[5]. Consequently, IAV are facing with less predictableand more complex environments, they have to orientthemselves, explore their environments autonomously,recover from failures, and perform whole familiesof tasks in real-time. More, if vehicles lack initialknowledge about themselves and their environments,learning becomes then inevitable. Learning refers to avariety of algorithms that are characterized by theirability to replace missing or incorrect environmentknowledge by experimentation, observation, and gen-eralization. In order to reach a goal, learning vehiclesrely on the interaction with their environment to extractinformation [23].

NN based navigation approaches answering theseIAV requirements and based on acquisition of intel-ligent behaviors exploit NN features such as implicitknowledge representation, learning and generalizationfrom experience (from examples), robustness relatedto the fault-tolerance with respect to defective sensorsor noisy sensor data, and massively parallel process-ing (real-time). Thus, the approach developed in [2]is based on aneural control system which providesMobile Adaptive Visual Navigator (MAVIN) with thecapability to be adaptive to its environment and to learnfrom experience. This system includes NN for earlyvisual perception, eye motion control, pattern learningand object recognition, object associations and delayedexpectation learning, emotional states, behavioral ac-tions, and associative switching. This system has beensuccessfully implemented on MAVIN. Also, NN areused to control a tracking behavior of an autonomousvehicle in [3], while they are used to control target lo-calization and obstacle avoidance behaviors in an envi-ronment with dynamically changing obstacles in [11].In addition, NEURO-NAV system developed in [16]

P1: ABL


Neural Navigation 221

for indoor vehicles is based on NN which are trained tointerpret visual information and perform navigationalbehaviors such as hallway following and landmark de-tection.

Globally, theseneural navigation approaches areessentially based on the manner with which humansreason, establish their own decisions, and take the ap-propriate action.

3. Neural Navigation Approach

To navigate in partially structured environments, IAVmust reach their targets without collisions with possiblyencountered obstacles, i.e., they must have the capa-bility to achieve target localization and obstacle avoid-ance behaviors. In this approach, these two behaviorsare acquired bysupervised Gradient Backpropagationlearning usingneuralpattern classifiers. Target local-ization is based on NN1Classifierwhich must recog-nize six (06) target location situations, after learning,from data obtained by computing distance and ori-entation of vehicles-target using a temperature fieldstrategy. Obstacle avoidance is based on NN2Clas-sifierwhich must recognize thirty (30) obstacle avoid-ance situations, after learning, from ultrasonic sensordata giving vehicle-obstacle distances. Afterwards, thedecision-making and action consist of two associationstages, carried out byreinforcement Trialand Errorlearning, and their coordination using a NN3 allowingthen to decide the appropriate action.

3.1. Vehicles and Sensors

Vehicles. The movements of vehicles are possible infive (05) directions and consequently five (05) possibleactionsAi (i = 1, . . . ,5) are defined as action to movetowards 30◦, 60◦, 90◦, 120◦, and 150◦, respectively, asshown in Fig. 1. They are expressed by the actionvectorA = [ A1, . . . , Ai , . . . , A5].

Sensors. The perception system is essentially basedon Polaroid ultrasonic transducers: each transducer isapproximately 1.5 inches in diameter yielding a 3 dBfull angle beam width of 50 kHz at approximately 15degrees. The range of each transducer is of 0.15–10.5meters. Thus, to detect possibly encountered obstacles,five (05) UltraSonic sensors (US) are necessary to getdistances (vehicle-obstacle) covering the area from 15◦

to 165◦: US1 from 15◦ to 45◦, US2 from 45◦ to 75◦,

Figure 1. Vehicle and its sensors.

US3 from 75◦ to 105◦, US4 from 105◦ to 135◦, and US5from 135◦ to 165◦, as shown in Fig. 1.

3.2. Partially Structured Environments

Target Location Situations. To localize and reachtargets, the temperature field strategy is used leadingto model the vehicle environment in six (06) areas cor-responding to all target locations called target locationsituations as shown in Fig. 2. These situations are de-fined with six (06) ClassesT1, . . . , Tj 1, . . . , T6 where( j1 = 1, . . . ,6):

If 15◦ ≤ γ < 45◦ (ClassT1),

If 45◦ ≤ γ < 75◦ (ClassT2),

If 75◦ ≤ γ < 105◦ (ClassT3),(1)

If 105◦ ≤ γ < 135◦ (ClassT4),

If 135◦ ≤ γ < 165◦ (ClassT5),

If 165◦ ≤ γ < 375◦ (ClassT6),

whereγ is the angle of the target direction.Thus, in each step, a temperature field is defined in

the vehicle environment with a temperature field vectorXT = [XT1, . . . , XTi , . . . , XT5] whereXTi componentsare computed from temperaturest30, t60, t90, t120, t150inthe neighborhood of the vehicle defined in Section 3.3.The maximum component ofXT , i.e., the maximumtemperature of this field corresponds to the situationTj 1 where the target is localized.

The task of the vehicle is then to recognize frominput vectorXT in which situationTj 1 finds its targetto make its way towards this maximum, i.e., in thedirection corresponding to this situation.

P1: ABL



Figure 2. Target location situationsT = [T1, . . . , Tj 1, . . . , T6].

Obstacles Avoidance Situations.The uncertainty ofthe real world knowledge is the main problem of IAV[20]. This uncertainty is due to the fact that the environ-ment representation is based essentially on the vehicleperception systems. It must be taken into account inthe data representation. In the suggested approach, theperception system uses ultrasonic sensors to detect ob-stacles necessary for the obstacle avoidance. Indeed,the problem of correctly evaluating noisy and incor-rect data for the interpretation of ultrasonic sensor sig-nals is often an encountered one [3]. This problemis taken into account in this navigation approach byNN with their inherent characteristics of adaptivity andhigh fault-tolerance with respect to defective sensors ornoisy sensor data.

By another way, the interest in mobile robotics es-pecially in partially structured environments such as:factories, passenger stations, harbours, and airports ledat first to a description of possibly encountered obsta-cles in such environments. Second, this interest hasled to an observation of human obstacle avoidance be-havior in such environments. Indeed, human perceivesthe obstacle avoidance situations as topological situa-tions: corridors, rooms, right turns, etc. Thus, tryingto capture the human obstacle avoidance behavior, it isnecessary to structure such environments in topologicalsituations.

a) Description of Possibly Encountered Obstacles.Partially structured environments are dynamic withstatic, intelligent dynamic, and non-intelligent dy-namic obstacles.

In reality, static obstacles, e.g., Obs1, Obs2, Obs3,and Obs4 in Fig. 3(a), where Veh: Vehicle, Obs:

Obstacle, and Tar: Target, of different shapes rep-resent walls, pillars, machines, desks, tables, chairs,etc.

The intelligent dynamic obstacles (e.g., Veh1 withregard to Veh2 and conversely in Fig. 3(a)) representin reality IAV controlled by the same suggested neu-ral navigation approach where each one considers theothers as obstacles.

The non-intelligent dynamic obstacles, oscillatinghorizontally (e.g., Obs5 in Fig. 3(a)) or vertically (e.g.,Obs6 in Fig. 3(a)) between two fixed points, repre-sent in reality preprogrammed, teleguided or guidedvehicles. This type of obstacles has neither perceptionsystem, nor obstacle avoidance approach.

b) Possibly Encountered Obstacles Structured inTopological Situations. The possible movements ofvehicles lead us to structure possibly encountered ob-stacles in thirty (30) topological situations called ob-stacle avoidance situations as shown in Fig. 3(b).These situations are defined with thirty (30) ClassesO1, . . . ,Oj 2, . . . ,O30 where ( j 2 = 1, . . . ,30). InFig. 3(b), the directions shown, in each situation, corre-spond to those where obstacles exist. To give an explicitrepresentation of situations shown in Fig. 3(b), the ex-ample of obstacle avoidance situationO23 is given inFig. 3(c).

Thus, in each step, a distance vectorXO= [XO1, . . . ,

XOi , . . . , XO5] where XOi components are computedfrom vehicle-obstacle distancesd30, d60, d90, d120, d150

in the neighborhood of the vehicle defined in Sec-tion 3.3.

The task of the vehicle is then to recognize from inputvectorXO in which obstacle avoidance situationOj 2 itfinds itself to avoid possibly encountered obstacles.

3.3. Neural Navigation System

During the navigation, each vehicle must build an im-plicit internal map (i.e., target, obstacles, and freespaces) allowing recognition of both target location andobstacle avoidance situations. Then, it decides the ap-propriate action from two association stages and theircoordination [6, 22]. To achieve this, the neural navi-gation system presented below is used where the onlyknown data are initial and final (i.e., target) positionsof the vehicle.

System Structure. The system structure allowing todevelop the suggested approach is built of three phasesas shown in Fig. 4. During the phase 1 the vehicle

P1: ABL



(a) (b)

(c)

Figure 3. Partially structured environments: (a) Possibly encountered obstacles, (b) obstacle avoidance situationsO = [O1, . . . ,Oj 2, . . . ,O30]where directions shown, in each situation, correspond to those where obstacles exist, and (c) the obstacle avoidance situations shown in (b)become clear through the explicit representation of the obstacle avoidance situationO23 given in this figure.

P1: ABL



Figure 4. Neural navigation system synopsis.

learns to recognize target location situationsTj 1 usingNN1 Classifierwhile it learns to recognize obstacleavoidance situationsOj 2 using NN2Classifierduringthe phase 2. The phase 3 decides an actionAi from twoassociation stages and their coordination using NN3.

Supervised Gradient Backpropagation Learning.The suggested approach must classify different situ-ation types of both target location and obstacle avoid-ance situations using multilayer feedforward networksNN1 and NN2 respectively. This classification is basedon thesupervised Gradient Backpropagationlearningalgorithm [1, 10, 12, 13, 15, 19, 24] presented below.Generally, for a simple classification, as in the sug-gested approach for both NN1 and NN2, three layersare sufficient (input layer, one hidden layer, and outputlayer) as shown in Fig. 5: architecture of both NN1 andNN2.

Input Layer. This layer is the input layer withi inputnodes receiving the components of the input vectorX.

Figure 5. Architecture of both NN1Classifier(target location sit-uations) whereXi = XTi (i = 1, . . . ,5),Yk (k = 1, . . . ,5),Cj =Tj 1 ( j = j 1= 1, . . . ,6) and NN2Classifier(obstacle avoidance sit-uations) whereXi = XOi (i = 1, . . . ,5),Yk (k = 1, . . . ,15),Cj =Oj 2 ( j = j 2= 1, . . . ,30).

This layer transmits these inputs to all nodes of the nextlayer.

Hidden Layer. This layer is the hidden layer withkhidden nodes. The output of each node is obtained usingthe output sigmo¨ıd function f as follows:

netk =∑

i

Xi W2ki , (2)

Yk = f (netk), (3)

where

f (x) = 1

1+ exp(−x). (4)

Output Layer. This layer is the output layer withjsigmoıdal output nodes which are obtained by:

netj =∑

k

YkW1 jk, (5)

Cj = f (netj ). (6)

The steps in the usedsupervised Gradient Backprop-agationlearning algorithm are outlined as follows:

1. Random weight initialization [−1,+1].2. Apply an input vectorX to the input layer.3. Compute netk and outputsYk of the hidden layer.4. Compute netj and outputsCj of the output layer.5. Compute the errorδ j for the outputs of the output

layer:

δ j = (DesiredCj − Cj ) f ′(netj ), (7)

since

f : sigmoıd function⇒ f ′ = f (1− f ),

δ j = (DesiredCj − Cj )Cj (1− Cj ). (8)

6. Compute the errorδk for the outputs of the hiddenlayer:

δk = f ′(netk)∑

j

δ j W1 jk, (9)

δk = Yk(1− Yk)∑

j

δ j W1 jk . (10)

7. Update the weights of the output layer:

W1 jk(t + 1) =W1 jk(t)+1W1 jk, (11)

P1: ABL



with

1W1 jk = ηδ j Yk, (12)

whereη is learning rate such as 0< η < 1.8. Update the weights of the hidden layer:

W2ki (t + 1) =W2ki (t)+1W2ki , (13)

with

1W2ki = ηδk Xi . (14)

9. Compute the errore:

e= (1/2)∑

j

(DesiredCj − Cj )2. (15)

10. Repeat 2 to 9 with the same input vectorX (thesame training example) until the errore is veryclose to the tolerance.

11. Repeat 2 to 10 for each input vectorX (each train-ing example).

12. Repeat 2 to 11 under several epochs, i.e., underseveral complete passes through all the trainingexamples.

The connection weights are thus updated until thenetwork convergence: a state permitting the coding,i.e., the classification of all the training examples orinput space. This state is reached when the errore isvery close of the tolerance, i.e., the error for all train-ing examples is reduced to an acceptable value [10].Practically, the tolerance is determined to end train-ing while preventing inappropriate memorization, alsocalled over-training.

Phase 1: Target Localization (NN1 Classifier).Tar-get localization is based on NN1Classifierwhich mustrecognize, after learning, each target location situationTj 1. NN1 is trained (see Section 4.1) from data ob-tained by computing distance and orientation of thevehicle-target using a temperature field strategy [22].This temperature field is defined in the vehicle envi-ronment and the vehicle task is therefore to localize itstarget corresponding to the unique maximum tempera-ture of this field. Temperatures in the neighborhood ofthe vehicle are defined by:t30 in direction 30◦, t60 in di-rection 60◦, t90 in direction 90◦, t120 in direction 120◦,

andt150 in direction 150◦. These temperatures are com-puted using sine and cosine functions as follows:

If 15◦ ≤ γ < 45◦ (ClassT1)

Thent30 = cos(γ ), t60 = sin(γ ),

t90 = sin(γ )/2, t120= sin(γ )/3,

andt150= sin(γ )/4

If 45◦ ≤ γ < 75◦ (ClassT2)

Thent30 = cos(γ ), t60 = sin(γ ),

t90 = cos(γ ), t120= cos(γ )/2,

andt150= cos(γ )/3

If 75◦ ≤ γ < 105◦ (ClassT3)

Thent30 = sin(γ )/3, t60 = sin(γ )/2,

t90 = sin(γ ), t120= sin(γ )/2,

andt150= sin(γ )/3

If 105◦ ≤ γ < 135◦ (ClassT4)

Thent30 = − cos(γ )/3, t60 = − cos(γ )/2,

t90 = − cos(γ ), t120= sin(γ ),

andt150= − cos(γ )

If 135◦ ≤ γ < 165◦ (ClassT5)

Thent30 = sin(γ )/4, t60 = sin(γ )/3,

t90 = sin(γ )/2, t120= sin(γ ),


If 165◦ ≤ γ < 375◦ (ClassT6)

If 165◦ ≤ γ < 270◦

Thent30 = − cos(γ )/2, t60 = − cos(γ )/3,

t90 = − cos(γ )/4, t120= − cos(γ )/3,


If γ = 270◦

Thent30 = − sin(γ ), t60 = − sin(γ )/3,

t90 = − sin(γ )/4, t120= − sin(γ )/3,

andt150= − sin(γ )/2

If 270◦ < γ < 375◦

Thent30 = cos(γ ), t60 = cos(γ )/3,

t90 = cos(γ )/4, t120= cos(γ )/3,

andt150= cos(γ )/2

(16)

These components are preprocessed to constitute theinput vectorXT of NN1 built of input layer, hidden

P1: ABL



layer, and output layer as shown in Fig. 5: architectureof NN1 where Xi = XTi (i = 1, . . . ,5),Yk (k =1, . . . ,5),Cj = Tj 1 ( j = j 1= 1, . . . ,6).

XT1 = (1/ρT ) exp(−t30/aT ),

XT2 = (1/ρT ) exp(−t60/aT ),

XT3 = (1/ρT ) exp(−t90/aT ), (17)

XT4 = (1/ρT ) exp(−t120/aT ),

XT5 = (1/ρT ) exp(−t150/aT ),

whereρT is norm of input vectorXT and aT , inputpreprocessing factor withaT > 1.

After learning, for each input vectorXT , NN1 pro-vides the vehicle with capability to decide its targetlocalization, recognizing target location situationTj 1

expressed by the highly activated outputTj 1.

Phase 2: Obstacle Avoidance (NN2 Classifier).Ob-stacle avoidance is based on NN2Classifierwhich mustrecognize, after learning, each obstacle avoidance sit-uation Oj 2. NN2 is trained (see Section 4.1) fromultrasonic sensor data obtained from the environmentgiving vehicle-obstacle distances. These distances aredefined in the vehicle neighborhood by:d30 in direction30◦, d60 in direction 60◦, d90 in direction 90◦, d120 indirection 120◦, andd150 in direction 150◦. These com-ponents are preprocessed to constitute the input vectorXO of NN2 built of input layer, hidden layer, and outputlayer as shown in Fig. 5: architecture of NN2 whereXi = XOi (i = 1, . . . ,5),Yk (k = 1, . . . ,15),Cj =Oj 2 ( j = j 2= 1, . . . ,30).

XO1 = (1/ρO) exp(−d30/aO),

XO2 = (1/ρO) exp(−d60/aO),

XO3 = (1/ρO) exp(−d90/aO), (18)

XO4 = (1/ρO) exp(−d120/aO),

XO5 = (1/ρO) exp(−d150/aO),

whereρO is norm of the vectorXO andaO, input pre-processing factor withaO > 1.

After learning, for each input vectorXO, NN2 pro-vides the vehicle with capability to decide its obstacleavoidance, recognizing obstacle avoidance situationOj 2 expressed by the highly activated outputOj 2.

Phase 3: Decision-Making and Action (NN3). Inthis phase, two association stages between each behav-ior and the favorable actions (among possible actions)

Figure 6. Architecture of NN3 (decision-making and action).

and the coordination of these association stages arecarried out by a multilayer feedforward network NN3.Thus, both situationsTj 1 andOj 2 are associated, by thereinforcement TrialandError learning, with the favor-able actions separately. Afterwards, the coordinationof two associated stages allows the decision-making ofthe appropriate action.

NN3 is built of two layers (input layer and outputlayer) as shown in Fig. 6: architecture of NN3.

Input Layer. This layer is the input layer with thirty-six (36) input nodes receiving the components ofTandO vectors. This layer transmits these inputs to allnodes of the next layer. Each nodeTj 1 is connectedto all nodesAi with the connection weightsUi j 1 and

P1: ABL



each nodeOj 2 is connected to all nodesAi with theconnection weightsVi j 2 as shown in Fig. 6.

Output Layer. This layer is the output layer with five(05) output nodes which are obtained by:

Ai = g

(∑j 1

Tj 1Ui j 1+∑

j 2

Oj 2Vi j 2

)+ Ni , (19)

with

g(x) ={

x if x > 00 otherwise,

(20)

andNi a random distribution variable over [0,β], whereβ is a constant defined below.

a) Association Stages (Reinforcement Trial andError Learning). Both situationsTj 1 andOj 2 are as-sociated separately in two independent stages, byrein-forcement TrialandError learning, with the favorableactions in order to fulfill the two vehicle behaviors.

The association between a situation and an action isusually carried out with the use of a signal providedby an outside process (e.g., a supervisor), giving thedesired response. To achieve the correct association,the desired response is acquired throughreinforcementTrial and Error learning. Learning, in this case, isguided only by a feedback process, i.e., guided by asignalP provided by the supervisor. This signal causesa reinforcement of the association between a given sit-uation and a favorable action if this latter leads to afavorable consequence to the vehicle; if not, the signalP provokes a dissociation [22].

For this learning, the following equation is used toupdate weightsMi j in the two association stages asdetailed in [22]:

τd Mi j (t)

dt= Ai Cj (−Mi j (t)+ α − P), (21)

with τ time constant andα constant.Equation (21) leads to the following first-order linear

equation with constant coefficients:

d Mi j (t)

dt+ Ai Cj

τMi j (t) = Ai Cj

τ(α − P). (22)

The general solution of Eq. (22) is given by:

Mi j (t) = c e−Ai C jτ·t + (α − P), (23)

wherec is a constant determined att = 0 : Mi j (0) = 0andP = 0⇒ c = −α.

Thus, the updating of weightsUi j 1 and Vi j 2 isachieved by the following equation given for weightsMi j :

Mi j (t) = −α e−Ai C jτ·t + (α − P), (24)

• Target Localization Association:This learning (seeSection 4.1) is guided only by a feedback process,i.e., guided by a signalP1 provided by the supervisor:a human expert. Target location situations are thenassociated with favorable actions in an obstacle-freeenvironment (i.e.,O = 0).

Favorable actions are defined, for each situa-tion Tj 1, by the human expert (supervisor) whichhas traduced this fact with the vectorZ = [Z1,

Z2, Z3, Z4, Z5] where eachZi component is deter-mined with regard to each possible actionAi . IfZi = 1 thenAi is a favorable action, while ifZi = 0thenAi is an unfavorable action. For each situationTj 1 only favorable actions are represented in Fig. 7.For instance, in situationT3 : Z1 = 0, Z2 = 1,Z3 = 1, Z4 = 1, andZ5 = 0 which means thatonly A2, A3, andA4 are considered as favorable ac-tions whileA1 andA5 are considered as unfavorableactions.

Figure 7. Representation of favorable actions: for eachTj 1 situa-tion where only componentsZi = 1 of Z = [Z1, Z2, Z3, Z4, Z5]corresponding to favorable actions are represented.

P1: ABL



• Obstacle Avoidance Association:This learning (seeSection 4.1) is guided only by a feedback process,i.e., guided by a signalP2 provided by the supervisor:data sensors from the environment. Obstacle avoid-ance situations are then associated with favorable ac-tions without considering the temperature field (i.e.,T = 0).

Favorable actions are defined, for each situationOj 2, by data sensors from the environment (super-visor). In each situationOj 2, favorable actions arethose corresponding to directions where no obstacleis detected (no collision), while unfavorable actionsare those corresponding to directions where an obsta-cle is detected (collision). For instance, in situationO23 shown in Fig. 3(c): onlyA1 andA3 are consid-ered as favorable actions whileA2, A4, and A5 areconsidered as unfavorable actions.

b) Coordination.It must provide the vehicle with ca-pability to fulfill, in a same time, two target localizationand obstacle avoidance behaviors giving the appropri-ate action. To ensure the coordination of two associa-tion stages (see Section 4.1), actionsAi are computedby Eq. (19).

After learning of the two association stages and theircoordination, NN3 provides the vehicle with capabilityto decide the appropriate action expressed by the highlyactivated outputAi .

4. Simulation Results

In this section, at first the simulated training environ-ments and training processes of NN1, NN2, and NN3are described. Second, the simulatedneural naviga-tion approach is described and simulation results arepresented. Thus, the vehicles, ultrasonic sensors, andpartially structured environments are simulated.

4.1. Training of NN1, NN2, and NN3

Training of NN1. This training is achieved in thelearning (training) environment shown in Fig. 8(a).The vehicle moves along the paths(1, . . . ,10) in anobstacle-free environment, where the target is posi-tioned in the environment center. This allows the vehi-cle to be in different positions and orientations and con-sequently the target will be in different locations withregard to the vehicle. Thus, each particular position andorientation corresponds to one training example for a

(a)

(b)

Figure 8. Learning (training) environments: (a) Target locationsituations—the vehicle moves along the paths(1, . . . ,10) repre-sented by arrows where the target is located in the environment center,and (b) obstacle avoidance situations: the vehicle is simulated in agiven position and orientation, where the simulated configurationof obstacles corresponds to one training example for the obstacleavoidance situationO23.

particular target location situation. Then, several train-ing examples are used for each target location situa-tion.

Thus, the used training set is composed of one thou-sand and two hundred and fifty (1250) patterns corre-sponding to the six (06) target location situations.

For a good discrimination among input vectorsXT

corresponding to different target location situationsaT = 5 is used. Also, NN1Classifieryields conver-gence to the toleranceEa1= 0.01 in well underEP1 =43 epochs with the learning rateη1 = 0.1.

P1: ABL



Training of NN2. The vehicle is simulated in a givenposition and orientation in the environment, where aconfiguration of obstacles is simulated correspondingto one example of a particular obstacle avoidance sit-uation (e.g., the training example shown in Fig. 8(b)corresponds to one training example of the obstacleavoidance situationO23). Then, several training ex-amples are simulated for each obstacle avoidance situ-ation.

Thus, the used training set is composed of one hun-dred and fifty (150) patterns corresponding to the thirty(30) obstacle avoidance situations, with five (05) exam-ples for each situation.

For a good discrimination among input vectorsXO

corresponding to different obstacle avoidance situa-tions aO = 10 is used. Also, NN2Classifieryieldsconvergence to the toleranceEa2 = 0.01 in well underEP2 = 50 epochs with the learning rateη2 = 0.4.

Note that for NN1 and NN2, each training cycle con-sisted of feeding a pattern one time into the feedforwardnetwork, backpropagating the errors, and adjusting theweights. One training epoch consisted of processingall patterns in this manner.

Training of NN3. This training is achieved with thetraining of two association stages and their coordina-tion.

a) Target Localization Association.In this stage,the updating weights is achieved by Eq. (24) whereMi j = Ui j 1,Cj = Tj 1, ( j = j 1= 1, . . . ,6), and:

P ={

P1 if Zi = 0

0 if Zi = 1,(25)

In this stage, the training to obtainUi j 1, constitutingthe first training of NN3, is achieved in an obstacle-freeenvironment (i.e.,O = 0): in the learning (training)environment shown in Fig. 8(a). In this training, onlyone training example for each target location situationis used. Thus, the training set consists of only six (06)examples using NN1 outputs as NN3 inputs as shownin Fig. 6.

The association process can be described by an ex-ample: att = 0,Ui j 1(0) = 0 and P = 0 and thevehicle is positioned such as its target location corre-sponds to target location situationT1. This situationactivates the corresponding NN1 output (i.e.,T1 = 1).Therefore,Z = [1, 1, 1, 0, 0] as shown in Fig. 7, i.e.,in this situation:A1, A2, andA3 are favorable actionswhile A4 andA5 are unfavorable actions.

For favorable actions(P = 0), Eq. (24) saturates atα whent is incremented withτ = 3 s as time constant.U11,U21, andU31 are thus increased and:

• the association betweenT1 andA1 is reinforced witht = t1 at saturation,• the association betweenT1 andA2 is reinforced with

t = t2 wheret2 < t1,• the association betweenT1 andA3 is reinforced with

t = t3 wheret3 < t2.

For unfavorable actions(P = P1), Eq. (24) saturatesat (α − P1) which is a negative value whent is incre-mented with the same time constantτ = 3 s. Thus,U41 andU51 become negative and:

• A4 is dissociated fromT1 with t at saturation,• A5 is dissociated fromT1 with t at saturation.

Thus,Ui j 1 are adjusted to obtain the reinforced ac-tions among favorable actions shown in Fig. 9(a). Ma-trix of target localization association is representedin this figure: solid circles correspond to positiveweights which represent favorable actions, indicatingreinforced association, where values are proportionalto the area of circles and the most reinforced actionis the one having the great positive weight. Hollowcircles correspond to negative weights which representdissociated actions.

The choice of the most reinforced action is guidedby the principle that the vehicle must take the moststraight action towards its target.

b) Obstacle Avoidance Association.In this stage,the updating weights is achieved by Eq. (24) whereMi j = Vi j 2,Cj = Oj 2, ( j = j 2= 1, . . . ,30), and:

P ={

P2 if collision

0 if no collision,(26)

In this stage, the training to obtainVi j 2, constitut-ing the second training of NN3, is achieved withoutconsidering the temperature field (i.e.,T = 0). In thistraining, only one training example for each obstacleavoidance situation is simulated as explained for thesituationO23 in Fig. 8(b). Thus, the training set con-sists of only thirty (30) examples using NN2 outputs asNN3 inputs as shown in Fig. 6.

The association process can be described by an ex-ample: att = 0,Vi j 2(0) = 0 and P = 0 and the

P1: ABL



(a)

(b)

Figure 9. Association matrices: (a) Matrix of target localizationassociation—solid circles correspond to positive weights which rep-resent favorable actions, indicating reinforced association, wherevalues are proportional to the area of circles and the most reinforcedaction is the one having the great positive weight. Hollow circlescorrespond to negative weights which represent actions leading toa dissociation, (b) matrix of obstacle avoidance association: solidcircles correspond to positive weights which represent favourableactions, indicating reinforced association, where values are propor-tional to the area of circles and the most reinforced action is theone having the great positive weight. Hollow circles correspond tonegative weights which represent dissociated actions.

vehicle is positioned in obstacle avoidance situationO1. This situation activates the corresponding NN2output (i.e.,O1 = 1). Therefore, as explained beforei.e., in this situation:A2, A3, A4 and A5 are favorableactions whileA1 is an unfavorable action.

For favorable actions(P = 0), Eq. (24) saturates atα whent is incremented withτ = 3 s as time constant.V21,V31,V41 andV51 are thus increased and:

• the association betweenO1 andA2 is reinforced witht = t1 at saturation,• the association betweenO1 andA3 is reinforced with

t = t2 wheret2 < t1,• the association betweenO1 andA4 is reinforced with

t = t3 wheret3 < t2,• the association betweenO1 andA5 is reinforced with

t = t4 wheret4 < t3.

For unfavorable actions (P = P2), Eq. (24) saturatesat (α − P2) which is a negative value whent is incre-mented with the same time constantτ = 3 s. Thus,V11 becomes negative and:

• A1 is dissociated fromO1 with t at saturation.

Thus,Vi j 2 are adjusted to obtain the reinforced ac-tions among favorable actions as shown in Fig. 9(b).Matrix of obstacle avoidance association is representedin this figure: solid circles correspond to positiveweights which represent favorable actions, indicatingreinforced association, where values are proportionalto the area of circles and the most reinforced actionis the one having the great positive weight. Hollowcircles correspond to negative weights which representdissociated actions.

The choice of the most reinforced action is guidedby the principle that the vehicle must avoid obstaclesjust to avoid collisions.

As explained before Eq. (24) saturates atα in bothassociation stages when actions are favorable, then theymust be reinforced, i.e., with positive weights(α >0).Elsewhere, Eq. (24) saturates at(α − P) in both as-sociation stages when actions are unfavorable, thenthey must be dissociated i.e., with negative weights(P > α). This leads to the following relations:

α > 0, P1 > α, and P2 > α. (27)

c) Coordination.In this coordination of two associa-tion stages, the detection of the maximum temperaturemust be interpreted as the goal of the vehicle while thegenerated actions by the presence of obstacles must beinterpreted as the reflex of the vehicle. Then, actionsgenerated by obstacle avoidance must have precedenceover those generated by target localization, i.e.,P1 and

P1: ABL



P2 constants must be defined such as:

P2 > P1. (28)

The random distribution variableNi , in Eq. (19), isused to decide between two (or among several) favor-able actions in conflictual cases, e.g., in the case ofT3 andO19 situations(T3 = 1 andO19 = 1): A1 andA5 are dissociated fromT3 (see Fig. 9(a)) whileA1, A3,andA5 are dissociated fromO19, (see Fig. 9(b)). Then,from Eq. (19), neitherA1, A3, nor A5 could be thehighly activated output. Also,A2 and A4 are rein-forced toT3 with the same value whileA2 and A4 arereinforced toO19 with the same value. In this case, thehighly activatedAi corresponds to the action having thegreat valueNi . Note thatNi values must be different.

Thus,β andα must be coupled as follows:

0< β < α. (29)

The used values of different constants in associationstages and their coordination are:β = 1, α = 5, P1 =7, andP2 = 9.

Note that actions considered as unfavorable in targetlocation situations could be taken in some cases whileactions considered as unfavorable in obstacle avoid-ance situations are never taken.

4.2. Neural Navigation Approach

To reflect the vehicle behaviors acquired by learn-ing and to demonstrate thegeneralizationability ofthe suggestedneuralapproach, the vehicle navigationis simulated in different static and dynamicpartiallystructured environments.

Each simulated vehicle has only two known data: itsinitial and final (target) positions. From these data, itmust reach its target while avoiding possibly encoun-tered obstacles using the suggestedneuralnavigationapproach. In this simulation the vehicle controls onlyits heading and consequently when obstacles are de-tected, in a same time, in its five (05) movement direc-tions, it must be stopped. Also, each non-intelligentdynamic obstacle are assumed to have a velocity infe-rior or equal to the vehicle one.

Static Obstacles. Tested in an environment contain-ing static obstacles, as illustrated in Fig. 10 (where Veh:Vehicle and Tar: Target), the vehicle succeeds, to avoid

Figure 10. Case of static obstacles.

Figure 11. Case of intelligent dynamic obstacles.

Figure 12. Case of non-intelligent dynamic obstacles.

the static obstacles while it makes its way towards thetarget.

Intelligent Dynamic Obstacles. In the case illus-trated in Fig. 11, the four vehicles Veh1, Veh2, Veh3,and Veh4 try to reach their respective targets while eachone avoids the others. Indeed, these vehicles navigatewith the sameneuralnavigation approach where eachone considers the others as intelligent dynamic obsta-cles.

Non-Intelligent Dynamic Obstacles. In the case ofnon-intelligent dynamic obstacles, oscillating verti-cally and horizontally between two fixed points, illus-trated in Fig. 12, the vehicle avoids them and reachesits target successfully.

Complex Environments. In the case illustrated inFig. 13, the three vehicles reach their targets, withoutcollisions, in a complex environment.

P1: ABL



Figure 13. Case of a complex environment.

These simulation results illustrate thegeneralizationcapability of both NN1 and NN2Classifiers, and thedecision-making and actioncapability of NN3.

5. Discussion and Conclusion

Ultrasonic sensors are very used for the obstacle avoid-ance, but their signals are often noisy or they are defec-tive giving incorrect data [3, 9]. NN with their inherentfeatures of adaptivity and high fault and noise toler-ance handle this problem making the NN based ap-proachesrobust. Indeed, malfunctioning of one of thesensors do not strongly impair the obstacle avoidancebehavior.

In the suggestedneuralapproach, the two intelligentbehaviors necessary to the navigation, are acquired bylearning. They enables IAV to bemore autonomousandintelligent in partially structured environments. Nev-ertheless, there are a number of issues that need to befurther investigated in perspective of an implementa-tion on a real vehicle. At first, vehicle must be endowedwith one or several actions to come back and a smoothtrajectory generation system controlling its velocity.Also, it must be endowed with specific sensors to detectdynamic obstacles and specific processing of data givenfrom them. An interesting alternative is then to imple-ment NN1, NN2, and NN3 on Xilinx’s FPGA [4, 8].

As used in the suggestedneuralapproach, the mostof the neural navigation approaches, e.g., developedin [2, 3, 11, 16] are based on the acquisition by learn-ing of different behaviors such as eye motion control,object recognition, target localization, target tracking,obstacle avoidance, or others. Recently, a promisinglifelong learning perspective for IAV is presented in[23]. In this paper, two lifelong vehicle learning ap-proaches are developed for the navigation of HERO-2000 in unknown indoor environments. The authorsclaimed that learning becomes easier, if it is embed-ded into a lifelong learning context. They suggestednot to study vehicle learning problems in isolation, but

in the context of the multitude of learning problemsthat a vehicle will face over its lifetime. Thus, lifelongvehicle learning opens the opportunity for transfer oflearned knowledge. Vehicles that memorize and trans-fer knowledge rely less on real-environment experi-mentation and learn then faster.

Acknowledgments

The authors would like to thank Miss O. Azouaoui forher invaluable comments and discussions.

References

1. J.A. Anderson,An Introduction to Neural Networks, The MITPress: Cambridge, MA, London, England, 1995.

2. A.A. Baloch and A.M. Waxman, “Visual learning, adaptiveexpectations, and behavioral conditioning of the mobile robotMAVIN,” Neural Networks, vol. 4, pp. 271–302, 1991.

3. K. Berns and R. Dillmann, “A neural network approach for thecontrol of a tracking behavior,” inProc. Int. IEEE Conf. —,7803-0078/91/0600-0500, 1991, pp. 500–503.

4. N.M. Botros and M. Abdul-Aziz, “Hardware implementation ofan artificial neural network,” inProc. Int. IEEE Conf. on NeuralNetworks, vol. III, San Francisco, CA, March 28–April 1, 1993,pp. 1252–1257.

5. S. Cherian and W. Troxell, “Intelligent behavior in machinesemerging from a collection of interactive control structures,”Computational Intelligence, vol. 11, no. 4, pp. 565–592, Novem-ber 1995.

6. A. Chohra, F. Sif, and S. Talaoubrid, “Neural navigation ap-proach of an autonomous mobile robot in a partially structuredenvironment,” inProc. 2nd IFAC Conf. Intelligent AutonomousVehicles, Helsinki, Finland, 1995, pp. 238–243.

7. A. Chohra and A. Farah, “Autonomy, behaviour, and evolutionof intelligent vehicles,” inProc. Int. IMACS IEEE-SMC Multi-conf. Computational Engineering in Systems Applications, Lille,France, 1996, pp. 36–41.

8. Y.I. El-Haffaf, A.K. Oudjida, and A. Chohra, “Digital artificialneural network architecture for obstacle avoidance suitable forFPGA,” inProc. Int. Conf. on Engineering Applications of Neu-ral Networks, Stockholm, Sweden, 16–18 June, 1997, pp. 313–316.

9. D. Floreano and F. Mondada, “Automatic creation of an au-tonomous agent: Genetic evolution of a neural network-drivenrobot,” inProc. 3rd Int. Conf. on Simulation of Adaptive Behav-ior, Brighton, 1994.

10. J.A. Freeman and D.M. Skapura,Neural Networks: Algorithms,Applications, and Programming Techniques, Addison-Wesley:New York, 1992.

11. H. Herbstreith, L. Gmeiner, and P. Preuβ, “A target-directed neu-rally controlled vehicle,” inProc. Int. IFAC Conf. Artificial Intel-ligence in Real-Time Control, The Netherlands, 1992, pp. 67–71.

12. T. Khanna,Foundations of neural networks, Addison-Wesley:New York, 1990.

13. B. Kosko,Neural Networks and Fuzzy Systems, Prentice-Hall,1992.

P1: ABL



14. C. Kujawski, “Deciding the behaviour of an autonomous mo-bile road vehicle,” inProc. 2nd Int. IFAC Conf. Intelligent Au-tonomous Vehicles, Helsinki, Finland, 1995, pp. 404–409.

15. S.Y. Kung, Digital Neural Networks, PTR Prentice-Hall:NJ 07632, 1993.

16. M. Meng and A.C. Kak, “Mobile robot navigation using neuralnetworks and nonmetrical environment models,”IEEE ControlSystems, pp. 30–39, October 1993.

17. L. Moreno, E.A. Puente, and M.A. Salichs, “World modellingand sensor data fusion in a non static environment: Applicationto mobile robots,” inProc. Int. IFAC Conf. Intelligent Compo-nents and Instruments for Control Applications, Malaga, Spain,1992, pp. 433–436.

18. W. Niegel, “Methodical structuring of knowledge used in anintelligent driving system,” inProc. 2nd Int. IFAC Conf. Intel-ligent Autonomous Vehicles, Helsinki, Finland, 1995, pp. 398–403.

19. D.W. Patterson,Artificial Neural Networks: Theory and Appli-cations, Prentice-Hall, Simon & Schuster (Asia): Singapore,1996.

20. P. Pignon, T. Hasegawa, and J.P. Laumond, “Structuration del’espace pour les robots mobiles,”Revue Scientifique et Tech-nique de la Defense, vol. 2, pp. 17–31, 1994.

21. K. Schilling and C. Jungius, “Mobile robots for planetary ex-ploration,” inProc. 2nd Int. IFAC Conf. Intelligent AutonomousVehicles, Helsinki, Finland, 1995, pp. 110–120.

22. E. Sorouchyari, “Mobile robot navigation: A neural networkapproach,” inProc. Art Coll. Neuro., Eco. Poly., Lausanne, 1989,pp. 159–175.

23. S. Thrun and T.M. Mitchell, “Lifelong robot learning,”Roboticsand Autonomous Systems, vol. 15, pp. 25–46, 1995.

24. S.T. Welstead,Neural Network and Fuzzy Logic Applications inC/C++, John Wiley & Sons: Toronto, 1994.

25. P.J. Werbos, “Neurocontrol and fuzzy logic: Connections anddesigns,”Int. J. of Approximate Reasoning, vol. 6, pp. 185–219,1992.

A. Chohra received the Electronics Engineer degree option Controlfrom the Electronics Department of USTO (Université des Sciences

et de la Technologie d’Oran) at Oran, Algeria in 1988. He receivedhis Magister degree in Cybernetics option Robotics from the LRIA(Laboratoire de Robotique et d’Intelligence Artificielle) of the CDTA(Centre de Développement des Technologies Avancées) at Algiers,Algeria in 1991. Currently, he is researcher at the LRIA/CDTAsince 1991, and his Doctorate es-sciences degree is in progress atthe ENP (Ecole Nationale Polytechnique) at Algiers, Algeria since1993. His research interests are navigation, intelligent autonomousvehicles, soft computing, hybrid intelligent systems, neural net-works, fuzzy logic, evolutionary algorithms, and adaptive resonancetheory.

C. Benmehrezreceived the Electronics Engineer degree in 1977from the ENP (Ecole National Polytechnique) at Algiers, Algeria.He received his Master’s of Engineering and Ph. D. degrees in Com-puters and Systems in 1979 and 1983 respectively, from RensselaerPolytechnic Institute, TROY, N.Y., where he worked as a teachingassistant and then instructor. He returned in 1983 to Algeria and be-came fully involved with research/development in the CDTA (Centrede Developpement des Technologies Avancées). His main interestsinclude digital circuit simulation, verification and testing, patternrecognition, neural networks, and fuzzy logic, microprocessor basedsystems development. One of his main concerns is the transfer oftechnology to developing countries. Since 1994, he is a facultymember at Ajman University of Science and Technology, UnitedArab Emirates.

A. Farah received the Electronics Engineer degree, Magister de-gree in Electronics Applied to Information Processing from the ENP(Ecole Nationale Polytechnique) at Algiers, Algeria in 1977 and1983 respectively, and Doctorate es-sciences degree from the INPL(Institut National Polytechnique de Lorraine) at Nancy, France in1989. He worked as assistant professor in 1978 and from 1979 to1984 at the ENITA (Ecole Nationale des Ingénieurs et Techniciens del’A eronautique) and the ENP respectively. Since 1989 he is associateprofessor at the ENP. His research interest include the design, ver-ification, and test of digital circuits and systems, signal processing,neural networks, fuzzy systems, and interconnection networks.

neural navigation approach for intelligent autonomous vehicles (iav) in partially structured...

Documents