a soft computing approach for recognition of occluded shapes

11
A soft computing approach for recognition of occluded shapes Mohammed Zaki a, * , Abdallah El-Ramsisi b , Rostom Omran c a Faculty of Engineering, Computer Science Department, Al-Azhar University, Nasr City, Cairo, Egypt b Air Force R & D Center, Cairo, Egypt c Egypt Air Information Technology Sector, Cairo, Egypt Received 3 March 1998; received in revised form 28 August 1998; accepted 28 October 1998 Abstract An ecient pattern recognition system based on soft computing concepts has been developed. A new reliable genetic stereo vision algorithm is used in order to estimate depth of objects without using any point-to-point correspondence. Instead, correspondence of the contours as a whole is required. Invariant breakpoints are located on a shape contour using the colinearity principle. Thus, a localized representation of a shape contour including 3-D moments as well as a chain code can be obtained. This representation is invariant to rotation, translation, scale, and starting point. The system is provided with a neural network classifier and a dynamic alignment procedure at its output. Combining the robustness of neural network classifier with the genetic algorithm capability results in a reliable pattern recognition system which can tolerate high degrees of noise and occlusion levels. The performance of the system has been demonstrated using five dierent types of aircraft and the experimental results are reported. Ó 2000 Elsevier Science Inc. All rights reserved. 1. Introduction Soft computing is that subject of artificial intelligence that incorporates neural networks, fuzzy logic and ge- netic algorithms (Zadeh, 1994). Traditionally, many re- searchers have used various methods to recover original objects from the 2-D features embedded in the object image (Bergevin and Levine, 1993; Cohen and Wang, 1994; Wang and Cohen, 1994). However, these methods, in their classical form, fail to reconstruct the original object from its occluded shapes. Therefore, two ap- proaches have been emerged to tackle the problem of occluded shapes. The first approach depends upon providing similarity measures for modification and enhancement of the segmentation phase of the classical (2-D) recognition methods. The second one relies on using volumetric (3-D) features to attain accurate clas- sification and recognition. In addition, Hof and Ahuja, 1989 have developed a method for reconstructing sur- faces from stereo algorithms by integrating feature matching, disparity estimation and contour detection. This work is a combination of both approaches since it uses (3-D) features and utilizes the similarity measures to adapt the system for recognizing partially occluded shapes. The main achievements of the work are: 1. Finding out a reliable set of invariant (3-D) features. 2. Deriving a novel genetic aggregate stereo algorithm (GNSA) for depth estimation. In a typical AGSA (Aloimonos and Shulman, 1992), shapes are divided into three separate parts in order to get three equations that can be solved to obtain the three parameters of the world contour. This method suers from the following drawbacks: It has no solution when centers of mass of the shape three regions are collinear. It can tolerate neither noise nor occlusion. It is actually overdeterministic. Therefore we have exploited genetic algorithms (Goldberg, 1989) to develop a new AGSA for estimating depth of the shape outer contour. On the basis of both the set of invariant features and the GNSA, a recognition system has been devel- oped. The proposed system is composed of four modules as shown in Fig. 1. The first module is the edge detector. This module not only detects the outer contour of the shape but also determines its break- points. The second module is responsible for estimat- ing the depth map of the outer contour of the shape The Journal of Systems and Software 51 (2000) 73–83 www.elsevier.com/locate/jss * Corresponding author. E-mail address: [email protected] (M. Zaki). 0164-1212/00/$ - see front matter Ó 2000 Elsevier Science Inc. All rights reserved. PII: S 0 1 6 4 - 1 2 1 2 ( 9 9 ) 0 0 1 1 1 - 9

Upload: mohammed-zaki

Post on 02-Jul-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

A soft computing approach for recognition of occluded shapes

Mohammed Zaki a,*, Abdallah El-Ramsisi b, Rostom Omran c

a Faculty of Engineering, Computer Science Department, Al-Azhar University, Nasr City, Cairo, Egyptb Air Force R & D Center, Cairo, Egypt

c Egypt Air Information Technology Sector, Cairo, Egypt

Received 3 March 1998; received in revised form 28 August 1998; accepted 28 October 1998

Abstract

An e�cient pattern recognition system based on soft computing concepts has been developed. A new reliable genetic stereo vision

algorithm is used in order to estimate depth of objects without using any point-to-point correspondence. Instead, correspondence of

the contours as a whole is required. Invariant breakpoints are located on a shape contour using the colinearity principle. Thus, a

localized representation of a shape contour including 3-D moments as well as a chain code can be obtained. This representation is

invariant to rotation, translation, scale, and starting point. The system is provided with a neural network classi®er and a dynamic

alignment procedure at its output. Combining the robustness of neural network classi®er with the genetic algorithm capability

results in a reliable pattern recognition system which can tolerate high degrees of noise and occlusion levels. The performance of the

system has been demonstrated using ®ve di�erent types of aircraft and the experimental results are reported. Ó 2000 Elsevier

Science Inc. All rights reserved.

1. Introduction

Soft computing is that subject of arti®cial intelligencethat incorporates neural networks, fuzzy logic and ge-netic algorithms (Zadeh, 1994). Traditionally, many re-searchers have used various methods to recover originalobjects from the 2-D features embedded in the objectimage (Bergevin and Levine, 1993; Cohen and Wang,1994; Wang and Cohen, 1994). However, these methods,in their classical form, fail to reconstruct the originalobject from its occluded shapes. Therefore, two ap-proaches have been emerged to tackle the problem ofoccluded shapes. The ®rst approach depends uponproviding similarity measures for modi®cation andenhancement of the segmentation phase of the classical(2-D) recognition methods. The second one relies onusing volumetric (3-D) features to attain accurate clas-si®cation and recognition. In addition, Hof and Ahuja,1989 have developed a method for reconstructing sur-faces from stereo algorithms by integrating featurematching, disparity estimation and contour detection.This work is a combination of both approaches since it

uses (3-D) features and utilizes the similarity measuresto adapt the system for recognizing partially occludedshapes. The main achievements of the work are:1. Finding out a reliable set of invariant (3-D) features.2. Deriving a novel genetic aggregate stereo algorithm

(GNSA) for depth estimation.In a typical AGSA (Aloimonos and Shulman, 1992),

shapes are divided into three separate parts in order toget three equations that can be solved to obtain the threeparameters of the world contour. This method su�ersfrom the following drawbacks:· It has no solution when centers of mass of the shape

three regions are collinear.· It can tolerate neither noise nor occlusion.· It is actually overdeterministic.Therefore we have exploited genetic algorithms(Goldberg, 1989) to develop a new AGSA for estimatingdepth of the shape outer contour.

On the basis of both the set of invariant featuresand the GNSA, a recognition system has been devel-oped. The proposed system is composed of fourmodules as shown in Fig. 1. The ®rst module is theedge detector. This module not only detects the outercontour of the shape but also determines its break-points. The second module is responsible for estimat-ing the depth map of the outer contour of the shape

The Journal of Systems and Software 51 (2000) 73±83www.elsevier.com/locate/jss

* Corresponding author.

E-mail address: [email protected] (M. Zaki).

0164-1212/00/$ - see front matter Ó 2000 Elsevier Science Inc. All rights reserved.

PII: S 0 1 6 4 - 1 2 1 2 ( 9 9 ) 0 0 1 1 1 - 9

using a GNSA. The third module is responsible forcalculating the features. As the depth map has becomeavailable, each segment on a shape contour can berepresented by a set of three 3-D features includingtwo 3-D moments as well as an information anglebetween its two straight lines. Thus, a localized rep-resentation of a shape contour including its chain codeis obtained. Actually, this representation has provento be invariant to rotation, translation, scaling, andstarting point. Accordingly, they form a reliable set ofinput features required for the classi®cation stage. Thefourth module includes both a multi-layer neural netclassi®er and a dynamic alignment procedure at itsoutput. Considering the partial occlusion problem, anatural choice, is the use of either spatial temporalneural networks (Tung et al., 1993) or recurrent neu-ral networks, especially the simple recurrent neuralnetwork proposed by Ku and Lee (1995). However,such an immediate choice is excluded since thesenetworks, as such, perform classi®cation on the as-sumption that an object shape is given at the initialstate (similar to a sequential machine recognizer). Thisassumption is actually inconvenient and makes therecognizer impractical. For that reason, some ad-vances in this area have recently been presented(Gupta and Upadhye, 1991). In this work, an alter-native solution is employed in which the classi®erconsists of a set of discriminators (Lo and Don, 1989).The number of these discriminators is equal to thenumber of reference shapes to be classi®ed. To com-pensate for temporal variations resulting from partialocclusion, a multi-neural network classi®cation systemwith a counter-propagation algorithm is used. In thissystem each neural network is assigned to discriminatea particular class of objects. Consequently, the outputsof all these discriminators are input to the dynamicalignment procedure. The function of this procedure isto optimally align contour segments so that an accu-rate measure of shape similarity is obtained (Sadjadiand Hall, 1980).

The rest of this paper is organized as follows. Section2 deals with the set of underlying 3-D features. Section 3describes the structure of the proposed system whileSection 4 discusses the related works. Sections 5 and6 demonstrate the obtained results and the salientconclusions.

2. The set of (3-D) features

We started with locating breakpoints on a shapecontour using an invariant breakpoints detection algo-rithm (Gupta and Malakapali, 1990). Accordingly, ashape contour can be partitioned into a sequence ofcontour segments, each contains three consecutivebreakpoints. As mentioned before, we have been able toestimate the depth of these breakpoints . Consequently,each segment on a shape contour can be represented bythree 3-D features including two 3-D moments as well asan information angle between its two straight lines.

2.1. 3-D moments

3-D moments of order �r � s� t� of a density func-tion f �x; y; z� are de®ned as

Mrst �Z 1

ÿ1

Z ÿ1

ÿ1

Z 1

ÿ1xrysztf �x; y; z� dx dy dz: �1�

It is assumed that f �x; y; z� is piecewisely continuous andbounded in a ®nite range in space. Based on this as-sumption, it can be proved that the sequence Mrst de-termines uniquely f �x; y; z�. The centroid of the densityfunction can be determined from the zeroeth- and ®rst-order moments as follows:

x0 � M100=M000; y0 � M010=M000; z0 � M001=M000:

Moments with their origin at the centroid of the densityfunction are called central moments and de®ned as fol-lows:

lrst �Z 1

ÿ1

Z 1

ÿ1

Z 1

ÿ1�xÿ x0�r�y ÿ y0�s�zÿ z0�t

� f �x; y; z� dx dy dz:�2�

The second-order 3-D moments derived in Lo and Don(1989) are

J1 � l200 � l020 � l002; �3�

J2 � l200 l020 � l200 l002 � l020l002 ÿ l2101 ÿ l2

110 ÿ l2011;

�4�

J3 � l200l020l002 ÿ l002l2110 � 2l110l101l001

ÿ l020l2101 ÿ l200l

2011: �5�

In this work, we used two 3-D moment invariantsde®ned as follows:

F1 � J 21 =J2; �6�

F2 � J3=J 31 : �7�

The two features (F1, F2) have been proven to be in-variant to rotation, translation, and scale (Sadjadi andHall, 1980). This property is highly signi®cant in com-pressing data required in world pattern recognition.

Fig. 1. A block diagram of the proposed system.

74 M. Zaki et al. / The Journal of Systems and Software 51 (2000) 73±83

2.2. Chain code representation

To enhance the performance of the system, anotherfeature (F3) representing the outer contour chain code isalso included .This feature is calculated as follows: Letthe angle between the two straight lines of a segment ish. The tangent of this angle can be expressed in terms ofthe slopes of the two straight lines S1, S2 as

tanh � �S1 ÿ S2�=�1� S1S2�; �8�

F3 � 1 if tanh < 0;0 otherwise:

��9�

Obviously, the new feature (F3) is also invariant to ro-tation, translation, and scale. Thus, we have added F3 to(F1, F2) to form a set of three 3-D invariant featureswhich are directly related to the requirements of shaperecognition since they carry the most important infor-mation about the spatial distribution of a shape.

3. The system structure

The proposed system is composed of three stages; anearly processing stage, a feature extraction stage and aclassi®cation stage. The main function of the ®rst stageis to estimate depth of the breakpoints of the shapeboundary. The main function of the second stage is tocalculate an invariant set of (3-D) features required asan input to the second stage. The main function of thethird stage is to determine a similarity measure betweentest and reference shapes and then take a decision. Ablock diagram of the proposed system is shown in Fig. 1.Brief explanations of the system modules will be pre-sented next.

3.1. Edges and breakpoints detection

We have decided to make use of feature-basedmethods in handling the partial occlusion problem. At®rst, the shape outer contour is obtained. In order toextract local features from a boundary representation,the boundary must be partitioned into distinct segments.Points connecting adjacent segments are called `break-points'. Obviously, the most important information liesin the location of the breakpoints on the boundary.Therefore, the location of the breakpoints must notchange for those parts of the shape that are not oc-cluded. In addition, these breakpoints must be invariantto basic shape transformation and noise. Breakpointssatisfying these requirements are called `invariantbreakpoints'.

An algorithm based on the colinearity test is used toidentify invariant breakpoints on the boundary of ashape. A set of points are set to be collinear if they lie ona straight line. To check the colinearity of a sequence of

boundary pixels �va; ya�; �va�1; ya�1�; . . . ; �vb; yb�, weproceed as follows. The length of the line joining the twoextreme points �va; ya� and �vb; yb� is calculated using therelation

Ld � ��ya ÿ yb�2 � �va ÿ vb�2�1=2: �10�

The length of the arc along the pixel sequence is calcu-lated using the relation

La �Xb

i�a�1

f�viÿ viÿ1�2 � �yi ÿ yiÿ1�2g1=2: �11�

Colinearity is checked by comparing the arc length La tothe length of the line Ld. If the ratio / � La=Ld is large,the points cannot be colinear and the point (i�, j�) fur-thest from the line joining the extreme points is markedas an intermediate breakpoint.

As shown in Fig. 2, the algorithm begins by checkingthe colinearity of any three consecutive boundary pixels.If the three points are collinear, the fourth consecutivepixel is added to the list and the sequence of four pixelsare tested for colinearity. This process of extending theend point of the sequence of points is repeated until thecolinearity test fails indicating that there is a breakpointin the sequence of points. The point farthest from theline is located and marked as a breakpoint. The colin-earity test is restarted at the breakpoint to determine thenext breakpoint in the sequence of boundary pixels. Theprocedure checks all points in the sequence and goesbeyond the start point up to the second breakpoint tocheck whether the arbitrary start point is a breakpointor not and also to check the validity of the ®rst break-point. Initially, the threshold / is set to one so that allboundary points which do not lie on straight lines areidenti®ed as breakpoints. Then, the threshold is incre-mented fractionally and the colinearity test is applied to

Fig. 2. Breakpoint detection ¯owchart.

M. Zaki et al. / The Journal of Systems and Software 51 (2000) 73±83 75

the set of breakpoints obtained. Breakpoints satisfyingthe colinearity condition now are deleted. The deletedbreakpoints fall on small angular changes and thus cor-respond to `false breakpoints'. By iteratively operatingon the boundaries formed by intermediate breakpoints,all false breakpoints are eliminated. The algorithm ter-minates when small consecutive increments in thethreshold do not a�ect the breakpoints. The number ofbreakpoints at termination is controlled by the shape ofthe boundary. Hence, di�erent shapes may have a dif-ferent number of ®nal breakpoints.

3.2. Depth estimation

To calculate the set of (3-D) features of contoursegments, the depth should be calculated at its break-points. Traditionally, stereo algorithms have been de-veloped so as to determine the depth from a stereo pairof images. The usual approach is to ®rst identify cor-responding features between the two images and esti-mate depths, then interpolate to obtain a completedepth map. Next, we present the new developed GNSAwhich is used for the same purpose.

3.2.1. Aggregate stereo algorithm (AGSA)A shape contour has been considered to be discrete;

i.e. a collection of points. So

C1 � f�xli; yli� j i � 1; 2; . . . ; ng;Cr � f�xri; yri� j i � 1; 2; . . . ; ng:Consider a point (Xi, Yi, Zi) on the world plane and itsprojections (xli, yli), (xri, yri) on the left and right imageframes, respectively. Then

yli � yri; �12�

Zi �Xn

i�1

f � b=xli ÿ xri; �13�

where f is the focal length of the lenses and b is thebaseline between the two lenses.

We proceed with the following propositions:1. Under the established con®guration, the value ofPn

i�1 ykli=Zi is directly computable (no correspon-

dence is required).2. Using the previous proposition, the parameters p, q,

and c of the world contour can be directly computedwithout any point-to-point correspondence betweenthe two frames. This can be proved as follows: Theequation of the world planar contour when expressedin terms of coordinates of the left frame is

1=Zi � �f ÿ pxli ÿ qyli�=cf ; i � 1; 2; . . . :; n �14�Multiplying both sides of Eq. (3) by yk

li and taking thesum, we get

Xn

i�1

ykli=Zi � �1=c�

Xn

i�1

ykli ÿ �1ÿ cf �

�Xn

i�1

pxliykli

(�Xn

i�1

qyliykli

): �15�

Substituting (13) into (15), we get the following linearsystem:

�1=fb�Xn

i�1

xliykli ÿ

Xn

i�1

xriykli � �1=c�

Xn

i�1

ykli ÿ �1=cf �

�Xn

i�1

pxliykli

(�Xn

i�1

qyliykli

): �16�

For three values of k, we get three equations in threeunknowns (p, q, c) which may be expressed brie¯y asfollows:

a11p � a12q� a13c � b1; �17�

a21p � a22q� a23c � b2; �18�

a31p � a32q� a33c � b3: �19�

The solution of the above system recovers the structuresand depth of the world planar contour without anypoint-to-point correspondence. Practically, it has beenshown that the exponent k should be zero in order to geta robust linear system. In this case, the three equationsobtained from (16) degenerate to just one equation asfollows:

pXn

i�1

xli � qXn

i�1

yli � c�1=b�Xn

i�1

xli

(ÿXn

i�1

xri

)� f : �20�

From the previous analysis, it is clear that we haveconsidered the number of points in the contours on theleft and right images to be the same. Practically, thismay not be true. To account for this we divide the sumin Eq. (20) by the total number of points in left and rightimages Nl, Nr, respectively. Then we get the followingmodi®ed form:

�p=Nl�Xn

i�1

xli � �q=Nr�Xn

i�1

yli � �c=bNl�Xn

i�1

xli

ÿ �c=bNr�Xn

i�1

xri � f =Nl: �21�

This equation can be expressed simply as

a1p � a2q� a3c � b; �22�where

a1 � �1=Nl�Xn

i�1

xli; �23�

76 M. Zaki et al. / The Journal of Systems and Software 51 (2000) 73±83

a2 � �1=Nr�Xn

i�1

yli; �24�

a3 � �1=bNl�Xn

i�1

xli ÿ �1=bNr�Xn

i�1

xri; �25�

b � f =Nl: �26�If matrix inversion is used as a tool to get the solution

of the problem, there is a need to divide the contoursinto three separate parts. Practically, the obtained so-lution may di�er according to the way of division asshown in Table 1 which demonstrates three di�erentdepth values for the same points on a vase outer contourusing AGSA with three di�erent ways of division asshown in Fig. 3. Moreover, no solution is given if thecenters of mass of the three di�erent areas are collinearor when the determinant of the matrix is equal to zero.Additionally, there is a great error in case of high noiseand occlusion levels.

In order to overcome the previous mentioned draw-backs, we have developed a GNSA which is used to get asolution of Eq. (22). Thus, the parameters of the worldcontour (p, q, c) are obtained and depth can be esti-mated. In this case, there is no need to divide the imagecontours into three regions. Instead, all the image or apart of it is su�cient enough to get the solution.Moreover, the system can tolerate high noise and oc-clusion level. In the next part, we present brie¯y the newpowerful algorithm used for depth estimation.

3.2.2. Genetic aggregate stereo algorithm (GNSA)Generally, genetic algorithms (GA) are search algo-

rithms used for solution of optimization problems. Themechanics of a GNSA is very simple, involving nothingmore complex than solving Eq. (22) by means of a GA.Thus, the parameters (p, q, c) are obtained and depthcan be estimated. The simplicity of operation and powerof e�ect are two of the main attractions of this ap-proach. For solution of optimization problems bymeans of GAs, ®ve components are required:1. Encoding. This is a way of encoding decision vari-

ables of the problem in a string of binary digits calleda chromosome. If there are m decision variables in anoptimization problem and each variable is encoded asn-digit binary number, then a chromosome is a stringof �m� n� binary digits.

2. Initialization of population. A method of initializingthe population of chromosomes is needed. Generally,the initial population is initialized in random.

3. Objective function. This function is used to evaluatethe given decision variables and return a value. Thisvalue is used to determine if a chromosome may beselected for reproduction or not.

4. A set of operators. GA is composed of three opera-tors: reproduction, crossover, and mutation. GAuse parent selection techniques that mimic the processof natural selection for selecting chromosomes tocreate a new generation, where the ®ttest membersreproduce most often. After parent selection, cross-over is applied to combine two chromosomes andgenerate two new ones as follows: An integer positionk is randomly generated, where k lies between 1 andstring length minus 1�1; �lÿ 1��. The two new stringsare created by swapping all characters between posi-tions �k � 1� and l inclusively. Actually, reproductionand crossover give GA the bulk of their processingpower. However, mutation is needed because repro-duction and crossover may lose some potentially use-ful genetic material (1s or 0s at particular locations).The mutation operator simply changes one bit from 0to 1 or vice versa.

5. Working parameters. A set of parameters is rede®nedto guide GA such as the length of each decision vari-able encoded as a binary string, the number of chro-mosomes to be generated in each generation, thecrossover rate, the mutation rate, and the stoppingcriterion.Brie¯y, by having established the last components,

the proposed algorithm can be summarized asStep 1. Calculate the coe�cients a1, a2, a3, b using

Eqs. (23), (24), (25) and (26), respectively to get a linearsystem of the three unknowns (p, q ,c) which is repre-sented by (22) (El-Ramsisi et al., 1998).

Step 2. Use GA to get a solution of the parameters p,q, c as follows:1. Encode the decision variables as a chromosome.

Table 1

Depth obtained by AGSA using three ways of division

P.N. Xleft Yleft Z1 Z2 Z3

1 2 2 0.78 1.00 1.00

2 4 5 0.37 0.28 0.56

3 11 5 0.33 0.25 0.50

4 14 1 1 0.89 0.69

5 15 11 0.15 0.09 0.25

6 11 7 0.26 0.17 0.38

7 4 7 0.26 0.19 0.50

8 2 9 0.22 0.15 0.44

Fig. 3. Three di�erent ways of dividing a vase contour.

M. Zaki et al. / The Journal of Systems and Software 51 (2000) 73±83 77

2. Initialize a population of chromosomes as a currentgeneration randomly.

3. Perform the following until a stopping criterion ismet:· Evaluate the objective function values d of current

generation using the next formula

d � �a1p � a2q� a3cÿ b�2:· Sort chromosomes of current generation ascend-

ingly according to the objective function.· Using reproduction operator, select chromosomes

from the ®rst half of the population as parent chro-mosomes so as to produce new children chromo-somes.

· Apply crossover and mutation operators to parentchromosomes selected previously.

· Replace the second half of current generation (badchromosomes) with the new children in order tocreate a new generation of population.

· Go back to (3).Step 3. If the minimum value of d is less than 0.001,

choose the chromosome of minimum d as the one thatgives the required solution. Get the values p, q, c fromthis chromosome. Then, estimate a depth map of ashape contour in terms of the coordinates of the leftframe using Eq. (3).

Step 4. If d is not small enough, go back to step 1. Getnew values of a1, a2, a3, b by considering some part ofthe image, not the whole image. Then, repeat the sameprocedure so as to get a more accurate depth map.

3.3. The neural network classi®er

It is important to note that a feature vector obtainedfrom the localized approach consists of a sequence oflocal feature in a ®xed order. Determining a patternsimilarity involves not only determining the presence oflocal features but also maintaining the inherent se-quentiality of the local features in patterns. Due to theserequirements, the classi®cation of patterns which expe-rience dimension and ordering change is a major prob-lem. In order to be adapted for such temporalvariations, a multi-neural network system which in-cludes a non-linear alignment procedure at the outputsis proposed (Sperduti and Starita, 1997). As shown inFig. 4, the neural network classi®er consists of ®ve dis-criminators, one discriminator for each class. Each dis-criminator consists of a counter-propagation neuralnetwork (CPNN) and a non-linear alignment procedureat its outputs as shown in Fig. 5. CPNN is a combina-tion of two other neural network models; Kohonenneural network (KNN) and Grossberg neural network(GNN). Thus, the study of CPNN highlights the ad-vantages of the last two models (Lippman, 1987). KNNdemonstrates the generalization and look up tablecapabilities of the selforganizing feature maps (Gorman

et al., 1988) while GNN demonstrates the ability to actas a minimal pattern encoder. Moreover, CPNN hasother capabilities beyond either of these. CPNN forms agood statistical model of its input vector environment.As compared to back-propagation neural network(BPNN), it can reduce the training time by one hun-dredfold. It provides a good solution for those appli-cations that cannot tolerate long training sessions. It isalso useful for rapid prototyping of systems where aquick approximation is important. The generalized ca-pability of CPNN allows it to handle well the incompleteor incorrect data. Therefore, CPNN is very useful forpattern recognition, pattern completion, and signal en-hancement applications.

On the other hand, the main drawback of CPNN isthe need to have a hidden layer neuron for each outputpattern to be learned. In other words, the capacity is onlyK patterns, where K is number of neurons in Kohonenlayer. Anyway, the capacity problem exists in mostneural network models.

3.4. The non-linear alignment procedure

As mentioned before, the neural network classi®eroutputs are input to a dynamic alignment procedure.The function of this procedure is to optimally align thesamples of a test shape with the samples of a referenceshape so as to determine the similarity measure betweenthem and then make a decision (Ku and Lee, 1995;Sperduti and Starita, 1997). Let a reference shape Rc berepresented by a sequence of segment features rc

u;u � 1; 2; . . . ;Uc and a test shape T represented by asequence of segment features tv; v � 1; 2; . . . ; V . In orderto optimally align rc

u and tv , an alignment function Wu;v

Fig. 4. Structure of a multi-layer neural network classi®er.

Fig. 5. Structure of discriminator C.

78 M. Zaki et al. / The Journal of Systems and Software 51 (2000) 73±83

which gives the best correspondence �ScR;T� between the

shapes Rc and T has to be determined. When comparingtwo entire contours, it is necessary to examine the sim-ilarity between every segment of both the contours. Thisis most conveniently done with `the look up' table. Letthe columns represent segments of one contour and therows the segments of the other, each entry in the table isthe similarity between the two segments correspondingto the given row and column. Fig. 6 shows an inter-segment similarity table resulting from a comparison ofthe segments of a reference shape with themselves. Theappearance of ones on the diagonal indicates thatthe segments match with themselves, and the values ofthe o�-diagonal terms illustrate that the segments do notmatch with each other. Such approach deals with objectdistortions regardless of the distortion cause. Actually,this distortion may be due to either noise or partialocclusion. Fig. 7 shows an intersegment similarity tableresulting from a comparison of an occluded contour tothe original contour.

The following constraints are assumed for the pur-pose of the dynamic alignment of the samples:1. End-point constraints:

u�1� � 1 & 16 v�1�6 V ;

u�Ku;v� � Uc & V 6 v�Ku;v�6 �2V ÿ 1�:2. Monotonic constraints:

u�k ÿ 1�6 u�k�;v�k ÿ 1�6 v�k�:

3. Continuity constraints:

u�k� ÿ u�k ÿ 1�6 1;

v�k� ÿ v�k ÿ 1�6 1:

The end-point constraints require that ®rst and lastsample of tv to be aligned with the ®rst and last sampleof rc

u. The starting point of the test shape is assumed tobe unknown. Thus, by extending the test contour by�V ÿ 1� segments, each one of the ®rst V segments of theextended sequence of contour segments is a possiblestart segment. Here, the end-point constraints de®newindows for the start and end points. The monotonicand continuity constraints de®ne the relationship be-tween consecutive points Wu;v�k ÿ 1� and Wu;v�k�. Addi-tional constraints such as a tolerance band can beimposed to restrict the search for the alignment functionin a meaningful manner and also to reduce the compu-tational requirements. Fig. 8 shows the similarity tableresulting from a comparison of a reference shape versusa rotated and occluded shape of the same class.

Under the previously mentioned constraints, a simi-larity measure between the test shape and the referenceshapes Sc

R;T is computed for c � 1; 2; . . . :;C. The testshape T is assigned to class c� determined by

c´ � argmaxfScR;T; g; c � 1; 2; . . . :;C: �27�

4. Related work

Computational vision has been an interesting topicfor many researchers. Lalit Gupta and Krishna

Fig. 6. Intersegment similarity table ± a reference shape versus itself

�ScR;T � 1�.

Fig. 7. Intersegment similarity table ± a reference shape versus an

occluded shape �ScR;T � 0:8�.

Fig. 8. Intersegment similarity table ± a reference shape versus a

rotated occluded shape �ScR;T � 0:7�.

M. Zaki et al. / The Journal of Systems and Software 51 (2000) 73±83 79

Malkapali have developed a system for classifying par-tially occluded noisy shapes in varying positions, ori-entation, and dimensions (Ku and Lee, 1995). In theirapproach, each contour segment is described by an or-dered sequence that represents the Euclidean distancebetween the pixels of the contour segment and thecentroid of the region formed by connecting the endpoints of the contour segment by a straight line (Sper-duti and Starita, 1997).

The need for the solution of the 3-D parameter de-termination problem has been another interesting pointof research. Stereo algorithms are developed in order todetermine the depth from a stereo pair of images. Tra-ditionally, ®nding the corresponding features has beenconsidered to be the most di�cult problem. Also, depthand orientation discontinuities may not be explicitlydetected in such a way that makes surface interpolationvery di�cult. Hof and Ahuja (1989) have developed anapproach that integrates the process of feature match-ing, contour detection and surface interpolation.

The need for solving the correspondence problem instereo algorithms has been appreciated by Knatani, andhas led to several methods. The most important of themis the AGSA (Aloimonos and Shulman, 1992). In thisapproach, depth is estimated without using point-to-point correspondence. Instead, correspondence of thecontours as a whole is required. In this case, the onlytool used for solving the problem is the matrix inversion.The last method has no solution when centers of mass ofthe three di�erent areas are collinear. Also, there is aneed to divide the contours into three separate parts.Most importantly, the obtained solution may di�er ac-cording to the way of division. Practically, this methodcannot tolerate high noise and occlusion levels. Toovercome these problems, we have developed a newAGSA based on genetics (Goldberg, 1989), where thereis no need to divide the image contours to three separateparts. Instead, all the image or a part of it is su�cientenough to get the solution. Moreover, it needs nothing

complex more than copying and swapping strings. Thus,system can tolerate high noise and occlusion levels.

5. Performance evaluation

In this section the system performance is presented. Akey parameter that clearly a�ects the system behavior isthe sensitivity of depth computation. To overcomedepth variation, GNSA has been run several times anddepth values of minimum ®tness have been chosen andnormalized with respect to the maximum value. Con-sequently, the set of three (3-D) invariant features (F1,F2, F3) has been calculated and used e�ectively in theclassi®cation process. An illustrative example of a pairof stereo images is shown in Fig. 9. Depth values ofbreakpoints on outer contour of this input is shown inTable 2.

The underlying tests are carried out by consideringthe classi®cation of aircraft shapes belonging to ®vedi�erent classes. The reference aircraft shapes are shownin Fig. 10. Noise is introduced by shifting contour pixelsto one of its eight neighbors. Partial shapes are gener-ated by randomly deleting a part of the contour andreplacing it with a straight line. The ratio of the length ofthe deleted part to the total contour length speci®es theocclusion level q. Examples of test shapes are illustratedin Fig. 11.

Classi®cation results are shown in Table 3, where c1,c2; c3; c4, c5 refer to the ®ve reference classes, and Rc isthe rate of correct classi®cation for class c which is de-®ned as

Rc � mc=Mc; �28�where mc is the number of correctly classi®ed shapes inclass c. Mc is the total number of tested shapes of class c.

In this work, we have tested 100 airplanes with oc-clusion level up to 40% (20 planes generated from eachclass). First, we have used the set of (2-D) moments

Fig. 9. Stereo images of a plane in 2-D space.

80 M. Zaki et al. / The Journal of Systems and Software 51 (2000) 73±83

proposed in Duadani et al. (1977). Then, the systemperformance is investigated when we replace this set offeatures by another set of just two elements representingthe (3-D) moments mentioned above (F1, F2). Another

progress has been achieved by adding the third featureelement representing the chain code of the shape con-tour (F3). Classi®cation rates for di�erent occlusionlevels for each case are shown in Table 4. Obviously, theclassi®cation rate �Rc� is inversely proportional to oc-clusion level (q). However, the accuracy improvement isat the expense of time cost. In fact, the time needed toextract the (2-D) features is only 17% of that needed toextract the (3-D) features.

A comparison of these results with that obtained inGupta and Upadhye (1991) indicates the superior ac-curacy of the proposed approach. Actually, this supe-riority is due to the use of a reliable set of (3-D) featuresinstead of using (2-D) features. Following Aloimonosand Shulman (1992), we have used a macroscopicanalysis to ®nd out the depth map instead of using amicroscopic correspondence to compute the stereo dis-parity as in Marr and Poggio (1976). Actually, ourproposed procedure (presented in Section 3.2.2) is basedon using a genetic algorithm to ®nd out the best valuesof the parameters p, q, and c. Again the use of a geneticalgorithm improves the system accuracy at the expenseof the time cost. Practically, the time needed for GNSAis approximately four times the time required by theconventional AGSA (Aloimonos and Shulman, 1992) tosolve the shape equations.

Fig. 10. The ®ve reference shapes in 2-D space.

Fig. 11. Examples of partially occluded test shapes.

Table 2

Depth of breakpoints on a plane contour

P.N. Xleft Yleft Z

1 7 16 0.78

2 6 15 0.80

3 6 11 0.78

4 1 6 1

5 1 4 0.96

6 5 7 0.83

7 6 7 0.75

8 6 4 0.75

9 4 2 0.74

10 4 1 0.74

11 7 3 0.67

12 1 1 0.60

13 10 2 0.60

14 8 4 0.62

15 8 7 0.64

16 9 7 0.61

17 13 4 0.52

18 13 6 0.52

19 8 11 0.70

20 8 15 0.76

M. Zaki et al. / The Journal of Systems and Software 51 (2000) 73±83 81

6. Conclusion

An e�cient pattern recognition system based on bothneural networks and genetic algorithms has been de-veloped for the purpose of accurate recognition of oc-cluded shapes. In fact, the system depends on a localizedrepresentation of a shape contour which is invariant torotation, translation, and scale. The system can workwith either a single input image (2-D features) or with apair of stereo images (3-D features). For that reason, wehave developed a new reliable GNSA in order to esti-mate depth required for calculating the (3-D) features.Accordingly, each contour segment of a shape can berepresented by a set of either (2-D) or (3-D) moments aswell as an information about the angle between its twostraight lines. To compensate for temporal variationsresulting from noise and/or occlusion, a multi-neuralnetwork classi®cation system has been used. Each net-work is a counter-propagation neural network and adynamic alignment procedure at its output. Such com-bination bene®ts the robustness of the classi®er andcreates a localized representation of a shape contourwhich is insensitive to the choice of the starting point.

The developed system has been tested with di�erentparameters. It is shown that when the (3-D) momentsare used, the recognition accuracy is considerably im-proved over that of the (2-D) features. An additionalimprovement is achieved when the chain code is com-bined with the (3-D) moments. The di�erence in accu-racy between the recognizer of the (2-D) and that of the(3-D) features increases with the increase of occlusionlevel. Practically, the proposed pattern recognition sys-tem can be used with reasonable accuracy for distortedshapes regardless of the cause of distortion. For theproposed set of airplanes, the level of noise and/or oc-clusion can be increased up to 40%.

References

Aloimonos, J., Shulman, D., 1992. Integration of Visual Modules: An

extension of Marr Paradigm. Academic Press, USA.

Bergevin, R., Levine, M., 1993. Generic object recognition: building

and matching coarse description from line drawings. IEEE Trans.

Pattern Anal. Mach. Intell. 15 (1), 10±36.

Cohen, J., Wang, F., 1994. Part I : Modeling image curves using

invariant 3-D object curve models ± a path to 3-D recognition

and shape estimation from image contours. IEEE Trans. Pattern

Anal. Mach. Intell. 16 (1), 1±12.

Duadani, S., Breeding, K., McGhee, R., 1977. Aircraft identi®cation

by moment invariants. IEEE Trans. Comput. 26 (1), 39±46.

El-Ramsisi, A., Zaki, M., Omran, R., 1998. 3-D Classi®cation based

on genetics and motion stereo. ICAIA, Cairo 1, 87±98.

Goldberg, D., 1989. Genetic Algorithms in Search, Optimization, and

Machine Learning. Addison-Wesly, Reading, MA.

Gorman, J., Mitcheli, O., Kuhl, F., 1988. Partial shape recognition

using dynamic programming. IEEE Trans. PAMI 10 (2).

Gupta, L., Malakapali, K., 1990. Robust partial shape classi®cation

using invariant breakpoints and dynamic alignment. Pattern

Recognition 23 (10), 1103±1111.

Gupta, L., Upadhye, A., 1991. Non-linear alignment of neural net

outputs for partial shape classi®cation. Pattern Recognition 24

(10), 943±948.

Hof, W., Ahuja, N., 1989. Surfaces from stereo: Integrating feature

matching, disparity estimation, and contour detection. IEEE

Trans. PAMI 11 (2), 121±136.

Ku, C., Lee, K., 1995. Diagonal recurrent neural network for

dynamical system control. IEEE Trans. Neural Networks 6,

144±156.

Lippman, R., 1987. An introduction to computing with neural nets.

IEEE Trans. ASSP.

Lo, C., Don, H., 1989. 3-D Moment forms: Their construction and

application to object identi®cation and positioning. IEEE Trans.

PAMI 11 (10).

Marr, D., Poggio, D., 1976. Cooperative computation of stereo

disparity. Science 194, 283±287.

Sadjadi, F., Hall, E., 1980. Three dimensional moment invariants.

IEEE Trans. PAMI 2 (2).

Sperduti, A., Starita, A., 1997. Supervised neural networks for the

classi®cation of structures. IEEE Trans. Neural Networks 8 (3),

714±735.

Tung, D., Ligomenides, P., Dayho�, J., 1993. Learning spatiotemporal

topology using an adaptive time-delay neural network. World

Congress on Neural Networks, USA.

Wang, J., Cohen, F., 1994. Part I: 3-D Recognition and shape

estimation from image contours using B-Spline, shape invariant

matching, and neural networks. IEEE Trans. Pattern Anal.

Mach. Intell. 16, 13±23.

Zadeh, L., 1994. Fuzzy logic neural networks and soft computing.

Commun. ACM 37 (3), 77±84.

Mohammed Zaki received his B.Sc. and M.Sc. in Electrical Engineeringfrom Cairo University, Egypt, in 1968 and 1972, respectively. He re-ceived his Ph.D. in Computer Science from Warsaw Polytechnic in1977. Now he is the professor of Computer Science at Al-AzharUniversity, Egypt. His ®elds of interest include Arti®cial IntelligenceApplications, Information Theory, Pattern Recognition and VisionSystems.

Abdallah El-Ramsisi was born in Egypt in 1952. He received his B.Sc.degree in Electrical Engineering from the Military Technical College in1952. In 1985, he obtained his M.Sc. degree from Faculty of Engi-neering, Cairo University in bit synchronization in optical communi-cations systems. In 1990, he obtained his Ph.D. degree in SystemEngineering from Oakland University, School of Engineering andComputer Science in the ®eld of Arti®cial Intelligence with majorconcentration on Computer Vision and Neural Network applications.He has a group of publications in the areas of Computer Vision,

Table 4

Probability of error for various occlusion levels

e q (%)

0 10 20 30 40

(2-D) Moments 0.06 0.10 0.23 0.42 0.66

(3-D) Moments 0.03 0.07 0.10 0.16 0.30

Proposed features 0.01 0.03 0.07 0.13 0.24

Table 3

Recognition rates of the ®ve reference classes

Class Mirage B 57 B

Craft

F

15

TU

160

Rc

D1 15 2 0 3 0 75

D2 0 16 2 1 1 80

D3 0 0 16 0 4 80

D4 2 2 1 14 1 70

D5 1 0 3 1 15 75

82 M. Zaki et al. / The Journal of Systems and Software 51 (2000) 73±83

Pattern Analysis, Image Analysis, Cryptography and system securityand GIS systems.

Rostom Omran was born in Egypt in 1951. He got his B.Sc. degree inElectrical Engineering from the Military Technical College in 1974. Heobtained his M.Sc. degree from Faculty of Engineering, Cairo Uni-

versity in Pattern Recognition in 1989. In 1999, he obtained his Ph.D.degree in Arti®cial Intelligence from Faculty of Engineering, Al-AzharUniversity. He has several publications in the ®eld of Arti®cial Intel-ligence with major concentration on Pattern Recognitio, Image Pro-cessing, Genetics and Neural Network applications.

M. Zaki et al. / The Journal of Systems and Software 51 (2000) 73±83 83