pattern classification for handwritten marathi characters -

International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2014ISSN 2278-7763

Copyright © 2013 SciResPub. IJOART

network as in the form of backpropagation through

time model (BPTT) offers a suitable framework for

reusing the output values of the neural network in

training and it exhibited some promising performance

but only for the dynamic patterns and shown

inefficiency static patterns [21]. Later on it has been

investigated that the feed forward multilayer neural

network with enhance and extended version of

backpropagation learning algorithm [22] is more

suitable for handling the complex pattern classification

or recognition tasks in spite of its inherited problem of

local minimum, slow rate of convergence and no

guarantee of convergence [23 – 27].

It has been found that to overcome the problems of

descent gradient searching in a large search space as in

the case of complex pattern recognition task with

multilayer feedforward neural network due the

evolutionary search algorithm i.e. genetic algorithm

(GA) is a better alterative [28]. The reason of this is

quite obvious because this search technique is free from

derivatives and it evolves the population of possible

partial solutions and applies the natural selection

process for filtering them until the global optimal

solution is not found [29]. Various prominent results

have been reported in the literature for the generalize

classification for the handwritten English character

recognition problem with the integration of genetic

algorithm and backpropagation learning rule for

multilayer feed forward neural network

architecture[30,11]. In this approach the fitness

performance for the weights has been considered with

back-propagated error of the current input pattern

vector. Thus, the performance of network still depends

upon the back-propagated instantaneous random and

unknown error.

In this paper the performance of feedforward

neural network with descent gradient of distributed

error and genetic algorithm is evaluated for the

recognition of handwritten characters of ’Marathi’

script. The performance index for the feedforward

multilayer neural networks is considered here with

distributed instantaneous unknown error i.e. different

error for different layers. The genetic algorithm is

applied here to make the search process more efficient

to determine the optimal weight vector from the

population of weights. The genetic algorithm here is

applied with distributed error and the fitness function

for the genetic algorithm is also considered as the mean

of square distributed error that is different for each

layer. Hence the convergence is obtained only when the

minimum of different errors is determined. So that, the

instantaneous square error is not same for each layer

instead of this it is different for each layer and it is

considered as distributed error for the multilayer feed

forward neural network, in which the number of units

in hidden layer and output layers are equal. Thus, the

same desired output pattern for a presented input

pattern is distributed to every unit of hidden layers &

outputs layer those contains the different actual outputs

and each layer has different square error. Thus, the

instantaneous error is now distributed instead of back

propagated. The proposed hybrid evolutionary

technique i.e. descent gradient of distributed error with

genetic algorithm is used to train the multilayer neural

network architecture for the generalized classification

of hand written ’Marathi’ script.

The rest of the paper is organized as follows:

Section 2 presents the generalized descent gradient

method for the instantaneous distributed error and the

implementation of genetic algorithm in generalize way

with distributed error. Section 3 explores the

architecture and simulation design for the proposed

method. Section 4 presents the results and discussion.

The section 5 of the paper presents the conclusion

followed by references.

IJoART

90



2. Generalized Descent gradient learning for

distributed square error

A multilayer feed forward neural network with at

least two intermediate layers commonly known as

hidden layer, in addition to the input and output layer

can perform any complex generalized pattern

classification task. The generalized delta learning rule

[23] is a very common and widely used technique to

train the multilayer feedforward neural networks for the

pattern classification & pattern mapping. In this

learning the optimum weight vector may be obtained

for the given training set, if the weights are adjusted in

such a way that the gradient descent is made along the

total error surface in the weight space. The error for the

minimization is actually not the least mean square error

for the entire training set instead of this it is an

instantaneous square error for each presented pattern on

each time. Thus, for every pattern on each time there

will be an unknown local error and there is the

incrementally updating of the weight for each local

error. Hence each time the weights are updated to

minimize this known local error by propagating this

error back to all hidden layers from the output layer.

Thus, the instantaneous error for each presented input

pattern as the square difference between the desire

pattern vector and the actual output for the units of

output layer is backpropagated to units of hidden

layers. In this current work we are considering the

distributed error instead of the backpropagated error.

The instantaneous square error is not same for the each

layer because each layer has its own actual output

pattern vector. So that for each layer the instantaneous

square error is computed with the square difference

between desire output pattern vector for the given input

sample from the training set and the actual output

pattern vector of the respective layer. This distributed

instantaneous square error imposes a constraint on the

architecture of multilayer feed forward neural network.

This constraint restrict the architecture in a way that the

number of units in output layer and the hidden layers

should same though the desire output pattern for the

presented input pattern could accommodate

conveniently by each layer. Thus, for every hidden

layer and output layer we have the different square

error. Therefore the optimum weight vector can obtain

for each layer if the weights are adjusted in such a way

that the gradient descent is made along the

instantaneous square error of that layer. It exhibits that

we have more than one objective function or minimum

error, one each for each layer except the input layer for

the presented pattern. It explores this problem as the

multi-objective optimization problem. Thus, here the

objective is to obtain the minimum of each

instantaneous square error simultaneously to determine

optimum weight vector for the presented input pattern.

Therefore, the mean of instantaneous square error of

the layer is used to update the weights of the layer and

the gradient descent of each error for each layer will

obtain at the same time. Therefore, there will be more

than one gradient descent at one time of individual

errors for the presented input pattern depending on the

number of hidden layers. Hence, the updating of weight

vector for units of hidden layers and for the units of

output layer will be proportional to their corresponding

gradient descents. So that, there is a different gradient

for each layers. Thus, the optimal weight changes will

proportional to the gradient descent of the distributed

instantaneous mean square errors for the presented

input pattern. The generalized method for obtaining the

weight update for hidden layers and output layer is

formulated as:

Let ),( ll da for Ll ,,2,1 be the current input

pattern vector set of the training set of L pattern

samples is presented to the multilayer feed forward

neural network for formulating the generalized descent

IJoART

91



gradient of instantaneous square distributed error. As

we have discussed already about the constraint of this

multilayer feed forward neural network for keeping

same the number of units in hidden and output layer as

shown in figure 1

Fig. 1: Multilayer Feed Forward Neural

Network Architecture

The current random sample pattern ),( ll da of the

training set defines the instantaneous squared error

vector Ope at the output layer and H

pe at the hidden layer

as:

))(,),(()( 111Oklkkl

Oll

Oll

Ol ySdySdySde

(1)

))(,),(()( 111Hklkkl

Hll

Hll

Hl ySdySdySde

(2)

Therefore the instantaneous distributed mean square

error for the output and hidden layer is defined as

respectively:

K

kOklk

Okl

Ol ySdE

12

21 )]([

(3)

And,J

jHjlj

Hjl

Hl ySdE

12

21 )]([

(4)

Hence, the update in the weight for the thk unit of

output layer at iteration t for the current input pattern

vector is represented as;

kj

Ol

kjlkj w

Etw )(

(5)

And also the update in the weight for the thj unit of

hidden layer at iteration t for the same current pattern is

represented as:

ji

Hl

jilji w

Etw )(

(6)

Here kj and ji are the learning rates for the output

and hidden layer respectively.

Now, apply the chain rule on Equation 5, we have;

kj

Okl

Okl

Ol

kjkj

Ol

kjlkj w

yyE

wEtw )(

Here, the activation value isJ

j kjHjlj

Okl wySy

1)( and the output signal is

Okly

Okl

Oklk

eyfyS

11)()(

Or, )()( HjljO

kl

Ol

kjlkj yS

yEtw

IJoART

92



)()()(

HjljO

kl

Oklk

Oklk

Ol

kj ySy

ySyS

E

)())(1)(()(

Hjlj

Oklk

OklkO

klk

Ol

kj ySySySyS

E

Now from the equation 3 we have:

K

kOklk

lkO

klk

Ol ySdyS

E1

)]([)(

Hence we have:K

kHjlj

Oklk

Oklk

Oklk

lkkj

lkj ySySySySdtw

1)())(1)(()]([)(

(7)

Thus, the weight at the iteration (t+1) for the units of

output layer with momentum term are presented as:K

klkj

Hjlj

Oklk

Oklk

Okkj

lkj twySySyStw

1)1()())(1)(()1(

(8)

Here the momentum rate constant is considered

with 10 for the output layer.

Similarly, apply the chain rule on Equation 6, we have;

ji

Hjl

Hjl

Hl

jiji

Hl

jilji w

yyE

wEtw )(

Or, iHjl

Hl

jilji a

yEtw )(

iHjl

Hjlj

Hjlj

Hl

ji ay

ySyS

E )()(

iHjlj

HjljH

jlj

Hl

ji aySySyS

E))(1)((

)(

Now, from the equation 4 we have:

J

jHjlj

lkH

jlj

Hl ySdyS

E1

)]([)(

Hence we have:

J

j iHjlj

Hjlj

Hjlj

lkji

lji aySySySdtw

1))(1)(()]([)(

(9)

Thus, the weight at the iteration (t+1) for the units of

output layer with momentum term are presented as:

J

jljii

Hjlj

Hjlj

Hljji

lji twaySyStw

1)1())(1)(()1(

(10)

Here the momentum rate constant is considered

with 10 for the hidden layer

Here an interesting observation is considered about the

number of terms appearing in the expression for weight

updating for the hidden layer. It can be seen from

equation 9 that the less number of terms are considered

with respect to the weight updating for hidden layer

from backpropagation learning rule for the

backpropagated instantaneous mean square error. Thus,

the less time complexity is involved for the

computation of weight update according to descent

gradient of distributed instantaneous mean square error.

Hence it is obvious that should consider the fast

convergence with respect to conventional generalized

delta learning rule of backpropagated error.

2.1 Genetic algorithm with descent gradient of

distributed Error

The majority of implementation of the GA is a

derivative of Holland’s innovative specification. In our

approach the genetic algorithm is incorporated with

descent gradient for distributed instantaneous mean

square error learning in the multilayer feed forward

neural network architecture for the generalized pattern

classification. The input pattern vector with its

corresponding output pattern vector form the training

IJoART

93



set is presented to the neural network. The neural

network with its current setting of weights obtained the

actual output for each unit of hidden layers and output

layer. The distributed instantaneous mean square error

is obtained and the proposed descent gradient learning

rule for distributed error is applied up to some fixed

arbitrary n iterations. Thus, the weights between the

layers and bias values of units are updated up to n

iterations for the given input pattern and improved from

their initial stage. After this the iteration for weight

update stops and the genetic algorithm is employed to

evolve the population of modified weights and bias

values. The genetic algorithm is applying for obtaining

the optimal weight vector from the large size of weight

space for the given training set with following three

elements.

(i) The genetic code for the weight vector

representation in the form of chromosome;

(ii) The technique for evolving the population of

weight vectors;

(iii) The fitness function for evaluating the

performance of evolved weight vector;

There are lot of works is reported on the evaluation of

neural network with genetic algorithm [24]. The

majority of the work indicates the integration of genetic

algorithm with neural network is found at following

three levels [25]:

(i) Connection weights (ii) Architectures

(iii) Learning rules.

The evaluation of a weight vectors for the neural

network is an area of curiosity and it is considered in

the approach of this current work. In this approach the

genetic algorithm is using different fitness evaluation

function for each layer. The distributed instantaneous

mean square error for each layer is considered as the

fitness evaluation function for that layer. Generally the

GA starts from the random initial solution and then

converges for the optimal solution. In our approach the

GA applies after the updating of weights up to n

iterations. So that the initial population of solutions for

GA is not random instead of this the initial population

of weights as solution is suboptimal because the

weights have updated in the direction of convergence.

Thus, the GA explores from suboptimal solution to

multi objective optimal solution for the given problem.

The multi objective optimal solution reflects that every

layer expect input layer has its own different error

surface or objective function.

Chromosome Representation

A chromosome is a collection of genes

representing either a weight value or a bias value

represented in some real number. The initial population

of weight and bias for the representation of basic or

initial chromosome in our method is not random.

Instead of this the initial chromosome consists with

suboptimal value of weight and bias. Therefore the

chromosome is represented as a matrix of real numbers

for the set of weight values and bias values. As we have

discussed already that in our proposed multilayer neural

network architecture the error is considered as

distributed instantaneous mean square error i.e. the

different error for different layers. Hence the

chromosome will partition in the sub-chromosomes

corresponding to each layer hidden layer and output

layer. Hence, as per our general architecture of neural

network as shown in Figure 1 there will be two sub-

chromosomes. In the first sub-chromosome, there will

be )jji( genes and for the second chromosome

there will be )kkj( genes. Thus, the numbers of

sub-chromosomes depend upon the number of hidden

layer but the number of genes in every sub-

chromosome will same, though values of genes may

different in each sub-chromosome.

IJoART

94



The Mutation Operator

Mutation operator randomly selects a gene from

chromosome and modified it with some random value

to generate the next population of chromosome. The

probability of mutation is kept low to minimize the

randomness for genetic algorithm. In our approach the

mutation operator applied to each sub-chromosome,

randomly selects a gene from each sub-chromosome

and adds a small random value between +1 and -1 to

generate the next population of sub-chromosome. Let

we have the chromosome NC for the network which is

partitioned in the two sub-chromosomes as NHC and

NOC for hidden layer and output layer. N

HC is

containing Hmj)ji( genes while NOC is

containing Omk)kj( genes. The size of next

generated population would be 1NH and 1NO

respectively. If the mutation operator has applied n

times over the old sub-chromosome for the output layer

and the hidden layer respectively then we have the

following new population of the sub chromosomes

[26]:

)]C(C[CC Hold_N

,Hold_N

m,Hn

1iold_N

Hnew_N

H HHH

(11)

And

)]C(C[CC Oold_N

,Oold_N

m,On

1iold_N

Onew_N

O OOO

(12)

Here H and O are the small random generated values

between -1 to + 1 for sub chromosomes of hidden layer

and output layer respectively, H & O are the

randomly selected genes from oldHC and old

OC sub-

chromosomes respectively and new_NHC & new_N

OC are

the next population of sub-chromosomes for the hidden

and output layer respectively. The inner operator

prepares a new sub-chromosome at each iterations of

mutation and outer operator is building the new

population of sub-chromosome called new_NHC &

new_NOC .

Elitism

Elitism is used with the creation of each new

population to continue the old good population in the

next generation. This process has the significance in the

way that the good solution of previous population

should not lose by the application of genetic operators.

This involved copying the best encoded network

unchanged into the new population as given in

Equations 11 and 12, to include old_NHC & old_N

OC for

creating new_NHC &

new_NOC .

Selection

The selection process of genetic algorithm selects good

or fit population from the newly generated population.

Here the selection process simultaneously considers

newly generated sub chromosomes of hidden layer and

output layer i.e. new_NHC and new_N

OC respectively for

selecting the good population for further cycle. Let a

sub chromosome SelHC from new_N

HC is selected for

which the distributed instantaneous mean square error

for the hidden layer i.e. HlE for the pattern l reached to

its accepted minimum level. Likewise a sub

chromosome SelNC from new_N

OC is selected for which

the distributed instantaneous mean square error for the

output layer i.e. OlE for the same pattern l reached to

its accepted minimum level.

Crossover

Crossover is a very important and useful operator of

genetic algorithm. Here the crossover operator

IJoART

95



considers the selected sub-chromosomes from SelHC &

SelOC and creates next generation of population

separately for the hidden layers and output layer. We

apply the uniform crossover operator n times on the

selected sub chromosomes on different crossover points

to obtain the next generation of population. Let the

selected sub-chromosomes SelHC and Sel

OC are

considered for uniform crossover as shown in Figs.2-4.

Fig. 4: After applying crossover operator

Therefore, on applying the crossover operator n times

on selected sub-chromosome ( SelHC & Sel

OC ), the n+1

population of sub-chromosomes each can be generated

as [27]:

)]CC()CCC[(CC SelH,VV

SelH,

SelH,

SelH,

SelH

n1i

SelH

nextH HH

(13)

And

)]CC()CCC[(CC SelO,VV

SelO,

SelO,

SelO,

SelO

n1i

SelO

nextO OO

(14)

Where and are the randomly selected genes

positions from the sub-chromosomes SelHC & Sel

OC and

nextHC & next

OC is the next generation of population of

size n+1. Thus, after the cross over operation we have

2(n+1) total populations of chromosome for the

network i.e. n+1each for hidden layer and for output

layer.

Fitness Evaluation Function

Fitness evaluation function of genetic algorithm is used

to evaluate the performance of generated new

populations. It filters the populations those find suitable

as per the criteria of fitness function. Here, we use the

separate fitness evaluation function for each layer.

Therefore as per our neural network architecture, two

fitness evaluation functions have used. The one is for

output layer and second one is for the hidden layer. The

first fitness evaluation function estimates the

performance for the sub chromosome of hidden layer

i.e. nextHC and second one estimates the performance for

the sub chromosome of output layer i.e. nextOC . The

fitness function used here is proportional to the sum of

distributed instantaneous mean squared error on

respective layers. The fitness function Hf for the hidden

layer considers the instantaneous mean square error as

specified in equation 4 to evaluate the performance of

sub-chromosomes for hidden layer i.e. nextHC . The

fitness function Of for the output layer considers the

instantaneous mean square error as specified in

Equation 3 to evaluate the performance of sub-

chromosomes for output layer i.e. nextOC . Thus, the

genetic algorithm attempts to find weight vectors and

bias values for different layers those minimize the

corresponding instantaneous mean of squared error.

This procedure for evaluating the performance for

IJoART

96



weight vector of hidden and output layer can represent

as:

0.1errormin H && 0.1errormin O

Do for all n+1chromosomes

{

( nextiHH

HlCH CCthenEerrorif next

iH,

min, )(min,

)&&

( nextiOO

HlCO CCthenEerrorif next

iO,

min, )(min,

)

))min(((min HH errorerrorelse &&

( )errorminerror(min OO ))

}

Here minHC & min

OC represents the sub-chromosomes

those have the minimum error for hidden and output

layers respectively. Here we also have the possibility

for getting more than optimal weight vectors for the

given training set because there are more than one sub-

chromosomes in hidden and in output layers those

evaluated as fit by the fitness evaluation functions of

respective layers.

3 Simulation Design and Implementation

In this simulation design and implementation, two

proposed multilayer feed forward neural networks are

considered. Both neural networks are trained with

proposed descent gradient of distributed instantaneous

mean square algorithm. Since every input pattern

consist with 16 distinct features so that each neural

network architecture contains 16 processing units in the

input layer. First neural network architecture consists

with input layer, two hidden layers with five units in

each and one output layer with 5 units. Second neural

network architecture consists with input layer, one

hidden layer of 5 units and output layer also with 5

units.

Feature Extraction

There are five different samples of handwritten

characters of ’Marathi’ script from five different people

are collected in this simulation as input stimuli for the

training pattern set. These scanned images of distinct

handwritten characters of ‘Marathi’ scripts are shown

in figure 5 as:

IJoART

97



Fig. 5: Scanned images of handwritten distinct‘Marathi’ scripts

The scanned images of hand written characters of

‘Marathi’ scripts as shown in figure 5 are partition into

sixteen equal parts, and the density values of the pixels

for each part were calculated and obtained the center of

density gravity. Therefore for each scanned image of

handwritten characters of ‘Marathi’ scripts we obtained

the sixteen values as the input pattern vector of training

set. Thus, we have the training set, which consist with

sampled patterns of handwritten characters of

‘Marathi’ scripts and each sample pattern is considered

as pattern vector of dimension 116 with real number

values. The output pattern vector corresponds to input

pattern vector is of dimension 15 of the binary

values. The test input patterns set is also considered

with same method for the sample patterns those were

not used in training set. The sample test patterns were

used to verify the performance of trained neural

networks.

Simulation design for 16-5-5-5 Neural Network

Architecture

In the simulation of proposed feed forward multilayer

neural network architecture with two hidden layers of 5

units each and one output layer of 5 units (16-5-5-5)

involves three different instantaneous mean of square

errors at the same time i.e. oE for output layer, 1hE

for first hidden layer & 2hE for second hidden layer,

those are presented as for pattern l:

K

kOklk

lk

Ol ySdE

12))((

21

(15)

G

gHglg

lk

Hl ySdE

12))((

21

11

(16)

IJoART

98



AndJ

jHjlj

lk

Hl ySdE

12))((

21

22

(17)

The proposed gradient learning rule for the

instantaneous mean of square error updates the weight

vector up to t iterations. After this the weight updating

is stopped and the genetic algorithm applies. The

updated weight & bias values are considered as the

initial population of chromosome for the genetic

algorithm. As per our proposed neural network

architecture in this simulation design we have three sub

chromosomes one each for both hidden layers and one

for the output layer. The first sub-chromosome as

shown in figure 6 is of 85 genes in which 80 are the

weights values on the connection link and 5 are the bias

for the units of hidden layer. The second and third sub-

chromosomes are of 30 genes each in which 25 are

weight values on the connection link and 5 are the bias

for the units of second hidden layer and output layer.

Fig. 6 (c): Sub-chromosome 3 for output layer of 30

genes

The mutation operator applies simultaneously to all the

three sub-chromosomes by adding the small random

values between -1 and 1 to the selected genes to

generate the new population of these sub chromosomes.

After this the selection is applied to all the three sub-

chromosomes for selecting the better population of

chromosomes for next generation. This selection

procedure considers the of distributed instantaneous

mean of square error as specified in equations 15, 16

and 17 as the fitness evaluation function to select the

sub chromosomes for the next generation. Now we

apply the cross over operator simultaneously on all the

selected sub-chromosomes to generate the large

population in next generation. Thus, the cross over

operator generates the populations of sub-chromosomes

for first hidden layer, second hidden layer and output

layer of size 85 genes, 30 genes and 30 genes

respectively. So that, the selected population of weights

and biases form each sub-chromosome determines the

optimal solutions for the given Training pattern set.

Thus, there are minimum three optimal solutions are

required for the convergence of neural network.

Simulation design for 16-5-5 Neural Network

Architecture

In the simulation of proposed feed forward multilayer

neural network architecture with one hidden layer of 5

units and one output layer of 5 units (16-5-5) involves

two different instantaneous mean of square errors at the

same time i.e. oE for output layer & 1hE for first

hidden layer, those are presented as for pattern l:

K

kOklk

lk

Ol ySdE

12))((

21

(18)

AndJ

jHjlj

lk

Hl ySdE

12))((

21

(19)

In this experiment we divide the chromosome into the

two sub chromosomes one each for hidden layer and

output layer. The first sub-chromosome as shown in

figure 7 is of 85 genes in which 80 are the weights

values on the connection link and 5 are the bias for the

units of hidden layer. The second sub-chromosome

IJoART

99



consists with 30 genes in which 25 are weight values

on the connection link and 5 are the bias for the units of

output layer.

Fig. 7 (b): Sub-chromosome 2 for output layer of 30

genes

The mutation operator applies simultaneously to both

the sub-chromosomes by adding the small random

values between -1 and 1 to the selected genes to

generate the new population of these sub chromosomes.

After this the selection is applied to both the sub-

chromosomes for selecting the better population of

chromosomes for next generation. This selection

procedure considers the of distributed instantaneous

mean of square error as specified in equations 18 and

19 as the fitness evaluation functions to select the sub

chromosomes for the next generation. Now we apply

the cross over operator simultaneously on all the

selected sub-chromosomes to generate the large

population in next generation. Thus, the cross over

operator generates the populations of sub-chromosomes

for hidden layer and output layer of size 85 genes and

30 genes respectively. So that, the selected population

of weights and biases form each sub-chromosome

determines the optimal solutions for the given Training

pattern set. Thus, there are minimum two optimal

solutions are required for the convergence of neural

network.

3.3 Parameters used

The following parameters are used to accomplish the

simulation of these two experiments for the given

training set of handwritten characters of ‘Marathi’

scripts.

Genetic Algorithm with backpropagated error: The

parameters of the genetic algorithm with

backpropagated error for the simulation of both the

experiments are as follows:

Parameter Value

Learning rate for

output layer )( O0.01

Learning rate for first

hidden layer )(1H

0.01

Learning rate for

second hidden layer

)(2H

0.1

Momentum term

)(0.9

Adaption rate )K( 3.0

Mutation

population size3

Crossover

population size1000

Initial populationRandomly generated values

between 0 and 1

Fitness

evaluation function

(one fitness function)

Back propagated

instantaneous squared error

K

kOkkkl ySdE

1))((

21

Minimum error

(MAXE)0.00001

IJoART

100



Table 1: Parameters used for genetic algorithm with

back propagated error

Genetic algorithm with distributed error: The

parameters used in the simulation of both the

experiments for genetic algorithm with descent gradient

learning for distributed error are as follows:

Parameter Value

Learning rate

for output

layer )( O

0.01

Learning rate

for hidden

layers

)&(21 HH

0.1

Momentum

term for

output layer

)(

0.9

Momentum

term for

output layer

)(

0.7

Adaption rate

)K(3.0

Minimum

error for the

output layer

)MAXE( O

0.0001

Minimum

error for the

hidden layers

)MAXE( H

0.001

Mutation

probabilitySmaller than 0.01

Mutation

population

size for sub-

chromosome

of output layer

3

Mutation

population

size for sub-

chromosome

of hidden

layers

3 each

Crossover

population

size for output

layer

1000

Crossover

population

size for first

hidden layer(

for 16-5-5-5

architecture)

1000

Crossover

population

size for

second hidden

layer( for 16-

5-5-5

architecture)

500

Crossover

population

size for

hidden layer(

for 16-5-5

architecture)

1000

Number of

iteration prior5000

IJoART

101



to applying

GA

Initial

population

Values of weights & bias in each sub

chromosomes up to 5000 iterations of

descent gradient for distributed error.

Fitness

evaluation

functions (two

fitness

function for

16-5-5

architecture

and three

fitness

function for

16-5-5-5

architecture)

Distributed instantaneous sum of

squared errors

K

kOkk

lk

Ol ySdE

12))((

21

G

gHgg

lk

Hl ySdE

12))((

21

11

J

jHjj

lk

Hl ySdE

12))((

21

22

Table 2: Parameters used for decent gradient

learning with distributed error

4 Results and Discussion

The results from Simulation design and implementation

for both the neural network architectures i.e. for 16-5-

5-5 and 16-5-5 are considered for 65 training sample

examples of Handwritten ‘Marathi’ scripts with two

hybrid techniques. The techniques commonly used are

genetic algorithm with descent gradient for

backpropagated instantaneous mean square error and

genetic algorithm with descent gradient for distributed

instantaneous mean square error. The performance of

both the neural network architectures have been

evaluated with these two hybrid techniques of learning

for the given training set and the performance analysis

is also performed. Hence in the performance analysis it

has been found that the neural network architecture of

16-5-5-5 performed more optimally in terms of

convergence, number of epoch and number of optimal

solutions for the classification of patterns in training

set. The performance of neural network architecture for

16-5-5-5 is also found efficient and more generalized

for the test pattern set also. The results of performance

evaluation are shown with tables 5 and 6. The entries of

tables are presenting mean values of iterations and

number of convergence weight matrices of five trials

with each hybrid technique for given training set.

IJoART

102



IJoART

103



Table 5: Performance evaluation for GA withdescent gradient of distributed Error and back

Propagated Error for 16-5-5 architecture

IJoART

104



Table 6: Performance evaluation for GA withdescent gradient of distributed Error and back

Propagated Error for 16-5-5-5 architectureIn the results tables are containing the information

about counts. The counts are here representing the

number of optimum solutions i.e. the number of weight

matrices on which the network is convergence for the

given training set. The integer value for the epoch in

tables is representing the number of iterations

performed by each learning method to classify the

given input pattern. It has been observed from the

results that no case of non convergence is found. Thus

the network is able to successfully converge for more

than one optimum weight vectors or solution for the

given input pattern. Table 5 of simulated result is

showing the performance evaluation between GA with

descent gradient of instantaneous mean square

distributed error and GA with descent gradient of

backpropagated error for the network architecture 16-5-

5. This evaluation is considered about the parameter of

epochs i.e. number of iteration for the convergence and

number of counts i.e. number of optimal converged

weight vectors. Results of table 5 are considered for

mean of five trials for the same input pattern. Table 6

of simulated result is showing the performance

IJoART

105



evaluation between GA with descent gradient of

instantaneous mean square distributed error and GA

with descent gradient of backpropagated error for the

network architecture 16-5-5-5. This evaluation is also

considered about the parameter of epochs i.e. number

of iteration for the convergence and number of counts

i.e. number of optimal converged weight vectors.

Results of table 6 are also considered for mean of five

trials for the same input pattern. An important analysis

about the optimal solutions is also observed form this

simulation. Here an optimal solution is obtained only

when there is more than one objective functions are

satisfied at one time. As in the case of our neural

network of 16-5-5 architecture there are two objective

functions one each for hidden layer and output layer.

The network is converged only when both the objective

functions find their defined minimum error threshold.

Similarly in the neural network of 16-5-5-5 architecture

we have the three different objective functions and the

network is converged only when all the three objective

functions find their defined minimum error threshold.

Thus, the performance of neural networks for descent

gradient of instantaneous mean square distributed error

considers as the multi-objective optimization. On the

other hand the GA with descent gradient of

instantaneous mean square back-propagated error

considers only one objective i.e. one common error

function for objective function for all the layers. So

that, number of optimal solutions or counts are

reflecting only the converged weight matrices or

optimal weight matrices to obtain only one minimum of

error. Thus, it exhibits the case of single objective

optimization. It can be seen from the result of Table 5

& 6 that the performance of neural network architecture

with descent gradient of instantaneous mean square

distributed error for multi objective optimization is

approximately same as GA with descent gradient with

back-propagated error for single objective optimization

on the parameter of number of iterations and number of

counts.

5. Conclusion

In this work we have considered the simulation of two

neural network architectures for their performance

evaluation with descent gradient of instantaneous mean

square distributed error with GA and descent gradient

of instantaneous mean square backpropagated error

with GA for the classification of handwritten ‘Marathi’

curve scripts. We considered the instantaneous mean

square distributed error as the mean of square

difference between target output pattern and actual

output pattern from each unit of each layer differently

correspond to present input pattern. Thus, the common

target pattern is used by each layer with their respective

different computed actual output pattern. Therefore in

this approach the convergence for the given training

samples is considered only when three different error

functions are minimized simultaneously. Hence, the

optimum solution is constraints with three objectives

functions and this reflects the case of multi objective

optimization instead of single objective optimization as

in the case of descent gradient of instantaneous mean

square backpropagated error. Therefore on the basis of

simulation results & analysis the following

observations can be drawn:

1. It can observe that the performance of GA with

descent gradient of distributed error for multi

objective optimization is better in most of the cases

than GA with descent gradient of backpropagated

error for single optimization in terms of number of

optimize solutions or counts. This is obvious that

number of iteration for GA with descent gradient

of distributed error are more because the in this

method there are three objective functions and all

of them should minimize for the optimal solution.

2. It can also see from the results that the behavior of

GA with descent gradient of distributed error is

IJoART

106



more consistent & exhibiting less randomness in

compare to GA with descent gradient of

backpropagated error. There is also another

interesting observation about the performance of

neural networks for GA with descent gradient of

distributed error for the number of counts and

iterations for the new pattern information and for

the same pattern information with different

examples. Every time for the same pattern

information with different examples the number of

counts are more & number of iteration are less and

for new pattern information these counts are low &

number of iterations are high. So that when we

move from one unknown local error minimum to

another unknown local error minimum there is less

number of optimum solutions and it requires more

number of iterations to converge.

3. Generally the GA starts form the random solutions

and converge towards the optimal solution. Hence

in multi objective optimization the randomness of

GA more increases and possibility to obtain

optimal solution decreases. In the proposed

technique, the GA does not start from random

population of solutions but instead of this it starts

from the sub-optimal solutions, because the GA is

applied after the some iteration of descent gradient

of instantaneous mean square distributed error.

These iterations explore the direction for

convergence and from here the GA starts. Thus,

GA starts from sub-optimal solutions and moves

towards the optimal solutions.

4. The multi objective optimization is a dominate

thrust area in soft computing research. There are

various real world problems where multi objective

optimization is required. The proposed method

may explore the possibility to achieve the optimal

solutions for various problems of multi objective

optimization. The performance of GA with descent

gradient of distributed error can be more improved

with different methods of image processing for

feature extraction from the handwritten curve

scripts. These aspects can consider for future work

to evaluate the performance for propose method on

various problem domain.

References

[1] Kumar, S., “Neural Networks: A Class room

approach”, New Delhi: Tata McGraw-Hill

(2004)

[2] Sun, Y., “Hopfield neural network based

algorithms for image restoration and

reconstruction-Part I: Algorithms and

Simulations”, IEEE Transaction on Signal

Process vol. 48(7), pp. 2105-2118 (2000)

[3] Szu, H., Yang, X., Telfer, B. and Sheng, Y.,

“Neural network and wavwlet transform for

scale invariant data classification”, Phys. Rev.

E 48, pp. 1497-1501 (1993)

[4] Nagy, G., “Classification Algorithms in

Pattern Recognition,” IEEE Transactions on

Audio and Electroacoustics, vol. 16(2), pp.

203-212 (1968)

[5] Hoppensteadt, F.C. and Ihikevich, E.M.,

“Synchronization of Laser Oscillators,

Associative Memory, and Optical

Neurocomputing,” Phys. Rev., vol. 62(E), pp.

4010-4013 (2000)

[6] Keith, L.P., “Classification Of Cmi energy

levels using counterpropagation neural

networks,” Phys. Rev., vol. 41(A), pp. 2457-

2461 (1990)

[7] Carlson, J.M., Langer, J.S. and Shaw, B.E.,

“Dynamics of earthquake faults,” Reviews of

modern physics, vol. 66(2), pp. 657-670

(1994)

[8] Palaniappan, R., “Method of identifying

individuals using VEP signals and neural

IJoART

107



networks,” IEE Proc. Science Measurement

and Technology, vol. 151(1), pp. 16-20 (2004)

[9] Zhao, H., “Designing asymmetric neural

networks with associative memory,” Phys.

Rev. vol. 70(6), pp. 137-141 (2004)

[10] Schutzhold, R., “Pattern recognition on a

quantum computer,” Phys. Rev. vol. 67(A),

pp. 311-316 (2003)

[11] Impedovo, S., “Fundamentals in Handwriting

Recognition.” NATO-Advanced Study

Institute, vol. 124, Springer-Verlag (1994)

[12] Mori, S., Suen, C.Y. and Yamamoto, K.,

“Historical review of OCR research and

development,” Proceeding of the IEEE, vol.80

(7), pp. 1029-1058 (1992)

[13] Fukushima, K. and Wake, N., “Handwritten

alphanumeric character recognition by the

neocognitron,” IEEE transaction on Neural

Networks, vol. 2(3), pp. 355-365 (1991)

[14] Blackwell, K.T., Vogl, T.P., Hyman S.D.,

Barbour, G.S. and Alkon, D.L., “A New

Approach to Handwritten Character

Recognition,” Pattern Recognition vol. 25, pp.

655-666 (1992)

[15] Ie Cun, Y., Boser, B., Denkar, J.S.,

Henderson, D., Howard, R.E., Hubbard, W.,

and Jackel, L.D., “Handwritten Digit

Recognition with a Back-Propagation

Network,” Advances in Neural Information

Processing Systems, vol. 2, pp. 396-404

(1990)

[16] Kharma, N.N., and Ward, R.K., “A novel

invariant mapping applied to hand-written

Arabic character recognition,” Pattern

Recognition vol. 34(11), pp. 2115-2120

(2001)

[17] Badi, K. and Shimura, M., “Machine

recognition of Arabic cursive script,” Trans.

Inst. Electron. Commun. Eng., vol.65(E), pp.

107-114 (1982)

[18] Suen, C.Y., Nadal, C., Lagault, R., Mai, T.A.,

and Lam, L., “Computer recognition of

unconstrained handwritten numerals,” Proc.

IEEE, vol. 80(7), pp. 1162-1180 (1992)

[19] Knerr, S., Personnaz, L., and Dreyfus, G.,

“Handwritten digit recognition by neural

networks with single-layer training,” IEEE

Trans. on Neural Networks, vol. 3, pp. 962-

968 (1992)

[20] Lee, S.W., and Song, H.H., “A New Recurrent

Neural Network Architecture for Visual

Pattern Recognition,” IEEE Trans. on Neural

Networks, vol. 8(2), pp. 331-340 (1997)

[21] Urbanczik, R., “A recurrent neural network

inverting a deformable template model of

handwritten digits,” Proc. Int. Conf. Artificial

Neural Networks, Sorrento, Italy, pp. 961-964

(1994)

[22] Hagan, M.T., Demuth, H.B. and Beale, M.H.,

“Neural Network Design,” PWS Publishing

Co., Boston, MA (1996)

[23] Rumelhart, D.E., Hinton G.E., and Williams

R.J., “Learning internal representations by

error propagation.”, MIT Press, Cambridge,

vol. 1,pp. 318–362 (1986).

[24] Sprinkhuizen-Kuyer, I.G., and Boers, E.J.W.,

“The Local Minima of the error surface of the

2-2-1 XOR network,” Annals of Mathematics

and Artificial Intelligence, vol. 25(1-2), pp.

107-136 (1999)

[25] Zweiri, Y.H., Seneviratne, L.D., and

Althoefer, K., “Stability Analysis of a Three-

Term Backpropagation algorithm,” Neural

Networks Journal, vol. 18(10), pp. 1341-1347

(2005)

[26] Abarbanel, H., Talathi, S., Gibb, L., and

Rabinovich, M., “Synaptic plasticity with

IJoART

108



discrete state synapses,” Phys. Rev., vol. E,

72:031914 (2005)

[27]Shrivastava, S. and Singh, M.P.,

“Performance evaluation of feed-forward

neural network with soft computing

techniques for hand written English

alphabets”, Journal of Applied Soft

Computing, vol. 11, pp. 1156-1182

(2011)

IJoART

109

pattern classification for handwritten marathi characters -

Documents