recognition of handwritten arabic literal amounts using a ... · recognition of handwritten...

Recognition of Handwritten Arabic Literal AmountsUsing a Hybrid Approach

Abdelhak Boukharouba • Abdelhak Bennia

Received: 9 February 2010 / Accepted: 2 December 2010 / Published online: 24 December 2010

� Springer Science+Business Media, LLC 2010

Abstract This paper describes a new approach to com-

bine a multilayer perceptron (MLP) and a hidden Markov

model for recognizing handwritten Arabic words. As a first

step, connected components (CCs) of black pixels are

detected, then the system determines which CCs are sub-

words and which are diacritics. The diacritics are then

isolated and identified separately, and the sub-words are

segmented into graphemes. The MLP is used as labeller

(classifier) and probability estimator. We also introduce the

diacritics and their positions in our hybrid system; thus,

only one model including both grapheme and diacritic

states is built to represent the whole alphabet. Finally, we

consider a maximum likelihood classifier to decide about

the word class. The experiments that were performed show

promising results on Arabic word segmentation and

recognition.

Keywords Arabic word modeling � Segmentation �Feature extraction � Multilayer perceptron �Continuous hidden Markov models �Viterbi algorithm

Introduction

Hidden Markov models (HMMs) [1], which are widely

used in speech recognition techniques, have been suc-

cessfully used for the recognition of handwritten words [2,

3]. In HMM word recognition paradigm, there are two

principal methods: the model discriminant method and the

path discriminant method [2]. In the model discriminant

approach, a separate HMM is used for each word class.

However, in the path discriminant approach, which is

analogous to the character based method, only one HMM is

used for all the word classes and different paths in the

model distinguish one word class from the others. Multi-

layer perceptrons have been applied to character recogni-

tion problems [4, 5], and a number of studies have shown

that incorporating the discriminating capabilities and

classification power of MLPs with the statistical modeling

of HMMs results in a system that is better than either MLPs

or HMMs [6]. Among the current efforts in incorporating

MLPs into HMM schemes, a popular approach has been to

replace HMM state observation probabilities with scaled

MLP probability estimates for each of the output neurons.

MLPs compute posterior (Bayesian) probabilities that are

then scaled by prior probabilities for each of the states

(classes) and incorporated into the Viterbi decoding

scheme as state observation probabilities.

The major challenge in the Arabic writing recognition

systems comes from the cursive nature of the data [7]. A

leading paradigm for Arabic script recognition is the use of

implicit segmentation. Often this refers to the use of an

HMM for recognition and vertical image frames for fea-

tures, with each character represented by several states in

the model. For example, Makhoul et al. [8] proposed a

system to recognize typewritten Arabic script. The system

depends on the estimation of character models, a lexicon,

A. Boukharouba (&)

Departement d’Electronique et Telecommunications, Faculte des

Sciences et de la Technologie, Universite 8 Mai 1945 Guelma,

BP 401, Guelma 24000, Algeria

e-mail: [email protected]

A. Bennia

Departement d’Electronique, Faculte Des Sciences de

l’Ingenieur, Universite Mentouri de Constantine,

Constantine, Algeria

123

Cogn Comput (2011) 3:382–393

DOI 10.1007/s12559-010-9088-6

and grammar from training samples. The training phase

extracts statistical features from overlapping vertical win-

dows with the corresponding ground truth to estimate the

character model parameters. The recognition phase injects

the features vector of an input pattern to the HMM to find

the character sequence with the highest likelihood.

Dehghan et al. [9] presented a holistic system for the

recognition of handwritten Farsi/Arabic words using right-

left discrete hidden Markov models. The histogram of chain-

code directions of the image strips is used as features vector.

The Kohonen self-organizing feature map is used for con-

structing the codebook and also smoothing the observation

probability distribution. Khorsheed [10] implemented a

universal HMM that is composed of smaller interconnected

character models. Each character model is a left-to-right

HMM, and it represents one letter from the alphabet.

Amin and Mari [11] presented a method for automatic

recognition of multi-fonts Arabic text. The system is based

on segmentation of words into characters and identification

of characters. Finally, the word recognition is based on the

tree representation lexicon and the Viterbi algorithm.

Miled et al. [12] adopted an HMM model for Arabic words

recognition using an explicit segmentation of words into

graphemes. They used a K-NN classifier to assign to each

grapheme an observation. A maximum likelihood (ML)

classifier was considered to decide about the word class.

Menasri et al. [13] described an off-line handwritten

Arabic words recognition system based on explicit graph-

eme segmentation and hybrid HMM/NN recognition

scheme. Their approach introduces a new shape-based

alphabet for handwriting Arabic recognition, which is

intended to benefit from some specificities of Arabic

writing. Boukharouba and Bennia [14] presented a neuro-

fuzzy hybrid network for the recognition of handwritten

Arabic words. Fuzzy rules are extracted from training

examples by a hybrid learning scheme.

Farah et al. [15] presented an approach using classifiers

combination and syntax analysis for recognizing hand-

written Arabic literal amounts. Benouareth et al. [16]

described an off-line unconstrained handwritten Arabic

word recognition system based on segmentation-free

approach and semi-continuous hidden Markov models

(SCHMMs) with explicit state duration.

This paper deals with the recognition of handwritten

Arabic literal amounts. An explicit segmentation with a

path discriminant continuous HMM is chosen as the rec-

ognition system. The development of a hybrid system using

HMM and MLP is not a new concept for word and

grapheme recognition, respectively [17]. However, the

main contribution of this work focuses on lexicon reduc-

tion using a sophisticated segmentation algorithm and word

modeling aspect. First, a new efficient segmentation algo-

rithm is presented. This algorithm incorporates an

evaluation function based on a running count of horizontal

white-black transitions in conjunction with a set of heu-

ristics to determine the decisive segmentation points.

Secondly, we introduce a shape-based alphabet that is

intended to reduce the redundancy in the shapes of Arabic

letters. Thirdly, this reduced alphabet is used to construct a

new model to incorporate more context by assigning the

diacritics to their nearest graphemes. Thus, only one model

that includes both grapheme classes and diacritics classes is

built to represent the whole alphabet. This allows the HMM

to easily handle stochastic variations of diacritics as well as

errors from the MLP grapheme recognizer.

This paper is organized as follows. ‘‘Arabic Handwriting

Characteristics’’ section describes the Arabic handwriting

characteristics. ‘‘Architecture and Description of Proposed

Recognition System’’ section presents an overall descrip-

tion of proposed recognition system. ‘‘Segmentation of

Words’’ section details the pre-processing and segmenta-

tion steps. ‘‘Feature Extraction’’ section presents the fea-

ture extraction method. ‘‘Statistical Modeling of

Handwritten Words’’ section gives the justifications behind

the design of the model we proposed and details the steps

of learning and recognition of the neuro-markovian net-

work. ‘‘Application and Experimental Results’’ section

deals with the application of our model to handwriting

Arabic amounts and presents the experiments performed to

validate the approach. Finally, we present some concluding

remarks and perspectives.

Arabic Handwriting Characteristics

Since the characteristics of Arabic handwriting are differ-

ent from the Latin ones and some of the readers may be

unfamiliar with Arabic script, a brief description of the

important aspects of Arabic script will be presented. Arabic

text is inherently cursive both in handwritten and printed

forms and is written horizontally from right to left. The

alphabet contains 28 different characters. Different Arabic

characters may have exactly the same shape and are dis-

tinguished from each other only by the addition of one out

of five diacritics shown in Fig. 1a. These are normally one,

two, or three dots, ‘‘hamza’’ or ‘‘medda’’. However, in

handwriting, two dots can be written as two connected dots

(straight segment). For three dots, they can be drawn as two

connected dots and a single dot, triangle or an open curve.

Figure 1b shows some variations in handwritten dots. A

diacritic may be above, below, or even inside the charac-

ter’s main shape. For example, the three different charac-

ters ( ) have the same main shape but different

diacritics. Ambiguous writing of these diacritics sometimes

causes a word image to be read in many various forms with

completely different meanings.

Cogn Comput (2011) 3:382–393 383

123

In contrast to Latin, Arabic characters are not divided

into upper and lower case categories. Instead, an Arabic

character might have several shapes depending on its rel-

ative position in a word (beginning, middle, end, or alone),

for example ( ). Table 1 shows a complete set of

Arabic characters in all their forms depending on their

position in the word. There are six characters, which cannot

be connected to the left. These characters will be called last

characters of sub-word ( ).

Arabic writing is cursive, and words are separated by

spaces. However, a word can be divided into smaller units

called sub-words (a portion of a word including one or

more connected characters).

The vocabulary of Arabic amounts is larger than those

found in Latin languages, see Table 2 [15]. This is due to

three major factors. First, Arabic has three different forms:

singular, double, and plural as shown in Fig. 2. Secondly,

double and plural nouns have up to four different forms

according to their grammatical positions as shown in

Fig. 3. Thirdly, most numbers define two forms for femi-

nine and masculine countable things as shown in Fig. 4.

Architecture and Description of Proposed

Recognition System

Figure 5 shows a block diagram of the proposed recogni-

tion system for handwritten Arabic literal amounts. The

technique can be summarized as follows. The input image

is first smoothed and binarized. The connected components

(CCs) of black pixels are detected, then the system

meddadot twodots

threedots

hamza

two dots

three dots

(b)

(a)

Fig. 1 Diacritics: a different diacritic shapes, b some variations in

handwritten dots

Table 1 Arabic alphabet in all its forms (end form EF, middle form

MF, beginning form BF, and isolated form IF)

Table 2 Arabic literal amounts vocabulary

Fig. 2 Singular, double, and plural forms for the word ‘‘thousand’’

Fig. 3 Four grammatical forms of the word ‘‘two-thousand’’

Fig. 4 Feminine and masculine forms of the word ‘‘three’’

384 Cogn Comput (2011) 3:382–393

123

determines which CCs are sub-words and which are dia-

critics. The diacritics are then isolated and identified sepa-

rately, and the sub-words are segmented into graphemes.

Different feature sets are extracted from each grapheme.

First, the MLP classifier is used as a labeller. After identi-

fying grapheme classes for input word image, the grapheme

labels (classes) are arranged from right to left according to

their appearance in the word. Then, we match the grapheme

and diacritic sequence against the candidate vocabulary

words. A word image is counted as correctly classified if all

graphemes and diacritics composing it are correctly classi-

fied. If this matching cannot succeed to recognize one of

candidate words, the MLP outputs are used as observation

(posterior) probabilities of graphemes and the neuro-mar-

kovian system is then carried out. Next, we use the Viterbi

algorithm in order to find the optimal path representing the

recognized word, which is an ordered list of graphemes and

diacritics associated with the sequence of observations. Note

that matching means verifying if the input word exists in the

lexicon or not. Therefore, a lexicon is required in order to

accept only legal words.

Segmentation of Words

The word images are acquired through a digital scanner

and stored into a file. The words must be binarized and

smoothed before the segmentation. Here, the segmentation

is performed in two levels: word and sub-word segmenta-

tion, and character segmentation. The segmentation pro-

cedure of lines into words and sub-words consists of

identification and classification of connected components.

After a detection of connected components in the input text

line, these components are classified in two classes: sub-

words including isolated characters and diacritics including

dots, see Fig. 6.

Most segmentation methods that are used classify the

connected components using information on their sizes and

positions. Any connected component whose size is less

than a threshold is regarded as a diacritical mark. The

algorithm used to determine the diacritics is presented in

[18]. To improve segmentation efficiency, we opted to

remove diacritics-like dots from characters. Their original

position and number are stored and reintroduced only in the

Viterbi Algorithm

Ordered list of graphemeand diacritic labels

Transition probabilitiesFrom training samples

Preprocessing

Connected ComponentAnalysis and Segmentation

Feature Extraction

Grapheme and diacriticlabels concatenation

Matching

MLP

MLP as a labeler (classifier)

ObservationProbabilities of graphemes

Lexicon

Successful matching

yes

Acceptedrecognition

result

no

Continuous HMM decision (recognition)

ObservationProbabilities of diacritics

Diacritic labels

Matching

Successful matching

yes

no

Acceptedrecognition

result

Rejectedrecognition

result

Fig. 5 Block diagram of the

proposed recognition system:

dashed and solid arrowsrepresent control and data

flows respectively

sub-wordisolated character

dots

Fig. 6 Different types of connected components forming the word

‘‘ninety’’

Cogn Comput (2011) 3:382–393 385

123

recognition phase. Only sub-words are considered by the

next segmentation phase.

The segmentation is carried out with the help of an

evaluation function based on a running count of horizontal

white-black transitions of the sub-word. We scan the bi-

narized image vertically column by column and locate each

white-black transition at the black pixel by retaining its

gray level as 0 and replacing the gray levels of the other

black pixels with 1. Figure 7 illustrates the steps for

extracting the primary segmentation points (PSPs). Fig-

ure 7b shows the result by locating the white-black tran-

sition vertically on Fig. 7a. The number of vertical white-

black transitions at every point in the sub-word is

calculated.

The steps of our algorithm are as follows:

Step 1: Scan the sub-word from right to left, and find the

points at which:

1. The white-black transition number changes from or to

1.

2. Upper contour of the sub-word parts, which have only

one white-black transition, changes quickly from high

to low or from low to high.

This procedure gives the primary segmentation points

(PSPs), which are represented by vertical white segments.

Thus, the sub-words are segmented into different parts; see

Fig. 7c.

Step 2: Use some rules to check whether the primary

segmentation points are real segmentation points (RSPs).

The rules are as follows:

Rule 1: The parts with short lengths are due to noise

(spurious pixels) and must be not considered. Each

redundant part introduces two primary segmentation

points, which must be removed; see Fig. 8. Notice that p1

occurs at the beginning/end of character and, p2 and p3

occur inside a character. Consequently, p2 and p3 must be

removed, and we retain only p1.

Rule 2: Remove the two last PSPs if the part between

them is of one black-white transition, see Fig. 9a.

Rule 3: Remove the last PSP in the last character, like

“ ”, “ ”,, and “ ” as in Fig. 9b. In the case, when the last

part is a long-vertical segment or a loop, the breakpoint is

retained as in Fig. 9c.

These rules are then applied to each PSP in order to

validate the real segmentation points (RSPs); the latter are

used to segment the input image. The connecting part

which is a useless horizontal segment between two suc-

cessive (RSPs) should be removed before recognition

because it causes recognition error. Figure 10 shows the

segmentation of the word “ ” depicted in Fig. 7 into

graphemes; the final RSPs are marked by vertical white

segments.

The segmentation algorithm is baseline independent,

which make it more robust especially for handwritten

scripts compared to the existent algorithms.

In this explicit segmentation, words are segmented into

graphemes, which are then recognized individually. A

grapheme may be an entire character or portion of

(a)

(b)

(c)PSPs

Fig. 7 The main steps for extracting (PSPs): a the main body of the

word five “ ”, b result by locating the white-black transitions

vertically on (a), c primary segmentation points (PSPs)

p1 p2 p1p3

p1

P3

p2 p1

Fig. 8 p2 and p3 are removed and only p1 is retained

p1 p2 p1 p2p1

p2

(a)

(c)

p1p1 p1

(b)

p1p1

Fig. 9 Filtering the redundant PSPs: (a, b) PSPs are removed to

avoid splitting characters, c PSPs are retained

386 Cogn Comput (2011) 3:382–393

123

character. For example, the character is segmented into

three portions as shown in Fig. 10, and each portion of the

character has the same class as the main shape of

character “ ”. As a result, the number of graphemes to be

recognized is smaller than the number of characters of

alphabet of Arabic literal amount. Consequently, in this

way, we have introduced a new grapheme-based alphabet

for handwritten Arabic word recognition, which allows us

to benefit from some inherent properties of Arabic writing.

Feature Extraction

In our vocabulary of the Arabic amounts, alphabet can be

divided into 18 classes; each class contains graphemes that

have a main shape; however, they can be distinguished by

the number of dots and their positions. After the segmen-

tation into graphemes, the word image is represented as a

sequence of observations (graphemes). Each grapheme is

either only one character or a portion of a character.

At this level, each grapheme is described by different

sets of features:

The first feature set is the chain-code histogram (CCH),

which is a statistical measure for the directionality of the

contour of a character. Four directions (0�, 45�, 90�, 135�)

are considered; thus, the directions between two successive

pixels are encoded as 0, 1, 2, and 3 direction codes,

respectively, as shown in Fig. 11. To extract features from

contour directions, the bonding box of each digit is divided

into four zones, and within each zone, the (4-bin) histo-

gram of chain code is computed where each bin represents

the frequency of the respective direction. Therefore, the

contour feature vector is composed of 4 9 4 (16) compo-

nents normalized between 0 and 1. Thus, fi, i = 1, 2,…, 16,

where fi is the ith directional feature. Besides, the density

of the contour pixels of each zone is calculated.

The second set is based on the white-black transition

information in the vertical and horizontal directions of a

grapheme image:

fhk ¼lhk

w; fvk ¼

lvk

h

for k ¼ 1; 2 and fh3 ¼P

k� 3 lhk

w; fv3 ¼

Pk� 3 lvk

h

where lhk and lvk are the lengths of parts, which have

k transitions in the horizontal and vertical direction,

respectively, whereas w and h, respectively, denote the

width and the height of the grapheme.

The third set, calculates the profiles of the grapheme

from the upper, lower, left, and right boundaries of the

image. The profile area is computed as the number of

pixels between the edges of the image (bounding box) and

the contour of the grapheme. Each profile feature is cal-

culated as the ratio between the area of each profile and the

grapheme’s area. Finally, relative size features are calcu-

lated with respect to pre-fixed size (w0, h0): ratio1 = w/w0

and ratio2 = h/h0 where w and h are the width and the

height of the grapheme, respectively.

As a result, we obtain a feature vector of 32 components

per grapheme.

Statistical Modeling of Handwritten Words

The hidden Markov model (HMM) theory has been suc-

cessfully used to model the writing variability [19]. The

theoretic formulation of HMM is beyond the scope of this

paper. Our interest in the HMM lies in its ability to effi-

ciently model different knowledge sources. It correctly

integrates different modeling levels (morphological, lexi-

cal, and syntactical) and also provides efficient algorithms

to determine an optimum value for the model parameters.

In this section, we give the justifications behind the design

of the model we propose and we detail the steps of learning

and recognition of the neuro-markovian network.

The Proposed Model

Markovian modeling assumes that a word image is repre-

sented by a sequence of observations. These observations

should be statistically independent once the underlying

hidden state sequence is known.

The task of the recognition problem is to find the word

w maximizing the posterior probability that w has generated

an unknown observation sequence (segments) o1,…, on:

Connecting parts

CharactersCharacter portionsCharacter

Fig. 10 Segmentation of the word “ ” (Fig. 7) in graphemes:

the final RSPs are marked by vertical white segments

12

03

(a) (b)

Fig. 11 The chain-code extraction: a contour of digit “ ” divided

into four zones, b 4-chain-code directions

Cogn Comput (2011) 3:382–393 387

123

p w=o1; . . .; onð Þ ¼ maxw

p w=o1; . . .; onð Þ: ð1Þ

Applying Bayes’ rule to this definition, we obtain the

fundamental equation of pattern recognition.

p w=o1; . . .; onð Þ ¼ p o1; . . .; on=wð ÞpðwÞp o1; . . .; onð Þ ð2Þ

Since p o1; . . .; onð Þ does not depend on w, the decoding

problem becomes equivalent to maximizing the joint

probability.

p w; o1; . . .; onð Þ ¼ p o1; . . .; on=wð ÞpðwÞ ð3Þ

p(w) is the a priori probability of the word w.

In the HMM paradigm, we can write

p o1; . . .;on=wð Þ¼X

s1��sn

p o1; . . .;on=s1; . . .;sn;wð Þp s1; . . .;sn=wð Þ

ð4Þ

In the case of the handwritten writing for each model, a

succession (path) prevails greatly [20], and we can write

therefore:

p o1; . . .; on=wð Þ ¼ p o1; . . .; on=s1; . . .; sn;wð Þp s1; . . .; sn=wð Þ:ð5Þ

In an HMM, each sequence element is assumed to

depend only on the corresponding state.

p o1; . . .; on=s1; . . .; sn;wð Þ ¼Yn

j¼1

p oj

�sj;w

� �ð6Þ

Our HHM is assumed to be of the first order.

p s1; . . .; sn=wð Þ ¼Yn

j¼2

p sj

�sj�1;w

� �ð7Þ

We showed that under these two hypotheses, we can

write:

p o1; . . .;on;s1; . . .;sn=wð Þ¼Yn

j¼1

p oj

�sj;w

� �Yn

j¼2

p sj

�sj�1;w

� �:

ð8Þ

For most applications, the observations are continuous

signals. Vector quantization of these continuous signals can

degrade the performance significantly. Therefore, it is

necessary to include continuous observation densities in

the Markov models by using neural networks. A popular

approach has been to replace HMM state observation

probabilities with scaled MLP probability estimates for

each of the output units, instead of Gaussian mixtures [21].

The advantage of such a hybrid scheme over traditional

HMM recognition is the discriminative nature of the MLP

training. In classical HMM training algorithms, the models

are trained to maximize the likelihood of producing their

training examples, but no training is done to minimize

some form of probability that other examples are produced

by the model. However, the MLP automatically

incorporates discrimination. When an MLP is trained for

grapheme classification, it is explicitly demanded that one

output is maximal and the other outputs are zero. This

provides a discriminating effect.

Multilayer Perceptron and Parameter Estimation

of the Neuro-Markovian Network

The proposed MLP is a three-layer network: an input layer,

a hidden layer, and an output layer as shown in Fig. 12.

The input layer composed of the vector x of character-

istics obtained from the input image; the second layer

contains the hidden neurons. The output of the hidden

neuron m is:

hm ¼ f w0m þXI

i¼1

xiwim

!

¼ fXI

i¼0

xiwim

!

;

where x0 = 1 is the neuron corresponding to the bias w0m.

Finally, the output layer is sized according to the num-

ber of classes to be distinguished.

The output of the output neuron j is:

pj ¼ f z0j þXM

m¼1

hmzmj

!

¼ fXM

m¼0

hmzmj

!

;

where h0 = 1 is the neuron corresponding to the bias z0j.

f is the sigmoid function:

f ðaÞ ¼ 1

1þ e�aand f

0 ðaÞ ¼ f ðaÞð1� f ðaÞÞ:

MLP learn through an iterative process of adjustments

applied to their weights. The most common learning

algorithm is the standard back-propagation algorithm [22].

The algorithm uses a gradient-based search technique to

minimize the instantaneous error between target t and

actual output p for input pattern k and J output neurons:

Ek ¼1

2

XJ

j¼1

tkj � pk

j

� �2

;

where pj � p sj

�oj

� �:

p1

pJhM

x1

xI

hm

h1

xi pjwim zmj

Fig. 12 Multilayer perceptron with one hidden layer

388 Cogn Comput (2011) 3:382–393

123

The weight update is performed repeatedly for total

patterns K until the total error ET ¼PK

k¼1 Ek is smaller

than a predefined threshold value. The updating rule with

Rumelhart’s momentum is defined as follows [22].

Compute error terms for output neurons and hidden

neurons, respectively:

dkj ¼ tk

j � pkj

� �pk

j 1� pkj

� �ð9Þ

dkm ¼ hk

m 1� hkm

� �XJ

j¼0

dkj zmj: ð10Þ

Update weights zmj and wim, respectively:

zmjðt þ 1Þ ¼ zmjðtÞ þ gdkj hk

m þ l zmjðtÞ � zmjðt � 1Þ� �

ð11Þ

wimðt þ 1Þ ¼ wimðtÞ þ gdkmxk

i þ l wimðtÞ � wimðt � 1Þð Þð12Þ

where g and l are the learning rate and the momentum,

respectively.

At the end of an optimal training, the posterior proba-

bility p sj

�oj

� �is computed by the multilayer perceptron

[23, 24]. On the other hand, the Markovian model uses the

probability p oj

�sj

� �of observing image oj given a state sj.

The two terms are related by the Bayes formula:

p oj

�sj

� �¼

p sj

�oj

� �� p oj

� �

p sj

� � : ð13Þ

And for a sequence o1; . . .; on and a path s1; . . .; sn:

p o1; . . .; on; s1; . . .; sn=wð Þ ¼Yn

j¼1

p sj

�oj;w

� �

�Yn

j¼2

p sj

�sj�1;w

� �

�Qn

j¼1 p oj

� �

Qnj¼1 p sj

� �:

ð14Þ

Since the product of image segments probabilities p(oj)

does not depend on the word w hypothesis, we can write:

p o1; . . .; on; s1; . . .; sn=wð Þ

/Qn

j¼1 p sj

�oj;w

� ��Qn

j¼2 p sj

�sj�1;w

� �

Qnj¼1 p sj

� � :ð15Þ

In the above formulas, the terms of the type p sj

�sj�1

� �

are transition probabilities that can be estimated using the

Baum–Welch algorithm [1], and the terms of the type

p sj

�oj

� �are well estimated by the outputs of the neural

network.

The prior probability p(sj) is obtained by computing the

number of occurrences of each state (class) in the entire

training database. The term p(oj) is factored out in the

Viterbi decoding [1], since it is common to all the states,

for a given input grapheme.

Application and Experimental Results

In Arabic script, an Arabic word is generally composed of

sub-words and diacritics. Once the diacritics are separated

and identified separately, the segmentation module seg-

ments the sub-words into graphemes. Assume the seg-

mentation is successful and recognition is based on

segmentation of a word into graphemes and diacritics. For

recognition, diacritics must be assigned to specific letters.

However, the position of diacritics is variable, and conse-

quently, the recognition process is made more complicated.

The problem we faced is how to introduce the type of the

diacritics and their positions in our model. In fact, each

diacritic type is associated with the nearest grapheme.

Consequently, in the word model, the observation of each

diacritic comes just after the observation of the nearest

grapheme. Besides, each separation between two succes-

sive connected components is introduced as additional

class (state) in our model, which models the separation

between two successive separated graphemes. Figure 13

illustrates the observation sequence of the word “ ”.

Only encircled parts symbolize the observation sequence,

the remaining ones are connecting parts and must be

removed.

In our application, a path discriminant continuous HMM

was chosen as the recognition engine. Only one HMM is

used for all the word classes, and different paths in the

model distinguish one word class from the others.

The training consists in estimating transition and

observation probabilities. The state transition probabilities

(inter-graphemes transitions) are estimated by counting

over the whole training database. The transitions “ ” of

the words “ ” and “ ” present, therefore, the same

transition “ ”. State observation probabilities of

graphemes are estimated by the outputs of the neural net-

work. This network has as many outputs as possible states.

o1

o2

Separation

o3

o4

o5o7

o6

Fig. 13 Observation sequence of the word “ ”

Cogn Comput (2011) 3:382–393 389

123

In all, 18 classes of graphemes are used in our vocabulary

of Arabic literal amounts. Therefore, 18 classes are used.

However, each detected diacritic can be seen as a state with

an observation probability equals to 1 and 0 if not. In the

same way, the observation probability of separation

between graphemes equals to 1 if it exist and 0 if not. Thus,

the number of states in the HMM is equal to 24 states.

At the word recognition process, each word is seg-

mented into a set of graphemes. These resulted graphemes

are arranged from right to left according to their appear-

ance in the word. The rightmost grapheme corresponds to

the first one in the word, and so on. Then, the word is

described by an ordered list of graphemes and their asso-

ciated diacritics. After identifying grapheme classes for

each handwritten word image using MLP classifier, we

match the grapheme and diacritic sequence against the

candidate vocabulary words. A word image is counted as

correctly classified if all graphemes and diacritics com-

posing it are correctly classified. If this matching cannot

succeed to recognize one of candidate words, the neuro-

markovian system is then carried out. To recognize a word,

the HMM computes the likelihood of words by summing

the probabilities over all possible paths through the word

model. Next, we use the Viterbi algorithm in order to find

the optimal path representing the recognized word.

A database consisting of 7,200 images of 48 different

words is used for developing handwritten Arabic literal

amount recognition. The 48 words of the vocabulary were

written three times by 50 writers. Table 2 illustrates the 48

different classes of words, and Fig. 14 shows some word

samples extracted from the used database.

For the segmentation algorithm, the experimental results

show that the algorithm achieved about 94% correct seg-

mentation. However, exceptional cases lead to over- or

under-segmentations, so we need more rules to obtain more

precise segmentation results. Figure 15a illustrates some

confusion due to over-segmentations caused by existence

of redundant parts such as spurious branches and small

loops where the used rules cannot segment correctly these

sub-words. Here, only the PSPs p1 and p2 must be retained.

Furthermore, certain character combinations form new

ligature shapes where one letter is above another, see

Fig. 15b. Consequently, the real segmentation points are

not detected, which lead to under-segmentation. Besides,

the strokes (graphemes) of some characters like “ ” are

omitted in some handwriting styles and written as hori-

zontal segment as depicted in Fig. 15c; consequently,

we should add more suitable rules to remove such

confusions.

For word recognition, the database was divided into

three parts, one part for training and the remaining two

parts for test. To train the MLP classifier, we used only

well-segmented graphemes. We have implemented an

MLP classifier. Then, we have studied the influence of the

number of hidden units M on the classifier performances.

The best performance of the MLP classifier (94.60%) is

obtained with a number of hidden neurons equals to 25.

The most frequent confusions generated by MLP are

related to specific configurations of character-pairs, for

example “ ”-“ ”, “ ”-“ ”, “ ”-“ ”, “ ”-“ ”, “ ”-“ ”,

“ ”-“ ”, “ ”-“ ”, “ ”-“ ”, “ ”-“ ”, and “ ”-“ ” . In such

cases, many characters were clustered near each other in

the feature space, leaving no chance to recognize them

uniquely. After analyzing the system errors, we observed

that many of the misrecognitions occurred between visu-

ally similar character-pairs as might be expected.

Among the confusions of the neural system removed by

the neuro-markovian system is the case of the word

“ ” where the MLP does not recognize the first char-

acter “ ” and recognize successfully the other graphemes.

However, the neuro-markovian system suggest the word

“ ” in the first position with observation sequence

probability of 1.654 exp(-15).

Another ambiguity between the letter “ ” and “ ”caused by MLP is removed by neuro-markovian system.

The neural network suggests the letter “ ” with a posterior

probability of 0.9903 in first suggestion instead of “ ” ofFig. 14 Word samples from used database

Rudentant loops

Rudentant branch

p1p2

Missed RSP

(a)

(b) (c)

Fig. 15 Incorrect segmentation types: a over-segmentation: p1 and

p2 are the RSPs and all other vertical segments must be removed,

b under-segmentation, c The strokes of “ ” are omitted

390 Cogn Comput (2011) 3:382–393

123

the word “ ” with a posterior probability of 0.1006.

However, the neuro-markovian system recognizes the word

“ ” well in the first position with observation sequence

probability of 8.2742 exp(-8).

To demonstrate the effectiveness of the HMM–MLP

technique applied, a comparison with MLP classifier has

been made. MLP is a state of the art classification tech-

nique, and it has been successfully applied to such real-

word problems as character recognition. Moreover, the

literature has shown better results on digit recognition

using MLP [25–27].

One way to observe the contribution of the contextual

processing is to measure recognition rates at the grapheme

level (by the neural network) and at the word level (by the

neuro-markovian system), then we make the comparison.

Table 3 reports the improvements on word recognition

using the hybrid system. This table presents the results on

grapheme recognition using MLP and word recognition

using HMM–MLP network. We note that the hybrid

system brings an increase in the recognition rate by about

3%.

In spite of the fact that the hybrid system supplied a

remarkable improvement in term of recognition rate at the

word level, it did not successfully recognize some words,

but it was able to correct some graphemes composing

them.

For example, the words “ ” and “ ”, which are

composed of five graphemes, the MLP recognize only the

first and the fourth graphemes “ , ” for the first word and

the first and the second graphemes “ , ” for the second

word. However, the HMM–MLP recognize well (correct)

the second and the third graphemes for the first word and

the third and the fourth graphemes for the second, while the

end grapheme “ ” is recognized as “ ” for both of them.

Consequently, we achieve an improvement at the grapheme

level, which reduces misrecognized graphemes composing

the word where a simple Arabic spell-check applied to this

reduced lexicon can recognize the entire word.

Comparison with published methods is delicate due to

the use of different databases, different numbers of training

and testing samples, and also different features and rec-

ognition algorithms. For example, Farah et al. [15] pre-

sented an approach using a similar database containing

4,800 words for recognizing handwritten Arabic literal

amounts, and they claimed a recognition rate of 96%.

Menasri et al. [13] and Benouareth et al. [16] evaluated

their approaches on the IFN/ENIT benchmark database,

and the achieved results were 87.4 and 90.20%,

respectively.

Compared to these results, we have attained higher

levels of recognition accuracy evaluated on a database of

7,200 words. Moreover, the main strength of this approach

focuses on lexicon reduction using a sophisticated seg-

mentation algorithm and modeling aspect. This approach,

to the best of our knowledge, has not been applied, as it is

presented in this paper for Arabic word segmentation and

recognition in restricted lexicons.

As we know, for HMM-based word recognition, there

are two main approaches: the first relies on an implicit

segmentation [8, 9], where the handwriting data are sam-

pled into a sequence of tiny frames (overlapped or not).

The second uses a more sophisticated explicit segmentation

technique [12, 13] to cut the words into more meaningful

units or graphemes, which are larger than the frames. Our

approach belongs to the second one. As described in [13],

we believe that explicit grapheme segmentation is well

adapted for Arabic writing. One of the reasons is that some

letters such as or have tails that go almost horizontally

under the baseline. If the tail is long, which often happens

in Arabic handwriting, the next letter of the word is likely

to be vertically overlapping the previous tail. Building a

sequence of graphemes intrinsically solves this problem,

while, on the other hand, vertical frames or sliding win-

dows will be forced to process a piece of image that con-

tains parts of two different letters at the same time. Besides,

diacritical marks are often not at the exact position on top

or under the main part of the letter. Consequently, the

vertical slicing approach will eventually split letters from

their diacritics, and thus reduces the character recognition

accuracy. However, our model incorporates these varia-

tions of diacritics by assigning them to their nearest

graphemes.

Compared to Menasri et al. work [13], their approach

introduced a new shape-based alphabet called letter-body

alphabet in order to reduce the lexicon size. However, our

alphabet is more reduced as some characters like has

been segmented into three portions (graphemes). Each

portion has the same class as the main shape of the char-

acter . Besides, their recognizer builds a letter-body

sequence by a concatenation of letter-body HMMs (each

HMM describe one class of shape). However, in our

method, each grapheme corresponds to only one HMM’s

state, which reduce considerably the number of parameters

of our HMM. Furthermore, in our path discriminant

approach, only one HMM is used for all the word classes.

Thus, it can be performed more efficiently by matching

with all word classes simultaneously than by matching one

by one.

Table 3 Recognition rate and error rate over the whole database

Recognition rate (%) Error rate (%)

MLP classifier 94.60 05.40

HMM–MLP hybrid system 97.20 02.80

Cogn Comput (2011) 3:382–393 391

123

Conclusion

In this paper, we have presented a new system based on

explicit segmentation for recognizing handwritten Arabic

words. We used only one word HMM to model explicitly

segmented words, leading to a better discrimination

between them. The posterior probabilities are computed by

the multilayer perceptron, the transition probabilities can

be estimated using the Baum–Welch algorithm, and the

prior probabilities are obtained by computing the number

of occurrences of each state (class) in the entire training

database. First, the MLP is used as a classifier. If the MLP

cannot succeed to recognize one of candidate words, the

MLP outputs are used as observation (posterior) probabil-

ities of graphemes and the neuro-markovian system is then

carried out. Finally, the Viterbi algorithm is used in order

to find the optimal path representing the recognized word.

This system deals with the loss in terms of recognition

performance brought by the grapheme recognition module

and aims at improving the word recognition results and

reliability of the system.

To summarize, the main benefits of this work are:

1. Since classification in a small alphabet is both more

efficient and more accurate than in a large alphabet.

We have introduced a new grapheme-based alphabet

for handwritten Arabic word recognition where the

number of graphemes to be recognized is smaller than

the number of characters of alphabet of Arabic literal

amount.

2. Using the MLP as labeller (classifier) and probability

estimator, we have only as many possible outputs as

there are graphemes in the database. Typically, in a

language, 18 graphemes are distinguished. Moreover,

the MLP have fewer input parameters 32. This implies

that they also have fewer weights; therefore, they can

be trained much faster. Consequently, fewer HMM

parameters are needed. This is not much if we

compared this with the parameter sizes of the other

approaches in the literature. In addition, the most of

words is recognized efficiently (94.60%) using MLP as

classifier without introducing HMM; thus, this

approach is performed at a low cost in computation

and needs less memory requirements.

3. The main argument for using the MLP to obtain output

probabilities is that the latter is trained discrimina-

tively and that no assumption on their distribution is

made as opposed to mixtures of Gaussians.

Finally, the most interesting contribution in this work is

the segmentation algorithm. The advantage of this algo-

rithm is that it is easier to find a set of potential segmen-

tation points because it analyzes the structural shape of

characters as they have been scanned without any

transformation (projection, thinning) and it is baseline

independent, which make it more robust especially for

handwritten scripts compared to the existent algorithms.

The main drawback of this algorithm is over- and under-

segmentation, but we can remedy it by using more

appropriate rules at the segmentation level or by intro-

ducing insert and delete states at the hybrid model level.

For future study, first, we plan to extend and adapt our

approach to applications with large lexicon such as the IFN/

ENIT database and we plan also to use some structural fea-

tures to efficiently remove some confusion such as the

position of the grapheme within the word which will be

computed by the segmentation process. For example, we can

remove the ambiguity between the letter “ ” and “ ” caused

by MLP above where the letter “ ” is located in the middle

of the sub-word, but the letter “ ” is an isolated one.

Secondly, we would like to extend the segmentation

algorithm for segmentation Arabic text into characters

where the word must be segmented into graphemes which

are then recombined to form characters and we expect to

achieve best performances thanks to the simplicity and

efficiency of the approach.

References

1. Rabiner LR. A tutorial on hidden Markov models and selected

applications in speech recognition. Proc IEEE. 1989;77(2):257–86.

2. Chen MY, Kundu A, Srihari SN. Variable duration hidden

Markov and morphological segmentation for handwritten word

recognition. IEEE Trans Image Process. 1995;4(12):1675–88.

3. Senior AW, Robinson AJ. An off-line cursive handwriting rec-

ognition system. IEEE Trans Pattern Anal Mach Intell. 1998;

20(3):309–21.

4. Altuwaijri M, Bayoumi M. Arabic text recognition using neural

networks. In: Proceedings of international symposium on circuits

and systems—ISCAS’94; 1994. p. 415–8.

5. Amin A, Al-Sadoun H. Handprinted Arabic character recognition

system using an artificial neural network. Pattern Recognit.

1996;29:663–75.

6. Morgan N, Bourlard H. Continuous speech recognition using

multilayer perceptrons with hidden Markov models. In: Pro-

ceedings of ICASSP-90; 1990. p. 413–6.

7. Lorigo LM, Govindaraju V. Offline Arabic handwriting recogni-

tion: a survey. IEEE Trans Pattern Anal Mach Intell. 2006;

28(5):712–24.

8. Makhoul J, Schwartz R, Lapre C, Bazzi I. A script independent

methodology for optical character recognition. Pattern Recognit.

1998;31(9):1285–94.

9. Dehghan M, Faez K, Ahmadi M, Shridhar M. Handwritten Farsi

(Arabic) word recognition: a holistic approach using discrete

HMM. Pattern Recognit. 2001;34:1057–65.

10. Khorsheed MS. Recognising handwritten Arabic manuscripts

using a single hidden Markov model. Pattern Recognit Lett.

2003;24(14):2235–42.

11. Amin A, Mari JF. Machine recognition and correction of printed

Arabic text. IEEE Trans Man Cybern. 1989;9:1300–6.

12. Miled H, Olivier C, Cheriet M, Lecourtier Y. Coupling obser-

vations/letters for a markovian modeling applied to the

392 Cogn Comput (2011) 3:382–393

123

recognition of the Arabic handwriting. In: Proceedings of 4th

IAPR international conference on document analysis and recog-

nition, ICDAR’97, Ulm, Germany; 1997. p. 580–3.

13. Menasri F, Vincent N, Augustin E, Cheriet M. Shape-based

alphabet for off-line Arabic handwriting recognition. In: Pro-

ceedings of the 9th international conference on document anal-

ysis and recognition ICDAR, Curitiba, Brazil; 2007. p. 969–73.

14. Boukharouba A, Bennia A. Recognition of Handwritten Arabic

words using a neuro-fuzzy network. In: Proceedings of AIP 1st

mediterranean conference on intelligent systems and automation,

Annaba; 2008. p. 254–9.

15. Farah N, Souici L, Sellami M. Classifiers combination and syntax

analysis for Arabic literal amount recognition. Eng Appl Artif

Intell. 2006;19:29–39.

16. Benouareth A, Ennaji A, Sellami M. Semi-continuous HMMs

with explicit state duration for unconstrained Arabic word mod-

eling and recognition. Pattern Recognit Lett. 2008;29:1742–52.

17. Morita M, Sabourin R, Bortolozzi F, Suen CY. Segmentation and

recognition of handwritten dates: an HMM-MLP hybrid

approach. Int J Doc Anal Recognit. 2004;6:248–62.

18. Al-Yousefi H, Udpa SS. Recognition of Arabic characters. IEEE

Trans Pattern Anal Mach Intell. 1992;14:853–7.

19. El-Yacoubi A, Gilloux M, Sabourin R, Suen CY. An HMM-based

approach for off-line unconstrained handwritten word modeling

and recognition. IEEE Trans Pattern Anal Mach Intell. 1999;

21(8):752–60.

20. Lethelier E, Leroux M, Gilloux M. Traitement des montants

numeriques des cheques postaux, approche d’une methode de

segmentation basee sur la reconnaissance. Actes de CNED 94

(3eme Colloque National sur l’Ecrit et le Document), Rouen,

France; 1994. p. 315–23.

21. Naik JM, Lubensky DM. A hybrid HMM-MLP speaker verifi-

cation algorithm for telephone speech. In: Proceedings of IEEE

international conference on acoustics, speech, and signal pro-

cessing (ICASSP); 1994. p. 153–6.

22. Looney CG. Advances in feedforward neural networks: demys-

tifying knowledge acquiring black boxes. IEEE Trans Knowl

Data Eng. 1996;8(2):211–26.

23. Bourlard H, Wellekens CJ. Links between Markov models and

multilayer perceptrons. In: Denver CO, Touretzky D, editors.

Proceedings of IEEE conference on neural information process-

ing systems, Morgan-Kaufmann; 1989. p. 502–10.

24. Hampshire JB, Pearlmutter H. Equivalence proofs for multi-layer

perceptron classifiers and the Bayesian discriminant function. In:

Proceedings of in connectionist models: proceedings of the

summer school, Morgan-Kauffmann; 1990. p. 159–72.

25. Ha TM, Bunk H. Off-line handwritten numeral recognition by

perturbation method. IEEE Trans Pattern Anal Mach Intell.

1997;19(5):535–9.

26. Oliveira LS, Sabourin R, Bortolozzi F, Suen CY. Automatic

recognition of handwritten numerical strings: a recognition and

verification strategy. IEEE Trans Pattern Anal Mach Intell.

2002;24(11):1438–54.

27. Liu J, Gader P. Neural networks with enhanced outlier rejection

ability for off-line handwritten word recognition. Pattern Rec-

ognit. 2002;35:2061–71.

Cogn Comput (2011) 3:382–393 393

123

recognition of handwritten arabic literal amounts using a ... · recognition of handwritten...

Documents