recognition of handwritten arabic literal amounts using a ... · recognition of handwritten...
TRANSCRIPT
Recognition of Handwritten Arabic Literal AmountsUsing a Hybrid Approach
Abdelhak Boukharouba • Abdelhak Bennia
Received: 9 February 2010 / Accepted: 2 December 2010 / Published online: 24 December 2010
� Springer Science+Business Media, LLC 2010
Abstract This paper describes a new approach to com-
bine a multilayer perceptron (MLP) and a hidden Markov
model for recognizing handwritten Arabic words. As a first
step, connected components (CCs) of black pixels are
detected, then the system determines which CCs are sub-
words and which are diacritics. The diacritics are then
isolated and identified separately, and the sub-words are
segmented into graphemes. The MLP is used as labeller
(classifier) and probability estimator. We also introduce the
diacritics and their positions in our hybrid system; thus,
only one model including both grapheme and diacritic
states is built to represent the whole alphabet. Finally, we
consider a maximum likelihood classifier to decide about
the word class. The experiments that were performed show
promising results on Arabic word segmentation and
recognition.
Keywords Arabic word modeling � Segmentation �Feature extraction � Multilayer perceptron �Continuous hidden Markov models �Viterbi algorithm
Introduction
Hidden Markov models (HMMs) [1], which are widely
used in speech recognition techniques, have been suc-
cessfully used for the recognition of handwritten words [2,
3]. In HMM word recognition paradigm, there are two
principal methods: the model discriminant method and the
path discriminant method [2]. In the model discriminant
approach, a separate HMM is used for each word class.
However, in the path discriminant approach, which is
analogous to the character based method, only one HMM is
used for all the word classes and different paths in the
model distinguish one word class from the others. Multi-
layer perceptrons have been applied to character recogni-
tion problems [4, 5], and a number of studies have shown
that incorporating the discriminating capabilities and
classification power of MLPs with the statistical modeling
of HMMs results in a system that is better than either MLPs
or HMMs [6]. Among the current efforts in incorporating
MLPs into HMM schemes, a popular approach has been to
replace HMM state observation probabilities with scaled
MLP probability estimates for each of the output neurons.
MLPs compute posterior (Bayesian) probabilities that are
then scaled by prior probabilities for each of the states
(classes) and incorporated into the Viterbi decoding
scheme as state observation probabilities.
The major challenge in the Arabic writing recognition
systems comes from the cursive nature of the data [7]. A
leading paradigm for Arabic script recognition is the use of
implicit segmentation. Often this refers to the use of an
HMM for recognition and vertical image frames for fea-
tures, with each character represented by several states in
the model. For example, Makhoul et al. [8] proposed a
system to recognize typewritten Arabic script. The system
depends on the estimation of character models, a lexicon,
A. Boukharouba (&)
Departement d’Electronique et Telecommunications, Faculte des
Sciences et de la Technologie, Universite 8 Mai 1945 Guelma,
BP 401, Guelma 24000, Algeria
e-mail: [email protected]
A. Bennia
Departement d’Electronique, Faculte Des Sciences de
l’Ingenieur, Universite Mentouri de Constantine,
Constantine, Algeria
123
Cogn Comput (2011) 3:382–393
DOI 10.1007/s12559-010-9088-6
and grammar from training samples. The training phase
extracts statistical features from overlapping vertical win-
dows with the corresponding ground truth to estimate the
character model parameters. The recognition phase injects
the features vector of an input pattern to the HMM to find
the character sequence with the highest likelihood.
Dehghan et al. [9] presented a holistic system for the
recognition of handwritten Farsi/Arabic words using right-
left discrete hidden Markov models. The histogram of chain-
code directions of the image strips is used as features vector.
The Kohonen self-organizing feature map is used for con-
structing the codebook and also smoothing the observation
probability distribution. Khorsheed [10] implemented a
universal HMM that is composed of smaller interconnected
character models. Each character model is a left-to-right
HMM, and it represents one letter from the alphabet.
Amin and Mari [11] presented a method for automatic
recognition of multi-fonts Arabic text. The system is based
on segmentation of words into characters and identification
of characters. Finally, the word recognition is based on the
tree representation lexicon and the Viterbi algorithm.
Miled et al. [12] adopted an HMM model for Arabic words
recognition using an explicit segmentation of words into
graphemes. They used a K-NN classifier to assign to each
grapheme an observation. A maximum likelihood (ML)
classifier was considered to decide about the word class.
Menasri et al. [13] described an off-line handwritten
Arabic words recognition system based on explicit graph-
eme segmentation and hybrid HMM/NN recognition
scheme. Their approach introduces a new shape-based
alphabet for handwriting Arabic recognition, which is
intended to benefit from some specificities of Arabic
writing. Boukharouba and Bennia [14] presented a neuro-
fuzzy hybrid network for the recognition of handwritten
Arabic words. Fuzzy rules are extracted from training
examples by a hybrid learning scheme.
Farah et al. [15] presented an approach using classifiers
combination and syntax analysis for recognizing hand-
written Arabic literal amounts. Benouareth et al. [16]
described an off-line unconstrained handwritten Arabic
word recognition system based on segmentation-free
approach and semi-continuous hidden Markov models
(SCHMMs) with explicit state duration.
This paper deals with the recognition of handwritten
Arabic literal amounts. An explicit segmentation with a
path discriminant continuous HMM is chosen as the rec-
ognition system. The development of a hybrid system using
HMM and MLP is not a new concept for word and
grapheme recognition, respectively [17]. However, the
main contribution of this work focuses on lexicon reduc-
tion using a sophisticated segmentation algorithm and word
modeling aspect. First, a new efficient segmentation algo-
rithm is presented. This algorithm incorporates an
evaluation function based on a running count of horizontal
white-black transitions in conjunction with a set of heu-
ristics to determine the decisive segmentation points.
Secondly, we introduce a shape-based alphabet that is
intended to reduce the redundancy in the shapes of Arabic
letters. Thirdly, this reduced alphabet is used to construct a
new model to incorporate more context by assigning the
diacritics to their nearest graphemes. Thus, only one model
that includes both grapheme classes and diacritics classes is
built to represent the whole alphabet. This allows the HMM
to easily handle stochastic variations of diacritics as well as
errors from the MLP grapheme recognizer.
This paper is organized as follows. ‘‘Arabic Handwriting
Characteristics’’ section describes the Arabic handwriting
characteristics. ‘‘Architecture and Description of Proposed
Recognition System’’ section presents an overall descrip-
tion of proposed recognition system. ‘‘Segmentation of
Words’’ section details the pre-processing and segmenta-
tion steps. ‘‘Feature Extraction’’ section presents the fea-
ture extraction method. ‘‘Statistical Modeling of
Handwritten Words’’ section gives the justifications behind
the design of the model we proposed and details the steps
of learning and recognition of the neuro-markovian net-
work. ‘‘Application and Experimental Results’’ section
deals with the application of our model to handwriting
Arabic amounts and presents the experiments performed to
validate the approach. Finally, we present some concluding
remarks and perspectives.
Arabic Handwriting Characteristics
Since the characteristics of Arabic handwriting are differ-
ent from the Latin ones and some of the readers may be
unfamiliar with Arabic script, a brief description of the
important aspects of Arabic script will be presented. Arabic
text is inherently cursive both in handwritten and printed
forms and is written horizontally from right to left. The
alphabet contains 28 different characters. Different Arabic
characters may have exactly the same shape and are dis-
tinguished from each other only by the addition of one out
of five diacritics shown in Fig. 1a. These are normally one,
two, or three dots, ‘‘hamza’’ or ‘‘medda’’. However, in
handwriting, two dots can be written as two connected dots
(straight segment). For three dots, they can be drawn as two
connected dots and a single dot, triangle or an open curve.
Figure 1b shows some variations in handwritten dots. A
diacritic may be above, below, or even inside the charac-
ter’s main shape. For example, the three different charac-
ters ( ) have the same main shape but different
diacritics. Ambiguous writing of these diacritics sometimes
causes a word image to be read in many various forms with
completely different meanings.
Cogn Comput (2011) 3:382–393 383
123
In contrast to Latin, Arabic characters are not divided
into upper and lower case categories. Instead, an Arabic
character might have several shapes depending on its rel-
ative position in a word (beginning, middle, end, or alone),
for example ( ). Table 1 shows a complete set of
Arabic characters in all their forms depending on their
position in the word. There are six characters, which cannot
be connected to the left. These characters will be called last
characters of sub-word ( ).
Arabic writing is cursive, and words are separated by
spaces. However, a word can be divided into smaller units
called sub-words (a portion of a word including one or
more connected characters).
The vocabulary of Arabic amounts is larger than those
found in Latin languages, see Table 2 [15]. This is due to
three major factors. First, Arabic has three different forms:
singular, double, and plural as shown in Fig. 2. Secondly,
double and plural nouns have up to four different forms
according to their grammatical positions as shown in
Fig. 3. Thirdly, most numbers define two forms for femi-
nine and masculine countable things as shown in Fig. 4.
Architecture and Description of Proposed
Recognition System
Figure 5 shows a block diagram of the proposed recogni-
tion system for handwritten Arabic literal amounts. The
technique can be summarized as follows. The input image
is first smoothed and binarized. The connected components
(CCs) of black pixels are detected, then the system
meddadot twodots
threedots
hamza
two dots
three dots
(b)
(a)
Fig. 1 Diacritics: a different diacritic shapes, b some variations in
handwritten dots
Table 1 Arabic alphabet in all its forms (end form EF, middle form
MF, beginning form BF, and isolated form IF)
Table 2 Arabic literal amounts vocabulary
Fig. 2 Singular, double, and plural forms for the word ‘‘thousand’’
Fig. 3 Four grammatical forms of the word ‘‘two-thousand’’
Fig. 4 Feminine and masculine forms of the word ‘‘three’’
384 Cogn Comput (2011) 3:382–393
123
determines which CCs are sub-words and which are dia-
critics. The diacritics are then isolated and identified sepa-
rately, and the sub-words are segmented into graphemes.
Different feature sets are extracted from each grapheme.
First, the MLP classifier is used as a labeller. After identi-
fying grapheme classes for input word image, the grapheme
labels (classes) are arranged from right to left according to
their appearance in the word. Then, we match the grapheme
and diacritic sequence against the candidate vocabulary
words. A word image is counted as correctly classified if all
graphemes and diacritics composing it are correctly classi-
fied. If this matching cannot succeed to recognize one of
candidate words, the MLP outputs are used as observation
(posterior) probabilities of graphemes and the neuro-mar-
kovian system is then carried out. Next, we use the Viterbi
algorithm in order to find the optimal path representing the
recognized word, which is an ordered list of graphemes and
diacritics associated with the sequence of observations. Note
that matching means verifying if the input word exists in the
lexicon or not. Therefore, a lexicon is required in order to
accept only legal words.
Segmentation of Words
The word images are acquired through a digital scanner
and stored into a file. The words must be binarized and
smoothed before the segmentation. Here, the segmentation
is performed in two levels: word and sub-word segmenta-
tion, and character segmentation. The segmentation pro-
cedure of lines into words and sub-words consists of
identification and classification of connected components.
After a detection of connected components in the input text
line, these components are classified in two classes: sub-
words including isolated characters and diacritics including
dots, see Fig. 6.
Most segmentation methods that are used classify the
connected components using information on their sizes and
positions. Any connected component whose size is less
than a threshold is regarded as a diacritical mark. The
algorithm used to determine the diacritics is presented in
[18]. To improve segmentation efficiency, we opted to
remove diacritics-like dots from characters. Their original
position and number are stored and reintroduced only in the
Viterbi Algorithm
Ordered list of graphemeand diacritic labels
Transition probabilitiesFrom training samples
Preprocessing
Connected ComponentAnalysis and Segmentation
Feature Extraction
Grapheme and diacriticlabels concatenation
Matching
MLP
MLP as a labeler (classifier)
ObservationProbabilities of graphemes
Lexicon
Successful matching
yes
Acceptedrecognition
result
no
Continuous HMM decision (recognition)
ObservationProbabilities of diacritics
Diacritic labels
Matching
Successful matching
yes
no
Acceptedrecognition
result
Rejectedrecognition
result
Fig. 5 Block diagram of the
proposed recognition system:
dashed and solid arrowsrepresent control and data
flows respectively
sub-wordisolated character
dots
Fig. 6 Different types of connected components forming the word
‘‘ninety’’
Cogn Comput (2011) 3:382–393 385
123
recognition phase. Only sub-words are considered by the
next segmentation phase.
The segmentation is carried out with the help of an
evaluation function based on a running count of horizontal
white-black transitions of the sub-word. We scan the bi-
narized image vertically column by column and locate each
white-black transition at the black pixel by retaining its
gray level as 0 and replacing the gray levels of the other
black pixels with 1. Figure 7 illustrates the steps for
extracting the primary segmentation points (PSPs). Fig-
ure 7b shows the result by locating the white-black tran-
sition vertically on Fig. 7a. The number of vertical white-
black transitions at every point in the sub-word is
calculated.
The steps of our algorithm are as follows:
Step 1: Scan the sub-word from right to left, and find the
points at which:
1. The white-black transition number changes from or to
1.
2. Upper contour of the sub-word parts, which have only
one white-black transition, changes quickly from high
to low or from low to high.
This procedure gives the primary segmentation points
(PSPs), which are represented by vertical white segments.
Thus, the sub-words are segmented into different parts; see
Fig. 7c.
Step 2: Use some rules to check whether the primary
segmentation points are real segmentation points (RSPs).
The rules are as follows:
Rule 1: The parts with short lengths are due to noise
(spurious pixels) and must be not considered. Each
redundant part introduces two primary segmentation
points, which must be removed; see Fig. 8. Notice that p1
occurs at the beginning/end of character and, p2 and p3
occur inside a character. Consequently, p2 and p3 must be
removed, and we retain only p1.
Rule 2: Remove the two last PSPs if the part between
them is of one black-white transition, see Fig. 9a.
Rule 3: Remove the last PSP in the last character, like
“ ”, “ ”,, and “ ” as in Fig. 9b. In the case, when the last
part is a long-vertical segment or a loop, the breakpoint is
retained as in Fig. 9c.
These rules are then applied to each PSP in order to
validate the real segmentation points (RSPs); the latter are
used to segment the input image. The connecting part
which is a useless horizontal segment between two suc-
cessive (RSPs) should be removed before recognition
because it causes recognition error. Figure 10 shows the
segmentation of the word “ ” depicted in Fig. 7 into
graphemes; the final RSPs are marked by vertical white
segments.
The segmentation algorithm is baseline independent,
which make it more robust especially for handwritten
scripts compared to the existent algorithms.
In this explicit segmentation, words are segmented into
graphemes, which are then recognized individually. A
grapheme may be an entire character or portion of
(a)
(b)
(c)PSPs
Fig. 7 The main steps for extracting (PSPs): a the main body of the
word five “ ”, b result by locating the white-black transitions
vertically on (a), c primary segmentation points (PSPs)
p1 p2 p1p3
p1
P3
p2 p1
Fig. 8 p2 and p3 are removed and only p1 is retained
p1 p2 p1 p2p1
p2
(a)
(c)
p1p1 p1
(b)
p1p1
Fig. 9 Filtering the redundant PSPs: (a, b) PSPs are removed to
avoid splitting characters, c PSPs are retained
386 Cogn Comput (2011) 3:382–393
123
character. For example, the character is segmented into
three portions as shown in Fig. 10, and each portion of the
character has the same class as the main shape of
character “ ”. As a result, the number of graphemes to be
recognized is smaller than the number of characters of
alphabet of Arabic literal amount. Consequently, in this
way, we have introduced a new grapheme-based alphabet
for handwritten Arabic word recognition, which allows us
to benefit from some inherent properties of Arabic writing.
Feature Extraction
In our vocabulary of the Arabic amounts, alphabet can be
divided into 18 classes; each class contains graphemes that
have a main shape; however, they can be distinguished by
the number of dots and their positions. After the segmen-
tation into graphemes, the word image is represented as a
sequence of observations (graphemes). Each grapheme is
either only one character or a portion of a character.
At this level, each grapheme is described by different
sets of features:
The first feature set is the chain-code histogram (CCH),
which is a statistical measure for the directionality of the
contour of a character. Four directions (0�, 45�, 90�, 135�)
are considered; thus, the directions between two successive
pixels are encoded as 0, 1, 2, and 3 direction codes,
respectively, as shown in Fig. 11. To extract features from
contour directions, the bonding box of each digit is divided
into four zones, and within each zone, the (4-bin) histo-
gram of chain code is computed where each bin represents
the frequency of the respective direction. Therefore, the
contour feature vector is composed of 4 9 4 (16) compo-
nents normalized between 0 and 1. Thus, fi, i = 1, 2,…, 16,
where fi is the ith directional feature. Besides, the density
of the contour pixels of each zone is calculated.
The second set is based on the white-black transition
information in the vertical and horizontal directions of a
grapheme image:
fhk ¼lhk
w; fvk ¼
lvk
h
for k ¼ 1; 2 and fh3 ¼P
k� 3 lhk
w; fv3 ¼
Pk� 3 lvk
h
where lhk and lvk are the lengths of parts, which have
k transitions in the horizontal and vertical direction,
respectively, whereas w and h, respectively, denote the
width and the height of the grapheme.
The third set, calculates the profiles of the grapheme
from the upper, lower, left, and right boundaries of the
image. The profile area is computed as the number of
pixels between the edges of the image (bounding box) and
the contour of the grapheme. Each profile feature is cal-
culated as the ratio between the area of each profile and the
grapheme’s area. Finally, relative size features are calcu-
lated with respect to pre-fixed size (w0, h0): ratio1 = w/w0
and ratio2 = h/h0 where w and h are the width and the
height of the grapheme, respectively.
As a result, we obtain a feature vector of 32 components
per grapheme.
Statistical Modeling of Handwritten Words
The hidden Markov model (HMM) theory has been suc-
cessfully used to model the writing variability [19]. The
theoretic formulation of HMM is beyond the scope of this
paper. Our interest in the HMM lies in its ability to effi-
ciently model different knowledge sources. It correctly
integrates different modeling levels (morphological, lexi-
cal, and syntactical) and also provides efficient algorithms
to determine an optimum value for the model parameters.
In this section, we give the justifications behind the design
of the model we propose and we detail the steps of learning
and recognition of the neuro-markovian network.
The Proposed Model
Markovian modeling assumes that a word image is repre-
sented by a sequence of observations. These observations
should be statistically independent once the underlying
hidden state sequence is known.
The task of the recognition problem is to find the word
w maximizing the posterior probability that w has generated
an unknown observation sequence (segments) o1,…, on:
Connecting parts
CharactersCharacter portionsCharacter
Fig. 10 Segmentation of the word “ ” (Fig. 7) in graphemes:
the final RSPs are marked by vertical white segments
12
03
(a) (b)
Fig. 11 The chain-code extraction: a contour of digit “ ” divided
into four zones, b 4-chain-code directions
Cogn Comput (2011) 3:382–393 387
123
p w=o1; . . .; onð Þ ¼ maxw
p w=o1; . . .; onð Þ: ð1Þ
Applying Bayes’ rule to this definition, we obtain the
fundamental equation of pattern recognition.
p w=o1; . . .; onð Þ ¼ p o1; . . .; on=wð ÞpðwÞp o1; . . .; onð Þ ð2Þ
Since p o1; . . .; onð Þ does not depend on w, the decoding
problem becomes equivalent to maximizing the joint
probability.
p w; o1; . . .; onð Þ ¼ p o1; . . .; on=wð ÞpðwÞ ð3Þ
p(w) is the a priori probability of the word w.
In the HMM paradigm, we can write
p o1; . . .;on=wð Þ¼X
s1���sn
p o1; . . .;on=s1; . . .;sn;wð Þp s1; . . .;sn=wð Þ
ð4Þ
In the case of the handwritten writing for each model, a
succession (path) prevails greatly [20], and we can write
therefore:
p o1; . . .; on=wð Þ ¼ p o1; . . .; on=s1; . . .; sn;wð Þp s1; . . .; sn=wð Þ:ð5Þ
In an HMM, each sequence element is assumed to
depend only on the corresponding state.
p o1; . . .; on=s1; . . .; sn;wð Þ ¼Yn
j¼1
p oj
�sj;w
� �ð6Þ
Our HHM is assumed to be of the first order.
p s1; . . .; sn=wð Þ ¼Yn
j¼2
p sj
�sj�1;w
� �ð7Þ
We showed that under these two hypotheses, we can
write:
p o1; . . .;on;s1; . . .;sn=wð Þ¼Yn
j¼1
p oj
�sj;w
� �Yn
j¼2
p sj
�sj�1;w
� �:
ð8Þ
For most applications, the observations are continuous
signals. Vector quantization of these continuous signals can
degrade the performance significantly. Therefore, it is
necessary to include continuous observation densities in
the Markov models by using neural networks. A popular
approach has been to replace HMM state observation
probabilities with scaled MLP probability estimates for
each of the output units, instead of Gaussian mixtures [21].
The advantage of such a hybrid scheme over traditional
HMM recognition is the discriminative nature of the MLP
training. In classical HMM training algorithms, the models
are trained to maximize the likelihood of producing their
training examples, but no training is done to minimize
some form of probability that other examples are produced
by the model. However, the MLP automatically
incorporates discrimination. When an MLP is trained for
grapheme classification, it is explicitly demanded that one
output is maximal and the other outputs are zero. This
provides a discriminating effect.
Multilayer Perceptron and Parameter Estimation
of the Neuro-Markovian Network
The proposed MLP is a three-layer network: an input layer,
a hidden layer, and an output layer as shown in Fig. 12.
The input layer composed of the vector x of character-
istics obtained from the input image; the second layer
contains the hidden neurons. The output of the hidden
neuron m is:
hm ¼ f w0m þXI
i¼1
xiwim
!
¼ fXI
i¼0
xiwim
!
;
where x0 = 1 is the neuron corresponding to the bias w0m.
Finally, the output layer is sized according to the num-
ber of classes to be distinguished.
The output of the output neuron j is:
pj ¼ f z0j þXM
m¼1
hmzmj
!
¼ fXM
m¼0
hmzmj
!
;
where h0 = 1 is the neuron corresponding to the bias z0j.
f is the sigmoid function:
f ðaÞ ¼ 1
1þ e�aand f
0 ðaÞ ¼ f ðaÞð1� f ðaÞÞ:
MLP learn through an iterative process of adjustments
applied to their weights. The most common learning
algorithm is the standard back-propagation algorithm [22].
The algorithm uses a gradient-based search technique to
minimize the instantaneous error between target t and
actual output p for input pattern k and J output neurons:
Ek ¼1
2
XJ
j¼1
tkj � pk
j
� �2
;
where pj � p sj
�oj
� �:
p1
pJhM
x1
xI
hm
h1
xi pjwim zmj
Fig. 12 Multilayer perceptron with one hidden layer
388 Cogn Comput (2011) 3:382–393
123
The weight update is performed repeatedly for total
patterns K until the total error ET ¼PK
k¼1 Ek is smaller
than a predefined threshold value. The updating rule with
Rumelhart’s momentum is defined as follows [22].
Compute error terms for output neurons and hidden
neurons, respectively:
dkj ¼ tk
j � pkj
� �pk
j 1� pkj
� �ð9Þ
dkm ¼ hk
m 1� hkm
� �XJ
j¼0
dkj zmj: ð10Þ
Update weights zmj and wim, respectively:
zmjðt þ 1Þ ¼ zmjðtÞ þ gdkj hk
m þ l zmjðtÞ � zmjðt � 1Þ� �
ð11Þ
wimðt þ 1Þ ¼ wimðtÞ þ gdkmxk
i þ l wimðtÞ � wimðt � 1Þð Þð12Þ
where g and l are the learning rate and the momentum,
respectively.
At the end of an optimal training, the posterior proba-
bility p sj
�oj
� �is computed by the multilayer perceptron
[23, 24]. On the other hand, the Markovian model uses the
probability p oj
�sj
� �of observing image oj given a state sj.
The two terms are related by the Bayes formula:
p oj
�sj
� �¼
p sj
�oj
� �� p oj
� �
p sj
� � : ð13Þ
And for a sequence o1; . . .; on and a path s1; . . .; sn:
p o1; . . .; on; s1; . . .; sn=wð Þ ¼Yn
j¼1
p sj
�oj;w
� �
�Yn
j¼2
p sj
�sj�1;w
� �
�Qn
j¼1 p oj
� �
Qnj¼1 p sj
� �:
ð14Þ
Since the product of image segments probabilities p(oj)
does not depend on the word w hypothesis, we can write:
p o1; . . .; on; s1; . . .; sn=wð Þ
/Qn
j¼1 p sj
�oj;w
� ��Qn
j¼2 p sj
�sj�1;w
� �
Qnj¼1 p sj
� � :ð15Þ
In the above formulas, the terms of the type p sj
�sj�1
� �
are transition probabilities that can be estimated using the
Baum–Welch algorithm [1], and the terms of the type
p sj
�oj
� �are well estimated by the outputs of the neural
network.
The prior probability p(sj) is obtained by computing the
number of occurrences of each state (class) in the entire
training database. The term p(oj) is factored out in the
Viterbi decoding [1], since it is common to all the states,
for a given input grapheme.
Application and Experimental Results
In Arabic script, an Arabic word is generally composed of
sub-words and diacritics. Once the diacritics are separated
and identified separately, the segmentation module seg-
ments the sub-words into graphemes. Assume the seg-
mentation is successful and recognition is based on
segmentation of a word into graphemes and diacritics. For
recognition, diacritics must be assigned to specific letters.
However, the position of diacritics is variable, and conse-
quently, the recognition process is made more complicated.
The problem we faced is how to introduce the type of the
diacritics and their positions in our model. In fact, each
diacritic type is associated with the nearest grapheme.
Consequently, in the word model, the observation of each
diacritic comes just after the observation of the nearest
grapheme. Besides, each separation between two succes-
sive connected components is introduced as additional
class (state) in our model, which models the separation
between two successive separated graphemes. Figure 13
illustrates the observation sequence of the word “ ”.
Only encircled parts symbolize the observation sequence,
the remaining ones are connecting parts and must be
removed.
In our application, a path discriminant continuous HMM
was chosen as the recognition engine. Only one HMM is
used for all the word classes, and different paths in the
model distinguish one word class from the others.
The training consists in estimating transition and
observation probabilities. The state transition probabilities
(inter-graphemes transitions) are estimated by counting
over the whole training database. The transitions “ ” of
the words “ ” and “ ” present, therefore, the same
transition “ ”. State observation probabilities of
graphemes are estimated by the outputs of the neural net-
work. This network has as many outputs as possible states.
o1
o2
Separation
o3
o4
o5o7
o6
Fig. 13 Observation sequence of the word “ ”
Cogn Comput (2011) 3:382–393 389
123
In all, 18 classes of graphemes are used in our vocabulary
of Arabic literal amounts. Therefore, 18 classes are used.
However, each detected diacritic can be seen as a state with
an observation probability equals to 1 and 0 if not. In the
same way, the observation probability of separation
between graphemes equals to 1 if it exist and 0 if not. Thus,
the number of states in the HMM is equal to 24 states.
At the word recognition process, each word is seg-
mented into a set of graphemes. These resulted graphemes
are arranged from right to left according to their appear-
ance in the word. The rightmost grapheme corresponds to
the first one in the word, and so on. Then, the word is
described by an ordered list of graphemes and their asso-
ciated diacritics. After identifying grapheme classes for
each handwritten word image using MLP classifier, we
match the grapheme and diacritic sequence against the
candidate vocabulary words. A word image is counted as
correctly classified if all graphemes and diacritics com-
posing it are correctly classified. If this matching cannot
succeed to recognize one of candidate words, the neuro-
markovian system is then carried out. To recognize a word,
the HMM computes the likelihood of words by summing
the probabilities over all possible paths through the word
model. Next, we use the Viterbi algorithm in order to find
the optimal path representing the recognized word.
A database consisting of 7,200 images of 48 different
words is used for developing handwritten Arabic literal
amount recognition. The 48 words of the vocabulary were
written three times by 50 writers. Table 2 illustrates the 48
different classes of words, and Fig. 14 shows some word
samples extracted from the used database.
For the segmentation algorithm, the experimental results
show that the algorithm achieved about 94% correct seg-
mentation. However, exceptional cases lead to over- or
under-segmentations, so we need more rules to obtain more
precise segmentation results. Figure 15a illustrates some
confusion due to over-segmentations caused by existence
of redundant parts such as spurious branches and small
loops where the used rules cannot segment correctly these
sub-words. Here, only the PSPs p1 and p2 must be retained.
Furthermore, certain character combinations form new
ligature shapes where one letter is above another, see
Fig. 15b. Consequently, the real segmentation points are
not detected, which lead to under-segmentation. Besides,
the strokes (graphemes) of some characters like “ ” are
omitted in some handwriting styles and written as hori-
zontal segment as depicted in Fig. 15c; consequently,
we should add more suitable rules to remove such
confusions.
For word recognition, the database was divided into
three parts, one part for training and the remaining two
parts for test. To train the MLP classifier, we used only
well-segmented graphemes. We have implemented an
MLP classifier. Then, we have studied the influence of the
number of hidden units M on the classifier performances.
The best performance of the MLP classifier (94.60%) is
obtained with a number of hidden neurons equals to 25.
The most frequent confusions generated by MLP are
related to specific configurations of character-pairs, for
example “ ”-“ ”, “ ”-“ ”, “ ”-“ ”, “ ”-“ ”, “ ”-“ ”,
“ ”-“ ”, “ ”-“ ”, “ ”-“ ”, “ ”-“ ”, and “ ”-“ ” . In such
cases, many characters were clustered near each other in
the feature space, leaving no chance to recognize them
uniquely. After analyzing the system errors, we observed
that many of the misrecognitions occurred between visu-
ally similar character-pairs as might be expected.
Among the confusions of the neural system removed by
the neuro-markovian system is the case of the word
“ ” where the MLP does not recognize the first char-
acter “ ” and recognize successfully the other graphemes.
However, the neuro-markovian system suggest the word
“ ” in the first position with observation sequence
probability of 1.654 exp(-15).
Another ambiguity between the letter “ ” and “ ”caused by MLP is removed by neuro-markovian system.
The neural network suggests the letter “ ” with a posterior
probability of 0.9903 in first suggestion instead of “ ” ofFig. 14 Word samples from used database
Rudentant loops
Rudentant branch
p1p2
Missed RSP
(a)
(b) (c)
Fig. 15 Incorrect segmentation types: a over-segmentation: p1 and
p2 are the RSPs and all other vertical segments must be removed,
b under-segmentation, c The strokes of “ ” are omitted
390 Cogn Comput (2011) 3:382–393
123
the word “ ” with a posterior probability of 0.1006.
However, the neuro-markovian system recognizes the word
“ ” well in the first position with observation sequence
probability of 8.2742 exp(-8).
To demonstrate the effectiveness of the HMM–MLP
technique applied, a comparison with MLP classifier has
been made. MLP is a state of the art classification tech-
nique, and it has been successfully applied to such real-
word problems as character recognition. Moreover, the
literature has shown better results on digit recognition
using MLP [25–27].
One way to observe the contribution of the contextual
processing is to measure recognition rates at the grapheme
level (by the neural network) and at the word level (by the
neuro-markovian system), then we make the comparison.
Table 3 reports the improvements on word recognition
using the hybrid system. This table presents the results on
grapheme recognition using MLP and word recognition
using HMM–MLP network. We note that the hybrid
system brings an increase in the recognition rate by about
3%.
In spite of the fact that the hybrid system supplied a
remarkable improvement in term of recognition rate at the
word level, it did not successfully recognize some words,
but it was able to correct some graphemes composing
them.
For example, the words “ ” and “ ”, which are
composed of five graphemes, the MLP recognize only the
first and the fourth graphemes “ , ” for the first word and
the first and the second graphemes “ , ” for the second
word. However, the HMM–MLP recognize well (correct)
the second and the third graphemes for the first word and
the third and the fourth graphemes for the second, while the
end grapheme “ ” is recognized as “ ” for both of them.
Consequently, we achieve an improvement at the grapheme
level, which reduces misrecognized graphemes composing
the word where a simple Arabic spell-check applied to this
reduced lexicon can recognize the entire word.
Comparison with published methods is delicate due to
the use of different databases, different numbers of training
and testing samples, and also different features and rec-
ognition algorithms. For example, Farah et al. [15] pre-
sented an approach using a similar database containing
4,800 words for recognizing handwritten Arabic literal
amounts, and they claimed a recognition rate of 96%.
Menasri et al. [13] and Benouareth et al. [16] evaluated
their approaches on the IFN/ENIT benchmark database,
and the achieved results were 87.4 and 90.20%,
respectively.
Compared to these results, we have attained higher
levels of recognition accuracy evaluated on a database of
7,200 words. Moreover, the main strength of this approach
focuses on lexicon reduction using a sophisticated seg-
mentation algorithm and modeling aspect. This approach,
to the best of our knowledge, has not been applied, as it is
presented in this paper for Arabic word segmentation and
recognition in restricted lexicons.
As we know, for HMM-based word recognition, there
are two main approaches: the first relies on an implicit
segmentation [8, 9], where the handwriting data are sam-
pled into a sequence of tiny frames (overlapped or not).
The second uses a more sophisticated explicit segmentation
technique [12, 13] to cut the words into more meaningful
units or graphemes, which are larger than the frames. Our
approach belongs to the second one. As described in [13],
we believe that explicit grapheme segmentation is well
adapted for Arabic writing. One of the reasons is that some
letters such as or have tails that go almost horizontally
under the baseline. If the tail is long, which often happens
in Arabic handwriting, the next letter of the word is likely
to be vertically overlapping the previous tail. Building a
sequence of graphemes intrinsically solves this problem,
while, on the other hand, vertical frames or sliding win-
dows will be forced to process a piece of image that con-
tains parts of two different letters at the same time. Besides,
diacritical marks are often not at the exact position on top
or under the main part of the letter. Consequently, the
vertical slicing approach will eventually split letters from
their diacritics, and thus reduces the character recognition
accuracy. However, our model incorporates these varia-
tions of diacritics by assigning them to their nearest
graphemes.
Compared to Menasri et al. work [13], their approach
introduced a new shape-based alphabet called letter-body
alphabet in order to reduce the lexicon size. However, our
alphabet is more reduced as some characters like has
been segmented into three portions (graphemes). Each
portion has the same class as the main shape of the char-
acter . Besides, their recognizer builds a letter-body
sequence by a concatenation of letter-body HMMs (each
HMM describe one class of shape). However, in our
method, each grapheme corresponds to only one HMM’s
state, which reduce considerably the number of parameters
of our HMM. Furthermore, in our path discriminant
approach, only one HMM is used for all the word classes.
Thus, it can be performed more efficiently by matching
with all word classes simultaneously than by matching one
by one.
Table 3 Recognition rate and error rate over the whole database
Recognition rate (%) Error rate (%)
MLP classifier 94.60 05.40
HMM–MLP hybrid system 97.20 02.80
Cogn Comput (2011) 3:382–393 391
123
Conclusion
In this paper, we have presented a new system based on
explicit segmentation for recognizing handwritten Arabic
words. We used only one word HMM to model explicitly
segmented words, leading to a better discrimination
between them. The posterior probabilities are computed by
the multilayer perceptron, the transition probabilities can
be estimated using the Baum–Welch algorithm, and the
prior probabilities are obtained by computing the number
of occurrences of each state (class) in the entire training
database. First, the MLP is used as a classifier. If the MLP
cannot succeed to recognize one of candidate words, the
MLP outputs are used as observation (posterior) probabil-
ities of graphemes and the neuro-markovian system is then
carried out. Finally, the Viterbi algorithm is used in order
to find the optimal path representing the recognized word.
This system deals with the loss in terms of recognition
performance brought by the grapheme recognition module
and aims at improving the word recognition results and
reliability of the system.
To summarize, the main benefits of this work are:
1. Since classification in a small alphabet is both more
efficient and more accurate than in a large alphabet.
We have introduced a new grapheme-based alphabet
for handwritten Arabic word recognition where the
number of graphemes to be recognized is smaller than
the number of characters of alphabet of Arabic literal
amount.
2. Using the MLP as labeller (classifier) and probability
estimator, we have only as many possible outputs as
there are graphemes in the database. Typically, in a
language, 18 graphemes are distinguished. Moreover,
the MLP have fewer input parameters 32. This implies
that they also have fewer weights; therefore, they can
be trained much faster. Consequently, fewer HMM
parameters are needed. This is not much if we
compared this with the parameter sizes of the other
approaches in the literature. In addition, the most of
words is recognized efficiently (94.60%) using MLP as
classifier without introducing HMM; thus, this
approach is performed at a low cost in computation
and needs less memory requirements.
3. The main argument for using the MLP to obtain output
probabilities is that the latter is trained discrimina-
tively and that no assumption on their distribution is
made as opposed to mixtures of Gaussians.
Finally, the most interesting contribution in this work is
the segmentation algorithm. The advantage of this algo-
rithm is that it is easier to find a set of potential segmen-
tation points because it analyzes the structural shape of
characters as they have been scanned without any
transformation (projection, thinning) and it is baseline
independent, which make it more robust especially for
handwritten scripts compared to the existent algorithms.
The main drawback of this algorithm is over- and under-
segmentation, but we can remedy it by using more
appropriate rules at the segmentation level or by intro-
ducing insert and delete states at the hybrid model level.
For future study, first, we plan to extend and adapt our
approach to applications with large lexicon such as the IFN/
ENIT database and we plan also to use some structural fea-
tures to efficiently remove some confusion such as the
position of the grapheme within the word which will be
computed by the segmentation process. For example, we can
remove the ambiguity between the letter “ ” and “ ” caused
by MLP above where the letter “ ” is located in the middle
of the sub-word, but the letter “ ” is an isolated one.
Secondly, we would like to extend the segmentation
algorithm for segmentation Arabic text into characters
where the word must be segmented into graphemes which
are then recombined to form characters and we expect to
achieve best performances thanks to the simplicity and
efficiency of the approach.
References
1. Rabiner LR. A tutorial on hidden Markov models and selected
applications in speech recognition. Proc IEEE. 1989;77(2):257–86.
2. Chen MY, Kundu A, Srihari SN. Variable duration hidden
Markov and morphological segmentation for handwritten word
recognition. IEEE Trans Image Process. 1995;4(12):1675–88.
3. Senior AW, Robinson AJ. An off-line cursive handwriting rec-
ognition system. IEEE Trans Pattern Anal Mach Intell. 1998;
20(3):309–21.
4. Altuwaijri M, Bayoumi M. Arabic text recognition using neural
networks. In: Proceedings of international symposium on circuits
and systems—ISCAS’94; 1994. p. 415–8.
5. Amin A, Al-Sadoun H. Handprinted Arabic character recognition
system using an artificial neural network. Pattern Recognit.
1996;29:663–75.
6. Morgan N, Bourlard H. Continuous speech recognition using
multilayer perceptrons with hidden Markov models. In: Pro-
ceedings of ICASSP-90; 1990. p. 413–6.
7. Lorigo LM, Govindaraju V. Offline Arabic handwriting recogni-
tion: a survey. IEEE Trans Pattern Anal Mach Intell. 2006;
28(5):712–24.
8. Makhoul J, Schwartz R, Lapre C, Bazzi I. A script independent
methodology for optical character recognition. Pattern Recognit.
1998;31(9):1285–94.
9. Dehghan M, Faez K, Ahmadi M, Shridhar M. Handwritten Farsi
(Arabic) word recognition: a holistic approach using discrete
HMM. Pattern Recognit. 2001;34:1057–65.
10. Khorsheed MS. Recognising handwritten Arabic manuscripts
using a single hidden Markov model. Pattern Recognit Lett.
2003;24(14):2235–42.
11. Amin A, Mari JF. Machine recognition and correction of printed
Arabic text. IEEE Trans Man Cybern. 1989;9:1300–6.
12. Miled H, Olivier C, Cheriet M, Lecourtier Y. Coupling obser-
vations/letters for a markovian modeling applied to the
392 Cogn Comput (2011) 3:382–393
123
recognition of the Arabic handwriting. In: Proceedings of 4th
IAPR international conference on document analysis and recog-
nition, ICDAR’97, Ulm, Germany; 1997. p. 580–3.
13. Menasri F, Vincent N, Augustin E, Cheriet M. Shape-based
alphabet for off-line Arabic handwriting recognition. In: Pro-
ceedings of the 9th international conference on document anal-
ysis and recognition ICDAR, Curitiba, Brazil; 2007. p. 969–73.
14. Boukharouba A, Bennia A. Recognition of Handwritten Arabic
words using a neuro-fuzzy network. In: Proceedings of AIP 1st
mediterranean conference on intelligent systems and automation,
Annaba; 2008. p. 254–9.
15. Farah N, Souici L, Sellami M. Classifiers combination and syntax
analysis for Arabic literal amount recognition. Eng Appl Artif
Intell. 2006;19:29–39.
16. Benouareth A, Ennaji A, Sellami M. Semi-continuous HMMs
with explicit state duration for unconstrained Arabic word mod-
eling and recognition. Pattern Recognit Lett. 2008;29:1742–52.
17. Morita M, Sabourin R, Bortolozzi F, Suen CY. Segmentation and
recognition of handwritten dates: an HMM-MLP hybrid
approach. Int J Doc Anal Recognit. 2004;6:248–62.
18. Al-Yousefi H, Udpa SS. Recognition of Arabic characters. IEEE
Trans Pattern Anal Mach Intell. 1992;14:853–7.
19. El-Yacoubi A, Gilloux M, Sabourin R, Suen CY. An HMM-based
approach for off-line unconstrained handwritten word modeling
and recognition. IEEE Trans Pattern Anal Mach Intell. 1999;
21(8):752–60.
20. Lethelier E, Leroux M, Gilloux M. Traitement des montants
numeriques des cheques postaux, approche d’une methode de
segmentation basee sur la reconnaissance. Actes de CNED 94
(3eme Colloque National sur l’Ecrit et le Document), Rouen,
France; 1994. p. 315–23.
21. Naik JM, Lubensky DM. A hybrid HMM-MLP speaker verifi-
cation algorithm for telephone speech. In: Proceedings of IEEE
international conference on acoustics, speech, and signal pro-
cessing (ICASSP); 1994. p. 153–6.
22. Looney CG. Advances in feedforward neural networks: demys-
tifying knowledge acquiring black boxes. IEEE Trans Knowl
Data Eng. 1996;8(2):211–26.
23. Bourlard H, Wellekens CJ. Links between Markov models and
multilayer perceptrons. In: Denver CO, Touretzky D, editors.
Proceedings of IEEE conference on neural information process-
ing systems, Morgan-Kaufmann; 1989. p. 502–10.
24. Hampshire JB, Pearlmutter H. Equivalence proofs for multi-layer
perceptron classifiers and the Bayesian discriminant function. In:
Proceedings of in connectionist models: proceedings of the
summer school, Morgan-Kauffmann; 1990. p. 159–72.
25. Ha TM, Bunk H. Off-line handwritten numeral recognition by
perturbation method. IEEE Trans Pattern Anal Mach Intell.
1997;19(5):535–9.
26. Oliveira LS, Sabourin R, Bortolozzi F, Suen CY. Automatic
recognition of handwritten numerical strings: a recognition and
verification strategy. IEEE Trans Pattern Anal Mach Intell.
2002;24(11):1438–54.
27. Liu J, Gader P. Neural networks with enhanced outlier rejection
ability for off-line handwritten word recognition. Pattern Rec-
ognit. 2002;35:2061–71.
Cogn Comput (2011) 3:382–393 393
123