[ieee 2012 11th international conference on signal processing (icsp 2012) - beijing, china...
TRANSCRIPT
Arabic Handwriting Recognition Using GaborWavelet Transform and SVM
Moftah Elzobi, Ayoub Al-Hamadi, Anwar Saeed, and Laslo DingsInstitute for Electronics, Signal Processing and Communications (IESK)
Otto-von-Guericke-University Magdeburg
D-39016 Magdeburg, P.O. Box 4210 Germany
{Moftah.Elzobi,Ayoub.Al-Hamadi}@ovgu.de
Abstract—In this paper, we propose a segmentation basedrecognition approach for handwritten Arabic text. The approachstarts by segmenting the word images into their constituent letterrepresentatives through exploiting a set of structural features. Forclassification, Gabor transform-based features are extracted fromeach letter that passed to a SVM classifier for recognition. Fortraining and testing, we used IESK-arDB database, which is anArabic off-line handwritten database, that containing the mostcommon Arabic words as well as security-related Arabic terms.The database is developed in the Institute for Electronics, SignalProcessing and Communication (IESK) at Otto-von- GuerickeUniversity Magdeburg, Germany. And it is freely available at(http://www.iesk-ardb.ovgu.de/).
The approach achieved an average of 70% segmentationaccuracy on 600 word images. Recognition rate of 74%, on set of5436 segmented letter images is reached, according to a Leave-one-out estimation method.
Index Terms—Optical character recognition, Arabic handwriting,Character segmentation, Gabor transform-based features.
I. INTRODUCTION
Given the importance of Arabic script, first, as the writing
medium of many languages (e.g., Arabic, Farsi, Urdu, and
etc.), and second, as being the script used in a huge amount of
historical documents that cover the history of many nowadays
countries throughout Middle East, North Africa, Central Asia,
and Balkan over several centuries. Many research papers and
articles have been appeared, suggesting various solutions for
the problem of Arabic handwriting recognition over the past
three decades, yet with minor advances.
Literature addressing the problem of handwriting recog-
nition can be classified into two broad categories: (i)Segmentation-based recognition; this category includes all ap-
proaches that perform segmentation (into letters or primitives)
prior to recognition. Advantage of such approaches is, their
capability to cope with the high variability nature of the
problem; the disadvantage however is the complexity and the
error-prone characteristics of the segmentation process. One
of the earliest segmentation-based approach, suggested for the
recognition of Arabic handwritten text, is the one proposed
by Almullim and Yamaguchi [1]. In this approach, words
are over-segmented into their basic strokes, where a stroke
is the curve between any two structure points (End points or
Branch points). Each stroke is then classified to one of five
groups according to its shape. Two of these groups contain
what is called secondary strokes, and the other three contains
primary strokes. Furthermore, a set of heuristics proposed to
assign secondaries to their primary strokes, in a try to construct
the corresponding character. A recognition rate of 81.25% is
reported. In [2], Bushofa and Spann propose a segmentation-
based recognition methodology for off-line printed Arabic
text. The segmentation algorithm stars by locating the text
baseline. Having the baseline discovered, a fixed size window
is used along the baseline to search for specific types of angles
that are expected to be formed when letters joined together.
Multiple heuristics rules are used to confirm the segmentation
results. Finally, features are extracted and a decision tree based
classifier is used for recognition, achieving 94.17% as a success
rate. Abuhaiba et al. [3] presented a recognition system for
off-line Arabic handwritten text. The system is segmentation-
based one, thinned and smoothed images of the strokes (sub-
word) are processed and converted into 1-D representations
called direct straight-line approximation. The representation
is processed further to produce a loop less graph called the
reduced graph where loops replaced by vertices. Ultimately,
the reduced graph representative is segmented into tokens
that are fed to a fuzzy sequential machine for recognition.
As a sub-word (no segmentation) recognizer, the proposed
system achieved 55.4% recognition rate; when segmentation
involved the recognition rate degraded to 51.1%. In [4], Xiu
et al. proposed probabilistic segmentation-based recognition
model, in which a tentative contour-based over-segmentation
is first performed on the text image. As a result, a set of what
they called graphemes is produced. The approach differentiates
among three types of graphemes. The confidence of each
character is calculated according to the probabilistic model,
respecting other factors e.g., recognition output, geometric
confidence and logical constraint. The authors experimented
the proposed methodology on five different test sets, achieving
59.2% success rate.
(ii) Segmentation-free recognition, as its name implies,
includes all approaches that are treating each word image as
single entity upon which features is extracted. Even though
such approaches proof successfulness in some application
areas, they are completely Lexicon dependent and are inca-
pable to serve in an unconstrained environment [5] [6]. One
of the earliest works that is followed a holistic approach
for the recognition of type-written and printed Arabic words
is the approach proposed by Al-Bader and Haralick [7]. In___________________________________ 978-1-4673-2197-6/12/$31.00 ©2012 IEEE
ICSP2012 Proceedings
their approach, they first start by detecting what they called
”shape primitives”, then matching the regions containing the
shape primitives against a set of pre-defined symbol models.
A spatial arrangement of the best fitted symbol models is
regarded as a description of the recognized word. The system
is experimented with various types of samples; a rate of 73%
is achieved on scanned words. Pechwitz and Maergner in [8],
addressed the problem of handwritten Arabic words. They
proposed a holistic HMM-based approach, which starts by
normalizing word images and then extracting features using
a sliding window approach. Ultimately, features are passed
to a semi-continuous 1-D HMM for recognition. The system
performance is tested on IFN/ENIT database, achieving a max-
imal recognition rate of 89%. Al-Hajj et al. in [9], proposed
an approach for the recognition of handwritten Arabic city
names. Two different types of features have been extracted,
namely baseline-independent and baseline-dependent using a
sliding window approach. For recognition, a right-left HMM
classifier is developed, and experiments are conducted on
IFN/ENIT database with recognition rates ranged from 85.45%
to 87.20%.
The fact that, a Gabor function has the ability to imitate
the functionality of simple kinds of cells in the human visual
system, motivated its application in many image processing
fields [10]. Several works proposing various Gabor-filter-based
solutions for the problem of Latin, Chinese, and Numeral
handwriting [10], [11]. In case of Arabic handwriting, Haboubi
et. al. [12], conduct a comparative study on different kinds
of features, e.g. structural, statistical, Gabor-filter-based, and
pixel-based. With no detailed explanations, the Gabor-filter
based features show the Worst-Case-Performance compared to
the rest. In the contrary to [12], Chen et. al. [11], reported
very good results, when they used Gabor-Filter based features
to recognize Arabic PAWs. They tested their approach on
the AMA-Arabic-Dataset, reporting 82.7% as recognition rate.
Since the set of Arabic PAWs is a very huge one, putting such
approach in service is unlikely.
In this work, we propose a segmentation-based, Gabor-
filter-based approach for the recognition of the Arabic hand-
writing. Unlike [11], we include the segmentation as an integral
part, since we believe it is an intuitively prerequisite operation
to reduce the infinite domain of possible words and/or PAWs
into a limited number of classes that can be accommodated
and processed.
II. METHODOLOGY
A. Segmentation
1) Resolving the PAWs overlaping: To resolve the PAWs
overlapping, we first identified the word baseline. Along the
baseline, we differentiate between two kinds of connected-
components (CC). Namely, ”Main CCs” (mostly are PAWs),
which are all CCs that intersect the baseline and ”Auxiliary
CCs” that are not (mostly are diacritics). To resolve the Main
CCs overlapping, and to relate Auxiliary CCs to their corre-
spondence Main CCs, we formulated a set of heuristics over
the four corners’ coordinates of the connected component’s
bounding-box. Figure 1, shows an example of PAWs overlap
resolving.
1
2
3 4
5
6
7
8 1
2 3
4
5
6
7
8
Baseline
(A) (B)
Fig. 1. PAWs’ overlap resolving: (A) Word image with the Main CCsoverlapping. (B) The Main CCs overlapping resolved.
2) Word image segmentation: To segment the word image
into letters representatives, we first extract a thinned version
from the PAWs-overlapping-free word image. Thinning is
necessary operation in order to reduce the computation cost
and also to ease the process of extracting features points (FP )
(e.g. End-pont, Branch-point, nd Loop-point), that we employ
for segmentation. Our approach is inspired by the approach
presented in [13], but instead of using only the set of contour
local minima as a candidate for segmentation points (SP ),
we included in SP all one-pixel columns as candidates, since
we observed that local minima occur very often inside many
of the letters (�� , �, �,��, �, , �, �� , and etc.). Then
a heuristic based election operation is performed, to exclude
from SP unlikely candidates, as follow: (i) exclude from SPall candidates that are containing any FP , respecting the fact
that any column contains FP , cannot be a SP in the same
time; (ii) starting from the most right, whenever encountering
two direct neighboring candidates, exclude the one on the
right. To handle over-segmentation (SP inside a letter), we
further apply three other heuristic as follow: (a) if there are
two consecutive SP candidates and no FP in between, delete
the one on the right; (b) if the direct neighboring on the left, a
column contains pixels of a diacritic(Dots and Hamza -pixels),
then delete the candidate from SP ; (c) if a column containing
Branch-point or Loop point pixel encountered before reaching
another SP candidate then confirm the candidate as SP .
Finally we insert SP candidates before and after every Main
CC. Figure 2, shows a word image segmented according to
the aforementioned approach.
(A)
(B)
(C)
Fig. 2. Word segmentation: (A) The source word image.(B) Letter boundedby rectangles on the thinned version.(C) The segmented images.
B. Features extraction
As a result of segmentation, we get multiple images of
letters constituting the word image. Before feature extraction
taking place, images should be normalized into a fixed dimen-
sionality to reduce the within-class variation of shape. While
preserving the aspect ratio, we normalized letter images into a
square of size 64× 64, and then extracting Gabor-filter-based
features.
1) Gabor filter based features: Motivated by its similarities
to the functionality of certain cells in the human primary visual
cortex and its spatial frequency localization property, Gabor
filter used extensively in the field of image processing (e.g,
Face recognition, Segmentation, Edge detection, OCR, and
image compression) [11]. In 2D, Gabor filter (as illustrated in
Figure 3), is the result of a Gaussian kernel function (envelope)
modulated by a complex plane sinusoidal function of frequency
and orientation (carrier). It is defined in the spacial plane as
follows:g(x, y;λ, θ, σx, σy) = exp
{− 1
2(x2
σ2x
+y2
σ2y
)
}×
exp
{i(2π
x
λ)
} (1)
where λ is the frequency (in pixel) and θ is the orientation
of the sinusoidal function. σx and σy are the standard de-
viations along the x- and y-axis, and x = x cos θ + y sin θ,
y = −x sin θ + y cos θ.
� =
Fig. 3. Gabor filter: The result of sinusoidal plane modulated by a GaussianKernel
Intuitively, the real and/or the imaginary components of the
filter can be derived from Eq. 1, and can be used instead. In
our case, for feature’s extraction, we convolve the normalized
segmented letter images with a bank of Gabor kernels, gener-
ated by choosing different values for θ, namely (0, π4 ,
π2 ,
3π4 )
and for λ 3 pixels and 5 pixels. The choose of λ values
is inspired by a study conducted in [10]. Additionally, It is
observed that σx and σy parameters are actually a function of
λ, thus we calculate them both as = 0.5λ, which is empirically
proven to be the optimal. The various Gabor kernels are later
used to generate eight different Gabor representations of the
letter image, corresponding to the different orientation and
frequency. Subsequently, we divide the 64× 64 representation
into 8× 8 feature regions, resulting in 64 regions. From each,
we extract one value as an element in 512 feature vector
(8× 64). Figure 4, illustrate the process.
Before passing the features into a machine learning algo-
rithm, the feature vector f = (f1, f2, . . . , f512) is normalized
to be f = (f1, f2, . . . , f512) as follows.
�������������� ������
�������������� ������
1 2 512, ,......,f f f
Fig. 4. Gabor filter: The result of sinusoidal plane modulated by a GaussianKernel
fi =fi − μi
4σi+ 0.5, i = 1, ..., 512. (2)
Where μi and σi are mean and standard deviation of the ithfeature across the training data, respectively.
C. SVM classification
In order to reduce the number of classes, we started by
grouping the letters into six groups, according to the number
of CCs, which resulted in five groups 1CC with 25 classes,
2CCs with 20 classes, 3CCs with 8 classes, and 4CCs with 8
classes) and the sixth group contains only letters with loop
(26 classes). Then, the handwriting recognition problem is
solved by formulating it as a supervised multi-class learning
process. Due to its capability, avoiding the over- fitting, the
local minima problems, and the uncorrelation between its
computation complexity and the dimensionality of input vector,
we choose SVM for classification.
Given a train data D = {(xi, yi) |xi ∈ Rd, yi ∈{−1,+1}, }, Binary classification function is formulated as
follows:
f(x) = sign(w · x + b),
Recalling that a hyperplane can be written as any set of x,
satisfying w ·x− b = 0, where w is a weight vector, normal to
the hyperplane, and b is the offset separating the hyperplane
from the origin of the space. Any two vectors x1, x2, satisfying
w ·x1− b = 1, w ·x2− b = −1 respectively, are called support
vectors, and the formed hyperplanes are called the standard
hyperplanes, the distance separating them is found to be 2‖w‖ .
Thus, the optimal hyperplane separating the two classes, can
be found by minimizing ‖w‖ respecting the follwing condition
yi(w · xi − b ≥ 1) ∀iIn this work, we created a class for each letter shape (ranging
from 2 to 4 classes a letter, according to the aforementioned
shapes a letter may appear in). Several one-vs-all SVM Classi-
fiers are trained. For experiments, we used the LIBSVM library
[14], choosing the Radial basic function (RBF) as a kernel.
TABLE IDETAILED ANALYSIS OF OUR RESULTS FOR THE ENTIRE ALPHABET.
Letters �� �� �� �� � � � �� � �� � �� � �� � �� � �� �� �� � � � � � ��
Rec.%
B X 68 86 71 55 44 77 X X X X 81 85 77 77 91 92 65 70 62 72 72 63 77 89 X 75 75E 91 53 82 68 81 65 85 75 62 51 61 88 92 58 53 83 89 71 88 81 84 92 53 83 73 51 81 83I 93 79 88 77 80 72 63 66 85 49 73 83 86 81 81 87 90 57 93 88 81 X 81 81 91 81 87 91
M X 57 82 50 59 49 63 X X X X 71 61 85 64 83 94 51 55 56 79 X 43 68 69 X 72 74D 87 X X X X X X X X X X X X X X X X X X X X X X X X X X XU 81 X X X X X X X X X X X X X X X X X X X X X X X X X X X
TABLE IIOUR RESULTS COMPARED TO THOSE OBTAINED BY XIU ET AL. [4].
Our method Xiu et al. [4]Set O U S O U SS1 12% 18% 70% 14% 31% 55%S2 15% 20% 65% 20% 28% 52%S3 14% 21% 65% 23% 26% 51%S4 9% 13% 78% 17% 25% 58%S5 8% 21% 71% 20% 23% 57%
Avg. 11.6% 18.6% 69.8% 18.8% 26.6% 54.6%
III. EXPERIMENTAL RESULTS
We conducted two kinds of experiments, firstly, we per-
formed a comparative experiments on the segmentation method
separately. Secondly, we assessed the performance of our
recognition approach and confirming the suitableness of Gabor
filter based features for recognition of Arabic Handwriting,
counterclaiming results published in [12]. As for our recogni-
tion approach, no comparison to other approaches in literature
is presented, since the single work (to the best of our knowl-
edge) [11], that follow a similar approach, is first targeting
the recognition of handwritten Arabic PAWs, rather than letter
images resulted from a segmentation module. Second, the
database used for testing is different than ours.
Realizing its crucial influence on recognition results, we
tested our segmentation method on 600 word images, orga-
nized in 5 sets according to the similarity in writing style.
Furthermore, we compared our results to results published
in [4]. Table II, details the results, where ”O”,”U”, and ”S”
stand for over-segmentation error, under-segmentation error,
completely successful segmentation respectively; In general,
our approach achieved approximately 69.8% as recognition
rate, compared to 54.5% for [4]. The overall recognition
performance is detailed in Table I, where ”B”,”E”,”I”,”M”
indicate the various forms a letter may take (see Subsection
II-C). ”D” and ”U”, for cases when a letter appears with
”Hamza” below or above. ”X” represents the allowed form
for the corresponding letter.
Around 1000 word images, resulted in 5436 letter images
(after segmentation), were used in training a SVM classifier;
and a leave-one-out estimation method is used for testing.
Recognition rates were ranging from 93% down to 43%, with
a median of 77% and an average of ”74.3%”. Letter like ( in I) for example, achieved a high rate of 93%. This can
be attributed to, firstly, it is relatively very frequent letter in
Arabic script, hence represented by a big number of samples
in the training set. Secondly, it has a distinguish simple shape,
allowing extraction of more representative features. Another
letter ( �� E), achieved a rate of 93%, despite the fact of
being less frequent in the script. The reason behind, is the
pre-classification step explained in Subsection II-C; that leads
to group the letter with just one other letter ( �� ), with a total
of 8 classes only, which in turn improve its recognition rate
significantly.
The low rate of recognition of some other letters (e.g., �B, and M), is caused by, the similarity in shape with other
multiple letters; and also, because of the fact, that those letters
belong to a group (1CC) with high number of classes (25
classes), which in turn leads to a degradation in recognition
rate, given the direct relationship between the number classes
and the recognition error.
IV. CONCLUSION AND FUTURE WORKS
In this paper, we proposed segmentation based, Gabor-filter
based approach for the recognition of the Arabic handwriting
script. Results were satisfactory, and the approach proves its
efficiency. It is found that a simple pre-classification step
improving the rate of recognition significantly, thus, we plan
to invest more efforts on improving a pre-classification stage,
that will consequently, lead to distributing letters into a few
numbers of classes, which can improve the results further.
REFERENCES
[1] H. Almuallim and S. Yamaguchi, “A method of recognition of arabiccursive handwriting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 9,pp. 715–722, September 1987.
[2] B Bushofa, “Segmentation and recognition of arabic characters bystructural classification,” Image and Vision Computing, vol. 15, no. 3,pp. 167–179, 1997.
[3] I. S. I. Abuhaiba, M. J. J. Holt, and S. Datta, “Recognition of off-linecursive handwriting,” Computer Vision and Image Understanding, vol.71, no. 1, pp. 19 – 38, 1998.
[4] Pingping Xiu, Liangrui Peng, Xiaoqing Ding, and Hua Wang, OfflineHandwritten Arabic Character Segmentation with Probabilistic Model.,pp. 402–412, Number project 60472002. Springer, 2006.
[5] L.M. Lorigo and V. Govindaraju, “Offline arabic handwriting recog-nition: a survey,” Pattern Analysis and Machine Intelligence, IEEETransactions on, vol. 28, no. 5, pp. 712 –724, may 2006.
[6] Sriganesh Madhvanath and Venu Govindaraju, “The role of holisticparadigms in handwritten word recognition,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 23, no. 2, pp. 149–164, Feb. 2001.
[7] Badr H. Al-Badr, A segmentation-free approach to text recognition withapplication to Arabic text, Ph.D. thesis, Seattle, WA, USA, 1995, UMIOrder No. GAX95-37297.
[8] M. Pechwitz and V. Maergner, “Hmm based approach for handwrittenarabic word recognition using the ifn/enit - database,” in DocumentAnalysis and Recognition, 2003. Proceedings. Seventh InternationalConference on, aug. 2003, pp. 890 – 894.
[9] R. Al-Hajj Mohamad, L. Likforman-Sulem, and C. Mokbel, “Combiningslanted-frame classifiers for improved hmm-based arabic handwritingrecognition,” Pattern Analysis and Machine Intelligence, IEEE Transac-tions on, vol. 31, no. 7, pp. 1165 –1177, july 2009.
[10] Yoshihiko Hamamoto, Shunji Uchimura, Masanori Watanabe, TetsuyaYasuda, Yoshihiro Mitani, and Shingo Tomita, “A gabor filter-basedmethod for recognizing handwritten numerals,” Pattern Recognition,vol. 31, no. 4, pp. 395 – 400, 1998.
[11] Jin Chen, Huaigu Cao, Rohit Prasad, Anurag Bhardwaj, and PremNatarajan, “Gabor features for offline arabic handwriting recognition,”in Proceedings of the 9th IAPR International Workshop on DocumentAnalysis Systems, New York, NY, USA, 2010, DAS ’10, pp. 53–58,ACM.
[12] S. Haboubi, S. Maddouri, N. Ellouze, and H. El-Abed, “Invariantprimitives for handwritten arabic script: A contrastive study of fourfeature sets,” in Document Analysis and Recognition, 2009. ICDAR ’09.10th International Conference on, july 2009, pp. 691 –697.
[13] A. Alper Atici and Fatos T. Yarman-Vural, “A heuristic algorithm foroptical character recognition of arabic script,” Signal Processing, vol.62, no. 1, pp. 87 – 99, 1997.
[14] Chih-Chung Chang and Chih-Jen Lin, “LIBSVM: A library forsupport vector machines,” ACM Transactions on Intelligent Systemsand Technology, vol. 2, pp. 27:1–27:27, 2011, Software available athttp://www.csie.ntu.edu.tw/∼cjlin/libsvm.