[ieee 2012 11th international conference on signal processing (icsp 2012) - beijing, china...

Arabic Handwriting Recognition Using GaborWavelet Transform and SVM

Moftah Elzobi, Ayoub Al-Hamadi, Anwar Saeed, and Laslo DingsInstitute for Electronics, Signal Processing and Communications (IESK)

Otto-von-Guericke-University Magdeburg

D-39016 Magdeburg, P.O. Box 4210 Germany

{Moftah.Elzobi,Ayoub.Al-Hamadi}@ovgu.de

Abstract—In this paper, we propose a segmentation basedrecognition approach for handwritten Arabic text. The approachstarts by segmenting the word images into their constituent letterrepresentatives through exploiting a set of structural features. Forclassification, Gabor transform-based features are extracted fromeach letter that passed to a SVM classifier for recognition. Fortraining and testing, we used IESK-arDB database, which is anArabic off-line handwritten database, that containing the mostcommon Arabic words as well as security-related Arabic terms.The database is developed in the Institute for Electronics, SignalProcessing and Communication (IESK) at Otto-von- GuerickeUniversity Magdeburg, Germany. And it is freely available at(http://www.iesk-ardb.ovgu.de/).

The approach achieved an average of 70% segmentationaccuracy on 600 word images. Recognition rate of 74%, on set of5436 segmented letter images is reached, according to a Leave-one-out estimation method.

Index Terms—Optical character recognition, Arabic handwriting,Character segmentation, Gabor transform-based features.

I. INTRODUCTION

Given the importance of Arabic script, first, as the writing

medium of many languages (e.g., Arabic, Farsi, Urdu, and

etc.), and second, as being the script used in a huge amount of

historical documents that cover the history of many nowadays

countries throughout Middle East, North Africa, Central Asia,

and Balkan over several centuries. Many research papers and

articles have been appeared, suggesting various solutions for

the problem of Arabic handwriting recognition over the past

three decades, yet with minor advances.

Literature addressing the problem of handwriting recog-

nition can be classified into two broad categories: (i)Segmentation-based recognition; this category includes all ap-

proaches that perform segmentation (into letters or primitives)

prior to recognition. Advantage of such approaches is, their

capability to cope with the high variability nature of the

problem; the disadvantage however is the complexity and the

error-prone characteristics of the segmentation process. One

of the earliest segmentation-based approach, suggested for the

recognition of Arabic handwritten text, is the one proposed

by Almullim and Yamaguchi [1]. In this approach, words

are over-segmented into their basic strokes, where a stroke

is the curve between any two structure points (End points or

Branch points). Each stroke is then classified to one of five

groups according to its shape. Two of these groups contain

what is called secondary strokes, and the other three contains

primary strokes. Furthermore, a set of heuristics proposed to

assign secondaries to their primary strokes, in a try to construct

the corresponding character. A recognition rate of 81.25% is

reported. In [2], Bushofa and Spann propose a segmentation-

based recognition methodology for off-line printed Arabic

text. The segmentation algorithm stars by locating the text

baseline. Having the baseline discovered, a fixed size window

is used along the baseline to search for specific types of angles

that are expected to be formed when letters joined together.

Multiple heuristics rules are used to confirm the segmentation

results. Finally, features are extracted and a decision tree based

classifier is used for recognition, achieving 94.17% as a success

rate. Abuhaiba et al. [3] presented a recognition system for

off-line Arabic handwritten text. The system is segmentation-

based one, thinned and smoothed images of the strokes (sub-

word) are processed and converted into 1-D representations

called direct straight-line approximation. The representation

is processed further to produce a loop less graph called the

reduced graph where loops replaced by vertices. Ultimately,

the reduced graph representative is segmented into tokens

that are fed to a fuzzy sequential machine for recognition.

As a sub-word (no segmentation) recognizer, the proposed

system achieved 55.4% recognition rate; when segmentation

involved the recognition rate degraded to 51.1%. In [4], Xiu

et al. proposed probabilistic segmentation-based recognition

model, in which a tentative contour-based over-segmentation

is first performed on the text image. As a result, a set of what

they called graphemes is produced. The approach differentiates

among three types of graphemes. The confidence of each

character is calculated according to the probabilistic model,

respecting other factors e.g., recognition output, geometric

confidence and logical constraint. The authors experimented

the proposed methodology on five different test sets, achieving

59.2% success rate.

(ii) Segmentation-free recognition, as its name implies,

includes all approaches that are treating each word image as

single entity upon which features is extracted. Even though

such approaches proof successfulness in some application

areas, they are completely Lexicon dependent and are inca-

pable to serve in an unconstrained environment [5] [6]. One

of the earliest works that is followed a holistic approach

for the recognition of type-written and printed Arabic words

is the approach proposed by Al-Bader and Haralick [7]. In___________________________________ 978-1-4673-2197-6/12/$31.00 ©2012 IEEE

ICSP2012 Proceedings

their approach, they first start by detecting what they called

”shape primitives”, then matching the regions containing the

shape primitives against a set of pre-defined symbol models.

A spatial arrangement of the best fitted symbol models is

regarded as a description of the recognized word. The system

is experimented with various types of samples; a rate of 73%

is achieved on scanned words. Pechwitz and Maergner in [8],

addressed the problem of handwritten Arabic words. They

proposed a holistic HMM-based approach, which starts by

normalizing word images and then extracting features using

a sliding window approach. Ultimately, features are passed

to a semi-continuous 1-D HMM for recognition. The system

performance is tested on IFN/ENIT database, achieving a max-

imal recognition rate of 89%. Al-Hajj et al. in [9], proposed

an approach for the recognition of handwritten Arabic city

names. Two different types of features have been extracted,

namely baseline-independent and baseline-dependent using a

sliding window approach. For recognition, a right-left HMM

classifier is developed, and experiments are conducted on

IFN/ENIT database with recognition rates ranged from 85.45%

to 87.20%.

The fact that, a Gabor function has the ability to imitate

the functionality of simple kinds of cells in the human visual

system, motivated its application in many image processing

fields [10]. Several works proposing various Gabor-filter-based

solutions for the problem of Latin, Chinese, and Numeral

handwriting [10], [11]. In case of Arabic handwriting, Haboubi

et. al. [12], conduct a comparative study on different kinds

of features, e.g. structural, statistical, Gabor-filter-based, and

pixel-based. With no detailed explanations, the Gabor-filter

based features show the Worst-Case-Performance compared to

the rest. In the contrary to [12], Chen et. al. [11], reported

very good results, when they used Gabor-Filter based features

to recognize Arabic PAWs. They tested their approach on

the AMA-Arabic-Dataset, reporting 82.7% as recognition rate.

Since the set of Arabic PAWs is a very huge one, putting such

approach in service is unlikely.

In this work, we propose a segmentation-based, Gabor-

filter-based approach for the recognition of the Arabic hand-

writing. Unlike [11], we include the segmentation as an integral

part, since we believe it is an intuitively prerequisite operation

to reduce the infinite domain of possible words and/or PAWs

into a limited number of classes that can be accommodated

and processed.

II. METHODOLOGY

A. Segmentation

1) Resolving the PAWs overlaping: To resolve the PAWs

overlapping, we first identified the word baseline. Along the

baseline, we differentiate between two kinds of connected-

components (CC). Namely, ”Main CCs” (mostly are PAWs),

which are all CCs that intersect the baseline and ”Auxiliary

CCs” that are not (mostly are diacritics). To resolve the Main

CCs overlapping, and to relate Auxiliary CCs to their corre-

spondence Main CCs, we formulated a set of heuristics over

the four corners’ coordinates of the connected component’s

bounding-box. Figure 1, shows an example of PAWs overlap

resolving.

1

2

3 4

5

6

7

8 1

2 3

4

5

6

7

8

Baseline

(A) (B)

Fig. 1. PAWs’ overlap resolving: (A) Word image with the Main CCsoverlapping. (B) The Main CCs overlapping resolved.

2) Word image segmentation: To segment the word image

into letters representatives, we first extract a thinned version

from the PAWs-overlapping-free word image. Thinning is

necessary operation in order to reduce the computation cost

and also to ease the process of extracting features points (FP )

(e.g. End-pont, Branch-point, nd Loop-point), that we employ

for segmentation. Our approach is inspired by the approach

presented in [13], but instead of using only the set of contour

local minima as a candidate for segmentation points (SP ),

we included in SP all one-pixel columns as candidates, since

we observed that local minima occur very often inside many

of the letters (�� , �, �,��, �, , �, �� , and etc.). Then

a heuristic based election operation is performed, to exclude

from SP unlikely candidates, as follow: (i) exclude from SPall candidates that are containing any FP , respecting the fact

that any column contains FP , cannot be a SP in the same

time; (ii) starting from the most right, whenever encountering

two direct neighboring candidates, exclude the one on the

right. To handle over-segmentation (SP inside a letter), we

further apply three other heuristic as follow: (a) if there are

two consecutive SP candidates and no FP in between, delete

the one on the right; (b) if the direct neighboring on the left, a

column contains pixels of a diacritic(Dots and Hamza -pixels),

then delete the candidate from SP ; (c) if a column containing

Branch-point or Loop point pixel encountered before reaching

another SP candidate then confirm the candidate as SP .

Finally we insert SP candidates before and after every Main

CC. Figure 2, shows a word image segmented according to

the aforementioned approach.

(A)

(B)

(C)

Fig. 2. Word segmentation: (A) The source word image.(B) Letter boundedby rectangles on the thinned version.(C) The segmented images.

B. Features extraction

As a result of segmentation, we get multiple images of

letters constituting the word image. Before feature extraction

taking place, images should be normalized into a fixed dimen-

sionality to reduce the within-class variation of shape. While

preserving the aspect ratio, we normalized letter images into a

square of size 64× 64, and then extracting Gabor-filter-based

features.

1) Gabor filter based features: Motivated by its similarities

to the functionality of certain cells in the human primary visual

cortex and its spatial frequency localization property, Gabor

filter used extensively in the field of image processing (e.g,

Face recognition, Segmentation, Edge detection, OCR, and

image compression) [11]. In 2D, Gabor filter (as illustrated in

Figure 3), is the result of a Gaussian kernel function (envelope)

modulated by a complex plane sinusoidal function of frequency

and orientation (carrier). It is defined in the spacial plane as

follows:g(x, y;λ, θ, σx, σy) = exp

{− 1

2(x2

σ2x

+y2

σ2y

)

}×

exp

{i(2π

x

λ)

} (1)

where λ is the frequency (in pixel) and θ is the orientation

of the sinusoidal function. σx and σy are the standard de-

viations along the x- and y-axis, and x = x cos θ + y sin θ,

y = −x sin θ + y cos θ.

� =

Fig. 3. Gabor filter: The result of sinusoidal plane modulated by a GaussianKernel

Intuitively, the real and/or the imaginary components of the

filter can be derived from Eq. 1, and can be used instead. In

our case, for feature’s extraction, we convolve the normalized

segmented letter images with a bank of Gabor kernels, gener-

ated by choosing different values for θ, namely (0, π4 ,

π2 ,

3π4 )

and for λ 3 pixels and 5 pixels. The choose of λ values

is inspired by a study conducted in [10]. Additionally, It is

observed that σx and σy parameters are actually a function of

λ, thus we calculate them both as = 0.5λ, which is empirically

proven to be the optimal. The various Gabor kernels are later

used to generate eight different Gabor representations of the

letter image, corresponding to the different orientation and

frequency. Subsequently, we divide the 64× 64 representation

into 8× 8 feature regions, resulting in 64 regions. From each,

we extract one value as an element in 512 feature vector

(8× 64). Figure 4, illustrate the process.

Before passing the features into a machine learning algo-

rithm, the feature vector f = (f1, f2, . . . , f512) is normalized

to be f = (f1, f2, . . . , f512) as follows.

��

��

1 2 512, ,......,f f f

Fig. 4. Gabor filter: The result of sinusoidal plane modulated by a GaussianKernel

fi =fi − μi

4σi+ 0.5, i = 1, ..., 512. (2)

Where μi and σi are mean and standard deviation of the ithfeature across the training data, respectively.

C. SVM classification

In order to reduce the number of classes, we started by

grouping the letters into six groups, according to the number

of CCs, which resulted in five groups 1CC with 25 classes,

2CCs with 20 classes, 3CCs with 8 classes, and 4CCs with 8

classes) and the sixth group contains only letters with loop

(26 classes). Then, the handwriting recognition problem is

solved by formulating it as a supervised multi-class learning

process. Due to its capability, avoiding the over- fitting, the

local minima problems, and the uncorrelation between its

computation complexity and the dimensionality of input vector,

we choose SVM for classification.

Given a train data D = {(xi, yi) |xi ∈ Rd, yi ∈{−1,+1}, }, Binary classification function is formulated as

follows:

f(x) = sign(w · x + b),

Recalling that a hyperplane can be written as any set of x,

satisfying w ·x− b = 0, where w is a weight vector, normal to

the hyperplane, and b is the offset separating the hyperplane

from the origin of the space. Any two vectors x1, x2, satisfying

w ·x1− b = 1, w ·x2− b = −1 respectively, are called support

vectors, and the formed hyperplanes are called the standard

hyperplanes, the distance separating them is found to be 2‖w‖ .

Thus, the optimal hyperplane separating the two classes, can

be found by minimizing ‖w‖ respecting the follwing condition

yi(w · xi − b ≥ 1) ∀iIn this work, we created a class for each letter shape (ranging

from 2 to 4 classes a letter, according to the aforementioned

shapes a letter may appear in). Several one-vs-all SVM Classi-

fiers are trained. For experiments, we used the LIBSVM library

[14], choosing the Radial basic function (RBF) as a kernel.

TABLE IDETAILED ANALYSIS OF OUR RESULTS FOR THE ENTIRE ALPHABET.

Letters ��

Rec.%

B X 68 86 71 55 44 77 X X X X 81 85 77 77 91 92 65 70 62 72 72 63 77 89 X 75 75E 91 53 82 68 81 65 85 75 62 51 61 88 92 58 53 83 89 71 88 81 84 92 53 83 73 51 81 83I 93 79 88 77 80 72 63 66 85 49 73 83 86 81 81 87 90 57 93 88 81 X 81 81 91 81 87 91

M X 57 82 50 59 49 63 X X X X 71 61 85 64 83 94 51 55 56 79 X 43 68 69 X 72 74D 87 X X X X X X X X X X X X X X X X X X X X X X X X X X XU 81 X X X X X X X X X X X X X X X X X X X X X X X X X X X

TABLE IIOUR RESULTS COMPARED TO THOSE OBTAINED BY XIU ET AL. [4].

Our method Xiu et al. [4]Set O U S O U SS1 12% 18% 70% 14% 31% 55%S2 15% 20% 65% 20% 28% 52%S3 14% 21% 65% 23% 26% 51%S4 9% 13% 78% 17% 25% 58%S5 8% 21% 71% 20% 23% 57%

Avg. 11.6% 18.6% 69.8% 18.8% 26.6% 54.6%

III. EXPERIMENTAL RESULTS

We conducted two kinds of experiments, firstly, we per-

formed a comparative experiments on the segmentation method

separately. Secondly, we assessed the performance of our

recognition approach and confirming the suitableness of Gabor

filter based features for recognition of Arabic Handwriting,

counterclaiming results published in [12]. As for our recogni-

tion approach, no comparison to other approaches in literature

is presented, since the single work (to the best of our knowl-

edge) [11], that follow a similar approach, is first targeting

the recognition of handwritten Arabic PAWs, rather than letter

images resulted from a segmentation module. Second, the

database used for testing is different than ours.

Realizing its crucial influence on recognition results, we

tested our segmentation method on 600 word images, orga-

nized in 5 sets according to the similarity in writing style.

Furthermore, we compared our results to results published

in [4]. Table II, details the results, where ”O”,”U”, and ”S”

stand for over-segmentation error, under-segmentation error,

completely successful segmentation respectively; In general,

our approach achieved approximately 69.8% as recognition

rate, compared to 54.5% for [4]. The overall recognition

performance is detailed in Table I, where ”B”,”E”,”I”,”M”

indicate the various forms a letter may take (see Subsection

II-C). ”D” and ”U”, for cases when a letter appears with

”Hamza” below or above. ”X” represents the allowed form

for the corresponding letter.

Around 1000 word images, resulted in 5436 letter images

(after segmentation), were used in training a SVM classifier;

and a leave-one-out estimation method is used for testing.

Recognition rates were ranging from 93% down to 43%, with

a median of 77% and an average of ”74.3%”. Letter like ( in I) for example, achieved a high rate of 93%. This can

be attributed to, firstly, it is relatively very frequent letter in

Arabic script, hence represented by a big number of samples

in the training set. Secondly, it has a distinguish simple shape,

allowing extraction of more representative features. Another

letter ( �� E), achieved a rate of 93%, despite the fact of

being less frequent in the script. The reason behind, is the

pre-classification step explained in Subsection II-C; that leads

to group the letter with just one other letter ( �� ), with a total

of 8 classes only, which in turn improve its recognition rate

significantly.

The low rate of recognition of some other letters (e.g., �B, and M), is caused by, the similarity in shape with other

multiple letters; and also, because of the fact, that those letters

belong to a group (1CC) with high number of classes (25

classes), which in turn leads to a degradation in recognition

rate, given the direct relationship between the number classes

and the recognition error.

IV. CONCLUSION AND FUTURE WORKS

In this paper, we proposed segmentation based, Gabor-filter

based approach for the recognition of the Arabic handwriting

script. Results were satisfactory, and the approach proves its

efficiency. It is found that a simple pre-classification step

improving the rate of recognition significantly, thus, we plan

to invest more efforts on improving a pre-classification stage,

that will consequently, lead to distributing letters into a few

numbers of classes, which can improve the results further.

REFERENCES

[1] H. Almuallim and S. Yamaguchi, “A method of recognition of arabiccursive handwriting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 9,pp. 715–722, September 1987.

[2] B Bushofa, “Segmentation and recognition of arabic characters bystructural classification,” Image and Vision Computing, vol. 15, no. 3,pp. 167–179, 1997.

[3] I. S. I. Abuhaiba, M. J. J. Holt, and S. Datta, “Recognition of off-linecursive handwriting,” Computer Vision and Image Understanding, vol.71, no. 1, pp. 19 – 38, 1998.

[4] Pingping Xiu, Liangrui Peng, Xiaoqing Ding, and Hua Wang, OfflineHandwritten Arabic Character Segmentation with Probabilistic Model.,pp. 402–412, Number project 60472002. Springer, 2006.

[5] L.M. Lorigo and V. Govindaraju, “Offline arabic handwriting recog-nition: a survey,” Pattern Analysis and Machine Intelligence, IEEETransactions on, vol. 28, no. 5, pp. 712 –724, may 2006.

[6] Sriganesh Madhvanath and Venu Govindaraju, “The role of holisticparadigms in handwritten word recognition,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 23, no. 2, pp. 149–164, Feb. 2001.

[7] Badr H. Al-Badr, A segmentation-free approach to text recognition withapplication to Arabic text, Ph.D. thesis, Seattle, WA, USA, 1995, UMIOrder No. GAX95-37297.

[8] M. Pechwitz and V. Maergner, “Hmm based approach for handwrittenarabic word recognition using the ifn/enit - database,” in DocumentAnalysis and Recognition, 2003. Proceedings. Seventh InternationalConference on, aug. 2003, pp. 890 – 894.

[9] R. Al-Hajj Mohamad, L. Likforman-Sulem, and C. Mokbel, “Combiningslanted-frame classifiers for improved hmm-based arabic handwritingrecognition,” Pattern Analysis and Machine Intelligence, IEEE Transac-tions on, vol. 31, no. 7, pp. 1165 –1177, july 2009.

[10] Yoshihiko Hamamoto, Shunji Uchimura, Masanori Watanabe, TetsuyaYasuda, Yoshihiro Mitani, and Shingo Tomita, “A gabor filter-basedmethod for recognizing handwritten numerals,” Pattern Recognition,vol. 31, no. 4, pp. 395 – 400, 1998.

[11] Jin Chen, Huaigu Cao, Rohit Prasad, Anurag Bhardwaj, and PremNatarajan, “Gabor features for offline arabic handwriting recognition,”in Proceedings of the 9th IAPR International Workshop on DocumentAnalysis Systems, New York, NY, USA, 2010, DAS ’10, pp. 53–58,ACM.

[12] S. Haboubi, S. Maddouri, N. Ellouze, and H. El-Abed, “Invariantprimitives for handwritten arabic script: A contrastive study of fourfeature sets,” in Document Analysis and Recognition, 2009. ICDAR ’09.10th International Conference on, july 2009, pp. 691 –697.

[13] A. Alper Atici and Fatos T. Yarman-Vural, “A heuristic algorithm foroptical character recognition of arabic script,” Signal Processing, vol.62, no. 1, pp. 87 – 99, 1997.

[14] Chih-Chung Chang and Chih-Jen Lin, “LIBSVM: A library forsupport vector machines,” ACM Transactions on Intelligent Systemsand Technology, vol. 2, pp. 27:1–27:27, 2011, Software available athttp://www.csie.ntu.edu.tw/∼cjlin/libsvm.

[ieee 2012 11th international conference on signal processing (icsp 2012) - beijing, china...

Documents