[ieee 2007 third international ieee conference on signal-image technologies and internet-based...

Global Binarization of Document Images Using a Neural Network

Adnan KhashmanElectrical and Electronic Engineering

Near East UniversityLefkosa, Mersin 10 Turkey

[email protected]

Boran SekerogluComputer EngineeringNear East University

Lefkosa, Mersin 10 [email protected]

Abstract

In degraded scanned documents, where considerablebackground noise or variation in contrast and illuminationexists, pixels may not be easily classified as foreground orbackground pixels. Thus, the need to perform documentbinarization in order to enhance the document image byseparating foregrounds (text) from backgrounds. A new ap-proach that combines a global thresholding method and asupervised neural network classifier is proposed to enhancescanned documents and to separate foreground and back-ground layers. Thresholding is first applied using Mass-Difference thresholding to obtain various local optimumthreshold values in an image. The neural network is thentrained using these values at its input and a single globaloptimum threshold value for the entire image at its out-put. Compared with other methods, experimental resultsshow that this combined approach is computationally cost-effective and is capable of enhancing degraded documentswith superior foreground and background separation re-sults.

1. Introduction

Document images, specially historical and handwrittendocument images, generally carry various levels of noisewhich causes pessimistic separation of the layers becauseof the age, paper, pen and pencil influences on the docu-ments [1] [2] [3]. One of the simplest and yet efficientimage processing techniques which can be used to separateforeground and background layers of document images isthresholding. This is based on the assumption that objectsand background layers in the image can be distinguished bytheir gray level values [4]. Thresholding methods considereither a whole image (global thresholding), or divide an im-age into kernels to determine individual threshold valuesfor each kernel (local thresholding). Several comparisonshad been performed to evaluate an optimum thresholding

method for document analysis [5] [6] [7]. The outcome oftwo comparative studies [5] [6] suggested that Otsu method[8], Niblack method [9], and Kittler-Illingworth method[4] showed superior results in enhancing document imagesin comparison with other methods. More recently, the workin [7] investigated several efficient thresholding algorithms;including the above three methods, and proposed the Mass-Difference (MD) thresholding method for the enhancementof document images, which was shown to be superior to theprevious methods.

While the above conventional enhancing and cleaningfilters produce various efficient results, however, their al-gorithms can be complex and computationally expensive.The use of neural networks for such a purpose has beenpreviously proposed in many recent works. Fu and Chi[10] combined a thresholding method and an artificial neu-ral network classifier to extract leaf veins. Padekas and Pa-pamarkos [11] proposed an integrated system for the bi-narization of normal and degraded printed documents forthe purpose of visualization and recognition of text char-acters. Yang et al. [12] suggested using a text image filesegmentation method and a neural network for documentprocessing. Hidalgo et al. [13] trained a multi-layer per-ceptron using noisy document images to produce enhanceddocument images. However, these works dealt in particularwith noisy handwritten document images or non-documentimages, and did not investigate the binarization of degradedhistorical documents. This paper presents a novel approachto scanned document enhancement. Here, we combineMass-Difference (MD) thresholding and supervised neuralnetwork arbitration for selecting a single global optimumthreshold value; which is then used to binarize the degradeddocument image. MD thresholding, which is global thresh-olding method [7], is first applied to segments of the doc-ument image, thus producing a number of different ”local”optimum threshold values. The variation in these values isinfluenced by the degradation in the document; where con-siderable background noise or variation in contrast and illu-mination exists. The supervised neural network, which uses

Third International IEEE Conference on Signal-Image technologies and Internet-Based System

978-0-7695-3122-9/08 $25.00 © 2008 IEEEDOI 10.1109/SITIS.2007.58

671

Third International IEEE Conference on Signal-Image Technologies and Internet-Based System

978-0-7695-3122-9/08 $25.00 © 2008 IEEEDOI 10.1109/SITIS.2007.58

671


978-0-7695-3122-9/08 $25.00 © 2008 IEEEDOI 10.1109/SITIS.2007.58

665


978-0-7695-3122-9/08 $25.00 © 2008 IEEEDOI 10.1109/SITIS.2007.58

665

the back propagation learning algorithm, is trained using thelocal optimum threshold values as its input patterns, and asingle global optimum threshold value as its output target.The single global threshold value is calculated prior to train-ing by applying MD thresholding to the entire document im-age without any segmentation. Our hypothesis is that thereexists a nonlinear relationship between local and global op-timum threshold values of an image, and thus the selectionof the global threshold value can be effectively made by atrained neural network. The advantage of using a neuralnetwork for the selection of a global thresholding value isthe reduction of the complexity of the conventional thresh-olding methods. The proposed system is implemented usingseveral historical, handwritten and specially designed wordsthat contain letters with different gray levels. Furthermore,our combined method will be compared with three efficientand known thresholding methods, namely Otsu, Kittler andIllingworth, and Niblack thresholding methods. The evalu-ation of our implementation and comparison use the Peak-to-Signal-Noise-Ratio (PSNR) and visual inspection of thebinarized document images.

The paper is organized as follows: section 2 introducesour proposed combined method and describes the documentimage database. Section 3 presents the performed experi-ments and the results evaluation methods. A comparison isalso drawn in this section between our combined methodand three other thresholding methods. Finally, section 4concludes this paper and suggests further work.

2. The Proposed Combined Method for Docu-ment Binarization

The combined method implementation comprises twophases: image pre-processing phase, and neural network ar-bitration phase. The first phase uses MD thresholding andyields a set of values for each document image comprisinginput patterns (MDL) and an output value of normalizedglobal optimum threshold (MDG). The second phase usesa supervised neural network classifier based on the backpropagation learning algorithm and aims at selecting a sin-gle global optimum threshold value for the document im-age. This learning algorithm is chosen due to its implemen-tation simplicity, and the availability of sufficient databasefor training and generalization.

2.1 Image Database

The database consists of 100 scanned historical docu-ments, handwritten documents, and specially created text.These are organized into three sets:

The first set comprises 75 scanned historical documentswhich contain a total of 2321 words with different contrastsand brightness. The first set will be used for both training

and testing the neural network. The second set comprises5 specially created words which contain a total of 45 char-acters with different backgrounds and different grayscale.This second set will be used only for further experiments totest the trained neural network. The third set is also usedonly for testing the trained neural network and comprises20 scanned handwritten documents. The documents wereprepared in our laboratory by scanning the handwriting of5 different persons, using two different writing tools (pen-cil and board-marker) on two different paper types (whitepaper and yellow envelope paper). Examples of the imagesets can be seen in Fig.1.

2.2 MD Thresholding

This is a global single-stage thresholding method thatfinds the optimum threshold value using the global max-ima (highest pixel value) and the mass average (mean of theintensities) of an image. The relationship between pixel val-ues of grayscale images provides a threshold point for theforeground and the background of the image. The highestpixel value represents the global maxima of the image. Theaveraged pixel values of a whole image represent the massaverage (mean of intensities) of the image.

MD is different from the background-symmetry algo-rithm [14] which assumes a distinct and dominant peakfor the background that is symmetric about its maximum.This maximum peak is found by searching for the maxi-mum value in the histogram, whereas in MD the maximumvalue is the highest pixel value within the image. MD usesthe deviations between mass average and global maxima.The Mass of an image can be defined as:

M =

dimy∑y=0

dimx∑x=0

/I[x, y]

/dimy × dimx (1)

where M represents the mass of image, dimx and dimy

denote the x and y dimensions of image respectively, and Irepresents the original grayscale image. The global maximaor maximum brightness of an image is defined in equation(2) as a maximum gray level value of an original grayscaleimage.

Gmax = Fmax(I) (2)

where Fmax(I) is the function that determines the maxi-mum gray level value within the image.

After the calculation of mass and global maxima, the Lo-cal Deviation (D) of the Mass (M) from the Global Max-ima (Gmax) is defined as:

D = Gmax −M (3)

672672666666

(a) (b) (c)

Figure 1. Image Set Examples, (a)Historical document image in Set 1, (b)Created word documentimage in Set 2 and (c)Handwritten document image in Set 3

The Total Deviation (T ) which represents the optimumthreshold value is defined as the difference of local devia-tion and the mass of the image, which is defined in equation(4). Absolute value of the difference of local deviation andthe mass of the image is considered to avoid the negativethreshold values in the cases of the smaller mass of the im-ages than the local deviation.

T = |M −D| (4)

Equation (4) can also be written as:

MI(x, y) ={

0 if I(x, y) ≤ T

255 else(5)

2.3 Image Preprocessing Phase

During this phase, scanned document images are pro-cessed in order to obtain the training and generalization datarequired by the neural network in the second phase. Theextraction of meaningful patterns when preparing the neu-ral network input data is vital and can make the differencebetween efficient learning and pure memorization of inputdata. However, implementing neural networks using im-ages could be computationally expensive, thus the need forcareful extraction of meaningful patterns while keeping thecomputational cost to a minimum.

In order to achieve the above objectives, images of 75scanned degraded historical documents from the databaseare resized to a uniform size of 512 × 512 pixels. Each ofthese gray level document images is then segmented usinga 32 × 32 kernel. MD thresholding is then applied twice:firstly, to the entire image prior to segmentation in orderto obtain the global threshold value (MDG) which is the

output target value for the neural network, secondly, MDthresholding is applied to each segment within the imagein order to obtain the local threshold values (MDL) whichwill form the input data for the neural network. Using 32×32 pixel segments yields 256 local threshold values (MDL)for each document image. A block diagram of image pre-processing phase can be seen in Fig. 2.

2.4 Neural Network Arbitration Phase

The second phase of the combined method for documentenhancement is training the supervised neural network toassociate the input patterns containing local threshold val-ues to a single optimum global threshold value. Once thenetwork converges or learns, this phase will only comprisegeneralizing the trained neural network using one forwardpass. The neural network is trained using only 25 differentscanned documents from the first image set. This leaves 50documents from this set, 5 images from Set 2, and 20 im-ages from Set 3 for testing the trained neural network. Insummary, training the neural network uses 25 patterns, andtesting it uses 75 patterns that we do not expose to the net-work during training. This will demonstrate the robustnessof the trained neural network in determining global opti-mum threshold values for document binarization.

The neural network consists of an input layer with 256neurons receiving the local threshold values, one hiddenlayer with 22 neurons which assures meaningful trainingwhile keeping the time cost to a minimum. The choice of 22neurons in the hidden layer was a result of various trainingexperiments. The output layer has only one neuron whichyields normalized values (0 to 1) representing the optimumglobal threshold value. Bias neurons are used to improvethe modeling capacity of the neural network. Throughout

673673667667

Figure 2. Block diagram of image preprocessing phase in the combined method

the learning phase, the learning coefficient and the momen-tum rate were adjusted several times in various experimentsin order to achieve the required minimum error value of0.00017 which was considered as sufficient for this applica-tion. Fig. 3 shows the topology of the neural network withina block diagram that illustrates the training phase, whereasTable 1 lists the final parameters of the trained neural net-work.

Table 1. Neural Network Final Training Param-eters

Parameter ValueInput Nodes 256

Hidden Nodes 22Output Nodes 1

Learning Coefficient 0.0091Momentum Rage 0.85

Error 0.00017Iterations 7177

Training Time (seconds) 102Run Time (seconds) 0.03

3 Experimental Results

The experiments involved applying four methods tobinarize the scanned documents in the database. Thefour methods were: the proposed combined method, Otsuthresholding, Kittler and Illingworth thresholding, and

Niblack thresholding methods.The implementation of the combined method has two

phases: firstly, the image pre-processing phase is carriedout prior to training and/or testing the neural network. In thesecond phase, training the neural network is performed onlyonce. When learning is achieved, the second phase consistsonly of running the trained neural network with one forwardpass using the final training weights.

The neural network learnt and converged after 7177 iter-ations and within 102 seconds. The processing time for thegeneralized neural network after training and using one for-ward pass, in addition to the image preprocessing phase wasa fast 0.030 second for each image. The processing time ofsimilar implementation using Otsu, Kittler and Illingworthand Niblack methods was computed as 0.68, 0.66 and 0.025second, respectively.

3.1 Results Evaluation Methods

In order to evaluate the obtained results when applyingthe four considered methods, we use visual inspection ofthe enhanced documents, in addition to two metrics whichwe derived using the Peak Signal-to-Noise Ratio (PSNR) ofthe enhanced images.

Visual inspection of the enhanced documents was per-formed by 15 independent human analyzers, who wereasked to consider the clarity and readability of the wordswithin Set 1 documents, noise occurrence and continuity ofcharacters to determine clear characters within Set 2 doc-uments, and clearly recognized readable characters withinhandwritten words in Set 3 documents. The general results

674674668668

Figure 3. Neural network topology and the training phase

of visual inspection were categorized as: recognized or un-recognized words for Set 1, clear or unclear characters forSet 2, and recognized or unrecognized characters for Set 3.We believe that this method of evaluation is necessary asone of our objectives is to provide clearly binarized doc-ument images that can improve human readability of de-graded documents.

Using PSNR of enhanced images, we derived two metricparameters for each of the four binarization methods used inour experiments. These parameters are the Average PSNRAccuracy Rate (APAR), and the Average PSNR Deviation(APD) of reconstructed images.

The average PSNR accuracy rate (APAR) for a particularmethod is calculated by considering the maximum PSNRvalue obtained using the four binarization methods, and thePSNR value obtained using only that particular method for atest image. The higher the APAR value is, the more efficientthe method is.

The average PSNR deviation (APD) for a particularmethod is calculated by taking the difference between themaximum PSNR value obtained using the four methods,and the PSNR value obtained using only that particularmethod for a test image; the difference is then divided bythe total number of test images. The lower the APD valueis, the more efficient the method is. APAR and APD aredefined as follows:

APARm =(∑x

i=1(PSNRmi × 100)max(PSNRi)

)/x (6)

APDm =

(x∑

i=1

((max(PSNRi))− PSNRmi))

)/x (7)

where APARm is the average PSNR accuracy rate formethod m, APDm is the average PSNR deviation formethod m, PSNRmi denotes the obtained PSNR valueof enhanced image i using enhancement method m,max(PSNRi) denotes the maximum PSNR value of theenhanced image i obtained using the four methods, and x isthe total number of test images.

Considering only PSNR values of the enhanced imageswhen evaluating the result is not always effective whencomparing various thresholding methods and using such adiverse document image database. This is because a partic-ular method may produce high PSNR values for a reason-able number of images, while producing low PSNR valuesfor the rest of the images, thus making it difficult to deter-mine stability and efficiency of the methods. Therefore, wederived APAR and APD metrics, which provide a uniformindication of the efficiency of the compared methods.

3.2 Implementation Results and Comparisons

The first method of evaluation was based on visual in-spection. Tables 2, 3 and 4 show the obtained results forSet 1, Set 2 and Set 3, respectively. The highest recogni-tion rates were achieved using our combined method, whichdemonstrates its efficiency in enhancing documents whencompared to the other well-known conventional methods.

In Set 1, Niblack and Otsu methods showed success-ful results in enhancing the degraded historical documents,

675675669669

(a) (b) (c)

(d) (e)

Figure 4. Example Results of Image Set 1 (a) Original Image, (b) Otsu method (c) Proposed method(d) Niblack method (e) Kittler and Illingworth method

Table 2. Results of Test Set 1Thresholding Total Recognized Unrecognized Recognition

Method Words Words Words RateOtsu 2321 1948 373 83.93 %

Kittler and Illingworth 2321 1500 821 64.63 %Niblack 2321 1977 344 85.18 %

Combined 2321 1998 323 86.08 %


Method Chars. Chars. Chars. RateOtsu 45 40 5 88.89 %

Kittler and Illingworth 45 45 0 100 %Niblack 45 37 8 82.22 %

Combined 45 45 0 100 %


Method Characters Characters Characters RateOtsu 240 240 0 100 %

Kittler and Illingworth 240 213 27 88.75 %Niblack 240 211 29 87.92 %

Combined 240 240 0 100 %

676676670670

(a) (b)

(c) (d)

(e)

Figure 5. Example Results of Image Set 3 (a) Original Image, (b) Otsu method (c) Proposed method(d) Niblack method (e) Kittler and Illingworth method

however they both over-threshold the images which causesminor loss of information. Kittler and Illingworth method,achieved the lowest rate in this set as it added noise to thedocument images. Fig. 4 shows Test Set 1 example. In Set2, Kittler and Illingworth method produced similar efficientresults to our combined method in enhancing the createdwords. This is due to the uniform illumination of the back-ground within the images in this set. On the other hand,Niblack and Otsu methods caused loss of certain characterswithin the word.

In Set 3, Otsu method produced similar efficient resultsto our combined method in enhancing the handwritten doc-uments, whereas Niblack, and Kittler and Illingworth meth-ods failed to enhance some of the handwritten characters.This is due to the non-uniform background illuminationof the documents in this set. Moreover, the efficiency ofNiblack method depends on the mask size; however there isno exact rule to define the mask size. Small mask size addsnoise to the documents and larger mask sizes cause someloss of information. Fig. 5 shows examples of the experi-ments on the handwritten document set.

The second method in our evaluation used two metrics(APAR and APD) which reflect the enhancement perfor-mance of each of the methods used in the experiments,while considering the three image sets altogether. Table 5shows the run time required for each method, and the ob-tained results where the highest APAR value was achievedusing our combined method; this was closely followed byOtsu and Niblack methods. The lowest APD value was alsoachieved using our combined method, and was followed byNiblack method.

Table 5. Metric Evaluation ResultsMethod APAR APD Run Time

Otsu 93.41% 1.03 0.68 s.Kittler and Illingworth 91.92% 1.23 0.66 s.

Niblack 93.99% 0.99 0.025 s.Combined 94.72% 0.83 0.030 s.

4 Conclusions

This paper presented a different approach to the deter-mination of an optimum global thresholding value that canbe used for the binarization of document images. The nov-elty in this paper is the use of a neural network to determinethe single optimum value. The neural network is trainedusing local thresholding values, of images of degraded, his-torical and created documents, at its input, and predeter-mined global thresholding values at its target output. Bothlocal and global training values are obtained using the con-ventional MD global thresholding method. The advantageof using a neural network rather than a direct conventionalglobal thresholding method is to reduce the complexity ofthe thresholding algorithm, and therefore, reduce the com-putational expense. Additionally, this work assumes thatthere is a non-linear relationship between the local thresh-olding values and the single global thresholding value of animage; and proposes the use of a supervised neural networkto model this relationship.

The proposed combined method can be efficiently ap-plied to separate foreground and background layers ofscanned document images, thus providing efficient bina-

677677671671

rization and clear enhancement of degraded document im-ages. The combined method has two phases: firstly, imagepreprocessing, where local thresholding values are obtainedusing MD thresholding method; these values are used asthe input data for the second phase. Secondly, the neuralnetwork determination of the single optimum global thresh-olding value.

The combined method is compared with three otherknown thresholding methods, namely, Otsu, Kittler andIllingworth, and Niblack methods. Experimental resultsshow that the combined method is computationally inex-pensive and outperforms the other method in enhancingdegraded documents with superior foreground and back-ground separation results. The combined method has notbeen compared to MD thresholding method, since the lateris implemented as part of the combined method at its firstphase, albeit locally rather than globally.

The proposed combined method and the three otherthresholding methods were implemented using several de-graded historical, handwritten and specially designed wordsthat contained letters with different gray levels. The imple-mentation results of the four methods were evaluated usingvisual inspection and two metrics that we derived based onthe peak signal-to-noise ratio (PSNR) values.

Using the evaluation results, a comparison between thecombined method and the other three methods was drawn.The other three methods showed success in enhancing someof the documents in the database, however failed to clearlyenhance other documents, by either over-thresholding thuscausing loss of information, or by adding noise to the doc-ument. This is due to the existence of uniform and non-uniform background illumination in the various image sets,which affected their performance.

The capability of our combined method in enhancing thevarious documents in all image sets, suggests also that ar-tificial neural networks can be successfully used as part ofa comprehensive system for enhancing scanned documents.Further research includes the investigation of applying ourcombined method to enhance magnetic resonance imagingscans.

References

[1] Y. Zheng, H. Li, and D. Doermann, ”Machine PrintedText and Handwriting Identification in Noisy Docu-ment Images”, IEEE Transactions on Pattern Analysisand Machine Intelligence, vol. 26 no. 3 pp. 337-353,2004.

[2] E. Kavallieratou, and H. Antonopoulou, ”Cleaningand Enhancing Historical Document Images”, LectureNotes in Computer Science, vol. 3708. Springer-Verlag,Berlin Heidelberg New York pp. 681-688, 2005.

[3] B. Gatos, I. Pratikakis, and S.J. Perantonis, ”An Adap-tive Binarization Technique for Low Quality HistoricalDocuments”, Lecture Notes in Computer Science, vol.3163. Springer-Verlag, Berlin Heidelberg New York,pp. 102-113, 2004.

[4] J. Kittler and J. Illingworth, ”Minimum Error Thresh-olding”, Pattern Recognition, vol. 19, no. 4, pp. 41-47,1986,

[5] O. D. Trier, and A. K. Jain, ”Goal-Directed Evalua-tion of Binarization Methods”, IEEE Transactions onPattern Analysis and Machine Intelligence, vol. 17, pp.1191-1201, 1995.

[6] M. Sezgin, and B. Sankur, ”Survey Over Image Thresh-olding Techniques and Quantitative Performance Eval-uation”, Journal of Electronic Imaging, vol. 13, no. 1,pp. 146-168, 2004.

[7] A. Khashman, and B. Sekeroglu, ”Novel Threshold-ing Method for Document Analysis”, IEEE Interna-tional Conference on Industrial Technology ICIT2006,pp. 616-620, 2006.

[8] N. Otsu, ”A Threshold Selection Method from Gray-Level Histogram”, IEEE Transactions on Systems,Man, and Cybernetics, vol. 9, pp. 62-66, 1979.

[9] W. Niblack, An Introduction to Digital Image Process-ing, Prentice Hall, pp. 115-116, 1986.

[10] H. Fu and Z. Chi, ”Combined Thresholding and Neu-ral Network Approach for Vein Pattern Extraction fromLeaf Images”, IEE Proc. Vis. Image Signal Process.,vol. 153, no. 6, pp. 881-892, 2006.

[11] Badekas, and N. Papamarkos, ”Optimal Combinationof Document Binarization Techniques Using a Self-Organizing Map Neural Network”, Engineering Appli-cations of Artificial Intelligence, Pergamon Press, Inc.Tarrytown, NY, USA, vol. 20, no. 1, pp. 11-24, 2007.

[12] Y. Yang, K. Summers, and M. Turner, ”A Text ImageEnhancement System Based on Segmentation and Clas-sification Method”, Proceedings of the 1st ACM Work-shop on Hardcopy Document Processing, ACM PressNew York, NY, USA, pp. 33-40, 2004.

[13] J.L. Hidalgo, S. Espana, M.J. Castro, and J.A. Perez,”Enhancement and Cleaning of Handwritten Data byUsing Neural Networks”, Lecture Notes in ComputerScience, vol.3522. Springer-Verlag, Berlin HeidelbergNew York, pp. 376-383, 2005.

[14] R.C. Gonzalez and R.E. Woods, Digital ImageProcessing, Reading, Massachusetts, Addison-Wesley,2002.

678678672672