document binarisation using kohonen som binarisation using kohonen som.pdfdocument binarisation...

18
Document binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the binarisation of normal and degraded printed documents for the purpose of visualisation and recognition of text characters is proposed. In degraded docu- ments, where considerable background noise or variation in contrast and illumination exists, there are many pixels that cannot be easily classified as foreground or background pixels. For this reason, it is necessary to perform document binarisation by combining and taking into account the results of a set of binarisation techniques, especially for document pixels that have high vagueness. The proposed binarisation technique takes advantages of the benefits of a set of selected binarisation algorithms by combining their results using a Kohonen self- organising map neural network. In order to improve further the binarisation results, significant improvements are proposed for two of the most powerful document binarisation techniques used, that is for the adaptive logical level technique and for the improvement of integrated function algorithm. The proposed binarisation technique is extensively tested with a variety of degraded documents. Several experimental and comparative results, demonstrating the performance of the proposed technique, are presented. 1 Introduction In general, scanned documents include text, line-drawings and graphics regions and can be considered as mixed type documents. In many practical applications, we need to recognise or improve mainly the text content of the documents. In such cases, it is preferable to convert the documents into a binary form in a way that the text regions to be transformed in a suitable binary form. Doing this, we can recognise, store, retrieve and transmit more efficiently the documents instead of the original grey-scale ones. This procedure became more advan- tageous in the cases of degraded documents. For many years, the binarisation of grey-scale documents was based on the standard bilevel techniques that are also called global thresholding algorithms [1–6]. These stat- istical methods, which can be considered as clustering approaches, are suitable for converting any grey-scale image into a binary form but are inappropriate for complex documents, and even more, for degraded docu- ments. In these special cases, it is important to take into account the nature form and the spatial structure of the document images. Based on this assumption, specialised binarisation techniques have been developed for complex document images. In one first category, local thresholding techniques have been proposed for document binarisation. These techniques estimate a different threshold for each pixel according to the grey-scale information of the neigh- bouring pixels. The techniques of Bernsen [7], Chow and Kaneko [8], Eikvil [9], Mardia and Hainsworth [10], Niblack [11], Taxt [12], Yanowitz and Bruckstein [13] and Sauvola and Pietikainen [14, 15] belong to this category. The hybrid techniques, which combine information of global and local thresholds belong to another category. The most famous techniques in this category are the methods of O’Gorman [16] and Liu and Li [17]. For document binarisation, probably, the most powerful techniques are those that take into account not only the image grey-scale values, but also the structural character- istics of the characters. Techniques that are based on stroke analysis, such as the stroke width (SW) and charac- ters geometry properties, belong to the category of docu- ment binarisation. The most powerful techniques in this category are the logical level technique (LLT) [18] and its improved adaptive logical level technique (ALLT) [19], and the integrated function algorithm technique (IFA) [20] and its advanced ‘improvement of integrated function algorithm’ (IIFA) [21]. Recently, Papamarkos [22] proposes a new neuro-fuzzy technique for binarisa- tion and grey-level (or colour) reduction of mixed-type documents. In this technique, a neuro-fuzzy classifier is fed by not only the image pixels’ values, but also with additional spatial information extracted in the neighbour- hood of the pixels. Despite the existence of all these binarisation techniques, it is proved, by evaluations that have been made [23–26], that there is not any technique that can be applied effec- tively in all types of digital documents. Each one of them has its own advantages, but also disadvantages. The proposed binarisation system takes advantages of binarisation results obtained by a set of the most powerful binarisation techniques from all categories. These tech- niques are incorporated into one system and are considered as components of it. We have included document binarisa- tion techniques that gave the highest scores in evaluation tests that have been made so far. Trier and Taxt [23] found that Niblack’s and Bernsen’s techniques are the fastest of the best performing binarisation techniques. # The Institution of Engineering and Technology 2007 doi:10.1049/iet-ipr:20050311 Paper first received 10th October 2005 and in final revised form 7th July 2006 The authors are with the Image Processing and Multimedia Laboratory, Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi 67100, Greece E-mail: [email protected] IET Image Process., 2007, 1, (1), pp. 67–85 67

Upload: others

Post on 14-Mar-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

Document binarisation using Kohonen SOM

E. Badekas and N. Papamarkos

Abstract: An integrated system for the binarisation of normal and degraded printed documentsfor the purpose of visualisation and recognition of text characters is proposed. In degraded docu-ments, where considerable background noise or variation in contrast and illumination exists,there are many pixels that cannot be easily classified as foreground or background pixels. Forthis reason, it is necessary to perform document binarisation by combining and taking intoaccount the results of a set of binarisation techniques, especially for document pixels thathave high vagueness. The proposed binarisation technique takes advantages of the benefits ofa set of selected binarisation algorithms by combining their results using a Kohonen self-organising map neural network. In order to improve further the binarisation results, significantimprovements are proposed for two of the most powerful document binarisation techniquesused, that is for the adaptive logical level technique and for the improvement of integratedfunction algorithm. The proposed binarisation technique is extensively tested with a varietyof degraded documents. Several experimental and comparative results, demonstrating theperformance of the proposed technique, are presented.

1 Introduction

In general, scanned documents include text, line-drawingsand graphics regions and can be considered as mixed typedocuments. In many practical applications, we need torecognise or improve mainly the text content of thedocuments. In such cases, it is preferable to convert thedocuments into a binary form in a way that the textregions to be transformed in a suitable binary form.Doing this, we can recognise, store, retrieve and transmitmore efficiently the documents instead of the originalgrey-scale ones. This procedure became more advan-tageous in the cases of degraded documents. For manyyears, the binarisation of grey-scale documents wasbased on the standard bilevel techniques that are alsocalled global thresholding algorithms [1–6]. These stat-istical methods, which can be considered as clusteringapproaches, are suitable for converting any grey-scaleimage into a binary form but are inappropriate forcomplex documents, and even more, for degraded docu-ments. In these special cases, it is important to take intoaccount the nature form and the spatial structure of thedocument images. Based on this assumption, specialisedbinarisation techniques have been developed for complexdocument images.

In one first category, local thresholding techniqueshave been proposed for document binarisation. Thesetechniques estimate a different threshold for each pixelaccording to the grey-scale information of the neigh-bouring pixels. The techniques of Bernsen [7], Chowand Kaneko [8], Eikvil [9], Mardia and Hainsworth

# The Institution of Engineering and Technology 2007

doi:10.1049/iet-ipr:20050311

Paper first received 10th October 2005 and in final revised form 7th July 2006

The authors are with the Image Processing and Multimedia Laboratory,Department of Electrical and Computer Engineering, Democritus Universityof Thrace, Xanthi 67100, Greece

E-mail: [email protected]

IET Image Process., 2007, 1, (1), pp. 67–85

[10], Niblack [11], Taxt [12], Yanowitz and Bruckstein[13] and Sauvola and Pietikainen [14, 15] belong tothis category. The hybrid techniques, which combineinformation of global and local thresholds belong toanother category. The most famous techniques in thiscategory are the methods of O’Gorman [16] and Liuand Li [17].

For document binarisation, probably, the most powerfultechniques are those that take into account not only theimage grey-scale values, but also the structural character-istics of the characters. Techniques that are based onstroke analysis, such as the stroke width (SW) and charac-ters geometry properties, belong to the category of docu-ment binarisation. The most powerful techniques in thiscategory are the logical level technique (LLT) [18] andits improved adaptive logical level technique (ALLT)[19], and the integrated function algorithm technique(IFA) [20] and its advanced ‘improvement of integratedfunction algorithm’ (IIFA) [21]. Recently, Papamarkos[22] proposes a new neuro-fuzzy technique for binarisa-tion and grey-level (or colour) reduction of mixed-typedocuments. In this technique, a neuro-fuzzy classifier isfed by not only the image pixels’ values, but also withadditional spatial information extracted in the neighbour-hood of the pixels.

Despite the existence of all these binarisation techniques,it is proved, by evaluations that have been made [23–26],that there is not any technique that can be applied effec-tively in all types of digital documents. Each one of themhas its own advantages, but also disadvantages.

The proposed binarisation system takes advantages ofbinarisation results obtained by a set of the most powerfulbinarisation techniques from all categories. These tech-niques are incorporated into one system and are consideredas components of it. We have included document binarisa-tion techniques that gave the highest scores in evaluationtests that have been made so far. Trier and Taxt [23]found that Niblack’s and Bernsen’s techniques are thefastest of the best performing binarisation techniques.

67

Page 2: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

Furthermore, Trier’s evaluation tests [24] identifyNiblack’s technique, with a post-processing step, as thebest. Kamel and Zhao [18] use six evaluation aspects(subjective evaluation, memory, speed, SW restriction,number of parameters of each technique) to evaluateand analyse seven character/graphics extraction basedbinarisation techniques. The best technique, in this test,is LLT. Recently, Sezgin and Sankur [26] compare 40binarisation techniques and they conclude that the localbased techniques of Sauvola and Pietikainen [14, 15] aswell as the technique of White and Rohrer (IIFA) [21]are the best performing document binarisation tech-niques. Except the above techniques, we include inthe proposed system the powerful global thresholdingtechnique of Otsu [1] and the fuzzy C-mean (FCM)technique [6].

The main aim of the proposed document binarisationtechnique is to build a system that takes advantages ofthe benefits of a set of selected binarisation techniquesby combining their results. This is important, especiallyfor the fuzzy pixels of the documents, that is for thepixels that cannot be easily classified. The techniquesthat are incorporated in the proposed system are the fol-lowing: Otsu [1], FCM [6], Bernsen [7], Niblack [11],Sauvola and Pietikainen [14, 15] and improvement ver-sions of ALLT [19] and IIFA [21]. It is noticed that, inmost of the cases, the simultaneous application of allbinarisation techniques leads to satisfactory results dueto their complementarity. However, according to thetype of document images and degradation and in orderto decrease the computational cost, only a subset ofthese techniques can be combined. In order to combinethe binarisation results of the selected independent binar-isation techniques (IBT), the Kohonen self-organisingmap (KSOM) neural network [27–30] is used as thefinal stage. We select to use the KSOM neural networkbecause it is fast, it has guaranteed convergence andalso, due to the use of spatial neighbourhood that makesthe KSOM so different and beneficial from the other com-petitive neural networks. Specifically, the neural networkclassifier is fed by the binarisation results obtained fromthe IBT. After the training stage, the output neuronsspecify the classes obtained. Then, using a mapping pro-cedure, these classes are categorised as classes of the fore-ground and background pixels.

As it is mentioned above, two of the most powerfuldocument binarisation techniques are the ALLT and theIIFA. Especially for these two techniques, we propose sig-nificant improvements that make these techniques evenmore powerful. Specifically, to overcome the drawbacksof the ALLT that associated mainly with the extractionof the characters SWs by considering the image as awhole, we propose a local SW extraction technique. Forthis reason, the document image is divided in N � M non-overlapped sub-images and then the SWs are locallyextracted in each sub-image. To do this, a linear histogramapproximation (LHA) procedure is applied [31, 32] whichestimates the modality of the local histograms, that isspecifies if the histogram of every sub-image can be con-sidered as bilevel or not. In the sub-images with bilevelhistograms, the SWs are accurately determined. Next,applying an interpolation procedure the SWs of the non-bilevel sub-images are suitably estimated. In the stage ofrun-length histogram construction, the Otsu binarisationtechnique is locally applied.

For the IIFA, the proposed improvements are referredto: (a) the automatic calculation of the proper valuefor threshold TA; (b) the application of a threshold

68

value that controls the filling of closed regions. Theseimprovements give better binarisation results, especiallyfor documents having small size characters and bad illu-minated regions.

The proposed binarisation system was extensively testedby using a variety of documents most of which come fromthe old Greek Parliamentary Proceedings and from theMediateam Oulu Document Database [33]. Characteristicexamples and comparative results are presented to confirmthe effectiveness of the proposed technique. The entiresystem has been implemented in a visual environmentusing Delphi 7.

2 System description

The proposed binarisation system performs documentbinarisation by combining, using the KSOM neuralnetwork, the results of a set of binarisation algorithms,most of which are developed for document binarisation.This procedure is important especially for the fuzzy pixelsof the documents, that is, for the pixels that cannot beeasily classified. The following seven IBTs are consideredas components of the proposed system:

† Otsu [1]† FCM [6]† Bernsen [7]† Niblack [11]† Sauvola and Pietikainen [14, 15]† an improvement version of ALLT [19]† an improvement version of IIFA [21]

The selection of the above seven IBTs has been made due totheir complementarity. That is, in most of the cases, the sim-ultaneous application of all IBT and their combination withthe KSOM leads to satisfactory results. However, accordingto the type of documents and degradation and in order toreduce the computation cost, it is not necessary to select andapply all seven IBT but only a subset of them. For example,in documents with uniform noise, the global binarisation tech-niques of Otsu [1] and FCM [6] give satisfactory binarisationresults and therefore must be selected. For document images,with non-uniform background having shadows and differenttypes of illuminations, the local binarisation techniques ofBernsen [7], Niblack [11] and Sauvola and Pietikainen [14,15] lead to better binarisation results. Especially in morecomplex document images and in order to exploit structuralcharacteristics of the characters, the ALLT [19] and IIFA[21] must be included to the system.

It should be noticed that in the case of a large documentdatabase, where the document images have similar noise,the selection of the proper IBT can be made once for alldocument images. This selection can be made automati-cally, according to the importance of each one of the IBTobtained by the application of the procedure described inExperiment 5 to a small and representative subset of thedocument images.

2.1 Description of the KSOM

The results obtained from the application of the above IBTare combined, in the final stage of the binarisation techniqueby using a KSOM. As depicted in Fig. 1, the KSOM has oneinput and one output layer. The number of neurons in theinput layer is taken equal to the number of independentbinarisation results that feed the KSOM. The number ofneurons in the output layer must be at least two in order

IET Image Process., Vol. 1, No. 1, March 2007

Page 3: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

to distinguish the foreground and background pixels. Onthe other hand, the shape of the classes obtained by theKSOM neural network is spherical because of the useof the Euclidean distance. However, in a document binar-isation procedure, the foreground and background classesare not always have spherical shapes. For this reason, as itis used in other classifiers like the LVQ, it is preferablefor an output class (foreground, background) to be associ-ated with more than one neuron. In contrast, the use of alarge number of output neurons increases the compu-tational cost. The application of our technique in a largenumber of document images, using different number ofoutput neurons, shows that the proper number of theoutput neurons is four. Usually, two of these neuronsare associated with the foreground class and the othertwo with the background. However, there are somecases where three neurons are associated with one classand the other one with the other. The decision about theassociations of the neurons to the output classes is per-formed by an adaptive procedure described in thefollowing.

The KSOM neural network is fed by the binarisationresults obtained from the IBT and the output neurons,after the training stage, specify the classes obtained.Then, using a mapping procedure, these classes are cate-gorised as classes of the foreground and backgroundpixels.

The learning algorithm of the KSOM is as follows:

Stage 1: Initialise with small random values the weightingvalues wij of the connections between the neurons in theinput and the output layer. Also define the initial value ofthe learning rate a ¼ a0 (usually a0 ¼ 0.05). A rectangularneighbourhood with initial size d ¼ d0 is used. Typically,d0 is taken equal to the number of the output neurons.Stage 2: Obtain the winner neuron of the output layer in thepresence of an input vector x ¼ (x1, x2, . . ., xn)T. The winnerneuron is obtained by calculating the degree of mismatch ofthe input vector with the weighting values of each outputneuron, according to the relationship

ci ¼Xðwij � xjÞ

2ð1Þ

The neuron with the lowest matching value represents thewinner neuron.Stage 3: Update the weighting values wij according to thewinner and its neighbouring neurons. The weighting

Fig. 1 Structure of the Kohonen SOM neural network

IET Image Process., Vol. 1, No. 1, March 2007

values are updated according to the relationship

wijðk þ 1Þ ¼ wijðkÞ þ Dwij ð2Þ

where k indicates the number of the iterations and

Dwij ¼aðxi � wijÞ if jc� jj , d

0 otherwise

�ð3Þ

where xi is the input vector, c the winner neuron and i and jare the numbers of input and output neurons.Stage 4: Stages 2 and 3 are repeated for all input vectors(samples).Stage 5: At the end of the epoch decrease the learning rate athe size of d. This can be done using the following relations

d ¼ d0 1�k

T

� �ð4Þ

a ¼ a0 1�k

T

� �ð5Þ

where k is the current epoch and T the total number ofepochs to be done.Stage 6: The training of the KSOM stops when one of thefollowing holds: (a) the number of epochs exceeds anupper bound, or (b) after an epoch the changes of theweighting values are not significant, that is if Dwij! 0.Usually, the KSOM for its convergence needs T ¼ 300epochs.

2.2 The proposed document binarisationtechnique

The proposed binarisation technique using the KSOM hasthe following stages:

Stage 1: Choose initially the N IBT that will participate inthe binarisation system according to the type of degradationof the processing document image.Stage 2: Apply the IBT to obtain In(x, y), n ¼ 1, . . . , Nbinary images.Stage 3: Obtain the set of pixels Sp, the binary values ofwhich will be used to feed the KSOM. If Np is thenumber of pixels obtained, and N IBT are used in thesystem, then we will have NT ¼ N . Np training samples.The Sp set of pixels must include mainly the ‘fuzzy’pixels, that is the pixels that cannot be easily classified asbackground or foreground pixels. For this reason, the Sp

set of pixels are usually obtained by using the FCM classi-fier [6]. Thus, these pixels can be defined as the pixels withhigh vagueness, as they come up from the application of theFCM method. To achieve this, the image is binarised firstusing the FCM method and then the fuzzy pixels aredefined as those pixels having membership function (MF)values close to 0.5. That is, the pixels with MF valuesclose to 0.5 are the vague ones and their degree of vague-ness depend on how close to 0.5 are these values.

According to the above analysis, the proper Sp set isobtained by the following procedure:

(a) Apply the FCM globally to the original image and foronly two classes. After this, each pixel has two MFvalues: MF1 and MF2.

69

Page 4: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

(b) Scale the MF values of each pixel (x, y) to the range [0,255] and produce a new grey-scale image Iv, using therelation:

Ivðx; yÞ ¼ round 255ðmaxðMF1;MF2Þ �minðMF1;MF2Þ

1�minðMF1;MF2Þ

� �ð6Þ

(c) Apply the global binarisation technique of Otsu toimage Iv and obtain a new binary image Ip. The set ofpixels Sp is now defined as the pixels of Ip that have valueequal to zero.

Instead of using the FCM, the Sp set of pixels can be alter-natively obtained as:

(i) Edge pixels extracted from the original image by theapplication of an edge extraction mask. This is a goodchoice because the fuzzy pixels belong or are close to theedge pixels. The automatic extraction of edge pixels ismade by applying the Sobel’s technique initially, whereasthe final threshold value is obtained by the Otsu’s technique.(ii) Random pixels sampled from the entire image. Thisoption is used in order to adjust the number of trainingpixels as a percentage of the total number of image pixels.

Stage 4: Define, as the training set ST, the binary values ofthe N binary images, obtained by the N independent tech-niques that correspond to the positions of pixels Sp set.Stage 5: Define the number of the output neurons of theKSOM and feed the neural network with the values of theST set. Let K be the proper number of the output neurons.After training, the centres of the output classes obtained cor-respond to Ok, k ¼ 1, . . . , K vectors having N elements.Stage 6: Classify each output class (neuron) as backgroundor foreground by examining the Euclidean distances of theircentres from the [0, . . . , 0]T and [1, . . . , 1]T vectors, whichrepresent the background and foreground in the featurespace, respectively. That is

Ok ¼background class if

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPNi¼1 O2

kðiÞ

q,

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPNi¼1

½OkðiÞ � 1�2

s

foreground class otherwise

8><>:

ð7Þ

Stage 7: This stage is the mapping stage. Each pixel corre-sponds to a vector of N elements with values the binaryresults obtained by the N IBT. During the mappingprocess, each pixel is classified by the KSOM to one ofthe obtained classes, and consequently to background orforeground pixel.Stage 8: A post-processing step can be applied to improveand emphasise the final binarisation results. This stepusually includes size filtering or the application of modifieddirectional morphological filters (MDMF) [34].

3 Adaptive logical level technique

The ALLT [19] is one of the most powerful binarisationtechniques. It is an extension of the LLT, proposed byKamel and Zhao [18]. The technique is based on thepre-calculation of the characters’ SW value (averagewidth of the characters’ strokes) of the document image.Using the SW, the LLT processes each pixel (x, y) of theimage I(x, y) by simultaneously comparing its grey-level,

70

or its smoothed grey-level values g(x, y), with four localaverages in the (2SWþ 1) � (2SWþ 1) window centredat the four points Pi, Piþ1, Pi

0, P 0iþ1 shown in Fig. 2. Ituses 1 to represent character/object and 0 to represent back-ground in the resulting binary image. To binarise an imageusing LLT, three steps are followed:

1. Calculate the local mean value of each pixel in the(2SWþ 1) � (2SWþ 1) sub-region

meanðPÞ ¼

P�SW�i�SW

P�SW�j�SW IðPx � i;Py � jÞ

ð2SWþ 1Þ2ð8Þ

where Px, Py are the coordinates of pixel P.2. Calculate L(P) for each pixel, according to the formula

LðPÞ ¼true if jmeanðPÞ � gðx; yÞj . T

false otherwise

�ð9Þ

where g(x, y) is the grey-value of the centred pixel or itssmoothed grey-value and T a pre-determined threshold.3. Finally, each pixel is binarised as follows

Ibðx; yÞ ¼1

ifS3

i¼0 ½LðPiÞ ^ LðP0iÞ ^ LðPiþ1Þ

^ LðP0iþ1Þ� is true

0 otherwise

8<: ð10Þ

where Ib(x, y) is the final binary image, and

P0i ¼ ½Piþ4 mod 8� for i ¼ 0; . . . ; 7 ð11Þ

Yang and Yan [19] proposed an improvement version of theLLT, which can be considered as an adaptive technique forautomatic calculation of the local threshold T. This tech-nique performs binarisation by using a unique SW for theentire image. Specifically, in order to find some bilevellocal histograms, the image is divided into N � N (N [[4, 8]) sub-images. Afterwards, the global SW is calculatedby construction of a run-length histogram using only thebilevel regions. This SW value is then used for the wholeimage. The final document binarisation is obtained byapplying the procedure defined by (9) and the properthreshold values for T are obtained locally according tothe following steps:

Step 1: For each pixel, the local maximum and minimumvalues are calculated in a window W of size

Fig. 2 Arrangement of the logical levels Pi in the (2SWþ 1) �(2SWþ 1) window, centred in the (x, y) pixel

IET Image Process., Vol. 1, No. 1, March 2007

Page 5: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

(2SWþ 1) � (2SWþ 1) centred in the processing pixel

fmaxðx; yÞ ¼ maxðxi;yiÞ[W

f ðxi; yiÞ ð12Þ

fminðx; yÞ ¼ minðxi;yiÞ[W

f ðxi; yiÞ ð13Þ

Step 2: In window W, calculate the absolute difference ofthe average value from the maximum and minimum grey-level values in the window

dmin ¼ j fminðx; yÞ � aveðPÞj ð14Þ

dmax ¼ j fmaxðx; yÞ � aveðPÞj ð15Þ

Step 3: If dmax . dmin, then window W contains more lowgrey-levels. Therefore

T ¼ a2

3fminðx; yÞ þ

1

3aveðPÞ

� �ð16Þ

where a is a fixed value between 0.3 and 0.8.Step 4: If dmax , dmin, then window W contains more highgrey-levels. Therefore

T ¼ a1

3fminðx; yÞ þ

2

3aveðPÞ

� �ð17Þ

Step 5: If dmax ¼ dmin, then

(a) If fmax(x, y) ¼ fmin(x, y) expand the windows size to(2SWþ 3) � (2SWþ 3). Then repeat from Step 1 usingthe new window size. If in the new window still fmax

(x, y) ¼ fmin(x, y), then the processing pixel Ib(x, y) isconsidered as background pixel.(b) If fmax(x, y) = fmin(x, y) then the window W tends tocontain the same quantity of low and high grey-levels. Inthis case, expand the size of window W to(2SWþ 3) � (2SWþ 3) and repeat all the steps from thebeginning (Step 1). If still dmax ¼ dmin and fmax(x, y) =fmin(x, y), then T ¼ a . mean(P).

3.1 Improving the ALLT

The most important parameter of the LLT and ALLT is theSW of the characters. As it is mentioned above, the SWdefines the size of the local processing window. It isobvious that the SW must be adapted to the characterswith local geometric characteristics and must not be thesame, as in the ALLT, for all pixels. In our approach andin order to estimate better and locally the SW of the charac-ters, the document image is divided in sub-images. Usingthe LHA technique (the description of the LHA techniqueis given in Section 3.2), it can be easily determined if asub-image corresponds to a bilevel histogram, that is ifthe sub-image contains mainly text and backgroundpixels. Then, the bilevel sub-images are converted into abinary form by using locally the Otsu’s technique. Afterthis, for each sub-image, a run-length histogram isconstructed and the proper SW values for the specific sub-images are locally obtained. The proper SW values for therest sub-images are then estimated by using a linear inter-polation procedure. Instead of the ALLT, the proposednew approach can be used with the most types of digitaldocuments even if those include characters of non-uniformsizes, boldface text parts, and in general, characters thatobviously have different SW sizes.

IET Image Process., Vol. 1, No. 1, March 2007

In summary, the proposed technique, for backgroundanalysis and proper SWs estimation, has the followingsteps:

Step 1: Divide the original image into N � M, sub-images.Usually, (N, M) [ [1, 20].Step 2: Calculate the local histogram for each sub-image.Step 3: Apply the LHA technique to define whichsub-images have bimodal histograms. If the number ofbilevel sub-images is , 0.3 N . M increase N or M and goto Step 1.Step 4: Convert the bimodal sub-images to the binary formby using locally the Otsu’s technique.Step 5: Calculate the local run-length histogram of eachbinary image obtained in the previous steps. A run-lengthhistogram is defined as an one-dimensional array R(i),i ¼ 1, . . . , L, where L is the longest run to be counted.R(i) is the frequency of the run of length i. We countblack-runs in horizontal and vertical directions.Step 6: Define the SW value for each sub-image as thehighest peak of the local run-length histogram. At the endof this step, the proper SW values for sub-images havingbimodal histograms have been obtained.Step 7: For those sub-images in which the SW value are notobtained in the previous step (without bimodal histogram),define the SW value as the average value of the SWs in their3 � 3 sub-images neighbourhood.Step 8: If there are still sub-images with SW values equal to0, the SW values are defined as the average values, of all theSW values calculated in the previous steps.Step 9: Apply (9) to all pixels by using the calculated SWsand the adaptive threshold values for T.

The main stages of the improved ALLT are depicted inFig. 3. An example that demonstrates the usefulness ofthe proposed improvements of the ALLT is demonstratedin Figs. 4a– f. First, the original ALLT is applied withoutimage subdivision and a global SW value is used(Figs. 4b–c). In the second case, the original image isdivided into 5�3 sub-images and the SW of the charactersis calculated as the mean value of the local SWs (Figs. 4dand e). In both cases, the large letters are considered asnoise and therefore are removed. In the last case, the binar-isation was performed by dividing the image into 5�3 sub-images. Each sub-image got its own SW and as it can beobserved from Fig. 4f, the final binarisation is satisfactory.The noise reduction is achieved with a size-filtering tech-nique, during which all the objects that are consideredhaving less pixels than a threshold (90 in this example)are removed from the image.

Another characteristic example is shown in Figs. 5 and 6.Fig. 7 depicts the procedure for the calculation of the SWvalues for the document of Fig. 5. It is noted that for thisdocument, the technique of Yang and Yan gives a globalSW ¼ 4. The binary images obtained by the applicationof ALLT and the proposed improved ALLT are shown inFig. 6.

3.2 The LHA technique

As it is mentioned before, the proposed improved versionof the ALLT uses the LHA technique. This is a criticalstage for the entire binarisation procedure. Perfect esti-mation of the bimodality of the sub-image histogramsleads to accurate calculation of the SW values. TheLHA used is a technique that approximates the histogramcurve with line segments [31, 32]. If h(n), n ¼ 0, . . . , 255,is the image histogram, then the LHA algorithm starts by

71

Page 6: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

Fig. 3 Main stages of the improved ALLT

considering as a first histogram approximation the linesegment with end-points (0, h(0)) and (255, h(255)).Then, the histogram point h(k) with the biggest distancefrom that line is obtained and the histogram is nowapproximated by the line segments (0, h(0))–(k, h(k))and (k, h(k))–(255, h(255)). This procedure is repeatedfor each new line segment, independently, until themaximum distance obtained on all line segments is�0.002 of the local peak. The LHA procedure can be con-sidered as a histogram-smoothing technique that can beused for the estimation of the number of main histogrampeaks. The LHA technique is simple, fast and it hasbeen proved that it gives very good results [31, 32]. Inour case, the LHA technique is used to decide if a

72

sub-image is bimodal or not, by checking if its localhistogram has two, well-distinguished hills. Thus, theshape of such bilevel histogram is always wellapproximated by the LHA technique. An example thatdemonstrates the application of the LHA technique isgiven in Fig. 8.

4 Improved integrated function algorithm

The IIFA [20, 21] is based on the extraction of a three-levelimage (þ, 2, 0). The main stages of the technique are asfollows:

Fig. 4 Example demonstrating the usefulness of the proposed improvements of the ALLT

a Original imageb Application of the LLT with a global SWc Size filtering of image bd Application of ALLT using a mean SW value calculated from the 5�3 sub-imagese Size filtering of image df Final image obtained using the improved ALLT with local SW

IET Image Process., Vol. 1, No. 1, March 2007

Page 7: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

Stage 1: Apply a smoothing mask to the original image toobtain its smoothed version Z(x, y).Stage 2: Calculate the activity values A(x, y) of thesmoothed image using the relation

Aðx; yÞ ¼X1

i¼�1

X1

j¼�1

aðxþ i; yþ jÞ ð18Þ

where

aðx; yÞ ¼@Zðx; yÞ

@x

��������þ @Zðx; yÞ

@y

�������� ð19Þ

Fig. 5 Document image under badly illuminated conditions

IET Image Process., Vol. 1, No. 1, March 2007

Stage 3: Define the threshold value TA which classifies the‘0’ from the ‘þ’ and ‘2’ pixels and then calculate theLaplacian r2Z(x, y) for all pixels of the smoothed imagewhich have A(x, y) � TA. Calculate the three-level imageas follows

Lðx; yÞ ¼

0 if Aðx; yÞ , TA

� if Aðx; yÞ � TA and r2Zðx; yÞ , 0

þ if Aðx; yÞ � TA and r2Zðx; yÞ � 0

8<: ð20Þ

Fig. 7 Calculation of the SWs for the image shown in Fig. 5

a Initially, SW values are obtained only for the regions with bimodallocal histogramsb Calculation of the SW values through a 3�3 neighbourhood maskc Calculation of the SW values of the sub-images as the average valuesof their neighbourhood

Fig. 6 Binary images obtained

a Binarisation using the ALLTb Binarisation using the proposed technique

73

Page 8: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

Stage 4: For each closed area in L (‘0’ and ‘2’ pixels sur-rounding by ‘þ’ pixels), calculate the number of ‘þ’ (Nþ)and ‘2’ (N2) pixels. In the areas where Nþ . N2 relabelall the pixels as ‘þ’ pixels. Calculate the binary image byconverting all ‘þ’ pixels to foreground pixels and all ‘0’and ‘2’ pixels to background pixels.Stage 5: Remove false print objects, using the gradient ofthe smoothed version of the original image and a predeter-mined threshold Tp. Objects that have average gradientvalue of their edge pixels below than a specified thresholdTp are removed.

4.1 Additional improvements for the IIFA

One of the most important disadvantages of the IIFA is thedifficulty in defining the proper value for the TA threshold.To overcome this, we initially use the Otsu’s technique,which is applied to the activity version of the documentimage A(x, y), so as to have an automatically determinedthreshold value for TA from the A(x, y) image. This pro-cedure is applied initially and before the application ofthe rest steps of the IIFA.

It is also proposed, the use of a new threshold value Tf

which defines if a closed area must be filled or not withforeground pixels. Specifically, Tf defines the ratio of ‘þ’and ‘2’ pixels in the area that must be satisfied in orderthe area to be filled. The binarisation result obtained bythe application of the original IIFA to the documentimage of Fig. 9 is depicted in Fig. 10. As it can beobserved in Fig. 10, the binary image contains a fewsmall areas, which have been filled with wrong objectpixels (i.e. the internal area of character A). Thishappens because the perimeter pixels of the areas,which are normally ‘þ’, are more than the ‘2’ pixelswhich are within the area, because of its small surface.In the original IIFA, an area can be filled with objectpixels, if it contains just one ‘þ’ pixel more than the‘ 2 ’ pixels of the area. This is the reason that producesthe specific problem with the small areas. In order toovercome this, we introduce the new threshold Tf,which controls the decision that must be taken for the

Fig. 8 Example of the application of the LHA technique

Fig. 9 Original image

74

filling of the areas. Practically, through this threshold,we determine the ratio of ‘þ’ pixels in relation to ‘2’pixels that a closed uniform area of the three-levelimage should have, so that all of its pixels to be charac-terised as foreground pixels. As depicted in Fig. 11, usinga threshold value equal to 0.6, the filling of the smallareas (such that of character ‘A’) is avoided. From theexperiments, we have found that a proper thresholdvalue must be between 0.5 (for large size characters)and 0.65 (for small size characters).

5 Bernsen’s technique

This technique, proposed by Bernsen [7], is a local binarisa-tion technique, which is based on the estimation of a localthreshold value for each processing pixel. The localthreshold value for each pixel (x, y) is calculated by therelation

T ðx; yÞ ¼Plow þ Phigh

2ð21Þ

where Plow and Phigh are the lowest and the highest grey-level value in a N � N window centred in the pixel (x, y),respectively. This value is assigned as the local thresholdvalue only if the deference between the lowest and thehighest grey-level value is bigger than a threshold L.Otherwise, it is assumed that the window region containspixels of one class (foreground or background). In ourapproach and in order to classify the pixels of theseregions, the global threshold value, as it is calculated fromthe technique of Otsu, is used to binarise them locally.Summarising, the proper local threshold values are calcu-lated as follows

T ðx; yÞ ¼

Plow þ Phigh

2if Phigh � Plow � L

GT if Phigh � Plow , L

8<: ð22Þ

where GT is the global threshold value calculated from theapplication of the technique of Otsu to the entire image.

Fig. 10 Binarisation result of the image of Fig. 9 using theoriginal IIFA

Fig. 11 Final binary image obtained by the application of theIIFA using Tf ¼ 0.6

IET Image Process., Vol. 1, No. 1, March 2007

Page 9: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

Fig. 12 Binarisation of a noisy document using all the techniques included in the proposed binarisation system

a Original grey-level imageb Binarisation obtained by the application of Otsu’s techniquec Binarisation obtained by the application of FCM techniqued Binarisation obtained by the application of Bernsen’s techniquee Binarisation obtained by the application of Niblack’s techniquef Binarisation obtained by the application of ALLTg Binarisation obtained by the application of IIFAh Binarisation obtained by the application of Sauvola and Pietikainen’s techniquei Final binary image obtained by the proposed system

6 Niblack’s technique

This is also a powerful local thresholding binarisation tech-nique that is included in the proposed system. In theNiblack’s technique, the estimation of the local thresholdT(x, y) value is based on the calculation of the local meanand standard deviation values. Thus, the proper value ofthe threshold for each pixel is calculated as

T ðx; yÞ ¼ mðx; yÞ þ k � sðx; yÞ ð23Þ

where m(x, y) and s(x, y) are the local mean and standarddeviation values, respectively. The value of k is used toadjust how much of the entire object boundary is taken aspart of the given object. In Trier and Jain [24], a windowsize of 15 � 15 and a bias setting of k ¼ 20.2 werefound satisfactory. However, we have found that theproper value of k depends very much on the image

IET Image Process., Vol. 1, No. 1, March 2007

content and noise, and as it is demonstrated in our exper-iments, its value can be significantly different from theabove value. The size of the local window should besmall enough to preserve local details and large enough tosuppress noise.

7 Sauvola and Pietikainen’s technique

This technique [14, 15] is a local binarisation technique.The calculation of the local threshold value is based, as inthe previous technique, in the estimation of local meanand standard deviation of the grey-scale values.Specifically, the proper threshold T(x, y) value at pixel(x, y) is taken equal to

T ðx; yÞ ¼ mðx; yÞ 1þ ksðx; yÞ

R� 1

� �� �ð24Þ

75

Page 10: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

Fig. 13 Application of the proposed system to a complex document image

a Original imageb Binarisation obtained by the application of ALLTc Binarisation obtained by the application of IIFAd Binarisation obtained by the application of FCM techniquee Binarisation obtained by the application of Niblack’s techniquef Binarisation obtained by the application of Bernsen’s techniqueFor results obtained by other techniques, see Fig. 14

IET Image Process., Vol. 1, No. 1, March 200776

Page 11: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

Fig. 14 Application of the proposed system to a complex document image

a Binarisation obtained by the application of Otsu’s techniqueb Binarisation obtained by the application of Sauvola and Pietikainen’s techniquec Final image obtained by the application of the proposed systemFor the original image and for results obtained by other techniques, see Fig. 13

The parameter R is the dynamic range of standard deviationand the parameter k obtains positive values. According toour experiments, we have fount that the proper values forthese parameters are k ¼ 0.1 and R ¼ 128. It should benoticed that Sauvola and Pietikainen use in their exper-iments k ¼ 0.5 and R ¼ 128.

8 Experimental results

The proposed system was tested with a variety of documentimages most of which come from the old GreekParliamentary Proceedings and from the Mediateam OuluDocument Database [33]. Seven characteristic experimentsthat include comparative results are presented in this section.

8.1 Experiment 1

This example demonstrates the application of the proposedbinarisation system to the noisy document of Fig. 12a,which comes from old Greek Parliamentary Proceedings.For comparison reasons, the same document is binarisedusing a number of independent techniques that included in

IET Image Process., Vol. 1, No. 1, March 2007

the proposed binarisation system. Specifically, the appli-cation of the Otsu’s technique obtains a threshold valueT ¼ 167, which gives the document shown in Fig. 12b.The binarisation result of the application of the FCM isshown in Fig. 12c. The technique of Bernsen, which isapplied with a window size N ¼ 15 and with a thresholdL ¼ 100, results to the document of Fig. 12d. TheNiblack’s technique is applied with the same window sizeN ¼ 15 while k ¼ 21 and gives the binary image asshown in Fig. 12e. Fig. 12f depicts the results obtained bythe application of the ALLT in the whole image (withoutsub-images) and using a value for a equal to 0.15. The deter-mined SW value is obtained equal to 5. In the IIFA(Fig. 12g), a 3�3 mask is used for smoothing. The valuefor threshold TA is automatically calculated as we proposein this paper, and Tf ¼ 0.6. Finally, the result of the appli-cation of Sauvola and Pietikainen’s technique (windowsize N ¼ 15 and k ¼ 0.1) is shown in Fig. 12h.

The binarisation results obtained by the above binarisationtechniques feed the KSOM with five output neurons, three ofwhich are finally associated with the foreground and two forthe background pixels. Specifically, 2000 random samplesfrom each one of the seven binary images, obtained by

77

Page 12: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

Fig. 15 Proposed binarisation technique applied to a hand-written document

a Original imageb Binarisation obtained by the application of ALLTc Binarisation obtained by the application of IIFAd Binarisation obtained by the application of FCM techniquee Binarisation obtained by the application of Niblack’s techniquef Binarisation obtained by the application of Bernsen’s techniqueg Binarisation obtained by the application of Otsu’s techniqueh Binarisation obtained by the application of Sauvola and Pietikainen’s techniquei Final image obtained by the application of the proposed system

IET Image Process., Vol. 1, No. 1, March 200778

Page 13: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

Fig. 16 Samples of the images used in Experiment 4

a Ground-truth imagesb Noisy images

the IBT (Figs. 12b–h), were used for the training ofthe KSOM. To improve the final document image, asize filter is applied as a post-processing procedure. Withthis filter, objects .500 or ,20 pixels are removed. As wecan optically observe in Fig. 12i, the final binarisationresult obtained by the proposed system is superior to theresults of each one of the independent techniques. It can beobserved that background noise has been correctlyremoved, whereas the proper characters’ strokes areefficiently obtained.

IET Image Process., Vol. 1, No. 1, March 2007

8.2 Experiment 2

In this experiment, the proposed binarisation systemis applied for the binarisation of the complexdocument image shown in Fig. 13a. The binarisationresults of each technique are shown in Fig. 13b– f andFig. 14a–b. The proposed system was applied to producefour output neurons since the samples in the training were2000. The final step was chosen to be the application ofMDMF [34]. It can be easily observed that in the final

Mean Value of PSNR

17,64

13,94

20,52

10,79

19,43

20,32

11,67

21,37

ALLT

IIFA

FCM

NIBLACK

BERNSEN

OTSU

SAUVOLA

PROPOSED SYSTEM

Fig. 17 Histogram of the mean value of PSNR calculated in Experiment 4

79

Page 14: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

Mean PSNR Values

19,8

20,22

20,18

21,07

19,85

19,96

20,85

5,12125,02025,9191

ALLT

IIFA

FCM

NIBLACK

BERNSEN

OTSU

SAUVOLA

Exc

lude

dIB

T

Fig. 18 PSNR mean values obtained for the IBT in Experiment 5

image (Fig. 14c), the noisy pixels in the top region ofthe document were sufficiently removed (in spite of theresults for most of the other techniques). Furthermore, thecharacter strokes were strongly emphasised in comparisonto the results of the application of each independenttechnique.

8.3 Experiment 3

In this experiment, the proposed binarisation technique isapplied to the handwritten document shown in Fig. 15a.The output neurons of the KSOM are taken equal to fourand the samples in the training are taken equal to 2000.There is no any post-processing technique applied in thisexperiment. It must be noticed that the initial documentimage is of low quality and there are significant difficultiesfor binarisation that all the seven independent techniqueswere failed to overcome. Specifically, in the initialdocument image:

(i) There is a shadow in the bottom of the document. As itcan been seen, the techniques of Otsu (Fig. 15g), FCM(Fig. 15d ) and Bernsen (Fig. 15f ) failed to binarise effi-ciently this region of the document.(ii) The document has a non-uniform background, which isdifficult to be separated from the characters without leavingnoise. The binarisation techniques, which failed to removenoise efficiently, are Niblack’s (Fig. 15e) and Sauvola andPietikainen’s (Fig. 15h).(iii) Broken characters. This problem is presented mostly inthe binary results of the ALLT (Fig. 15b) and of theBernsen’s (Fig. 15f ) technique.(iv) Finally in the binary result of IIFA (Fig. 15c), it can beseen that the characters have SW greater than the characterSW of the original image. Furthermore, there are closedareas in many characters that are wrongly filled andclassified as part of the character.

80

Fig. 15i depicts the final image obtained by the proposedbinarisation technique. It can be easily observed thatwe lead to superior binarisation results and all of theabove difficulties are efficiently overcome.

8.4 Experiment 4

The above experiments present comparative results, using ascriterion the human perceived difference. However, in manycases, statistical criteria can be used to evaluate the results ofa binarisation technique. In this experiment, the well-knownpeak signal-to-noise ratio (PSNR) statistical criterion is usedfor the evaluation of the proposed system. PSNR is ameasure of how close is an image to another. Therefore,the higher the value of PSNR, the higher the similarity ofthe two images. The definition of PSNR is

PSNR ¼ 10 log2552

MSE

� �ð25Þ

where MSE is the mean squared error calculated from therelation

MSE ¼

PMx¼1

PNy¼1 Iðx; yÞ � I 0ðx; yÞ�� ��

MNð26Þ

Fig. 19 Captured image which is used as the ground-truth imagein Experiment 6

IET Image Process., Vol. 1, No. 1, March 2007

Page 15: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

Fig. 20 Noisy images produced from the image of Fig. 19 and their binarisation results obtained by the application of the proposed system

a Texture imageb Light imagec Texture–light imaged Texture–neon glow imagee Texture–blur image

where I(x, y) and I0(x, y) are the original and the processed

version of the same image and M and N are the dimensionsof the images.

In order to evaluate the proposed binarisation technique,we use clean document images to produce 20 noisy grey-scale images. The original clean documents are consideredas the ground-truth images for the evaluation tests. Thenoisy images are produced using the software developedby Sezgin and Sankur [26]. Three types of noise areimposed according to the following parameters:

† ‘Not much noise’: Blurring variance equal to 0.25 andspeckle variance 1 and variance 2 equal to 0.05 and 0.06,respectively.† ‘Medium noise’: Blurring variance equal to 0.85 andspeckle variance 1 and variance 2 equal to 0.15 and 0.06,respectively.

IET Image Process., Vol. 1, No. 1, March 2007

† ‘Much noise’: Blurring variance equal to 1.25 andspeckle variance 1 and variance 2 equal to 0.15 and 0.06,respectively.

Samples of the noisy images used in this evaluation testare shown in Fig. 16. Having 20 noisy grey-scale images,we binarise them using each one of the IBT and also withthe proposed binarisation technique using four outputneurons. It should be noticed that the independent tech-niques are applied using their proper values, as they givenin the literature, which are the same as in Experiment 1.Each one of the binary image obtained is compared withthe corresponding ground-truth image. The PSNR valuesfor every binarisation result and also for the proposedtechnique are calculated. The histogram shown in Fig. 17is constructed using the PSNR mean values. As it can beobserved from this histogram, the proposed binarisationtechnique leads to the best mean PSNR value.

81

Page 16: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

PSNR Values

-

5.00

10.00

15.00

20.00

25.00

30.00

ALLT 18.62 25.32 18.99 16.34 14.82

IIFA 16.57 16.73 16.80 14.75 12.43

FCM 21.25 3.73 3.04 14.00 16.66

NIBLACK 9.76 20.52 13.64 9.38 8.08

BERNSEN 20.97 4.93 4.70 16.22 16.38

OTSU 21.11 3.78 3.16 18.11 14.82

SAUVOLA 18.43 16.29 18.45 13.39 10.91

PROPOSED SYSTEM 28.26 26.52 25.97 24.50 18.42

rulB erutxeTwolGnoeN erutxeTthgiL erutxeTthgiLerutxeT

Fig. 21 Analytical histogram of the PSNR values calculated for the images shown in Fig. 20

8.5 Experiment 5

In order to have an evaluation of the importance of eachIBT, the proposed binarisation system is applied to the 20noisy images produced in the previous experiment by com-bining each time only six instead of the seven IBT. That is,for each document image, seven applications are performed,where in each application a different IBT is excluded for thesystem. In this way, for each processing image, seven binar-isation results are obtained. Comparing these binarisation

82

results, we can state that the binarisation technique that isvery important to the system is the technique that when itis not combined in the system, the worst binarisationresult is produced. In reverse, the less important techniqueof the system is the technique, the absence of which leadto the best binarisation result.

The comparison of the binarisation results is performed,as in the previous experiment, using the PSNR meanvalues. The histogram shown in Fig. 18 is constructedusing these values obtained by the application of the

Mean Value of PSNR

18,82

15,46

11,74

12,27

12,64

12,19

15,49

24,73

ALLT

IIFA

FCM

NIBLACK

BERNSEN

OTSU

SAUVOLA

PROPOSED SYSTEM

Fig. 22 Histogram of the mean value of PSNR calculated for the images shown in Fig. 20

IET Image Process., Vol. 1, No. 1, March 2007

Page 17: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

proposed system to the 20 noisy images. As it can beobserved from this histogram, the result of the proposedsystem produced when the Niblack’s technique is notincluded achieves the best PSNR mean value. This meansthat the specific binarisation technique affect the perform-ance of the proposed system less than any other techniquein contrast to the ALLT, the absence of which significantlydecreases the performance of the system (worst PSNR meanvalue in the histogram).

Fig. 23 Sample of document images used in Experiment 7

a Ground-truth imageb Noisy imagec Binarisation result obtained

IET Image Process., Vol. 1, No. 1, March 2007

8.6 Experiment 6

In this experiment, in order to examine the proposed binar-isation technique in different kind of noise (not only for thespeckle noise), we use the clear document image of Fig. 19to produce five noisy grey-scale images using theeffects-tools of PaintShop Pro 7 (textures, lights, neonglow, blur and combination of them).

As in the previous experiment, the binarisation resultsobtained by the application of the proposed system andthe IBT are compared using the PSNR values. The proposedsystem uses four output neurons and a size filter (,5 pixelsand .600 pixels) as a post-processing procedure. The fiveinitial noisy document images with the binarisation resultsobtained by the application of the proposed binarisationtechnique are shown in Fig. 20.

Each one of the binarisation results is compared with theground-truth image shown in Fig. 19. The analytical

Table 1: Scores of the five criteria obtained for thedocument image of Fig. 22 during the evaluation test inExperiment 7

MHD ME EMM NU RAE AVE

ALLT 3.6050 0.3037 0.7800 0.2191 0.5260 1.0867

IIFA 0.9680 0.2496 0.1360 0.1671 0.2520 0.3545

FCM 0.8820 0.2479 0.1060 0.1657 0.2270 0.3257

Niblack 3.7550 0.3254 0.8240 0.1981 0.5660 1.1337

Bernsen 1.7190 0.2930 0.4740 0.2154 0.4860 0.6375

Otsu 1.3530 0.2507 0.1490 0.1748 0.2710 0.4397

Sauvola 3.7170 0.3832 0.8580 0.3137 0.7140 1.1972

Proposed

system

0.8770 0.2461 0.0990 0.1570 0.1900 0.3138

0,000

1,000

2,000

3,000

4,000

5,000

6,000

ALLT 4,739 0,136 0,851 0,213 0,528 1,293IIFA 0,839 0,073 0,115 0,168 0,211 0,281

FCM 0,466 0,070 0,075 0,166 0,181 0,192

NIBLACK 4,886 0,185 0,890 0,200 0,606 1,353BERNSEN 3,795 0,147 0,730 0,235 0,577 1,097

OTSU 0,625 0,072 0,091 0,171 0,206 0,237SAUVOLA 4,833 0,230 0,905 0,313 0,733 1,403

PROPOSED SYSTEM 0,463 0,069 0,069 0,160 0,155 0,183

MH ME EMM GTNU RAE AVE

Fig. 24 Analytical histogram of the mean values of the scores obtained during Experiment 7

83

Page 18: Document binarisation using Kohonen SOM binarisation using Kohonen SOM.pdfDocument binarisation using Kohonen SOM E. Badekas and N. Papamarkos Abstract: An integrated system for the

histogram of all PSNR values that are calculated in thisevaluation test is shown in Fig. 21. Fig. 22 shows the histo-gram constructed using the mean PSNR value. The score thatachieves the proposed binarisation technique is significantlybetter than the scores of the other binarisation techniques.

8.7 Experiment 7

In this last experiment, the proposed system is evaluatedusing the five criteria proposed by Sezgin and Sankur [26].These criteria are the modified Hausdorff distances(MHD), the misclassification error (ME), the edge mismatch(EMM), the region non-uniformity (NU) and the relativeforeground area error (RAE). All of these criteria, exceptNU, need the ground-truth binary image in order to becalculated. The expected values of the above criteria varybetween [0, 1], except the MHD values. In all cases, themeasure that is closer to zero corresponds to the bestbinarisation result. In order to obtain an average performancescore from the previous criteria, the average value (AVE) ofthe five criteria is calculated. Fig. 23 presents a sample of thedocument images used in this specific experiment. Theanalytical score values of the five criteria obtained for thisdocument image are shown in Table 1. The score values ofthese five criteria are calculated for 30 document images,and the histogram shown in Fig. 24 is constructed usingthe mean score values obtained. As in the previousexperiments, the proposed system achieved the best scorein each one of the independent five criteria.

9 Conclusions

The authors have proposed a new binarisation techniquesuitable for degraded digital documents. The main aim ofthe proposed binarisation technique is to build a systemthat takes advantages of the benefits of a set of IBT by com-bining their results using the KSOM neural network. Inorder to improve further the binarisation results, significantimprovements are proposed for the ALLT and IIFA, whichare powerful techniques for document binarisation. The pro-posed binarisation technique was extensively tested with avariety of degraded document images. Many of themcame from standard databases such as the MediateamOulu Document Database and the old GreekParliamentary Proceedings. Several experimental resultsare presented that confirm the effectiveness of the proposedsystem. The entire system is implemented in visual environ-ment using Delphi 7.

10 Acknowledgment

The work reported in this paper was partially supported bythe project Archimedes of TEI Serron.

11 References

1 Otsu, N.: ‘A thresholding selection method from gray-levelhistogram’, IEEE Trans. Syst. Man Cybern., 1979, 8, pp. 62–66

2 Kittler, J., and Illingworth, J.: ‘Minimum error thresholding’, PatternRecognit., 1986, 19, (1), pp. 41–47

3 Reddi, S.S., Rudin, S.F., and Keshavan, H.R.: ‘An optimal multiplethreshold scheme for image segmentation’, IEEE Trans. Syst. ManCybern., 1984, 14, (4), pp. 661–665

4 Kapur, J.N., Sahoo, P.K., and Wong, A.K.: ‘A new method forgray-level picture thresholding using the entropy of the histogram’,Comput. Vis. Graph. Image Process., 1985, 29, pp. 273–285

5 Papamarkos, N., and Gatos, B.: ‘A new approach for multilevelthreshold selection’, CVGIP, Graph. Models Image Process., 1994,56, (5), pp. 357–370

84

6 Chi, Z., Yan, H., and Pham, T.: ‘Fuzzy algorithms: with applicationsto image processing and pattern recognition’ (World ScientificPublishing, 1996)

7 Bernsen, J.: ‘Dynamic thresholding of grey-level images’. Proc. 8thInt. Conf. on Pattern Recognition, Paris, 1986, pp. 1251–1255

8 Chow, C.K., and Kaneko, T.: ‘Automatic detection of the left ventriclefrom cineangiograms’, Comput. Biomed. Res., 1972, 5, pp. 388–410

9 Eikvil, L., Taxt, T., and Moen, K.: ‘A fast adaptive method forbinarization of document images’. Proc. ICDAR, France, 1991,pp. 435–443

10 Mardia, K.V., and Hainsworth, T.J.: ‘A spatial thresholding methodfor image segmentation’, IEEE Trans. Pattern Anal. Mach. Intell.,1988, 10, (8), pp. 919–927

11 Niblack, W.: ‘An introduction to digital image processing’ (Prentice-Hall, Englewood Cliffs, NJ, 1986), pp. 115–116

12 Taxt, T., Flynn, P.J., and Jain, A.K.: ‘Segmentation of documentimages’, IEEE Trans. Pattern Anal. Mach. Intell., 1989, 11, (12),pp. 1322–1329

13 Yanowitz, S.D., and Bruckstein, A.M.: ‘A new method for imagesegmentation’, Comput. Vis. Graph. Image Process., 1989, 46, (1),pp. 82–95

14 Sauvola, J., Seppanen, T., Haapakoski, S., and Pietikainen, M.:‘Adaptive document binarization’. Proc. 4th Int. Conf. on DocumentAnalysis and Recognition, Ulm Germany, 1997, pp. 147–152

15 Sauvola, J., and Pietikainen, M.: ‘Adaptive document imagebinarization’, Pattern Recognit., 2000, 33, pp. 225–236

16 Gorman, L.O.: ‘Binarization and multithresholding of documentimages using connectivity’, CVGIP, Graph. Models Image Process.,1994, 56, (6), pp. 494–506

17 Liu, Y., and Srihari, S.N.: ‘Document image binarization based ontexture features’, IEEE Pattern Anal. Mach. Intell., 1997, 19, (5),pp. 540–544

18 Kamel, M., and Zhao, A.: ‘Extraction of binary character/graphicsimages from gray-scale document images’, CVGIP, Graph. ModelsImage Process, 1993, 55, (3), pp. 203–217

19 Yang, Y., and Yan, H.: ‘An adaptive logical method for binarisationof degraded document images’, Pattern Recognit., 2000, 33,pp. 787–807

20 White, J.M., and Rohrer, G.D.: ‘Image segmentation for opticalcharacter recognition and other applications requiring characterimage extraction’, IBM J. Res. Dev., 1983, 27, (4), pp. 400–411

21 Trier, O.D., and Taxt, T.: ‘Improvement of ‘integrated functionalgorithm’ for binarisation of document images’, Pattern Recognit.Lett., 1995, 16, pp. 277–283

22 Papamarkos, N.: ‘A neuro-fuzzy technique for documentbinarisation’, Neural Comput. Appl., 2003, 12, (3–4), pp. 190–199

23 Trier, O.D., and Taxt, T.: ‘Evaluation of binarization methods fordocument images’, IEEE Trans. Pattern Anal. Mach. Intell., 1995,17, (3), pp. 312–315

24 Trier, O.D., and Jain, A.K.: ‘Goal-directed evaluation of binarizationmethods’, IEEE Trans. Pattern Anal. Mach. Intell., 1995, 17, (12),pp. 1191–1201

25 Leedham, G., Yan, C., Takru, K., and Mian, J.H.: ‘Comparison ofsome thresholding algorithms for text/background segmentation indifficult document images’. Proc. 7th Int. Conf. on DocumentAnalysis and Recognition, 2003, pp. 859–865

26 Sezgin, M., and Sankur, B.: ‘Survey over image thresholdingtechniques and quantitative performance evaluation’, J. Electron.Imaging, 2004, 13, (1), pp. 146–165

27 Haykin, S.: ‘Neural networks: a comprehensive foundation’(Macmillan College Publishing Company, New York, 1994)

28 Kohonen, T.: ‘The self-organizing map’, Proc. IEEE, 1990, 78, (9),pp. 1464–1480

29 Kohonen, T.: ‘Self-organizing maps’ (Springer Verlag, Berlin, 1997,2nd edn.)

30 Strouthopoulos, C., Papamarkos, N., and Atsalakis, A.: ‘Textextraction in complex color documents’, Pattern Recognit., 2002,35, (8), pp. 1743–1758

31 Atsalakis, A., Papamarkos, N., and Andreadis, I.: ‘On estimation ofthe number of image principal colors and color reduction throughself-organized neural networks’, Int. J. Imaging Syst. Technol.,2002, 12, (3), pp. 117–127

32 Atsalakis, A., Andreadis, I., and Papamarkos, N.: ‘Histogram basedcolor reduction through self-organized neural networks’, Lect. NotesComput. Sci., 2001, 2130, pp. 470–476

33 Sauvola, J., and Kauniskangas, H.: ‘MediaTeam Document DatabaseII, a CD-ROM collection of document images’, University of Oulu,Finland, 1999

34 Ping, Z., and Lihui, C.: ‘Document filters using morphological andgeometrical features of characters’, Image Vis. Comp., 2001, 19,pp. 847–855

IET Image Process., Vol. 1, No. 1, March 2007