self-generation voting based method for handwritten chinese character recognition

Self-generation Voting Based Method for Handwritten

Chinese Character Recognition

Yun-Xue SHAO1 Chun-Heng WANG1 Bai-Hua XIAO1 Lin-Bo ZHANG2

Abstract: Voting strategy is very useful in pattern recognition. Many methods, like Boosting and Bagging, are proposedand are successfully used in some applications using this strategy. However, these methods are infeasible or unsuitable forhandwritten Chinese character recognition because of the problem′s characteristics. In this paper, a self-generation votingmethod is proposed for further improving the recognition rate in handwritten Chinese character recognition. This methodlearns a set of parameters first for generating a set of samples from the test sample, and then classify these generated samplesusing a base-line classifier. At last, it gives the final recognition result by voting. Experimental results on two databases showthat the proposed method is effective and useful in handwritten Chinese character recognition systems.

Keywords: Handwritten Chinese character recognition, self-generation voting, modified quadratic discriminant func-

tion, line density equalization

The problem of off-line handwritten Chinese charac-ter recognition (HCCR) has been investigated by manyresearchers over a long time, and great improvementshave been achieved[1−10]. Some important techniques in-clude: 1) The nonlinear normalization based on line den-

sity equalization[1−2] is proposed for character image nor-malization. 2) The local stroke direction feature[3−5] isproposed for feature extraction. 3) The linear discrimi-nant analysis (LDA) is used for feature dimensionality re-duction. 4) The modified quadratic discriminant function

(MQDF)[3] is used for character recognition. These tech-niques are used in this paper for the base-line classifier. Forclassifying a large category set, many techniques like ar-tificial neural networks (ANNs), support vector machines(SVMs), bagging or boosting become infeasible because ei-ther the training time or the classification time becomesunacceptable. However, some good ideas existing in thesemethods are inspiring and can be used in HCCR. For ex-ample, the voting strategy is used in the proposed methodfor further improving the recognition accuracy of the base-line classifier. The proposed self-generation voting basedmethod learns a set of parameters for generating a set oftest samples and learns a set of weights for the final vot-ing. Experimental results on two databases show that theproposed method is effective and useful in practice.

The rest of the paper is organized as follows. First, theframework of the self-generation voting is proposed in Sec-tion 1. Then the self-generation method and the parameterlearning method for HCCR are given in Section 2. Section3 presents the experimental setup and results. Finally, aconclusion is given in Section 4.

Manuscript received March 9, 2012; revised June 15, 2012Supported by National Natural Science Foundation of China

(61172103, 60933010, 60835001)Recommended by Associate Editor Cheng-Lin LIUCitation: Yun-Xue Shao, Wang Chun-Heng, Bai-Hua Xiao, Lin-Bo

Zhang. Self-generation voting based method for handwritten Chinesecharacter recognition. Acta Automatica Sinica, 2013, 39(4): 450−4541. State Key Laboratory of Management and Control for Complex

Systems, Institute of Automation, Chinese Academy of Sciences, Bei-jing 100190, China 2. China Academy of Transportation SciencesBeijing 100029, China

1 Self-generation voting

Bootstrap aggregation (bagging) or boosting are generaleffective techniques for improving prediction rules. Bag-ging is proposed by Breiman[11] to improve the classificationby combining classifications of randomly generated train-ing sets. Given a standard training set D of size n, bag-ging generates m new training sets Di each of size np > n,by sampling examples from D uniformly and with replace-ment. The m models are fitted using the m bootstrap train-ing sets and combined by voting. Boosting is a committeebased approach that can be used to improve the accuracyof classification methods. It is based on the question thatcan a set of weak learners create a single strong learner? Aweak learner is defined to be a classifier which is slightlybetter than random guessing. In contrast, a strong learneris well-correlated with the true classification. Most boost-ing algorithms consist of iteratively learning weak classifierswith respect to a distribution and adding them to the finalstrong classifier. After a weak learner is added, the datais reweighted to make sure that future weak learners focuson the examples that previous weak learners misclassified.Unlike bagging, which uses a simple averaging of results toobtain an overall prediction, boosting uses a weighted aver-age of results obtained from the weak learners. The weightis usually related to the weak learners′ accuracy.

In the following, the classifier trained on the bootstrapsamples is treated as the weak classifier. The strong clas-sifier is the voting of the weak classifiers. These votingmethods learn a weak classifier ht(x) and its correspondingweight α from a training set generated from the originaltraining set by some resample methods. The final classifieris the voting of the weak classifiers.

H(xxx) =

T∑t=1

αtht(x). (1)

This kind of voting strategy is referred to as the multi-classifier voting in this paper. Given a test sample x, eachweak classifier gives its prediction, the final prediction is thevoting of the weak classifier′s prediction. Fig. 1 illustratesthe Multi-classifier voting strategy.

In handwritten Chinese character recognition, it is diffi-cult to learn a set of weak classifiers using bagging or boost-

Yun-Xue SHAO et al./ Self-generation Voting Based Method for Handwritten Chinese Character Recognition 451

ing because of large number of classes and similar characterpairs existing in this problem and lacking of sufficient train-ing samples. How can we use the voting strategy to furtherimprove the performance of one classifier? We have one testsample x and one classifier h(x) in hand, and now we wantto use the voting strategy, but it is difficult to generatemulti-classifiers, so the only way is to generate multi-testsamples {x1, · · · , xT } from x and then use classifier h(x) toclassify each generated sample and using the voting strategyto give the final classification result. This is the motivationof the proposed method. Formally,

H(x) =

T∑t=1

αth(xt). (2)

This kind of voting strategy using multi-generated samplesand one classifier is referred to as Self-generation Votingstrategy. Fig. 2 illustrates this strategy. For an input sam-ple x, a set of samples {x1, · · · , xT } are generated fromit. The generating method and the importance αt are pre-learned.

Fig. 1 The multi-classifier voting strategy

Fig. 2 The self-generation voting strategy

2 Handwritten Chinese character reco-gnition based on self-generation vot-ing strategy

For using the self-generation voting strategy, method forgenerating a set of samples should be created first. Thegenerated samples must belong to the same class as theoriginal sample but the difference between the generatedsamples should not be too small or too large. Then the pa-rameter learning method should be designed for selectinga small but complete generation set and its correspondingweights. Finally, use the voting strategy to give the recog-nition result.

2.1 Self-generation method

A generation model is proposed in [12] for generating alarge number of virtual training samples from existing ones.The experimental results show that the generated samplesare helpful for solving the problem of lacking training sam-

ples. This method is used as the generating method in ourexperiments.

Let f(x, y) be an original character image, g(u, v) be thegenerated image. (x, y) and (u, v) are the coordinates on theoriginal image and the generated image, respectively. Thegenerating method ensure that g(u, v) belongs to the sameclass with f(x, y). The variation of the generated samplesis controlled by the parameters in the mapping functions.The mapping functions are

u = u(x, y) = wn(d1, b1(x)) + k1b2(y) + c1,

v = v(x, y) = wn(d2, b2(y)) + k2b1(x) + c2, (3)

where k1 and k2 are shearing slopes, d1 and d2 control theextent of local resizing, c1 and c2 are constants to alignthe centroid back to the original centroid, b1 and b2 arefunctions to linearly scale the original coordinates to theinterval [0, 1], and wn is a nonlinear warping function forproducing local variations in the size of sub-patterns.

In our experiments, all the parameters are initialized as[12] does. k1 and k2 are randomly picked from [−0.17, 0.17]and [−0.20, 0.20] respectively, d1 �= 0 and d2 �= 0 are ran-domly taken from the interval [−1.6, 1.6]. The following twononlinear warping functions are used for producing morekinds of variations. The probabilities of using w1 and w2

are fixed at 0.8 and 0.2, respectively.

w1(d, t) =1 − e−dt

1 − e−d, (4)

w2(d, t) =

{0.5w1(d, 2t), 0 ≤ t ≤ 0.5

0.5 + 0.5w1(−d, 2(t − 0.5)), 0.5 < t ≤ 1. (5)

These subjective value ranges are determined based on thehypothesis that the variation between characters in thesame class is not too wide. For example, Fig. 3 shows thegenerated samples with different d1 (d2 = 0.01, k1 = 0, k2 =0, wn = w1). The character image in the character image inthe center is the original image. From the top-left image tothe bottom-right image, the corresponding d1 is initializedas -1.6 and increased to 1.6 with the step 0.1. We can seethat the bigger |d1| is, the larger is the change in charac-ter image. A too extreme parameter would not help us todesign a better classifier. Fig. 3 also shows that the tinychanges of a parameter would not affect the shape of thegenerated images much.

Fig. 3 Generated samples with different d1

Fig. 4 gives some generated examples of some samplesfrom the two databases used in our experiments. The firstcolumn is the original image and the other columns are thegenerated samples. We can see that the generated samplesbelong to the same character class with the original char-acter image while the shapes are different. However, somegenerated samples are similar to each other while some areseverely different. Therefore, learning a subset of the gen-erated samples would make the voting faster and better.

452 Acta Automatica Sinica, 2013, Vol. 39, No. 4

Fig. 4 Some generated samples

2.2 Parameter learning method

For the generation model described in Section 2.1,given an input character image f(x, y), a parameter setPi{d1, d2, k1, k2, wn} corresponds to a generated characterimage gi(u, v).

Pi → gi(u, v). (6)

Different combinations of the parameter sets would af-fect the final performance. How to determine the bestcombinations of these parameters will be concerned inour future work. In this paper, just for testifyingthe effectiveness of the proposed method, we randomlygenerated 100 parameter sets Sp{P1{d1

1, d12, k

11 , k1

2 , w1n}, · · · ,

P100{d1001 , d100

2 , k1001 , k100

2 , w100n }} for training and parameter

learning.The goal of the learning method is to learn a set of param-

eter sets P{P1, · · · , PT } from the candidate set Sp and itscorresponding weights α{α1, · · · , αT } that perform best onthe validation set. Algorithm 1 gives the learning methodused in our experiments. It selects Pi and αi greedily fromSp and Sα respectively. Sα is the candidate set of votingweights. In Algorithm 1, T is the number of selected param-eter sets we want to use for voting, CT0{(ϕ0

1), · · · , (ϕ0n)} is

the validation set. where ϕ0i is the feature vector extracted

from fi. The local variable best p is the best parameter setselected in the iteration t. best p = ϕ means that no extraparameter sets are helpful to the already learned parametersets.

Algorithm 1. Parameter sets and voting weights learn-ing method

Output. P, αInitialize P = ∅, α = ∅, best acc = 0for t = 1 : T do

best p = ∅

For p ∈ Sp do1. Generate gi from fi using p2. Extract feature ϕt

i from gi

3. Put ϕti into CTt−1, getting CTt

CTt = {(ϕ01, · · · , ϕt

1), · · · , (ϕ0n, · · · , ϕt

n)}for α ∈ Sα do

1) Set αt = α, getting α{α1, · · · , αt}2) Calculate the current recognition ratecur acc on the validation set usingH(fj) = h(ϕ0

j ) +∑t

i=1 αth(ϕij)

3) if cur acc > best acc thenbest acc = cur acc, best p = p, best α = α

endend

endif best p = ∅ then

Breakelse

1) Extract ϕti using best p and put it into

CTt−1 getingCTt = {(ϕ0

1, · · · , ϕt1), · · · , (ϕ0

n, · · · , ϕtn)}

2) Set αt = best α and put it intoαα{α1, · · · , αt}3) Set Pt = best p and put it into PP{P1, · · ·Pt}

endend

end

3 Experimental results

We evaluate the proposed method on the CASIAdatabase and the CASIA-HWDB1.1[13]. The CASIAdatabase, which is collected by the Institute of Automa-tion, Chinese Academy of Sciences, contains 3755 Chinesecharacters, 300 samples per class. 250 samples per classare chosen for training and the remaining 50 samples fortesting. The CASIA-HWDB1.1 database is built by the Na-tional Laboratory of Pattern Recognition (NLPR), Instituteof Automation of Chinese Academy of Sciences (CASIA).It contains 3 755 Chinese characters and 171 alphanumericand symbols, almost 300 samples per class. In our experi-ment, we only use the 3 755 Chinese characters, the first 250samples per class are chosen for training and the remainingsamples for testing. Some samples in the two databases aregiven in Fig. 5. The first three rows are taken from CASIAand the remaining are taken from CASIA-HWDB1.1.

Fig. 5 Some samples in the two databases

In the parameter learning stage, the training samples aredivided into two parts, the first 240 samples per class areused for training and the remaining 10 samples are usedfor validation. The candidate set of weights α is set to{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1}. Small changes ofthe weights would not affect the final results much. Most ofthe compounded distances based methods[14] use the can-didate set like this.

For each character image and its generated image, nor-

Yun-Xue SHAO et al./ Self-generation Voting Based Method for Handwritten Chinese Character Recognition 453

malize it to 64 × 64 using the line-density equalization[1]

based normalization method. Then the 8-direction gradientfeatures[5] (512 dimensions) are extracted from each nor-malized image. By LDA, every image features are com-pressed to 256 dimensions. The base-line classifier usedin our experiment is MQDF2[3]. The main parameter inMQDF2 is the number of principle vectors Nv. In Algo-rithm 1, Nv which is used in h(x) is set to 50. For each testsample x, classify it using the base-line classifier results ina candidate class whose size is set to 10 in our experiments.The proposed method first generate {x1, · · · , xT } from xusing the pre-learned parameters P{P1, · · · , PT }, and thenuse the base-line classifier to calculate the distance fromxi to the j-th class of the candidate classes. Finally, cal-culate the distance from x to the j-th class by summingthe weighted distance between xi and the j-th class andgive the recognition results by finding the minimal distancefrom the candidate classes. Denote TIMEn as the timeneeded for generating and pre-processing of the T generatedsamples, TIMEc as the time needed for candidate classesselection, and TIMEv as the time needed for the final vot-ing. The recognition time used by the proposed method is(T ∗TIMEn+TIMEc+TIMEv) while the original methodneeds (TIMEn +TIMEc). In our experiments, TIMEn isabout 1.4ms, TIMEc is about 4.8 ms and TIMEv is about3.7ms (10 candidate classes and T = 20). The computa-tional time is clocked on a normal PC with 2 CPUs and4G memory. The most time-consuming part of the pro-posed method is pre-processing of the generated samples.The speeding up of the proposed method will be concernedin the future work. However, this time-consuming part isvery suitable for parallel processing.

We first use the original training samples to train thebase-line classifier and evaluate it on the two databases withvarying value of Nv. The results are given in Table 1. Thebest recognition rate is achieved at Nv = 50 on the twodatabases. As Nv increases, the recognition rate decreases.This is caused by the lack of sufficient train samples.

Table 1 The recognition rates of the base-line classifier

without using generated samples (%)

Nv CASIA CASIA-HWDB1.1

30 97.87 87.24

50 97.97 87.41

70 97.95 87.07

90 97.90 87.07

Then we use all the generated samples for training thebase-line classifier. The results are given in Table 2. AsNv increasing, the recognition rate increase, too. From thistable, we can see that the generated samples are useful fortraining.

Table 2 The recognition rates of the base-line classifier using

generated samples for training (%)

Nv CASIA CASIA-HWDB1.1

30 97.75 87.04

50 97.98 87.74

70 98.07 88.13

90 98.10 88.06

Finally, we evaluate the proposed method on the twodatabases. Tables 3 and 4 give the results of the proposed

method with varying value of T and Nv on two databases,respectively. T = 0 means that no generated samples areused for the final voting. From Tables 3 and 4 we can seethat the proposed method performs much better than theprevious methods. The recognition rates increase with theincreasing of Nv . This is identical to the results in Table2.

Fig. 6 shows the recognition rate with varying value of Twhile Nv = 50 on the two databases. The left sub-figurecontains the results with CASIA database and the rightone is the results on the CASIA-HWDB1.1 database. FromFig. 6, we can see that the recognition rate increases slowlywhile T reaches 20. This tells us that we do not need togenerate too many samples to get a better performance.

4 Conclusion

A self-generation voting method is proposed for hand-written Chinese character recognition. This method learnsa set of parameters for generating a set of samples from thetest sample and learns a set of weights for the final voting.Experimental results on two databases show its effective-ness and it is observed that less than 20 votes can give abig improvement.

Table 3 The recognition rates of the proposed method on

database CASIA (%)

Nv T = 0 T = 5 T = 10 T = 15 T = 20 T = 25

30 97.75 98.16 98.41 98.56 98.62 98.64

50 97.98 98.28 98.48 98.62 98.67 98.69

70 98.07 98.31 98.49 98.63 98.68 98.70

90 98.10 98.27 98.48 98.61 98.65 98.66

Table 4 The recognition rates of the proposed method on

database CASIA-HWDB1.1 (%)

Nv T = 0 T = 5 T = 10 T = 15 T = 20 T = 25

30 87.04 88.47 89.68 90.24 90.38 90.50

50 87.74 88.91 89.83 90.36 90.52 90.65

70 88.13 88.89 89.83 90.40 90.51 90.59

90 88.06 88.95 89.85 90.39 90.49 90.63

Fig. 6 The recognition rate with varying value of T (Nv = 50)

on the two databases

454 Acta Automatica Sinica, 2013, Vol. 39, No. 4

References

[1] Yamada H, Yamamoto K, Saito T. A nonlinear normaliza-tion method for handprinted Kanji character recognition-line density equalization. Pattern Recognition, 1990, 23(9):1023−1029

[2] Tsukumo J, Tanaka H. Classification of handprinted Chi-nese characters using nonlinear normalization and correla-tion methods. In: Proceedings of the 9th International Con-ference on Pattern Recognition. Rome, Italy: IEEE, 1988.168−171

[3] Kimura F, Takashina K, Tsuruoka S, Miyake Y. Modi-fied quadratic discriminant functions and the applicationto Chinese character recognition. IEEE Transactions onPattern Analysis and Machine Intelligence, 1987, 9(1):149−153

[4] Shi M, Fujisawa Y, Wakabayashi T, Kimura F. Handwrittennumeral recognition using gradient and curvature of grayscale image. Pattern Recognition, 2002, 35(10): 2051−2059

[5] Liu C L, Nakashima K, Sako H, Fujisawa H. Handwrit-ten digit recognition: investigation of normalization andfeature extraction techniques. Pattern Recognition, 2004,37(2): 265−279

[6] Gao T F, Liu C L. High accuracy handwritten Chinese char-acter recognition using LDA-based compound distances.Pattern Recognition, 2008, 41(11): 3442−3451

[7] Liu C L. High accuracy handwritten Chinese characterrecognition using quadratic classifiers with discriminativefeature extraction. In: Proceedings of the 18th InternationalConference on Pattern Recognition. Washington DC: IEEE,2006. 942−945

[8] Liu C L, Sako H, Fujisawa H. Handwritten Chinese charac-ter recognition: alternatives to nonlinear normalization. In:Proceedings of the 7th International Conference on Doc-ument Analysis and Recognition. Edinburgh, UK: IEEE,2003. 524−528

[9] Leung K C, Leung C H. Recognition of handwritten Chinesecharacters by critical region analysis. Pattern Recognition,2010, 43(3); 949−961

[10] Xu B, Huang K Z, Liu C L. Similar handwritten Chi-nese characters recognition by critical region selection basedon average symmetric uncertainty. In: Proceedings of the2010 International Conference on Frontiers in HandwritingRecognition. Kolkata, India: IEEE, 2010. 527−532

[11] Breiman L. Bagging predictors. Machine Learning, 1996,24(2): 123−140

[12] Leung K C, Leung C H. Recognition of handwritten Chi-nese characters by combining regularization, Fisher’s dis-criminant and distorted sample generation. In: Proceedingsof the 10th International Conference on Document Analysisand Recognition. Barcelona, Spain: IEEE, 2009. 1026−1030

[13] Liu C L, Yin F, Wang D H, Wang Q F. CASIA online andoffline Chinese handwriting databases. In: Proceedings ofthe 2011 International Conference on Document Analysisand Recognition. Beijing, China: IEEE, 2011. 37−41

[14] Suzuki M, Ohmachi S, Kato N, Aso H, Nemoto Y. A dis-crimination method of similar characters using compoundMahalanobis function. Tramsactionson. IEICE Jpn, 1997,10 2752−2760

Yun-Xue SHAO Ph.D. candidate at the Institute of Automa-tion, Chinese Academy of Sciences. His research interest coversimage processing and pattern recognition. Corresponding authorof this paper. E-mail: [email protected]

Chun-Heng WANG Professor at the Institute of Automa-tion, Chinese Academy of Sciences. His research interest covers

image processing and pattern recognition.E-mail: [email protected]

Bai-Hua XIAO Professor at the Institute of Automation,Chinese Academy of Sciences. His research interest covers imageprocessing and pattern recognition.E-mail: [email protected]

Lin-Bo ZHANG Research assistant at China Academy ofTransportation Sciences. His research interest covers computervision and pattern recognition.E-mail: [email protected]

self-generation voting based method for handwritten chinese character recognition

Documents