character independent font recognition on a single chinese character

Character Independent Font Recognitionon a Single Chinese Character

Xiaoqing Ding, Member, IEEE, Li Chen, Member, IEEE, and Tao Wu, Student Member, IEEE

Abstract—A novel algorithm for font recognition on a single unknown Chinese character, independent of the identity of the character, is

proposed in this paper. We employ a wavelet transform on the character image and extract wavelet features from the transformed image.

After a Box-Cox transformation and LDA (Linear Discriminant Analysis) process, the discriminating features for font recognition are

extracted and classified through a MQDF (Modified Quadric Distance Function) classifier with only one prototype for each font class. Our

experiments show that our algorithm can achieve a recognition rate of 90.28 percent on a single unknown character and 99.01 percent if

five characters are used for font recognition. Compared with existing methods, all of which are based on a text block, our method can

provide a higher recognition rate and is more flexible and robust, since it is based on a single unknown character. Additionally, our

method demonstrates that it is possible to extract subtle yet discriminative signals embedded in a much larger noisy background.

Index Terms—Font recognition, character independent, single character, wavelet features, LDA, MQDF.

Ç

1 INTRODUCTION

MUCH research has been dedicated to optical characterrecognition. However, only a few projects take font

recognition into account, in spite of its importance. Characterrecognition was the main focus in the early years of documentprocessing research. As character recognition accuracyimproved over time, scholars began to pay more attentionto the acquisition of font information. Font information isfundamental to document processing and can be used inseveral ways. For example, it is essential for layoutreconstruction. Modern OCR systems need to output notonly the content of the document, but also the documentlayout structure including the fonts of characters. It is alsohelpful in layout analysis and understanding: different partsof the document, such as the title, abstract, and body text,often employ different fonts. Finally, it is useful in construct-ing high-performance recognition systems. When fontinformation is available, a mono-font character recognitionsystem can be employed instead of a multifont recognitionsystem. It is known that the former usually has a higheraccuracy compared to the latter. Hence, a recognition systemmay achieve better performance if it utilizes font information.

Most past research on font recognition has been carried outon text in western languages. Shi and Pavlidis [1] used pageproperties such as a histogram of word length and strokeslopes to extract font features. Zramdini and Ingold [2] usedtypographical features for font recognition. They employed astatistical approach to obtain font features by means of amultivariate Bayesian classifier. Similar work was carried out

by Jung et al. [3]. All the above projects are based onalphabetic attributes of letters that Asian characters do notpossess, such as baseline position, the presence or absence ofserifs, spacing between characters, etc. As a result, thesemethods cannot be used in font recognition on Chinesecharacters. Zhu and Tan [4] presented an algorithm that canhandle Chinese documents. They used a group of Gaborfilters to extract texture features from a text block andemployed a weighted Euclidean distance classifier torecognize the font. Their method cannot, however, recognizethe font of a single Chinese character.

It can be noted that, in many Chinese documents, morethan one font is used in a given text block. Selected charactersin a sentence may employ a different font for emphasis. Forexample, certain characters may be printed in boldface whileothers in the same sentence appear in the regular typeface.Fig. 1 gives an example, which is scanned from a realdocument. In form recognition, many cells have only one tothree characters, such as those to be filled in with gender,nationality, etc.

The problem of font recognition on a single unknownChinese character, which can provide greater flexibility, hasarisen in order to deal with these situations. Our previouspaper [5] used stroke property features and stroke distribu-tion features to recognize the font, the only paper to datethat presents a solution to this problem, but the recognitionrate was only 69.7 percent. In this paper, a new algorithm isproposed for font recognition that can recognize the font ofa single unknown Chinese character effectively.

Font recognition can adopt two complementary ap-proaches: the a priori approach [12] and the a posteriorapproach [2]. The first approach is applied when charactersof the analyzed text are not yet known and the secondapproach can be adopted when the content of the given textis known and is used to recognize the font. The image of asingle Chinese character contains information about boththe character and the font, but the former is dominant in thecharacter image. Images of the same character renderedusing different fonts differ only slightly from each other,while different characters rendered using the same font look

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 2, FEBRUARY 2007 195

. X. Ding is with the Electronics Engineering Department, TsinghuaUniversity, Beijing P.R. China. E-mail: [email protected].

. L. Chen is with Huawei Corp., Building 3, 27 New Jinqiao Road, PudongDistrict, Shanghai, 201206, P.R. China. E-mail: [email protected].

. T. Wu is with the Electrical Engineering Department, TsinghuaUniversity, Beijing, 100084, P.R. China.E-mail: [email protected].

Manuscript received 24 Aug. 2005; revised 26 Feb. 2006; accepted 22 June2006; published online 13 Dec. 2006.Recommended for acceptance by D. Lopresti.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TPAMI-0458-0805.

0162-8828/07/$20.00 � 2007 IEEE Published by the IEEE Computer Society

completely distinct. In a posterior font recognition from asingle Chinese character, the character is constant and thefont varies. Under this condition, font recognition on asingle Chinese character is not very difficult because thedifferences between images reflect only the differences inthe font. But, for the a priori approach, both the characterand the font vary. The slight differences between fonts aresubsumed by the significant differences between thecharacters. Therefore, single character font recognitionbecomes very difficult when the identity of the character isunknown. A method that can extract a useful signal from anintensely noisy background is crucial.

The method proposed in this paper is an example of an apriori approach. Our font recognition is carried out on asingle unknown Chinese character. It is a challenge toextract effective features for a priori font recognitionbecause of the sheer size of the Chinese character set(exceeding 3,755 characters) which can be considered as anoisy background with faint font signals embedded withinthe much more substantial character signals. This phenom-enon arises from the fact that the differences between anytwo characters are dominant at the bitmap level while thedifferences between fonts are subordinate.

First, we apply a wavelet transform on the character imageand extract wavelet features from the transformed image.After that, we employ the Box-Cox transformation, which canmake feature distribution more Gaussian-like. Next, weutilize the LDA (Linear Discriminant Analysis) process, acrucial step towards extracting discriminating features fromthe wavelet features for font recognition. Finally, we use aMQDF (Modified Quadric Distance Function) font classifierwith only one prototype for each font to determine the font ofa test input character. Our experiments show the effective-ness of the method, with excellent recognition rates achieved.Moreover, it is important to note that our work supports thenotion that it is possible to extract a faint signal embedded in anoisy background. That is a significant result in the field ofpattern recognition.

The rest of the paper is organized as follows: In Section 2,we describe the features for font recognition in detail.Section 3 discusses the classifier. Experimental results andcomparisons between this method and other existingmethods are shown in Section 4. The conclusion is finallydrawn in Section 5.

2 FONT FEATURES

Chinese characters are composed of different stroke sub-structures with different sizes. These stroke substructurescontain the character information along with embedded fontinformation. The font information appears more frequently incertain local substructures of a character. The wavelet

transform is localized in both the frequency (scale) domainand the space domain and naturally leads to a multiresolutionanalysis (MRA). Consequently, the wavelet transform is asuitable candidate for analyzing the properties of differentstroke substructures and, hence, a good tool for fontrecognition on a single Chinese character.

2.1 Wavelet Transform

It is well known that the two-dimensional wavelet decom-position of a discrete image fðm;nÞ represents the image interms of 3J þ 1 subimages [6]:

A2�J f; fDð1Þ2�j ; D

ð2Þ2�j ; D

ð3Þ2�jgj¼1;2;...;J ; ð1Þ

where A2�J f is the approximation of the image at resolution

2�J ; Dð1Þ2�j f , D

ð2Þ2�j f , and D

ð3Þ2�j f are the wavelet subimages

containing the image details at resolution 2�j. Wavelet

coefficients of large amplitude in Dð1Þ2�j f , D


ð3Þ2�j f

correspond, respectively, to vertical high frequencies (hor-

izontal edges), horizontal high frequencies (vertical edges),

and high frequencies in both directions.

The wavelet transform can employ different basic

wavelets. For font recognition purposes, it is imperative to

employ a basic wavelet with a compact support and

symmetry properties. To analyze the characters’ local

information, it is necessary to have the basic wavelet with

a compact support. To accommodate to the stokes’

symmetry properties, it is necessary for the basic wavelet

to be symmetric. A biorthogonal spline2 [7] wavelet,

illustrated in Fig. 2, meets all of these requirements and is

our choice for font recognition purposes.

2.2 Wavelet Features for One Subimage

The two-dimensional wavelet decomposition of a discreteimage is shown in (1). After the wavelet decomposition, weget 3J þ 1 subimages from the original image.

For each subimage of size ofN �N , we divide it intoM �M nonoverlapping blocks, whereN can be divided exactly byM. Let K ¼ N=M, resulting in each block having the sizeK �K. We expand each block to the size ðK þ 2Þ � ðK þ 2Þfor stability, keeping the center fixed. That is to say, each blockhas two rows (or columns) overlapping with the adjacentblock after expansion. To ensure that the most outer blockshave a size of ðK þ 2Þ � ðK þ 2Þ after expansion, we expandthe image’s outermost pixels as shown in Fig. 3, whichillustrates the expansion of an image of size 8� 8. The dashedoutlines in the figure denote the new expanded pixels.

196 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 2, FEBRUARY 2007

Fig. 1. Characters in a sentence are printed in different fonts.

Fig. 2. The biorthogonal spline2 wavelet.

We sample each block as follows to get one feature:

z ¼Xðx;yÞ2B

fðx; yÞj j � wðx; yÞ; ð2Þ

where B denotes some block, fðx; yÞ denotes the pixel valueat point ðx; yÞ, i.e., the wavelet coefficient at point ðx; yÞ. Theweighting function wðx; yÞ is shown as follows:

wðx; yÞ ¼ � � exp �� ðx� xcenterÞ2 þ ðy� ycenterÞ2� ��

; ð3Þ

where xcenter and ycenter denote the coordinates of the centerpoint of block B, while � and � are constants decided by thesize of the block.

Finally, we get M �M features, each block representingone feature from one subimage.

2.3 Wavelet Features

The wavelet transform can be used to analyze the characterimage under different resolutions. Subimages of differentresolutions reflect the information of strokes with differentwidths; blocks with different sizes in one subimage reflectthe information of substructures with different dimensions.To obtain effective features for font recognition, we extractfeatures from subimages of different resolutions and utilizeblocks with different sizes in one subimage.

The two-dimensional wavelet decomposition of a dis-

crete image is shown in (1). If the original image has a size of

n� n, then the subimage A2�J has a size of ð2�J � nÞ � ð2�J �nÞ and subimagesD

ð1Þ2�j f , D


ð3Þ2�j f have a size of

ð2�j � nÞ � ð2�j � nÞ.We employ a 3-level wavelet transform on the normal-

ized image of a single Chinese character. The wavelet basis

we used is a Spline2 wavelet [7]. That is to say, the J in (1) is

3. After the wavelet transform, we have 10 subimages:

A2�3 ; fDð1Þ2j ; Dð2Þ2j ; D

ð3Þ2j gj¼�1;�2;�3. In our method, the image of

a single Chinese character is normalized to a 48� 48 pixel

matrix. Hence, the subimages A2�3 , Dð1Þ2�3 , D

ð2Þ2�3 , D

ð3Þ2�3 have a

size of 6� 6, Dð1Þ2�2 , D

ð2Þ2�2 , D

ð3Þ2�2 have a size of 12� 12, D

ð1Þ2�1 ,

Dð2Þ2�1 , and D

ð3Þ2�1 have a size of 24� 24, respectively. For each

subimage, we extract its features as introduced in Sec-

tion 2.2. We combine all of the features extracted from these

10 subimages and get the complete set of wavelet features

for font recognition.SubimagesD

ð1Þ2�1 ,D

ð2Þ2�1 , andD

ð3Þ2�1 have a size of 24� 24. This

means the N in Section 2.2 is 24. Let M be 4, 6, and 8 and thecorresponding K 6, 4, and 3 (K ¼ N=M, see Section 2.2 fordetails), respectively. For each of these three subimages, we

get 4� 4 ¼ 16 features whenM is 4, 6� 6 ¼ 36 features whenM is 6, and 8� 8 ¼ 64 features whenM is 8, respectively. Weget 16þ 36þ 64 ¼ 116 features from one subimage andaltogether 116� 3 ¼ 348 features from these three subimages.

A similar process is employed with subimage Dð1Þ2�2 , D

ð2Þ2�2

andDð3Þ2�2 . These three subimages have a size of 12� 12 and the

N in Section 2.2 is 12. Let M be 3, 4, and 6 and thecorrespondingK 4,3,and2.Foreachof thesethree subimages,we get 3� 3 ¼ 9 features when M is 3, 4� 4 ¼ 16 featureswhen M is 4, and 6� 6 ¼ 36 features when M is 6,respectively. We get 9þ 16þ 36 ¼ 61 features from onesubimage and altogether 61� 3 ¼ 183 features from thesethree subimages.

SubimagesDð1Þ2�3 ,D

ð2Þ2�3 ,D

ð3Þ2�3 have a size of 6� 6. LetM be 3

and the corresponding K be 2. For each subimage, we get3� 3 ¼ 9 features. Apart from this, we employ the absolutevalue of each pixel as one feature. Each subimage has a size of6� 6 and we get 36 features. So, we get 9þ 36 ¼ 45 featuresfrom one subimage and a total of 45� 3 ¼ 135 features fromthese three subimages.

For subimage A2�3 , we employ the absolute value of eachpixel as one feature and get 36 features.

We combine all these features extracted as above and getan n ¼ 702 dimension feature vector ð348þ 183þ 135þ36 ¼ 702Þ. These features contain not only the informationabout fonts, but also the information about characters. Sincethe latter is dominant in character images, it will severelyimpact the extraction of font features. Therefore, somefurther discriminative feature extraction method should beemployed to suppress the character information. Here, weutilize a LDA process, which will be detailed in the nextsection.

Since the LDA process and the MQDF classifier requirefeatures with a Gaussian distribution, we employ a Box-Coxtransformation [8] to bring the distributions of the featurescloser to Gaussian. The Box-Cox transformation is aparametric power transformation technique designed toreduce anomalies such as nonadditivity, nonnormality andheteroscedasticity. The formula is as follows:

y ¼x��1� ; if � 6¼ 0

lnðxÞ; if� ¼ 0:

�ð4Þ

Each dimension of the features is transformed by (4)separately. After transformation, the features are calledwavelet features.

2.4 Font Discrimination Feature Extraction fromWavelet Features

The wavelet features contain not only the information aboutfonts, but also the information about characters. Inter-character differences are so great that they may overwhelmthe features which reflect the distinctions between fonts,severely impacting the accuray of font recognition. Therefore,some sort of further font discriminative feature extractionmethod should be considered. We employ a LDA (lineardiscriminant analysis) process [9] to suppress character-specific information and enhance that which is font-specific.

In PCA (Principle Component Analysis), the most im-portant PCA components are the character components withthe biggest variation. Compared to PCA, the most importantLDA components are the components with the maximumvalue of S�1

w ðSw þ SbÞ�� , where Sw and Sb denote the

DING ET AL.: CHARACTER INDEPENDENT FONT RECOGNITION ON A SINGLE CHINESE CHARACTER 197

Fig. 3. Expansion of the outermost pixels.

covariance matrix of fonts within-class and the covariancematrix of fonts between-class, respectively. This LDA processconcerns the font differences, driving the within-classvariation of fonts as small as possible and the between-classvariation of fonts as large as possible. Such LDA componentsare beneficial to the extraction of discriminative features forfont recognition.

The LDA components are obtained as follows: Let

ffV ðjÞi ; 1 � i � Njg; 1 � j � Cg be the set of wavelet feature

vectors, where VðjÞi denotes the wavelet feature vector

extracted from the ith sample belonging to the jth class, Nj

denotes the number of samples belonging to the jth class, and

C denotes the number of classes. Assuming all classes have

equal prior probabilities, we calculate the mean vector of each

class �j and the mean vector of all classes � as follows:

�j ¼1

Nj

XNj

i¼1

VðjÞi ; ð5Þ

� ¼ 1

C

XCj¼1

�j: ð6Þ

Then, we compute Sw and Sb, the covariance matrix of fonts

within-class and the covariance matrix of fonts between-

class, respectively,

Sw ¼1

C

XCj¼1

1

Nj

XNj

i¼1

VðjÞi � �j

� �VðjÞi � �j

� �T !; ð7Þ

Sb ¼1

C

XCj¼1

ð�j � �Þð�j � �ÞT ; ð8Þ

where C, Nj, VðjÞi , �j, and � have the same meaning as in (5)

and (6), Sw denotes the average covariance matrix of fontswith-in class, and Sb denotes the covariance of �j.

Since the optimization criteria in LDA maximizesjS�1w ðSw þ SbÞj, we compute the eigenvalue decomposition

of matrix S�1w ðSw þ SbÞ and get the eigenvalues f�i; i ¼

1; 2; . . . ; ng ranked in descending order and the correspond-ing eigenvectors f�i; i ¼ 1; 2; . . . ; ng. The transformationmatrix W is composed of the first m eigenvectors, i.e.,W ¼ ½�1; �2; . . . ; �m�, wherem is a positive integer smaller thann. The feature extraction process is as follows:

YðjÞi ¼WT � V ðjÞi ; ð9Þ

where VðjÞi is an n-dimension wavelet feature vector and Y

ðjÞi

is the extracted m-dimension feature vector. After the LDAprocess, anm-dimension feature vector is extracted from then-dimension feature vector. Parameter m is determined bythe experiment, which is detailed in the Appendix.

3 FONT CLASSIFIER

A MQDF (modified quadratic discriminant function)classifier [10] with only one prototype for each font classis employed to classify the font of an unknown singleChinese character. First, we introduce the QDF (quadraticdiscriminant function) classifier. The QDF classifier is the

optimal classifier for Gaussian distributed features and the

discriminant function for the jth class is as follows:

gjðyÞ ¼Xmr¼1

ððy� �jÞT �ðjÞr Þ

2

�ðjÞr

þXmr¼1

log�ðjÞr ; ð10Þ

where y is the feature vector to be classified, m is the

dimension of the feature vector, �ðjÞr denotes the

rth eigenvalue of the jth class’ covariance matrix sorted

by the descending order, �ðjÞr denotes the eigenvector

that corresponds to �ðjÞr . �j denotes the mean vector of

the features extracted from the original wavelet feature

vector of all the samples in the jth class by LDA:

�j ¼1

Nj

XNj

i¼1

YðjÞi j ¼ 1; 2; � � � ; C;

where Nj and YðjÞi have the same meanings as mentioned in

Section 2.4. Thus, the jth font’s unique prototype �j is

trained by many different characters of the jth font (usually

about 3,755 different Chinese characters). Accordingly,

there is only one prototype �j for the jth font in the

template yielding content-independent font recognition.The QDF classifier uses the following decision rule:

CðyÞ ¼ Cj if gjðyÞ ¼ min1�j�N

gjðyÞ; ð11Þ

whereCð�Þ denotes a classification operation,Ci the ith class,

gjðyÞ the discriminant function of Cj, and N is the number of

classes.In real-world applications, the parameters of the Gaus-

sian distributions are unknown and maximum likelihood

estimates are used as the parameters for the QDF classifier.

Estimation errors for the parameters will lower the

performance of the QDF classifier and make it nonoptimal.

To overcome this problem, we instead employ a MQDF

classifier, which is less sensitive to the estimation errors

[10], to classify the font. The discriminant function of a

MQDF classifier is shown below:

gjðyÞ ¼Xkr¼1

ðy� �jÞT �ðjÞr

� �2

�ðjÞr

þXmr¼kþ1

ðy� �jÞT �ðjÞr

� �2

�ðjÞ

þXkr¼1

log�ðjÞr þXmr¼kþ1

log�;

ð12Þ

where y, m, �j, �ðjÞr , and �ðjÞr have the same meanings as in

(10), k is a positive integer smaller than m, and �� is a

constant. In our experiment, �� is calculated as follows:

� ¼ 1

C

XCj¼1

�ðjÞkþ1; ð13Þ

where C is the number of font classes and �ðjÞrþ1 denotes the

ðrþ 1Þth eigenvalue of the jth class’ covariance matrix

ranked in descending order.The decision rule of a MQDF classifier is the same as a

QDF classifier, which is shown in (11). Parameter k is

determined by experiment, as detailed in the Appendix.


4 EXPERIMENTS AND COMPARISON WITH EXISTENT

METHODS

4.1 Experiment Results on a Single UnknownCharacter

Our experiments are carried out on two independentdatabases: Database A and Database B. Database A is thesame one as used in [5], including the seven most popularChinese typefaces (Song, Fang, Hei, Kai, Lishu, Weibei, andYuan). There are 730 sets of characters in Database A, whichare collected from everyday documents, with each setincluding 3,755 different Chinese characters (the ChineseGB-1 character set so as to test the capability independent ofcharacters). Six hundred thirty-five of the sets are used fortraining and the remaining 95 sets are used for testing.Database B includes seven Chinese typefaces (the same asDatabase A) combined with four font styles (normal, bold,italic, and bold-italic). Each font has 3,755 different Chinesecharacters (the Chinese GB-1 character set). The first3,000 characters are used for training and the remaining 755for testing. We note that although our databases consist of allthe characters in Chinese GB-1, the algorithm only needs oneprototype for each font class, which is an average over all thecharacters in the class. These details were presented inSection 3.

Samples in Database A are scanned from real-worlddocuments under different scanner settings, while samplesin Database B are generated directly by computer. There arefour differences between the two databases. First, Database Ahas many more samples than Database B. There are2,741,150 samples in Database A, while Database B has only105,140 samples. Second, samples in Database A have a muchpoorer quality compared with Database B. Database Aincludes both broken and touching samples, as shown inFig. 4 and Fig. 5. In addition, characters in the same font inDatabase A have variable stroke widths, as shown in Fig. 6.Third, there are different subtypefaces for a particulartypeface in Database A, while Database B does not sufferfrom this problem. Fig. 7 shows some subtypefaces that allbelong to typeface Lishu and should be classified as the sameclass. Moreover, some subtypefaces look more like those of

another typeface, rather than of the true typeface, as shown inFig. 8. Finally, samples in Database A are all normal stylesamples while samples in Database B have four font styles(normal, bold, italic, and bold-italic). There are seven fontclasses (seven typefaces) in Database A and 28 font classes(seven typefaces combined with four styles) in Database B.

In Database B, the normal style samples are visuallysimilar to the bold style samples, as shown in Fig. 9. Asimilar overlap also exists between italic style samples andbold-italic style samples.

The input of the font recognition system is the image of asingle unknown character. As mentioned above, we employ awavelet transform on the normalized character image andextract a 702-dimension feature vector from the transformed


Fig. 4. Some touching samples in Database A.

Fig. 5. Some broken samples in Database A.

Fig. 6. Variable stroke widths in one typeface in Database A. (a) Typeface

Song with thin strokes. (b) Typeface Song with thick strokes.

Fig. 7. Subtypefaces in Database A. (a) Subtypeface 1 belonging to

typeface Lishu. (b) Subtypeface 2 belonging to typeface Lishu.

(c) Subtypeface 3 belonging to typeface Lishu.

Fig. 8. Subtypefaces in Database A ((b) looks more similar to (c) insteadof (a), even though (b) and (c) belong to different typefaces while (b) and(a) belong to the same typeface). (a) Subtypeface 1 belonging totypeface Yuan. (b) Subtypeface 2 belonging to typeface Yuan. (c) Onesubtypeface belonging to typeface Hei.

Fig. 9. Bold style samples and normal style samples in Database B.

(a) Bold style samples. (b) Normal style samples.

image. The wavelet basis we used is a Spline2 wavelet. Afterthe Box-Cox transform and LDA process, we have an m-dimension feature vector and employ a MQDF classifier toidentify the font. For the experiment on Database A, we get a672-dimension feature vector from the 702-dimension wave-let features after the LDA process. The k in the MQDF (see (12)for details) is set to 448. Our experiment on Database Aachieves a recognition rate of 91.29 percent; the typefaceconfusion matrix is shown in Table 1.

Another experiment was carried out on a subset ofDatabase B with the same system. We use only normal stylesamples in Database B. That is to say, only seven typefacesneed to be classified. The typeface confusion matrix isshown in Table 2. (Each font has 3,755 different Chinesecharacters. The first 3,000 characters are used for trainingand the remaining 755 for testing.)

The result in Table 2 is much better than the result inTable 1. The reason lies in the samples’ high quality inDatabase B.

Adjusting the system parameters for the experiment onDatabase B, we get a 400-dimension feature vector from the702-dimension wavelet features after the LDA process. Thek in the MQDF classifier (see (12) for details) is set to 320.Our experiment on Database B achieves a recognition rateof 90.28 percent, as shown in Table 3.

Table 4 shows the recognition results for font attributeson Database B, i.e., the typeface, weight, and slope. Therecognition rate of “font” is considered to be a “whole,”where each attribute misclassification leads to a recognitionerror. The recognition rate of “weight” corresponds to thediscriminating power between normal/italic style andbold/bold-italic style. The recognition rate of “slope”corresponds to the discriminating power between nor-mal/bold style and italic/bold-italic style.

In Table 4, we can see the average recognition rate forweight is 92.27 percent, much lower than the average

recognition rates for typeface and slope. An important

reason lies in the small differences between normal/italic

style samples and bold/bold-italic style samples in Data-

base B, as shown in Fig. 9. Since the recognition rate for

weight is only 92.27 percent, the average recognition rate for

font as a whole is reduced to only 90.28 percent. It should be

pointed out that our method is a text independent approach

based on a single character; the recognition rate can be

improved by using multiple characters.

4.2 Experiment Results on Multiple Characters

If more than one character can be employed for font

recognition, a much better result can be achieved. If, for

example,N characters from the same font are used, we could

recognize the font for each individual character first, then

combine these results by a ranked majority voting method

[11] (we use a modified Borda count method) to get the final

result. For font F , we calculate the score as follows:

ScoreðF Þ ¼XNn¼1

SF ðCharnÞ; ð14Þ

where N indicates the number of characters used for font

recognition, SF ðCharnÞ indicates the score contributed by

the nth character for font F , and ScoreðF Þ indicates the total

score for font F . If font F is at the first, or second, or third

place in the font candidate table of the nth character,

SF ðCharnÞ takes the value of 3, 2, and 1, respectively. If

font F is at the fourth place or lowers in the candidate table,

SF ðCharnÞ takes the value of 0. We take the font with the

highest score computed by (14) as the final result of font

recognition, as shown in (15).


TABLE 1Result for Font Recognition on Database A

TABLE 2Results for Font Recognition on a Subset of Database B

TABLE 3Results for Font Recognition on Database B

TABLE 4Recognition Results for Font Attributes on Database B

CðfontÞ ¼ Fi if ScoreðFiÞ ¼ max1�j�C

ScoreðFjÞ; ð15Þ

where Cð�Þ denotes a classification operation, Fi denotes the

ith font class, ScoreðFiÞ denotes the score for font Fi, and C

is the number of font classes.Font recognition results using more than one character

are shown in Table 5. A more intuitive graph is shown in

Fig. 10.

4.3 Performances on Noisy Images

In real-world applications, the robustness of an algorithm is

a key factor. Our method has obtained a recognition rate of

98.20 percent on good quality samples (Database B) and a

recognition rate of 91.29 percent on real-world samples

(Database A). It shows that our method is effective when

utilizing either good quality samples or real-world samples.In addition, we have investigated the performance of our

method under different noise levels through further

experiments. For each sample from set A, we add white

noise to simulate contaminating in the real-world. Fig. 11

shows the degraded samples we used for testing robust-

ness. The signal noise ratio (SNR) is defined as follows:

SNR ¼ 10 log

PI2m;nP

ðIm;n � Im;nÞ2;

where Im;n and Im;n represent the original and the noisyimage, respectively.

Table 6 and Fig. 12 show that at noise level SNR ¼ 14, thealgorithm still can perform effectively when only a singleunknown character is involved, with an accuracy of78.5 percent, and has a much better performance when fiveunknown characters are involved with an accuracy of90.5 percent. It is understandable that our method canachieve a much higher accuracy when more characters areused. This shows that our method is capable of achieving ahigh accuracy by employing more characters if the image iscontaminated, as long as the performance on a singlecharacter does not drop too severely.

4.4 Performances on Different Characters

This section addresses the relationship between the perfor-mance of font recognition and the complexity of characters inour method. Since font information is embedded in the strokesubstructures, using characters of various complexities, i.e.,ones consisting of different numbers of strokes, may result indifferent accuracies.


TABLE 5Results Using More Than One Character on Database B

(over 28 Font Classes)

Fig. 10. Results using more than one character on Database B (over

28 font classes).

Fig. 11. Single character images at some different noise levels. (a) SNR

= 22. (b) SNR = 17. (c) SNR = 14. (d) SNR = 12. (e) SNR = 10.

TABLE 6Some Results under Different Noise Levels

Fig. 12. Result under different noise levels.

Fig. 13. A Chinese character.

TABLE 7The Relationship between the Accuracy (%) and the

Complexity Tested on Database B Samples

The complexity of a character is represented by horizontaland vertical complexity, which is defined as the max numberof horizontal and vertical transitions in its binary image,respectively. For instance, the character in Fig. 13 has ahorizontal complexity of 6 and a vertical complexity of 4.

We tested our method on all the normal samples inDatabase B, the result is given in Table 7. This shows thatour method can achieve a stable accuracy, approximately99 percent through a single character, when both thehorizontal and vertical complexity of the image is beyondsix. It also indicates that different characters with differentcomplexities will result in different accuracies. Generally,any rise in either horizontal or vertical complexity willbring about an increase of the accuracy. Further, Table 7implies that our method is capable of recognizing the fontof a segment of a character, as long as the image containssufficient strokes.

Additionally, we tested our method on some segments ofcharacters. These segments are obtained by splitting thecharacters naturally according to the gaps among thecomponents of the character to simulate segmentationerrors. Because our algorithm was not developed to workon the segment-level, we verified the effectiveness of thealgorithm through several segment examples rather than acomprehensive test. Table 8 shows some examples ofsegments and the experiment results. The capability ofrecognizing the font of an isolated segment not onlyillustrates the robustness of our method in the presence ofsegmentation errors, but also further demonstrates itsadvantage of being independent of the character identity.

4.5 Comparison with Existing Methods

There are many papers that discuss the font recognitionproblem but few of them are based on a single unknowncharacter. Our previous paper [5] used stroke propertyfeatures and stroke distribution features to recognize the

font and got a recognition rate of 69.7 percent. Our newmethod gets a recognition rate of 91.29 percent on the samedatabase (Database A). The high recognition rate shows theeffectiveness of our method.

Zhu and Tan used a group of Gabor filters to extract texturefeatures for font recognition and employed a weightedEuclidean distance classifier to recognize the font [4]. Wecompare our method with theirs because their method is arepresentative font recognition method for Chinese charac-ters, although their method is based on a block of text (auniform text block identified through preprocessing) ratherthan a single unknown Chinese character.

We test their method on multiple characters with thesamples in set B. Since their method requires a block of textconsisting of several characters, we select N�N samples inset B randomly and assemble them into a block as the inputsample. Fig. 14 gives some examples.

Table 9 shows that Zhu’s method can get a fairly goodresult when only seven classes are to be identified.According to Section 4.1 and Section 4.2, our method canget an average accuracy up to 98.2 percent with only onecharacter. When 25 characters are involved, our method canachieve nearly 100 percent accuracy, which is much higherthan Zhu’s 96.34 percent.

When the situation gets more complex because there are28 classes to be identified, the performance of Zhu’s methoddrops significantly. Table 10 shows the result.

Fig. 15 shows the experiment results on 28 font classes(seven Chinese typefaces combined with four styles). Thereare 500 samples for training and another 500 for testing foreach class. According to the results, our algorithm is betterat font recognition especially on few characters, andachieves a much better result, up to 100 percent, whenmore characters are involved. Or, viewed in another way,our method can achieve similar performance to theirs using


TABLE 8Some Examples of Segment and Experiment Result

(Take seven fonts into account, one sample for each font.)

Fig. 14. Samples used to the test algorithm based on blocks of text.

(a) 4� 4 bold style of Fangsong. (b) 5� 5 italic style of Hei. (c) 8� 8

normal style of Kai.

TABLE 9Results for Zhu’s Method in Set B

(Seven Classes) with 25 Characters

TABLE 10Results for Zhu’s Method in Set B(28 Classes) with 25 Characters

far fewer characters. When the recognition accuracy reaches90 percent, our method requires only one character whiletheirs needs 25.

5 CONCLUSION

Since character information is dominant in the image of thecharacter, differences between fonts are submerged by thedifferences between characters. As a result, it is a greatchallenge to recognize accurately the font of a singleunknown character in a large-scale Chinese character set. Inthis paper we have explored this problem and presented anew algorithm for font recognition on a single unknownChinese character. Our method can provide a higherrecognition rate compared to the existing methods, and canprovide a much higher flexibility since the method is based ona single unknown character while existing methods require atext block. We employ a wavelet transform on the characterimage and extract wavelet features from the transformedimage first. After a Box-Cox transformation and LDA process,the discriminating features for font recognition are extracted.A MQDF classifier with only one prototype for each font classis employed to recognize the font and achieves satisfactoryresults. Experiments show that our method is flexible, robustand effective. Additionally, our method demonstrates that itis possible to extract a faint signal embedded in a noisy

background. That is a significant result and is illuminative formany other problems in the field of pattern recognition.

APPENDIX

THE DETERMINATION OF PARAMETERS m AND k

Parameters m and k are decided experimentally. In orderfor the algorithm to achieve its best performance, the valuesof m and k must be determined in concert. This requireschanging the values of m and k simultaneously to identifythe best combination. To simplify the procedure, wedetermine the values of k and m, respectively, althoughthe result might only be locally optimal.

We determine m first. It is obvious that the accuracy willrise as m increases when there are infinite samples that arenormally distributed. Unfortunately, these two conditionsare so idealized that they cannot be met in the real world.Therefore, the accuracy will not rise continuously after mincreases beyond a threshold. Hence, we decide the m valueso as to make the accuracy reach its maximum experimen-tally. We set the value of k to m-32, change m, and test thealgorithm with set A. Table 11 shows the result. Accordingto Fig. 16, the algorithm achieves the best performancewhen m ¼ 672.

Next, we determine the value of k. Similarly, theaccuracy is not raised, while k increases because of theconstraints of the idealized model and the finite number ofsamples. So, we let m ¼ 672 and test the relationshipbetween k and accuracy on set A. Table 12 and Fig. 17shows the result, and k ¼ 448 is the best point that makesthe accuracy reach the peak value 90.85 percent.

REFERENCES

[1] H. Shi and T. Pavlidis, “Font Recognition and ContextualProcessing for More Accurate Text Recognition,” Proc. Fourth Int’lConf. Document Analysis and Recognition, pp. 39-44, Aug. 1997.

[2] A. Zramdini and R. Ingold, “Optical Font Recognition UsingTypographical Features,” IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 20, no. 8, pp. 877-882, Aug. 1998.


TABLE 11The Relationship between m and Accuracy

Fig. 16. The relationship between m and accuracy.

TABLE 12The Relationship between k and Accuracy

Fig. 17. The relationship between k and accuracy.

Fig. 15. Comparison between our method and Zhu’s (28 classes).

[3] M.-C. Jung, Y.-C. Shin, and S.N. Srihari, “Multifont ClassificationUsing Typographical Attributes,” Proc. Sixth Int’l Conf. DocumentAnalysis and Recognition, pp. 353-356, 1999.

[4] Y. Zhu and T. Tan, “Font Recognition Based on Global TextureAnalysis,” IEEE Trans. Pattern Analysis and Machine Intelligence,vol. 23, no. 10, pp. 1192-1200, Oct. 2001.

[5] L. Chen and X. Ding, “Optical Font Recognition of Single ChineseCharacter,” Proc. IS&T/SPIE Document Recognition and Retrieval X,Electronic Imaging, Jan. 2003.

[6] S. Mallat, “A Theory for Multiresolution Signal Decomposition:The Wavelet Representation,” IEEE Trans. Pattern Analysis andMachine Intelligence, vol. 11, no. 7, pp. 674-693, July 1989.

[7] K.R. Castleman, Digital Image Processing. Prentice Hall, 1996.[8] R.M. Sakia, “The Box-Cox Transformation Technique: A Review,”

The Statistician, vol. 41, pp. 169-178, 1992.[9] K. Fukunaga, Introduction to Statistical Pattern Recognition,

second ed. Academic Press, 1990.[10] F. Kimura et al., “Modified Quadratic Discriminant Functions and

the Application to Chinese Character Recognition,” IEEE Trans.Pattern Analysis and Machine Intelligence, vol. 9, no. 1, pp. 149-153,Jan. 1987.

[11] A.F.R. Rahman, H. Alam, and M.C. Fairhurst, “Multiple ClassifierCombination for Character Recognition: Revisiting the MajorityVoting System and Its Variations,” Proc. Fifth Int’l WorkshopDocument Analysis Systems, 2002.

[12] L. Chen and X. Ding, “A Universal Method for Single CharacterType Recognition,” Proc. 17th Int’l Conf. Pattern Recognition, vol. 1,pp. 413-416, Aug. 2004.

Xiaoqing Ding graduated from the Departmentof Electronic Engineering, Tsinghua University,and won the Gold Medal for the best graduatingstudent in 1962. Since then, she has beenteaching at Tsinghua University. Now, she is aprincipal professor and PhD supervisor. Formany years, she has done research in imageprocessing, pattern recognition, and computervision and has received many superior achieve-ments in various areas: character recognition,

biometrics, and video surveillance. She has received many honors andhas received the National Scientific and Technical Progress Award threetimes. She has published more than 300 papers and is the coauthor offour books. She has served as program committee member of manyinternational conferences and editor of international journals. She is amember of the IEEE and fellow of the IAPR.

Li Chen received the BE degree from TsinghuaUniversity of China in 1996, and the PhD degreein electronic engineering from Tsinghua Univer-sity of China in 2003. He is now a WCDMAsenior engineer in Huawei Corp., Shanghai,China. He is a member of the IEEE.

Tao Wu received the BS degree in electronicsengineering from Tsinghua University, Beijing,Peoples Republic of China, in 2005. He joinedState Key Laboratory of Intelligent Technologyand Systems of China in 2005, where he iscurrently a candidate for the master’s degree.His research interests include computer vision,pattern recognition, machine learning, and im-age and video processing. He is a studentmember of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


character independent font recognition on a single chinese character

Documents