markov random field-based statistical character structure modeling for handwritten chinese character...

14
Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition Jia Zeng, Member, IEEE, and Zhi-Qiang Liu Abstract—This paper proposes a statistical-structural character modeling method based on Markov random fields (MRFs) for handwritten Chinese character recognition (HCCR). The stroke relationships of a Chinese character reflect its structure, which can be statistically represented by the neighborhood system and clique potentials within the MRF framework. Based on the prior knowledge of character structures, we design the neighborhood system that accounts for the most important stroke relationships. We penalize the structurally mismatched stroke relationships with MRFs using the prior clique potentials and derive the likelihood clique potentials from Gaussian mixture models, which encode the large variations of stroke relationships statistically. In the proposed HCCR system, we use the single-site likelihood clique potentials to extract many candidate strokes from character images and use the pair-site clique potentials to determine the best structural match between the input candidate strokes and the MRF-based character models by relaxation labeling. The experiments on the Korea Advanced Institute of Science and Technology (KAIST) character database demonstrate that MRFs can statistically model character structures, and work well in the HCCR system. Index Terms—Markov random fields, handwritten Chinese character recognition, statistical-structural character modeling. Ç 1 INTRODUCTION T HE Chinese character structure is hierarchical: many straight-line strokes constitute independent radicals, which in turn constitute characters. According to Bieder- man’s [1] recognition-by-components (RBC) theory, the visual input is matched against the objects’ structural representations in the brain, which consist of primitive shapes and their interrelations. Character shapes can be represented by fragmental features (for example, strokes) and configurational features for relationships among fragmental features. The human visual system uses mostly configura- tional features rather than fragmental features to recognize characters during reading. Therefore, character structures play important roles in recognition, especially for characters very similar in shape. Because Chinese characters have hierarchical parts with complicated shape information, modeling character structures becomes one of the most challenging topics in pattern recognition [2]. In the past, the statistical and the structural methods have been two major strategies in modeling Chinese characters [3]. The first method is based on feature statistics for the holistic shape information, where standard statistical methodologies are used to recognize characters (for example, city block distance and Mahalanobis distance [4], k-nearest-neighbor- hood classifier [5], K-Means clustering and Gaussian distribution selector [6], contextual vector quantization [7], nonlinear active shape models [8], and invariant support vector machines [9]). The statistical method can efficiently build a large-vocabulary character recognition system be- cause it has a systematic learning process from training samples. However, it indirectly reflects character structures [2], [3] and, thus, has difficulty differentiating characters with similar shapes such as “ ” and “ .” Inspired by Biederman’s [1] RBC theory, the second method represents fine details of character structures by a character model composed of many stroke models corre- sponding to real strokes. Character recognition proceeds by finding the best structural match between the input strokes and the stroke models. Compared with the statistical method, the structural method extracts feature points and line segments from character images and represents their spatial relationships by a relational graph, in which the node denotes the feature point or line segment, and the edge between two nodes denotes their relationships (for example, constraint graph model [10], attributed relational graph [11], and hierarchical random graph [12]). Despite the excellent descriptive ability for fine details of character structures, there are two major problems yet to be solved. The first is the stroke extraction problem—because the strokes are often ambiguous and degraded, how to extract the stable ones for modeling their spatial relationships. This problem becomes much more difficult if the thinning preprocessing techniques cause junction-distortions in character skeletons. The second problem lies in that the structural method usually depends heavily on developer’s heuristic knowledge, leading to neither the rigorous matching algorithm nor the automatic leaning scheme from training samples [2]. Therefore, a hybrid statistical-structural method is needed for modeling character structures. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 5, MAY 2008 767 . J. Zeng is with the Department of Electronic Engineering, City University of Hong Kong, Tat Chee Ave. 83, Kowloon Tong, Hong Kong, P.R. China. E-mail: [email protected]. . Z.-Q. Liu is with the School of Creative Media, City University of Hong Kong, Tat Chee Ave. 83, Kowloon Tong, Hong Kong, P.R. China. E-mail: [email protected]. Manuscript received 26 Sept. 2006; revised 7 Mar. 2007; accepted 11 June 2007; published online 28 June 2007. Recommended for acceptance by S.-C. Zhu. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TPAMI-0684-0906. Digital Object Identifier no. 10.1109/TPAMI.2007.70734. 0162-8828/08/$25.00 ß 2008 IEEE Published by the IEEE Computer Society

Upload: jia-zeng

Post on 06-Nov-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition

Markov Random Field-Based StatisticalCharacter Structure Modeling for Handwritten

Chinese Character RecognitionJia Zeng, Member, IEEE, and Zhi-Qiang Liu

Abstract—This paper proposes a statistical-structural character modeling method based on Markov random fields (MRFs) for

handwritten Chinese character recognition (HCCR). The stroke relationships of a Chinese character reflect its structure, which can be

statistically represented by the neighborhood system and clique potentials within the MRF framework. Based on the prior knowledge of

character structures, we design the neighborhood system that accounts for the most important stroke relationships. We penalize the

structurally mismatched stroke relationships with MRFs using the prior clique potentials and derive the likelihood clique potentials from

Gaussian mixture models, which encode the large variations of stroke relationships statistically. In the proposed HCCR system, we use

the single-site likelihood clique potentials to extract many candidate strokes from character images and use the pair-site clique potentials

to determine the best structural match between the input candidate strokes and the MRF-based character models by relaxation labeling.

The experiments on the Korea Advanced Institute of Science and Technology (KAIST) character database demonstrate that MRFs can

statistically model character structures, and work well in the HCCR system.

Index Terms—Markov random fields, handwritten Chinese character recognition, statistical-structural character modeling.

Ç

1 INTRODUCTION

THE Chinese character structure is hierarchical: manystraight-line strokes constitute independent radicals,

which in turn constitute characters. According to Bieder-man’s [1] recognition-by-components (RBC) theory, thevisual input is matched against the objects’ structuralrepresentations in the brain, which consist of primitiveshapes and their interrelations. Character shapes can berepresented by fragmental features (for example, strokes) andconfigurational features for relationships among fragmentalfeatures. The human visual system uses mostly configura-tional features rather than fragmental features to recognizecharacters during reading. Therefore, character structuresplay important roles in recognition, especially for charactersvery similar in shape. Because Chinese characters havehierarchical parts with complicated shape information,modeling character structures becomes one of the mostchallenging topics in pattern recognition [2].

In the past, the statistical and the structural methods havebeen two major strategies in modeling Chinese characters [3].

The firstmethodis basedonfeaturestatistics for the holisticshape information, where standard statistical methodologiesare used to recognize characters (for example, city blockdistance and Mahalanobis distance [4], k-nearest-neighbor-hood classifier [5], K-Means clustering and Gaussian

distribution selector [6], contextual vector quantization [7],nonlinear active shape models [8], and invariant supportvector machines [9]). The statistical method can efficientlybuild a large-vocabulary character recognition system be-cause it has a systematic learning process from trainingsamples. However, it indirectly reflects character structures[2], [3] and, thus, has difficulty differentiating characters withsimilar shapes such as “ ” and “ .”

Inspired by Biederman’s [1] RBC theory, the secondmethod represents fine details of character structures by acharacter model composed of many stroke models corre-sponding to real strokes. Character recognition proceeds byfinding the best structural match between the input strokesand the stroke models. Compared with the statistical method,the structural method extracts feature points and linesegments from character images and represents their spatialrelationships by a relational graph, in which the node denotesthe feature point or line segment, and the edge between twonodes denotes their relationships (for example, constraintgraph model [10], attributed relational graph [11], andhierarchical random graph [12]). Despite the excellentdescriptive ability for fine details of character structures,there are two major problems yet to be solved. The first is thestroke extraction problem—because the strokes are oftenambiguous and degraded, how to extract the stable ones formodeling their spatial relationships. This problem becomesmuch more difficult if the thinning preprocessing techniquescause junction-distortions in character skeletons. The secondproblem lies in that the structural method usually dependsheavily on developer’s heuristic knowledge, leading toneither the rigorous matching algorithm nor the automaticleaning scheme from training samples [2]. Therefore, a hybridstatistical-structural method is neededfor modelingcharacterstructures.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 5, MAY 2008 767

. J. Zeng is with the Department of Electronic Engineering, City Universityof Hong Kong, Tat Chee Ave. 83, Kowloon Tong, Hong Kong, P.R. China.E-mail: [email protected].

. Z.-Q. Liu is with the School of Creative Media, City University of HongKong, Tat Chee Ave. 83, Kowloon Tong, Hong Kong, P.R. China.E-mail: [email protected].

Manuscript received 26 Sept. 2006; revised 7 Mar. 2007; accepted 11 June2007; published online 28 June 2007.Recommended for acceptance by S.-C. Zhu.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TPAMI-0684-0906.Digital Object Identifier no. 10.1109/TPAMI.2007.70734.

0162-8828/08/$25.00 � 2008 IEEE Published by the IEEE Computer Society

Page 2: Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition

In this paper, Markov random fields (MRFs), withMarkov property on undirected graphs [13], fulfill theneed of representing both statistical and structural in-formation of characters within a unified framework. Theirgreat success achieved in pattern recognition, imageprocessing, and computer vision in the passing decadeshas been largely due to their ability to reflect local statisticaldependencies existing universally in patterns, images, andvideo frames [14], [15], [16]. MRFs can model two-dimensional (2D) patterns statistical structurally. Withinthe MRF framework, statistical interactions at adjacent sitesin a pattern or image are reflected by two fundamentalconcepts: neighborhood system @i and clique potentials Vc. Theneighborhood system @i defines a set of neighbors of site iprovided that i0 2 @i, i 2 @i0, i 62 @i. The clique c is asubset of sites that are all pairwise neighbors. To encourageor penalize various local interactions, we assign the costs Vcto the cliques in the neighborhood system. According to theHammersley-Clifford theorem [17], the joint probabilitydistribution of the random variables at all sites in the MRFis a Gibbs distribution associated with an energy functionadded by the clique potentials over all cliques. Thedesirable global configuration is reached by the maximuma posteriori (MAP) estimation of the MRF equivalent tominimizing the corresponding joint energy function of theGibbs distribution.

Indeed, character recognition based on the hidden MRF[18] or the contextual stochastic modeling [19] only concernsthe causal dependencies among sites in the neighborhoodsystem and thus reduces the practical computational cost bydynamic programming. However, noncausal stroke relation-ships are more reasonable because the temporal stroke-orderinformation is unknown during offline character recognition.To approximate the joint distribution of the strokes, thestatistical character structure modeling (SCSM) method [2]represents all kinds of stroke relationships by the conditionalprobability of the neighbors selected by minimizing theKullback-Leibler (KL) divergence. With such a representa-tion, it adopts a heuristic search algorithm to find the bestcorrespondence between the input strokes and the strokemodels. In the SCSM, the KL divergence defines a neighbor-hood system that minimizes the information loss to approx-imate the joint probability distribution of the strokes. In thissense, the SCSM may be viewed as a special case of MRFs, ifthe joint probability has the Gibbs distribution form. Like-wise, the stochastic modeling of stroke relationships (SMSR)[20] may be also considered within the MRF framework if itdirectly uses the Gibbs distribution to model stroke relation-ships of Hangul characters. Therefore, we believe that thetheoretically well-founded MRF may shed more light onbuilding a salient framework for statistical-structural char-acter modeling and recognition.

For now, let’s reexamine the MRF-based SCSM. Differentstroke relationships identify different character structures. Atall sites, the strokes are random with respect to their direction,position, and length, and their spatial relationships can berepresented by the joint probability distribution (or density).In the MRF-based character model, the joint probability of thestrokes has the Gibbs distribution form, which transforms alarge variety of stroke relationships into the local interactionswith neighboring strokes (cliques) in the neighborhoodsystem. High-order cliques may represent high-order strokerelationships, but they also lead to a high computational cost.

Practically, we only consider the single-site cliques, C1 ¼ fig,and pair-site cliques, C2 ¼ fði; i0Þg. To encourage or penalizevarious stroke relationships, we assign the different cliquepotentials Vc to the neighboring strokes. In principle, we maydesign Vc arbitrarily if it decreases the energy value with anincrease of the matching degree between the observed strokesand the stroke models. Furthermore, we normalize theresulting Gibbs into a probability distribution by the partitionfunction with a complex combinatorial computation. Toavoid the normalization, we derive Vc directly from theprobability density functions, which here are Gaussianmixture models (GMMs) estimated from training samplesautomatically. In addition to the unary feature representing asingle stroke’s direction, position, and length, we also use thebinary feature to encode the structural information such asthe relative direction, position, and length between twostrokes. Since the binary feature is related to the unaryfeature, the estimated Gibbs distribution will only reproducethe marginal statistics of the observed strokes [21]. Afterbuilding the MRF-based character model for each category ofcharacters, the recognition proceeds by finding the beststructural match between the input strokes and the charactermodels in the sense of minimizing the MRF energy withrelaxation labeling (RL) [22]. Fig. 1 summarizes the MRF-based SCSM. Not only does the MRF framework improve thestructural matching accuracy but also offer a rational learningscheme from training samples rather than ad hoc heuristics.In the MRF-based SCSM, we have three key issues: 1) Definethe neighborhood system that accounts for the mostimportant stroke relationships. 2) Design the clique potentialsthat evaluate the local statistical dependencies amongstrokes. 3) Extract reliable strokes from character images.

Character recognition is a special case of visual objectrecognition. The psychological RBC theory [1] proposes astructural decomposition model for object representation,where many categories of objects are naturally represented bylocal image patches and pairwise spatial relationshipsbetween those patches. Specifically, “parts” are imagepatches and “shape” describes the geometry of mutualpositions between parts. As far as character structures areconcerned, “parts” are decomposed strokes, and their spatialconfiguration implies a certain “shape.” In object recognition,deformable models capture the large shape variations ofobjects from training samples and, thus, can classify hand-written Chinese radicals with the complex shape information(for example, nonlinear active shape models [8]). Anotherexample is the constellation model [23] that represents eachcategory of objects as the flexible constellations of rigid partsand further describes intraclass variability by a joint prob-ability density function on the shape of the constellations andthe output of part detectors. The constellation model focuseson three problems, that is, segmentation of training images,part selection, and estimation of model parameters. From thisperspective, it can be directly applied to character recogni-tion, where the part selection problem is equivalent tosearching the most distinctive and stable strokes fromcharacter images. However, it may have difficulty achievingthe adequate structural representation of characters for thefollowing reasons. First, it lacks the vocabulary to describefine details of character structures except for the jointprobability distribution of the parts. Second, it does not havean explicit mechanism to incorporate the prior structuralinformation of characters.

768 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 5, MAY 2008

Page 3: Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition

In contrast, Grenander and Miller’s [24] pattern theorystates that the variables describing the structures in the worldare typically related in a graphical fashion, and finding theright graph or class of graphs is a crucial step in setting up asatisfactory model for any patterns. Because most spatialinteractions among strokes occur within a certain neighbor-hood system, we assume the Markov property on the strokemodels (nodes of the graph) that constitute the MRF-basedcharacter model. As shown in Fig. 1, the observed strokescompose one undirected graph, and the MRF-based charactermodel is the other undirected graph. The structural matchbetween two graphs relies on a graph matching approachsuch as the RL. Since handwritten Chinese characters containrich structural information, there are several advantages ofusing the MRF-based character models. First, the MRFemploys a much more enriched vocabulary to describecharacter structures. For example, the neighborhood systemdefines the interactive range of the strokes, and the cliquepotentials directly measure the structural similarity betweenthe observed strokes and the stroke models. Second, graphsencode the prior knowledge of stroke relationships, wherethe edge between two nodes represents a certain priorrelationship between the strokes. Third, the special designof neighborhood systems and clique potentials can empha-size the subtle difference of character structures.

In the next section, we introduce the MRF for thestatistical-structural character modeling. As a labeling pro-blem, we describe character structures by the neighborhoodsystem and clique potentials within the MRF framework. Wedefine the global and connected neighborhood systems toaccount for the most important stroke relationships. Thelikelihood clique potentials are derived from the GMMs, andthe prior clique potentials are penalties for those mismatchedstroke relationships. Section 3 builds an MRF-based hand-written Chinese character recognition (HCCR) system,including the stroke extraction algorithm, the structuralmatching algorithm, and the learning algorithm. First, theMRF-guided stroke extraction algorithm finds all possiblecandidate strokes. Second, based on the pair-site cliquepotentials, the RL algorithm determines the best strokerelationships from the extracted candidate strokes. Finally,

the learning algorithm estimates the parameters of the MRF-based character model giving the maximum-likelihood (ML)description to training samples. Section 4 shows experimentson the Korea Advanced Institute of Science and Technology(KAIST) database, and Section 5 draws conclusions.

2 STATISTICAL-STRUCTURAL CHARACTER

MODELING

This section is devoted to the MRF-based SCSM method. Forreference purpose, we list all important notations as follows:

. I . I ¼ f1; . . . ; i; . . . ; Ig, a set of sits.

. J . J ¼ f1; . . . ; j; . . . ; Jg, a set of labels.

. O, O ¼ fo1; . . . ;oi; . . . ;oIg, a collection of observa-tions at all sites.

. F , F ¼ ff1; . . . ; fi; . . . ; fIg, a labeling configurationat all sites.

. @i. The neighborhood system of site i.

. c. A clique that all sites in it are pairwise neighbors.

. C1. A set of single-site cliques.

. C2. A set of pair-site cliques.

. VC1. A single-site clique potential.

. VC2. A pair-site clique potential.

. �. A set of parameters defining an MRF-basedcharacter model.

. U�. The energy function that is a sum of cliquepotentials over all possible c.

Many pattern recognition problems can be posed as thelabeling problem to which the solution is a set of linguisticlabels, J ¼ f1; . . . ; j; . . . ; Jg, assigned to a set of sites,I ¼ f1; . . . ; i; . . . ; Ig, to explain the observations, O ¼fo1; . . . ;oIg, at all sites. The sites may be successive times,image pixels, and image patches, whereas the labels reflectany relations, regularities, or structures inherent in sites. Ateach site, the random observation oi may represent symbols,feature vectors, or image pixel values. For simplicity, weassume that the observations are i.i.d. at all sites. The labelsmay be viewed as the hidden random variables that generateobservations, and the number of labels J usually does notequal the number ofsitesI.The labelingstrength,fiðjÞ 2 ½0; 1�,

ZENG AND LIU: MARKOV RANDOM FIELD-BASED STATISTICAL CHARACTER STRUCTURE MODELING FOR HANDWRITTEN CHINESE... 769

Fig. 1. The MRF-based statistical character structure modeling.

Page 4: Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition

measures the label j assigned to the site i, where fiðjÞ ¼ 1,denotes that j is definitely assigned to i, that is, fi ¼ j. Thenull label is not assigned to any sites, denoted byP

i fiðjÞ ¼ 0, and the null site is not associated with anylabels, denoted by

Pj fiðjÞ ¼ 0. The labeling configuration at

all sites,F ¼ ff1; f2; . . . ; fIg, isastochasticprocess.Accordingto Mumford’s [25] pattern theory of perception, the class-

conditional joint probability, P ðO;Fj�Þ, can describe theunderlying pattern structure for each class �. For example, in(hidden Markov model) HMM-based speech recognition, wehave labelsrepresentingphonemes,andsuchalabelset for the

word “cat” would have labels for phonemes =k=, =a=, and=t= in Fig. 2a; in MRF-based HCCR, we have labels represent-ing stroke models. The label set for the character “ ” wouldhave labels for all decomposed straight-line strokes in Fig. 2b.

In MRF-based HCCR, we would have a number of MRF-

based character models �, one for each category, and use theP ð�jOÞ to score each character model based on the given testobservation O. According to Bayesian decision theory [26,p. 20], we classify a test character to the character model with

the highest score. Such graph matching score evaluates howwell an MRF matches the observed candidate strokes. Since inisolated HCCR, we have little prior knowledge aboutdependencies among character models, we assume the equal

prior probability P ð�Þ. Thus, by Bayes’ rule, we obtainP ð�jOÞ / P ðOj�Þ and use P ðOj�Þ to score each charactermodel instead. Because computing the matching score,P ðOj�Þ ¼

PF P ðO;Fj�Þ, is an intractable combinatorial

problem, we turn to find the single best labeling configuration

F� to explain the observation O. Given�, the MAP estimationguarantees the best labeling configuration

F� ¼ arg maxF

P ðF jO; �Þ: ð1Þ

Again, by Bayes’ rule, we obtain P ðF jO; �Þ / P ðO;Fj�Þ, so

that we change the MAP estimation in (1) to maximizingP ðO;Fj�Þ, which is usually factored as the likelihoodfunction ofF with respect to O and the prior probability ofF

P ðO;Fj�Þ ¼ pðOjF ; �ÞP ðF j�Þ: ð2Þ

Indeed, the labeling problem is a compound Bayesiandecision problem [26, p. 62]

2.1 MRFs

To avoid computing P ðO;Fj�Þ in (2) for all JI possible

labeling configurations, the MRF constrains the label

interdependence by assuming that the labels are only

dependent on their neighbors. According to the Hammers-

ley-Clifford theorem [17], we rewrite (2) in the Gibbs

distribution form

e�UðO;Fj�Þ

Z1ð�Þ¼ e

�UðOjF ;�Þ

Z2ð�Þe�UðFj�Þ

Z3ð�Þ; ð3Þ

where UðO;Fj�Þ, UðOjF ; �Þ, and UðF j�Þ are the joint

energy, the likelihood energy, and the prior energy,

respectively, and Z1ð�Þ, Z2ð�Þ, and Z3ð�Þ are the partition

functions that normalize the corresponding Gibbs into

probability distributions.Observe (3) and we see that Z1ð�Þ ¼ 1 if both terms on the

right hand have been normalized. As far as the likelihoodfunction is concerned, if we assumepðOjF ; �Þas a GMM, thenwe can take Z2ð�Þ ¼ 1 and, at the same time, drive thelikelihood energy from the GMM. In this way, the derivedUðOjF ; �Þ is comparable among different character models.Generally, we use the prior energy to encode the priorstructural information and design the prior clique potentialsto penalize the inconsistent structures. Hence, we have toevaluateZ3ð�Þ toensuretheGibbsdistributionP ðF j�Þ,but thedifferent character models � have the different normalizationfactors Z3ð�Þ. To compare fairly among all the charactermodels, we shall design the prior energy based on thelikelihood energy with the following intuitive interpretation.When the observed strokes contradict the prior structuralinformation encoded in the MRF, we decrease the likelihoodof the character model by reducing a certain proportion of thatlikelihood. Such design balances the likelihood and priorenergy functions to achieve a desirable global configuration.Furthermore, since UðF j�Þ depends on UðOjF ; �Þ, it is alsocomparable among even if we ignore Z3ð�Þ.

Therefore, through the above design of the energy

functions, maximizing (2) is equivalent to minimizing the

joint energy

UðO;Fj�Þ � UðOjF ; �Þ þ UðF j�Þ; ð4Þ

770 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 5, MAY 2008

Fig. 2. Many pattern recognition problems can be posed as the labeling problem.

Page 5: Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition

where

UðOjF ; �Þ ¼Xc2C1;C2

VcðOjF ; �Þ; ð5Þ

UðF j�Þ ¼Xc2C1;C2

VcðF j�Þ: ð6Þ

The energy function equals a sum of clique potentials overall possible cliques c. The single-site likelihood cliquepotential, VC1

ðoijj; �Þ, describes the statistical informationof observation oi given the label j, and the pair-sitelikelihood clique potential, VC2

ðoi;oi0 jj; j0; �Þ, statisticallydescribes the relationships between oi and oi0 given thelabels j and j0. Both single-site and pair-site prior cliquepotentials VC1

ðjj�Þ and VC2ðj; j0j�Þ encode the prior structural

information of the neighboring labels so that the MRFdepends on proper structural priors.

The MRF framework weights the likelihood energy (5)and prior energy (6) and combines them to form the jointenergy (4). If two energy distributions overlap signifi-cantly, this mathematical combination produces a desir-able result. Otherwise, it may be possible that the jointenergy will fall into the region unsupported by either thelikelihood or the prior.

The direction, position, and length contain the completespatial information of strokes. At each site i, there is acandidate stroke, oi ¼ ½oDi ;oPi ;oLi �

T, where oDi , oPi , and oLi arethe unary features of direction, position, and length. Therelationships between the two neighboring strokes arerepresented by the binary features, oii0 ¼ ½oDii0 ;oPii0 ;oLii0 �

T,where oDii0 ¼ oDi0 � oDi , oPii0 ¼ oPi0 � oPi , and oLii0 ¼ oLi0 � oLi .Within the MRF framework, we describe feature statisticsby the neighborhood system and clique potentials. In thefollowing sections, we shall focus on two issues of theMRF-based character models: 1) define the neighborhoodsystem that accounts for the most important stroke relation-ships and 2) design the clique potentials that measure thevarious stroke relationships.

2.2 Neighborhood Systems

For high-level vision problems, the neighborhood systemis usually defined at irregular sites such as image patches.Currently, we have three choices to define the neighbor-hood system.

The first is the global neighborhood system [27], in whichall sites are neighbors of each other in Fig. 3. It accounts forall types of stroke relationships completely, but it is the mostcomputationally expensive because all relationships be-tween two sites have to be calculated in the structural-matching algorithm. In practice, we often extract less thanforty candidate strokes, I � 40, from a Chinese character,

which make the global neighborhood system computation-ally tractable.

The second is the connected neighborhood system. Twoimage patches i and i0 are neighbors if they are connected inFig. 3. Chinese characters can be decomposed into imagepatches referred to as strokes, and the connected strokesoften reflect the stable structures such as the stable relativedirections, positions, and lengths. Thus, the connectedneighborhood system may cover most important strokerelationships. Because no more than two strokes areconnected in most characters, practically, we consider upto pair-site cliques and ignore other high-order cliques.

The third is the KL divergence neighborhood system [2]in which the most important stroke relationships areselected by minimizing the KL divergence among the strokedistributions.

As a comparison, the connected neighborhood system isnatural but can only reflect some fixed types of strokerelationships. The global neighborhood system is completebut complex. The KL divergence neighborhood system is acompromise between the above two neighborhood systems.In this paper, we consider only the global and connectedneighborhood systems. Because of the stroke ambiguity,practically, we need to extract many candidate strokes.Those repetitive or overlapped candidate strokes are notneighbors of each other in the neighborhood system.

2.3 Clique Potentials

The likelihood potentials encode both statistical andstructural information of strokes from training samples,and the prior potentials encode the prior structuralinformation of characters.

We derive the single-site and pair-site likelihood cliquepotentials from the GMMs [27]. According to the Gibbsdistribution and i.i.d. assumption, we obtain the followingsingle-site likelihood clique potential:

VC1ðoijj; �Þ ¼ � log

XMs

m¼1

wjmNðoi;��jm;�jmÞ" #

ð7Þ

and the pair-site likelihood clique potential

VC2ðoi;oi0 jj; j0; �Þ ¼ � log

XMs

m¼1

wjj0mNðoii0 ;��jj0m;�jj0mÞ" #

; ð8Þ

where oi and oii0 are the unary and binary features,

respectively, Ms is the number of mixture components, and

wjm and wjj0m are the weights of the mixture components.

Nð;��;�Þ is a multivariate Gaussian distribution

ZENG AND LIU: MARKOV RANDOM FIELD-BASED STATISTICAL CHARACTER STRUCTURE MODELING FOR HANDWRITTEN CHINESE... 771

Fig. 3. The global neighborhood system defines all sites i are neighbors in (b) and (c). The connected neighborhood system defines only connectedsites i, and i0 are neighbors such as in (d), (e), and (f).

Page 6: Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition

Nðo;��;�Þ ¼ 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið2�Þdj�j

q e�12ðo���Þ

0��1ðo���Þ; ð9Þ

where d is o’s dimensionality. Because the unary and binaryfeatures are related, the final Gibbs distribution (31) will notreproduce the stroke feature statistics governed by theGMM. Instead, the feature statistics are the marginaldistribution of the observed strokes [21].

The long strokes and short strokes of characters playdifferent roles in character structures. The long strokesusually constitute the basic character structure, but someshort strokes are crucial to differentiate similar characters inFig. 4. Therefore, during the structural match, we assign thelabels for the long stroke models to candidate strokes beforeshort ones and penalize the null labels especially for thosecrucial short stroke models in Fig. 4. To this end, weincorporate the length information of the stroke models

�j ¼��LjPJj¼1 ��

Lj

ð10Þ

into the likelihood clique potentials, where the mean vector

��j ¼XMs

m¼1

wjm��jm: ð11Þ

Thus, we use the single-site likelihood potential, �jVC1

ðoijj; �Þ, and the pair-site likelihood potential, �j�j0VC2

ðoi;oi0 jj; j0; �Þ, in the structural matching algorithm.

The single-site prior clique potential penalizes the nulllabel according to the length �j

VC1ðjj�Þ ¼

0; ifP

i f�i ðjÞ 6¼ 0;

�j�1; ifP

i f�i ðjÞ ¼ 0;

�ð12Þ

where �1 > 0. The pair-site prior clique potentials penalizethe mismatch between two input candidate strokes and theirlabeling configuration in terms of connection. We denote twoconnected labels j and j0 by ajj0 ¼ P ðj0jjÞ > 0, and otherwise,ajj0 ¼ 0. The label j is always disconnected with itself, that is,ajj ¼ 0. The connection between labels j and j0 denoted by anedge is fixed according to the initial character model. Asshown in Fig. 5d, the label 4 is connected with the labels 1, 2,and 3. This connection reflects the prior local structure of thecharacter “ ” in ðaÞ. During the structural match betweenthe candidate strokes, o1, o2, o3, o4, and the labels, 1, 2, 3, and4, we penalize the inconsistent relationship if two discon-nected labels 1 and 2 are assigned to two connected strokes o1

and o4 by the following pair-site prior clique potential:

VC2ðj; j0j�Þ ¼ 0; if j; j0 consist with i; i0;

ajj0�2; if j; j0 inconsist with i; i0;

�ð13Þ

where �2 > 0. To balance the likelihood and prior cliquepotentials, as explained in Section 2.1, we design thepenalties �1, �2 as

�1 ¼ minI

i¼1VC1ðoijj; �Þ ð14Þ

�2 ¼ 0:05VC2ðoi;oi0 jj; j0; �Þ: ð15Þ

To differentiate similar characters in shape in Fig. 4, wehave to carefully assign weights to some clique potentials inorder to emphasize the subtle structural difference insimilar characters. For example, we have to assign largeweights to the clique potential for the dot stroke in thecharacter “ ” to emphasize its difference from “ .” Theseweights for a pair of ambiguous Chinese characters can beautomatically obtained by the neural network learningalgorithm proposed in [28]. Alternatively, we can alsomanually set the weights to these clique potentials based onprior knowledge. For simplicity, we do not differentiatethese similar characters in this paper. More details aboutsimilar character recognition can be found in [28].

3 MRF-BASED HCCR SYSTEM

An HCCR system based on stroke analysis usually has fivecomponents: a handwritten Chinese character database, thecharacter model, the language model, the stroke extractionalgorithm, and the structural matching algorithm, in whichthe character model is a crucial part in the HCCR system.Fig. 6 shows the hierarchical structure of the HCCR systemproposed in this paper. We use the MRF-based charactermodel to describe the stroke relationships. The structuralmatching algorithm such as the RL bridges the charactermodels and the candidate strokes. Because of the strokeambiguity, we propose an MRF-guided method to extractall possible candidate strokes from character images by thesingle-site likelihood clique potential. The contextualinformation between character models is governed by thelanguage model not considered in this paper.

3.1 MRF-Guided Stroke Extraction

Extracting reliable strokes is an indispensable prerequisitefor modeling stroke relationships. The preprocessing ofcharacter images has three steps:

1. We normalize the slant and moment [29] with aspectratio preserved for character images and then performthe euclidean distance transform (EDT)-based

772 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 5, MAY 2008

Fig. 4. Three long horizontal strokes and one vertical stroke constitute

the main body of the character “ ,” but the short stroke is crucial to

differentiate it from the character “ .”

Fig. 5. The connection between labels is prior knowledge about thecharacter structure. In (d), the four connected labels represent priorstroke relationships of the character “ ” in terms of connection. Wepenalize the inconsistent relationship if two disconnected labels 1 and 2are assigned to two connected strokes o1 and o4.

Page 7: Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition

thinning [30] to the input characters, which canrecover the jam-packed holes and remove looselytouching strokes.

2. From the character skeleton, we extract the end andjunction points called the feature points. In the mean-while, we trace the consecutive pixels connecting thefeature points referred to as substrokes and removespurious substrokes whose lengths are short [11].

3. We use the corner detection [31] to break eachsubstroke at high-curvature points.

In [11], the directional feature oD defined in an interval[�90�, 270�] is not cyclic because different values mayrepresent almost the same direction, such as �89� and 269�.To obtain the cyclic oD, we extract the following Gaborfilter-based [32], [33] directional features.

1. We use eight Gabor filters with orientations 0�, 22:5�,45�, 67:5�, 90�, 112:5�, 135�, and 157:5� to convolvewith the thinned character image and obtain eightuncorrelated gray images.

2. At each pixel on the character skeleton, we useeight gray values from eight gray images as oD

normalized by setting its maximum and mini-mum values to one and zero, for example,oD ¼ ½1:00; 0:65; 0:05; 0:02; 0; 0:04; 0:09; 0:58�T.

3. The oD of a substroke is the average oD ofits component pixels. For example, if a substrokehas oD ¼ ½0:81; 0:83; 0:28; 0:06; 0:12; 0:10; 0:36; 0:59�T,it must be a horizontal line because its Gabor-filter responses are larger in 0� and 22:5�

orientations.

In summary, Fig. 7 illustrates the directional features basedon Gabor filters. From any start point, the cyclic oD

continues clockwise and counterclockwise. Furthermore,the euclidean distance between two vectors measures thedirectional similarity because the same directions have thesmallest distance, whereas the perpendicular directionshave the largest distance.

Due to complicated character shapes and ambiguous

character strokes, traditional stroke extraction methods often

produce erroneous broken strokes leading to problematic

structural description. The difficulty here stems from the

possibility of assigning multiple structural descriptions to the

same character. According to the interactive cascade model of

segmentation [34], partial bottom-up information is sent to a

higher level object representation that, in turn, feed backs to

guide the segmentation process. Therefore, we use the MRF-

based character model to guide the substroke merging to

produce perceptually meaningful candidate strokes for

recognition. Because the character model does not encode

the relationships between substrokes, we can only use the

single-site likelihood cliquepotentials (7) to searchallpossible

candidate strokes having the lower energy for each label.

Fig. 8 illustrates the MRF-guided stroke extraction process.

Algorithm 1 MRF-guided stroke extraction.input: OPEN , CLOSED, G ¼ ðO; EÞ, �.

output: CLOSED.

initialize: OPEN , CLOSED �.

1 begin

2 for j 1 to J do

3 for i 1 to I do

4 if koPi � ��Pj k � TH3, koDi � ��Dj k � TH4 then

5 OPEN oi6 end

7 end

8 for i 1 to jOPENj do

9 onew oi 2 OPEN10 repeat

11 oold onew;

12 CLOSED oold;

13 onew oold, oi0 , E14 until VC1

ðonewjj; �Þ � VC1ðooldjj; �Þ TH5;

15 end

16 end

17 end

First, we build a graph G ¼ ðO; EÞ for all the substrokes,

where E is a matrix about the connectable information of the

substrokes oi and oi0 that satisfies the following conditions:

ZENG AND LIU: MARKOV RANDOM FIELD-BASED STATISTICAL CHARACTER STRUCTURE MODELING FOR HANDWRITTEN CHINESE... 773

Fig. 6. The hierarchical structure of the HCCR system.

Page 8: Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition

1. oi and oi0 share the same junction region like o2 ando3 in Fig. 8 or the distance between their end pointsis less than a threshold TH1, where TH1 ¼ 3 pixels.

2. koDi � oDi0 k � TH2, where TH2 ¼ 0:5.

The first condition ensures two substrokes from a contin-

uous straight line, and the second condition checks the

linearity of two substrokes. Since the normalized character

image size is 64 64 pixels, TH1 ¼ 3 is a proper distance for

the gap between substrokes. We set TH2 ¼ 0:5 to prevent

perpendicular substrokes from being connectable. For each

substroke, oPi is its centroid ðx; yÞ coordinates, and oLi is thenumber of pixels both of which are normalized with respectto the character size.

Second, we use the greedy search Algorithm 1 to find thepossible candidate strokes for each label j. The OPEN setstores the initial substrokes that satisfy the positional anddirectional constraints with label j. The thresholds TH3 ¼0:5 and TH4 ¼ 0:5 are loose enough to include initialcandidates, as explained in [11]. For each initial substrokein the OPEN set, we concatenate it with its connectablesubstroke by

774 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 5, MAY 2008

Fig. 7. Cyclic directional features oD based on Gabor filters.

Fig. 8. We build a graph G ¼ ðO; EÞ for all substrokes, where E contains the connectable information. Algorithm 1 searches the possible substroke

concatenations decreasing the energy for each label. The numbers in brackets are the corresponding single-site likelihood potentials for possible

concatenations. All candidate strokes are stored in the CLOSED set.

Page 9: Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition

oPnew ¼oPi oLi þ oPi0 o

Li0

oLi þ oLi0; ð16Þ

oDnew ¼oDi oLi þ oDi0 o

Li0

oLi þ oLi0; ð17Þ

oLnew ¼ oLi þ oLi0 : ð18Þ

We put the new substroke in theCLOSED set, if it decreases

the single-site likelihood potential in Fig. 8. The threshold

TH5 is a small positive value to control the greedy search.Finally, the CLOSED set stores all candidate strokes that

decreases the single-site likelihood clique potential (7). For

each label j, we may extract multiple candidate strokes. A

null label indicates that no candidate strokes are extracted.

Fig. 9 shows some MRF-guided stroke extraction results,

where only the candidate stroke having the minimum

single-site likelihood potential for each label is shown. After

stroke extraction, we also obtain the connection information

of all candidate strokes.

3.2 Relaxation Labeling

Character recognition is equivalent to finding the best

structural match F� in (1), and the joint energy UðO;F�j�Þin (4) is the structural matching cost. Each character

category is associated with an MRF-based character model.

After the structural match with all character models, we

classify the test observation to the one with the minimum

cost. In this section, we use the RL algorithm [22] to find the

best labeling configuration F�.

First, we convert the minimization of the joint energy

into the maximization of a corresponding gain function

gðO;Fj�Þ¼XJj¼1

XIi¼1

KiðjÞfiðjÞþXJj0¼1

maxi02@i

Ki;i0 ðj; j0ÞfiðjÞfi0 ðj0Þ" #

;

ð19Þ

which is the sum of compatibility functions defined by the

clique potentials

KiðjÞ ¼ CONST1 � VC1ðjj�Þ � VC1

ðoijj; �Þ; ð20Þ

Ki;i0 ðj; j0Þ ¼ CONST2 � VC2ðj; j0j�Þ � VC2

ðoi;oi0 jj; j0; �Þ; ð21Þ

where the constants CONST1 and CONST2 ensure that

both compatibility functions are nonnegative. Obviously,

there is a one-to-one correspondence between the maxima

of gðO;Fj�Þ and the minima of UðO;Fj�Þ because of the

relationship, UðO;Fj�Þ ¼ CONST � gðO;Fj�Þ, where

CONST is some constant on the character model. There-

fore, we obtain the minimum structural matching cost

UðO;F�j�Þ according to the final labeling configuration F�

and the maximum gðO;F�j�Þ. We use the max operator

rather than theP

operator in the classical gain function

[22], because we consider only the maximum compatibility

among distinct local labeling constraints in the neighbor-

hood system. The compatibility, Ki;i0 ðj; jÞ ¼ 0, prevents the

same label from assigning to the neighboring sites.

ZENG AND LIU: MARKOV RANDOM FIELD-BASED STATISTICAL CHARACTER STRUCTURE MODELING FOR HANDWRITTEN CHINESE... 775

Fig. 9. The MRF-guided stroke extraction results. The first column is the corresponding MRF-based character models.

Page 10: Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition

Second, we update the labeling strength fti ðjÞ except,Pi f

1i ðjÞ ¼ 0, by the gradient qiðjÞ of the gain function until

t reaches the fixed number T in Algorithm 2.

Algorithm 2 Relaxation Labeling.

input: O ¼ fo1; . . . ;oIg, �.

output: F�, UðO;F�j�Þ.initialize: f1

i ðjÞ initial labeling,

KiðjÞ CONST1 � VC1ðoijj; �Þ � VC1

ðjj�Þ,Ki;i0 ðj; j0Þ CONST2 � VC2

ðoi;oi0 jj; j0; �Þ�VC2

ðj; j0j�Þ, 1 � j, j0 � J , 1 � i � I, i0 2 @i.1 begin

2 for t 1 to T do

3 for j 1 to J do

4 for i 1 to I do

5 qiðjÞ KiðjÞ þP

j0 maxi0 Ki;i0 ðj; j0Þfi0 ðj0Þ;

6 ftþ1i ðjÞ ¼

fti ðjÞqtiðjÞPifti ðjÞqtiðjÞ

;

7 end

8 end

9 end

10 for j 1 to J do

11 i arg maxifTi ðjÞ;

12 f�i ðjÞ 1,P

i f�i ðjÞ 1;

13 end

14 gðO;F�j�Þ P

j

Pi

�KiðjÞf�i ðjÞ þ

Pj0 maxi0

Ki;i0 ðj; j0Þ f�i ðjÞf�i0 ðj0Þ�;

15 UðO;F�j�Þ CONST � gðO;F�j�Þ;16 end

Finally, we assign the label j to the site i with themaximum fTi ðjÞ. This labeling assignment strategy ensuresthat each label j corresponds to only one site i.

Because the compatibility functions contain both prior andlikelihood information, the RL does not heavily depend on theinitial labeling. By normalization, we may view the labelingstrength fiðjÞas an approximation to the posterior probabilitypðjjoi; �Þused in the later generalized expectation-maximiza-tion (EM) algorithm for learning. The neighborhood systemdetermines the RL computational complexity. For thesimplest case, if each site has only one neighbor, the RL hastheOðJI2Þ complexity. For the most complex case, if each sitetakes all other sites as neighbors, the RL has theOðJI2 þ IJ2Þcomplexity.

At each site i, there is a candidate stroke. We set the initiallabeling, f1

i ðjÞ ¼ 1, for the candidate strokes extracted by thelabel jand zero for others in Algorithm 2. By stroke extraction,we also obtain the single-site compatibility KiðjÞ in (20). Inthis sense, the stroke extraction and the structural match havebeen put into the unified MRF framework in terms ofminimizing the joint energy (4). Hence, we only need tocalculateKi;i0 ðj; j0Þ in Algorithm 2 to search the best matchingcandidate stroke for each label j. As a result, each label isassigned to only one best candidate stroke. Some othermodel-based stroke extraction methods can be found in [2]and [11]. The advantage of the MRF-guided stroke extractionis that it uses the single-site likelihood potential to reduce thetotal number of candidate strokes for each label. In structuralmatch, two labels may be assigned to two overlappedcandidate strokes. In this case, we retain the label with thehigher single-site compatibility and assign the other label tothe candidate stroke that does not overlap with other labeledstrokes. The label is null if there are no such candidate strokes,

and the null labels are penalized according to (12). Thoseunlabeled candidate strokes,

Pj fiðjÞ ¼ 0, are regarded as

noise and penalized proportional to their length [2].

3.3 Learning

The learning algorithm includes three steps: setting upMRF prototypes, initializing MRFs parameters, and theEM parameter estimation.

First, we set up MRF prototypes for each category ofcharacters using the observation, O� ¼ fo�1; . . . ;o�Ig, from awell-segmented standard character, where the number ofsites I of standard characters equals the number of labels J ofthe MRF for each category. The initial mean vectors��j and��jj0in (7) and (8) are the unary and binary features oi and oii0 ,respectively. The initial covariance matrix �j and �jj0 in (7)and (8) are set as the diagonal matrix diagð�2

1; . . . ; �2dÞ due to

statistical independence. The initial conditional probabilityajj0 ¼ 1 if the two labels j and j0 are connected, otherwise,ajj0 ¼ 0. Thus, we get all initial information about charactermodels from the observed strokes of those well-segmentedstandard characters.

Second, for each training character image, we use the MRF-guided stroke extraction method to extract the observation setO. Suppose a set of training observations Or, 1 � r � R, isused to estimate the MRF parameters with Ms mixturecomponents. We use Algorithm 2 to assign labels to thetraining observation Or. The best labeling configuration F�implies an alignment of the observation with labels.Furthermore, we use the K-Means [26] algorithm to clusterthe observations associated with the same label into differentmixture components. As a result, every observation isassociated with a single unique mixture component. Thisassociation is represented by the indicator function

�ri ðjmÞ¼

1; if ori is with the mth mixture componentof the label j

0; otherwise:

8<:

ð22Þ

Therefore, the mean vector, covariance matrix, mixtureweight, and ajj0 of the single-site likelihood clique potentialcan be estimated via simple averages in Algorithm 3. Theestimation of pair-site likelihood clique potential is almostthe same except that we use the binary features oii0 and theindicator function

�rii0 ðjj0mÞ ¼

1; if ori ;ori0 is with the mth mixture

component of the label j; j0

0; otherwise:

8<: ð23Þ

Algorithm 3 Learning.input: Or¼for1; . . . ;orIg, 1 � r � R, O�¼fo�1; . . . ;o�Ig, Ms.output: � ¼ fajj0 ; ��jm; �jm; wjm; ��jj0m; �jj0m; wjj0mg.initialize: �r

i ðjmÞ, �rii0 ðjj0mÞ 0.

1 begin2 J , ��j, ��jj0 , �j, �jj0 , ajj0 O� ¼ fo�1; . . . ;o�Ig;3 for r 1 to R do4 UðOr;F�j�Þ, F� RLðOr; �Þ;5 if fri ðjÞ ¼ 1, fri0 ðj0Þ ¼ 1 then6 �r

i ðjÞ, �rii0 ðjj0Þ 1;

7 end8 Ajj0 total number of connected i and i0labeled

with connected j and j0;9 end

776 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 5, MAY 2008

Page 11: Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition

10 �ri ðjmÞ, �r

ii0 ðjj0mÞ k� meansðOr;�ri ðjÞ;

�rii0 ðjj0Þ;MsÞ, 8r;

11 ajj0 Ajj0R ;

12 ��jm PR

r¼1

PIr

i¼1�ri ðjmÞoriPR

r¼1

PIr

i¼1�ri ðjmÞ

;

13 ��jj0m PR

r¼1

PIr

i¼1�rii0 ðjj

0mÞorii0PR

r¼1

PIr

i¼1�rii0 ðjj

0mÞ;

14 �jm PR

r¼1

PIr

i¼1�ri ðjmÞðori���jmÞðori���jmÞ

0PR

r¼1

PIr

i¼1�ri ðjmÞ

;

15 �jj0m PR

r¼1

PIr

i¼1�rii0 ðjj

0mÞðorii0 ���jj0mÞðo

rii0 ���jj0mÞ

0PR

r¼1

PIr

i¼1�rii0 ðjj

0mÞ;

16 wjm PR

r¼1

PIr

i¼1�ri ðjmÞPR

r¼1

PIr

i¼1

PMs

m¼1�ri ðjmÞ

;

17 wjj0m PR

r¼1

PIr

i¼1�rii0 ðjj

0mÞPR

r¼1

PIr

i¼1

PMs

m¼1�rii0 ðjj

0mÞ;

18 end

Finally, we use the generalized EM algorithm to refine allMRFs parameters according to the ML criterion [26]. Given aset of training observations, the EM algorithm can iterativelyand automatically adjust parameters �� and � in themth mixture component of the MRF. After several iterations,the RL terminates and associates each observation oi withthe jth label and mth mixture by the labeling strength,fiðjmÞ 2 ½0; 1�, which represents the oi’s contribution tocomputing the ML parameter values for label j andmixture m. In other words, rather than assigning a label tothe specific site, we assign the label to each site in proportionto the labeling strength fri ðjmÞ. The EM algorithm is almostthe same as Algorithm 3, except that we use fiðjmÞ as theupdating weight. To get accurate character models, we needa large amount of training observations. When the numberof training observations is small, certain mixture compo-nents will have very few associated training observations, sothe variances or the corresponding mixture weight will bevery small. Hence, we delete those mixture componentsprovided that at least one component in that label is left. Dueto limited training observations, the covariance matrix maybe singular and irreversible. In this case, the EM algorithmupdates only mean vectors and leaves the covariance matrixunchanged.

4 EXPERIMENTAL RESULTS

We evaluated the MRF-based HCCR system on the KAIST [2]Hanja1 and Hanja2 databases. The Hanja1 database has783 classes with 200 samples for each class. The Hanja2database has 1,309 samples from real documents only for testpurposes. The Hanja1 image quality is good, but Hanja2 is

bad. Fig. 10 shows some typical samples in Hanja1 andHanja2.

Fig. 11 shows the MRF-guided stroke extraction and thestructural matching results. The first column shows the inputcharacter images. The second column shows the slant andmoment normalization of the character skeleton. The thirdcolumn shows the MRF-based character models, where thelabels are numbered. On the second column, the RL algorithmassigns the best labels to the extracted candidate strokes.

To validate different MRF-based character models, wetested their classification performance on Hanja1 and Hanja2databases. In our experiment, we selected 783 categories ofcharacters from Haja1 database as the recognition vocabu-lary. For each category, we used 10 samples of the evennumber for testing and the remaining 190 samples fortraining purposes. To evaluate the MRF-based charactermodel for cursive Chinese characters, we also used the testsamples from Hanja2 database. Tables 1 and 2 comparedifferent neighborhood systems and mixture components inthe MRF-based character model. The global neighborhoodsystem has a better performance than the connectedneighborhood system. Although more mixture componentslead to better results, some mixtures would be associated withfew observations for estimation due to limited trainingsamples. Furthermore, more mixture components do notenhance the performance very much in the global neighbor-hood system.

We used a Matlab implementation on a PC with 2 GHzCPU and 1 GB of memory. The average time on preprocessingis 0.003 seconds, and the stroke extraction (0.031 seconds) andthe RL algorithm (0.016 seconds) consume a total of0.047 seconds in the connected neighborhood system percharacter image. The global neighborhood system costs0.078 seconds in the RL algorithm per character image.Although the structural match with one character model isefficient, requiring less than a second in our implementation,practically, we have to repeat the structural match with allcategories of character models, such as 783 categories inHanja1 database, to recognize one input character image.When the number of categories increases, the total time cost torecognize one character image increases. Currently, there aretwo commonly adopted strategies to expedite the recognitionprocess. The first simultaneously uses several computers toperform the structural match with all the character models inparallel. The second is the hierarchical classification systemthat uses a fast algorithm to select a few candidate charactermodels and then performs the structural match between theinput strokes and these models to determine the best one.

We compared our method with the SCSM [2] and theattributed relational graph [11]. The SCSM used the first100 odd number of samples of each category for training, andthe first 10 samples of even number of samples for test onHanja1 database. By handling degraded region, the baselinerecognition rate was 98.45 percent [2]. The Hanja2 databasewas only used for test with a recognition rate 83.14 percent.

ZENG AND LIU: MARKOV RANDOM FIELD-BASED STATISTICAL CHARACTER STRUCTURE MODELING FOR HANDWRITTEN CHINESE... 777

Fig. 10. Samples in Hanja1 and Hanja2 databases.

Page 12: Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition

Based on the SCSM, a binary classifier was proposed todifferentiate the similar characters, which improved theoverall recognition rate from 98.45 percent to 99.46 percenton Hanja1 database. The attributed relational graph used thefirst 80 odd number of samples in Hanja1 for training and thefirst 20 samples of even number for testing. To compare withthe baseline recognizer, we did not specially differentiate thesimilar characters in [28]. Table 3 shows the comparison withthe SCSM and the attributed relational graph on Hanja1 andHanja2 databases. It compares the best result of the MRF-based character models (global neighborhood system withthree mixture components) with other recognizers. The

recognition rate of the MRF-based HCCR system on Hanja1

was 1.04 percent and 0.48 percent higher than those reported

in [2], [11], respectively, though we used more training

samples. When the recognition rate is over 95 percent, even

one percentage improvement of the recognition rate is

significant. In addition, the recognition rate on Hanja2

database was 1.81 percent higher than that reported in [2].

Since we did not specially design and optimize clique

potentials to differentiate similar characters as that in [28]

did, the overall recognition rate is 0.53 percent lower than that

reported in [28].Fig. 12 shows some misclassified samples caused by two

reasons: 1) character image degradation and 2) similar

characters. The first may be alleviated by proper preproces-

sing and pseudostrokes [2]. The second can be solved by

optimizing the weights of the clique potentials to emphasize

the subtle difference between similar characters [28].

778 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 5, MAY 2008

Fig. 11. The MRF-guided stroke extraction and structural matching results.

TABLE 1Comparison of MRF-Based Character Models on Hanja1

TABLE 2Comparison of MRF-Based Character Models on Hanja2

TABLE 3Recognition Rate Comparison on KAIST Database

Page 13: Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition

5 CONCLUSIONS

The psychological theory of RBC [1] proposes a structuralrepresentation of objects for recognition and supports therecent research on the structural methods for HCCR problems[3]. The pattern theory of perception [24], [25] suggests thegraphical model as a general pattern representation methodbecause it has an excellent expressive power to encode thefeature statistics, as well as the geometric relations throughthe underlying graph structures. Thus, pattern recognitionproceeds by a graph matching process that searches the beststructural match between the observations and the graphicalmodels. Based on both theories, this paper has improved thetraditional structural method using statistically well-foundedMRFs. This new strategy can represent Chinese characterstructures in terms of three issues:

1. The neighborhood system that accounts for the mostimportant stroke relationships.

2. The clique potentials that measure the similaritybetween the input strokes and the stroke models.

3. The structural matching algorithm that searches thebest labeling configuration.

Correspondingly, we have proposed 1) the global and theconnected neighborhood systems, 2) the clique potentialsderived from the GMM and prior knowledge, and 3) theRL algorithm.

Indeed, the SCSM [2] and the attributed relational graph[11] can be considered within the MRF framework. Forexample, in [2], 1) the neighborhood system was defined bythe KL divergence; 2) the Gaussian joint distribution ofstrokes and models was used as the matching distance; and3) the heuristic search algorithm found the optimal match. In[11], 1) the neighborhood system was manually defined bycategorizing strokes into several relational types; 2) thematching distance between the input strokes and the modelswas designed heuristically; and 3) the matching algorithmwas a heuristic search strategy. Compared with abovemethods, the MRF-based SCSM is theoretically well foundedand provides us with many choices to design the neighbor-hood system and clique potentials. Another major differencelies in that we design the pair-site likelihood clique potentialfor binary features to encode stroke relationships statisticallyfrom training samples. Furthermore, the RL algorithm is a fastparallel minimizer of the energy function for the structuralmatch, which may be more robust than the heuristic searchstrategies in [2], [11].

Based on 2D Gabor filters, the stroke directional featurehas a cyclic representation. After normalization, the direc-tion, position, and length of the stroke can be put into avector with dimensionality d ¼ 11.

We have built the MRF-based HCCR system that canreliably extract candidate strokes and systematically estimate

parameters from training samples. Recent psychologicalresults [34] support that the higher-level object representationcan feed back to guide the image segmentation process.Hence, we propose the MRF-guided stroke extractionmethod to extract all possible candidate strokes. Some otherHCCR systems [2], [11] have also adopted the model-basedstroke extraction strategies, which exhaustively search for allpossible substroke concatenations. Here, we improve it by thegreedy search based on the single-site likelihood cliquepotential. Further, the RL algorithm uses the pair-site cliquepotentials to determine the best structural match. In thissense, we incorporate the stroke extraction process intooptimizing the MRF-based character model. The recognitionrate on the Hanja1 database was 0.48 percent and 1.04 percenthigher than those reported in [2], [11], and the recognition rateon the Hanja2 database was 1.81 percent higher than thatreported in [2]. The performance of the MRF-based HCCRsystem demonstrates the effectiveness of MRFs for statistical-structural character modeling and recognition.

REFERENCES

[1] I. Biederman, “Recognition-by-Components: A Theory of HumanImage Understanding,” Psychological Rev., vol. 94, no. 2, pp. 115-147, Apr. 1987.

[2] I.-J. Kim and J.-H. Kim, “Statistical Character Structure Modelingand Its Application to Handwritten Chinese Character Recogni-tion,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25,no. 11, pp. 1422-1436, Nov. 2003.

[3] C.-L. Liu, S. Jaeger, and M. Nakagawa, “Online Recognition ofChinese Characters: The State-of-the-Art,” IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 26, no. 2, pp. 198-213, Feb. 2004.

[4] N. Kato, M. Suzuki, S. Omachi, H. Aso, and Y. Nemoto, “AHandwritten Character Recognition System Using DirectionalElement Feature and Asymmetric Mahalanobis Distance,” IEEETrans. Pattern Analysis and Machine Intelligence, vol. 21, no. 3,pp. 258-262, Mar. 1999.

[5] C.-L. Liu and M. Nakagawa, “Evaluation of Prototype LearningAlgorithms for Nearest-Neighbor Classifier in Application toHandwritten Character Recognition,” Pattern Recognition, vol. 34,no. 3, pp. 601-615, 2001.

[6] Y.Y. Tang, L.-T. Tu, J. Liu, S.-W. Lee, and W.-W. Lin, “Off-LineRecognition of Chinese Handwriting by Multifeature and Multi-level Classification,” IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 20, no. 5, pp. 556-561, May 1998.

[7] P.-K. Wong and C. Chan, “Off-Line Handwritten ChineseCharacter Recognition as a Compound Bayes Decision Problem,”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 9,pp. 1016-1023, Sept. 1998.

[8] D. Shi, S.R. Gunn, and R.I. Damper, “Handwritten ChineseRadical Recognition Using Nonlinear Active Shape Models,” IEEETrans. Pattern Analysis and Machine Intelligence, vol. 25, no. 2,pp. 277-280, Feb. 2003.

[9] X. Wang, X. Ding, and C. Liu, “Gabor Filters-Based FeatureExtraction for Character Recognition,” Pattern Recognition, vol. 38,no. 3, pp. 369-379, 2005.

[10] X. Huang, J. Gu, and Y. Wu, “A Constrained Approach to MultifontChinese Character Recognition,” IEEE Trans. Pattern Analysis andMachine Intelligence, vol. 15, no. 8, pp. 838-843, Aug. 1993.

ZENG AND LIU: MARKOV RANDOM FIELD-BASED STATISTICAL CHARACTER STRUCTURE MODELING FOR HANDWRITTEN CHINESE... 779

Fig. 12. The misclassified samples.

Page 14: Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition

[11] C.-L. Liu, I.-J. Kim, and J.H. Kim, “Model-Based Stroke Extractionand Matching for Handwritten Chinese Character Recognition,”Pattern Recognition, vol. 34, no. 12, pp. 2339-2352, 2001.

[12] H.Y. Kim and J.H. Kim, “Hierarchical Random Graph Representa-tion of Handwritten Characters and Its Application to HangulRecognition,” Pattern Recognition, vol. 34, no. 2, pp. 187-201, 2001.

[13] R.G. Cowell, A.P. Dawid, S.L. Lauritzen, and D.J. Spiegelhater,Probabilistic Networks and Expert Systems. Springer, 1999.

[14] Markov Random Fields: Theory and Application, R. Chellappa andA. Jain, eds. Academic Press, 1993.

[15] S. Geman and D. Geman, “Stochastic Relaxation, Gibbs Distribu-tion and the Bayesian Restoration of Images,” IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 6, no. 6, pp. 721-741, 1984.

[16] S.Z. Li, Markov Random Field Modeling in Image Analysis. Springer,2001.

[17] J.M. Hammersley and P. Clifford, Markov Field on Finite Graphs andLattices, unpublished, 1971.

[18] Q. Wang, Z. Chi, D. Feng, and R. Zhao, “Hidden Markov RandomField Based Approach for Offline Handwritten Chinese CharacterRecognition,” Proc. Int’l Conf. Pattern Recognition, vol. 2, pp. 347-350, 2000.

[19] Y. Xiong, Q. Huo, and C. Chan, “A Discrete Contextual StochasticModel for the Offline Recognition of Handwritten ChineseCharacters,” IEEE Trans. Pattern Analysis and Machine Intelligence,vol. 23, no. 7, pp. 774-782, July 2001.

[20] K.-W. Kang and J.H. Kim, “Utilization of Hierarchical, StochasticRelationship Modeling for Hangul Character Recognition,” IEEETrans. Pattern Analysis and Machine Intelligence, vol. 26, no. 9,pp. 1185-1196, Sept. 2004.

[21] S.C. Zhu, Y.N. Wu, and D. Mumford, “Minimax Entropy Principleand Its Application to Texture Modeling,” Neural Computation,vol. 9, no. 8, pp. 1627-1660, 1997.

[22] S.Z. Li, H. Wang, and K.L. Chan, “Minimization of MRF Energywith Relaxation Labeling,” J. Math. Imaging and Vision, vol. 7, no. 2,pp. 149-161, 1997.

[23] M. Weber, M. Welling, and P. Perona, “Unsupervised Learning ofModels for Recognition,” Proc. European Conf. Computer Vision,pp. 18-32, 2000.

[24] Pattern theory: From Representation to Inference, U. Grenander andM.I. Miller, eds. Oxford Univ. Press, 2007.

[25] D. Mumford, “Pattern Theory: The Mathematics of Perception,”Proc. Int’l Congress of Math., pp. 401-422, 2002.

[26] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, seconded. John Wiley & Sons, 2001.

[27] J. Zeng and Z.-Q. Liu, “Markov Random Fields for HandwrittenChinese Character Recognition,” Proc. Int’l Conf. DocumentAnalysis and Recognition, pp. 101-105, 2005.

[28] I.-J. Kim and J.-H. Kim, “Pair-Wise Discrimination Based on aStroke Importance Measure,” Pattern Recognition, vol. 35, no. 10,pp. 2259-2266, 2002.

[29] C.-L. Liu, K. Nakashima, H. Sako, and H. Fujisawa, “HandwrittenDigit Recognition: Investigation of Normalization and FeatureExtraction Techniques,” Pattern Recognition, vol. 37, no. 2, pp. 265-279, 2004.

[30] H.-H. Chang and H. Yan, “Analysis of Stroke Structures ofHandwritten Chinese Characters,” IEEE Trans. Systems, Man, andCybernetics B, vol. 29, no. 1, pp. 47-61, Feb. 1999.

[31] X.C. He and N.H.C. Yung, “Curvature Scale Space Corner Detectorwith Adaptive Threshold and Dynamic Region of Support,” Proc.Int’l Conf. Pattern Recognition, vol. 2, pp. 791-794, 2004.

[32] Y.-M. Su and J.-F. Wang, “A Novel Stroke Extraction Method forChinese Character Using Gabor Filters,” Pattern Recognition,vol. 36, no. 3, pp. 635-647, 2003.

[33] J. Zeng and Z.-Q. Liu, “Stroke Segmentation of Chinese CharactersUsing Markov Random Fields,” Proc. Int’l Conf. Pattern Recogni-tion, pp. 868-871, 2006.

[34] S.P. Vecera and M.J. Farah, “Is Visual Image Segmentation aBottom-Up or an Interactive Process,” Perception and Psychophysics,vol. 59, no. 8, pp. 1280-1296, 1997.

Jia Zeng received the BEng degree in electricalengineering from Wuhan University of Technol-ogy, P.R. China, in 2002 and the PhD degreefrom the School of Creative Media, City Uni-versity of Hong Kong, P.R. China, in 2006. In2003, he was a research assistant in the Centerfor Media Technology, School of CreativeMedia, City University of Hong Kong. He iscurrently a rsearch fellow in the Department ofElectronic Engineering, City University of Hong

Kong. His research interests are graphical models, pattern recognition,and bioinformatics. He was awarded first place and second place at the2005 and 2006 IEEE Region 10 Postgraduate Student Paper Competi-tion, respectively. He is a member of the IEEE.

Zhi-Qiang Liu received the MASc degree inaerospace engineering from the Institute forAerospace Studies, University of Toronto andthe PhD degree in electrical engineering from theUniversity of Alberta, Canada. He is currentlywith the School of Creative Media, City Uni-versity of Hong Kong. He has taught computerarchitecture, computer networks, artificial intelli-gence, programming languages, machine learn-ing, pattern recognition, computer graphics, and

art and technology. His research interests include neural-fuzzy systems,machine learning, media computing, and computer vision.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

780 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 5, MAY 2008