Transcript mapping for handwritten Chinese documents by integrating character recognition model and geometric context

Download Transcript mapping for handwritten Chinese documents by integrating character recognition model and geometric context

Post on 14-Dec-2016




0 download


introf ScKeywords:Transcript mappingMinimum classication error (MCE)Geometric contexte dt aingofionhe aof annotation, is formulated as an optimization problem by incorporating geometric context of charactersand character recognition model. We evaluated the alignment performance on a Chinese handwritingdatabase CASIA-HWDB, which contains nearly four million character samples of 7356 classes and 5091h has bthods hard datin systg userened on labeled data. By ground-truthing, the text regions, textweendocu-matic(onender-equal to the total number of characters in the transcript. Therefore, aContents lists available at SciVerse ScienceDirectw.Pattern RecPattern Recognition 46 (2013) 28072818restriction [830]. For printed document images (C.-L. Liu).simple linear alignment does not work.In recent years, many efforts have been devoted to this difcultproblem, and some ground-truthing tools have been developed forautomatic annotation of document images generated with less0031-3203/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. Corresponding author. Tel.: +86 10 8261 4560.E-mail addresses: (F. Yin), (Q.-F. Wang),of labeled data for training. On the other hand, the performance ofsegmentation and recognition should been again evaluated andsegmentation (two or more characters merged into one segment),thus the total number of segmented character images usually is notreleased recently by the authors [7]. Therefore, ground-truthingdocument images, i.e., annotating the regions, text lines, wordsand characters, become a prerequisite for handwritten Chinesedocument analysis research. On the one hand, the design ofsegmentation and recognition algorithms needs a large numberChinese handwritten documents, there is no extra gap betwords than characters (Fig. 1 shows an example of Chinesement image and its character level transcript mapping). Autosegmentation may result in errors of over-segmentationcharacter segmented into two or more segments) and ubased algorithms for handwritten document recognition andretrieval, large datasets of unconstrained handwritten documentsbecome more demanding. However, to the best of our knowledge,no ofine handwritten Chinese document datasets ground-truthed to text lines and characters have existed except the onethe transcript available (especially for character level alignment),the alignment is not a trivial problem. First of all, the characterscannot be reliably segmented because the size and position ofcharacters are variable and the strokes are often touchingand overlapping in handwritten documents. For unconstrained1. IntroductionHandwriting recognition researcthan 40 years and many effective me6]. For all the research works, standdocument images play crucial rolestion. Particularly, with the increasinpages of unconstrained handwritten texts. The experimental results demonstrate the superiority ofrecognition-based text line alignment and the benet of integrating geometric context. On a test set of1015 handwritten pages (10,449 text lines), the proposed approach achieved character level alignmentaccuracy 92.32% when involving under-segmentation errors and 99.04% when excluding under-segmentation errors. The tool based on the proposed approach has been practically used for labelinghandwritten Chinese documents.& 2013 Elsevier Ltd. All rights reserved.een pursued for moreave been proposed [1asets of characters andem design and evalua-of statistical learning-lines and words/characters need to be segmented and tagged veryaccurately. Unfortunately, such ground-truth was typically createdmanually and therefore its creation is tedious, expensive andprone to human errors. To a large extent, automatic annotationtools can alleviate above difculties.Aligning text line images with their text transcripts is thecrucial step of handwriting annotation. However, even withDynamic time warping (DTW)Condence transformation (CT)Transcript mapping for handwritten Chcharacter recognition model and geomeFei Yin n, Qiu-Feng Wang, Cheng-Lin LiuNational Laboratory of Pattern Recognition, Institute of Automation, Chinese Academya r t i c l e i n f oArticle history:Received 20 October 2012Received in revised form12 February 2013Accepted 18 March 2013Available online 28 March 2013a b s t r a c tCreating document imagprerequisite for documenlaborious and time consumand the large variabilityrecognition-based annotatthe Bayesian framework, tjournal homepage: wwese documents by integratingic contextiences, 95 Zhongguan East Road, Beijing 100190, PR Chinaatasets with ground-truths of regions, text lines and characters is analysis research. However, ground-truthing large datasets is not onlybut also prone to errors due to the difculty of character segmentationcharacter shape, size and position. This paper describes an effectiveapproach for ground-truthing handwritten Chinese documents. Underlignment of text line images with text transcript, which is the crucial parameters optimization and extended experimental resultsF. Yin et al. / Pattern Recognition 46 (2013) 280728182808by text editing, printing and scanning, the text description(transcript) can be matched with the scanned image to generateground-truth data automatically [813]. Handwritten documentimages can be similarly matched with the transcript (existinga priori or edited a posteriori), but the matching process is muchmore complicated due to the irregularity of document layout andwritten word/character shapes [14]. Usually, for handwrittendocuments, a dynamic search algorithm optimizing a match scorebetween the text line image and its transcript is used to align thesequence of image segments and the words/characters. Thesemethods can be roughly categorized into two groups dependingon whether word/character recognition models are used or not.In the rst group without recognition models, outline features areextracted from text line image for transcript alignment [1523].Recognition models can better measure the similarity betweenimage segment and word/character and thus can improve thealignment accuracy. The methods in [2426] use characterprototype-based word models, the ones in [2730] use HMM-based word recognizers. Taking into account the salient effects ofcharacter recognizers in word/string recognition, the recognizerlargely benets the alignment of text lines.The alignment accuracy is still not sufcient despite the promiseof recognition models. In Chinese documents, the mixed alphanu-meric characters and punctuation marks are prone to segmentationand labeling errors because they have distinct geometric features ofsize, aspect ratio and position in text line. The misalignment ofChinese characters is mainly due to character touching and the gapswithin characters composed of multiple radicals. Fig. 2 shows typicalannotation errors caused by a punctuation mark and a radical ofChinese character. Geometric context features would be helpful toreduce such errors.Geometric context has been used in handwriting recognition toreduce character segmentation and recognition errors, by usingvarious geometric features (such as character size, inter-characterand between-character gaps) and statistical models (geometricFig. 1. An example of transcript mapping. (a) Text line image and its transcript.(b) Character level alignment.class means, Gaussian density models, discriminative classiers,etc.) for handwritten English word and Japanese character stringrecognition [3135]. These methods cannot sufciently describe thegeometric features of handwritten Chinese characters. For example,the peripheral features in [32] are inadequate to grasp the complexstructures and shapes of Chinese characters; in [3335], somestroke based features were designed for online handwrittencharacters, these features are not readily applicable for ofinecharacters.This paper describes a practical annotation system for ground-truthing handwritten Chinese documents. Unlike most previousworks, we integrate geometric context to improve the perfor-mance of text alignment in Chinese handwriting annotation.We use four statistical models to evaluate the single-charactergeometric features and between-character relationships. Underthe Bayesian framework, the geometric models are combined withand discussions. The rest of this paper is organized as follows.Section 2 reviews the related works; Section 3 describes theoutline of our annotation system; Section 4 formulates the textline alignment problem and describes the technical details;Section 5 describes the geometric models; Section 6 presents theexperimental results. Concluding remarks are offered in Section 7.2. Related worksThere have been many efforts in developing methods and systemsfor minimizing the human labors in document images ground-truthing. The published methods can be roughly categorized intothree classes: geometric transformation-based, word matching-basedand recognition-based methods.Geometric transformation-based methods are mainly designedfor printed document alignment. The underlying idea is to estimatea global geometric transformation and then perform a robust localbitmap match between the text description image (generated usinga word-processing tool) and document image (printed, photocopied,or scanned from text description). For such matching, Kanungo andHaralick [8,9], Zi and Doermann [10] used four corner points of theimages as corresponding pairs to nd a linear transformation. Thesemethods are not robust when an image part is missing or therethe character recognizer to evaluate the text-to-handwriting match-ing score and the best match searched by dynamic time warping(DTW) gives the alignment result (character segmentation andlabeling). The combining parameters are optimized by string-leveltraining and simplex search. In our system, we also provide properpre-processing (text line segmentation and character over-segmen-tation) and post-processing modules for guaranteeing high accuracyof text line and character segmentation and labeling. After annota-tion, a document image is represented as a sequence of text lines,each line as an ordered sequence of connected components, whichare partitioned into characters. Such data can be used for design andevaluation of text line segmentation, character segmentation andrecognition algorithms, etc.This paper is an extension to our conference papers [36,37].The extension is in several respects: Bayesian formulation ofalignment problem, condence transformation (CT), more detailsFig. 2. Annotation errors caused by punctuation mark (red box) and radical ofChinese character (green box). (For interpretation of the references to color in thisgure caption, the reader is referred to the web version of this article.)are extra feature points. Hobby [11] improved this method byconsidering all feature points and using a direct search optimizationalgorithm [38] to search for the afne transform parameters byminimizing the matching cost between the text description and thecharacter bounding boxes in the image. This method, however,suffers from heavy computational burden and local minimum. Kimand Kanungo [12,13] improved the method by grouping the featurepoints to reduce the complexity and employing a modied attrib-uted branch-and-bound matching algorithm to guarantee globalminimum. Though these methods do not consider the deformationof layout and character shapes, Viard-Gaudin [14] et al. tried thismethodology to create ground truth for handwritten documents.They designed a database of online and ofine handwritten data bymanually locating corresponding points in online and ofinedomain, but the matching process is much complex.Annotating handwritten document images is usually addressedby formulating the matching between text line image and itstranscription as an alignment of two ordered sequences. Dependingon whether word/character recognition models are used or not,these methods also can be grouped into word matching-basedmethods and recognition-based methods.In word matching-based methods, Zinger et al. [15] noticedthat the relative lengths of ASCII and handwritten words arehighly correlated. They proposed to nd the word level alignmentby sequentially adjusting the word image boundaries from right toleft with the cost function based on relative length. Stamatopouloset al. [16] adopted a greedy search algorithm which recursivelyseparates the text line according to the between-component gapsuntil the word number equals that of the transcription. In theapproach of Korneld et al. [17,18], the document was automati-cally segmented into a list of word images, which were matchedwith text using DTW based on global geometric properties (word-box features) extracted from both the handwritten word imageto improve the performance by combining the alignment resultsgiven by multiple different HMM-based word recognizers. Recently,Toselli et al. [30] also combined the human transcriber and a HMMbased handwritten text recognition system to generate the tran-scription of text images.Our proposed approach uses a character recognizer for hand-written Chinese text lines alignment. Character-level alignment byDTW is suitable for Chinese documents because of the largecategory set (over 5000 characters are frequently used) andthe rich shape information of single characters. We also utilizegeometric context information to improve the alignment accuracy.HMM is an effective model for text line alignment and has beenwidely used to annotate the western text document databases.However, for handwritten Chinese text recognition, HMM has notshown superior performance (e.g., only 39% character correct ratewas reported in [40]). The proposed DTW-based algorithm issimpler than HMM in implementation. On the other hand, thestate sequence decoding process in HMM is similar to DTW, bothfollow the principle of dynamic programming.of thF. Yin et al. / Pattern Recognition 46 (2013) 28072818 2809and the transcription with a special font. Except using differentmatching features and replacing transcript with synthesized hand-written image based on writer's handwriting model in [21,22],the DTW was similarly employed in [1922]. The hidden Markovmodel (HMM) is another effective tool to solve the alignmentproblem. It was used in the work of Rothfeder et al. [23], where theword images were treated as the hidden variables, and the HMMmodels the probability of observing the word image given theword text. The Viterbi algorithm was used to decode the sequenceof word images.Among the recognition-based methods, the one of Tomai andZhang et al. [24,25] formulates word alignment as an optimizationproblem involving multiple word segmentation hypotheses andword recognition, and uses dynamic programming to nd wordalignment in several coarse-to-ne stages. The word recognizerprovides lexicon words as well as the character boundaries alignedwith a word. The extended work of Huang and Srihari [26] used asimilar approach. Zimmermann and Bunke [27] proposed anautomatic word segmentation method with HMM-based wordrecognizer for cursive handwritten text lines alignment, in whichthe word HMM was concatenated by character models, and theword models were concatenated to align with the transcription oftext line. It worked very well on the IAM off-line datasets.A similar method was employed in [28]. Liwicki et al. [29] proposedFig. 3. Block diagramFig. 4. An example of input document image and its corresponding transcriptie annotation system.3. System overviewThe block diagram of our Chinese handwritten documentannotation system is shown in Fig. 3. The input to the system isa handwritten document image and its corresponding transcript(examples given in Fig. 4), the output is the ground-truth data(annotated image). First, the number of text lines and the numberof characters in every line are detected from the transcript of thedocument. The document image is segmented into text linesguided by given number, and then each text line image is alignedwith its transcript for character segmentation and labeling. In ourpractical tool, line segmentation and alignment are automatic andthe remaining errors can be corrected by human intervention.In this paper, however, we focus on the automatic alignment andthe manual corrections are not involved in the performanceevaluation.Separation of text lines in handwritten documents is nota trivial task because the text lines are often un-uniformly skewedand curved, and the inter-line space is not prominent. We havedesigned a text line segmentation algorithm based on minimalspanning tree (MST) clustering with distance metric learning [39],in which the connected components (CCs) are grouped by MSTinto a tree structure under a learned metric, and then the edges ofon. (a) Input document image. (b) Transcription for the document image.the tree are dynamically cut by using a new hypervolume reduc-tion objective to get the nal text lines. This algorithm performsvery well, but is not perfect. For example, sometimes a line is splitinto several sub-lines or several lines are merged into one. There-fore, we designed a series of post-processing operations to correctmis-segmented text lines manually. The design of human correc-tions is trivial and thus is not detailed in this paper.After separating the document image into the same number of textlines as the text transcript, each text line image is to be segmented intocharacters and aligned with its corresponding transcript. Since thecharacters cannot be reliably segmented before recognition, we solveit using the over-segmentation strategy, which can take advantageof the character shape, overlapping and touching characteristics tobetter separate the characters at their boundaries. The result of over-segmentation is a sequence of primitive segments, each correspondingto a character or a part of a character, such that candidate characterscan be generated by concatenating consecutive segments. We use theover-segmentation technique of [41], where connected components(CCs) that are wide enough or have a large width to height ratio areexamined to split at single-touched valley, stroke crossing or longligature strokes. An example of over-segmentation is shown in Fig. 5.After over-segmentation, the ordered sequence of primitivesegments of each text line is dynamically partitioned into char-acter patterns by matching with the character recognition model.The optimal match can be found by dynamic programming (DP)search (also called dynamic time warping (DTW) or Viterbialgorithm in special contexts) to minimize an edit distance, andthe alignment result largely depends on the edit costs dened forsegment-to-character match. The character recognizer measuresthe shape similarity (or distance) between a candidate patterni 1pF. Yin et al. / Pattern Recognition 46 (2013) 280728182810Fig. 5. A text line image and its primitive segments after over-segmentation.Fig. 6. An example of text line alignment. (For interpretation of the references tocolor in this gure caption, the reader is referred to the web version of this article.)where n is the character number of transcript, zi 1 indicates thatthe i-th candidate pattern is a valid character, and zgi 1 indicatesthat the gap between the (i1)-th and i-th candidate patterns is avalid between-character gap; guii and gbii denote the class-independent geometric features extracted from the i-th candidatepattern, and from the pair of the (i1)-th and i-th candidatepatterns, respectively. The two probabilistic terms in (3) correspondto the unary and binary class-independent geometric models(see Section 5.2).The likelihood function PXsjT can be decomposed into theproduct of character-dependent terms since the segmentation Xsis a sequence of candidate character patternsPXsjT npxsi jtipguci jtipgbci jti1ti; 4(composed of one or more consecutive primitive segments) and acharacter, which is desired to assign high score to correct characterclasses and low scores to incorrect classes. On a non-characterpattern, the classication score is desired to be low for allcharacter classes [42]. We use geometric models to measure thesimilarity of character outline and between-character compatibility.The best alignment, corresponding to a path in a grid space, issearched for by DTW. Fig. 6 shows an example, where a string ofseven characters is aligned with 10 primitive segments, which arecorrectly segmented and labeled by the path of the red line. Afteralignment, mis-segmentation and mis-labeling of characters (suchas the path of blue line in Fig. 6, and such errors are inevitable) canalso be corrected manually in post-processing [36].4. Text line alignmentText line alignment is the crucial step of document annotation.In this section, we rst give a statistical formulation of thisproblem and a DTW-based solution, then describe the techniquesfor the involved condence transformation (CT) and parametersoptimization.4.1. Problem formulationWe formulate the problem of text line alignment from Bayesiandecision view so as to integrate geometric context and characterrecognition model in a principled way. Under the 0/1 loss, theoptimal criterion for alignment is to maximize the posteriorprobability of the character segmentation given the text line image(X) and its transcript containing n characters (T t1tn). Thisposterior probability can be formulated byPsjX; T Ps; T jXPT PsjXnPT jX; sPT PsjXnPT jXsPT PsjXnPXsjTPXs ; 1where Xs denotes the sequence of candidate characters corre-sponding to segmentation s. Hence, the optimal segmentation isdecided bysn arg maxjsj jT jPsjXn PXsjTPXs ; 2where the segmentation is also constrained to have the samelength as the transcript (jsj jT j).In formulation (2), PsjX denotes the posterior probability ofthe s-th segmentation path given the text line image X. It can bedecomposed intoPsjX npzpi 1jguii pzgi 1jgbii ; 3i 1between a partial string ft1tig and partial image fx1xjg. Di; jcan be updated from the preceding partial alignments byDi; j minkiDi; jki penaltyxjki1xj ;Di1; j penaltyti;Di1; jki ;8>:20where penalityxjki1xj is the cost of deleting primitivesegments xjki1xj, and penalityti is the cost of skippingcharacter ti. Moreover, the term and are dened asp xjki1xj g xjki ;xjki1F. Yin et al. / Pattern Recognition 46 (2013) 28072818 2811where xsi , guci and gbci denote the shape features used in characterrecognition, unary and binary geometric features (see Section 5.1),respectively. Similarly, we can decompose pXs aspXs ni 1pxsi pguci pgbci : 5Combining (4) and (5), PXsjT=PXs is obtained byPXsjTPXs ni 1pxsi jtipxsi pguci jtipguci pgbci jti1tipgbci : 6Since the probabilities pxsi , pguci and pgbci are not trivial toestimate, we convert (6) toPXsjTPXs ni 1ptijxsi p1tiptijguci p2tipti1tijgbci p3ti1ti; 7where the three posterior probabilities can be approximated bycondence transformation of classier outputs (see Section 4.3),and the three corresponding prior probabilities p1ti, p2ti andp3ti1ti are viewed as constants in classier design, denoted byp1, p2, p3, respectively (p1 and p2 are different due to characterclass clustering in geometric model, see Section 5.3).Combining (3) and (7), the optimal segmentation of (2) isgiven bysn arg maxjsj jTj n1Pnni 1pti xsi pti guci pti1tijgbci pzpi 1jguii pzgi 1jgbii ; 8where P p1p2p3. Though the formulation (8) approximates theposterior probability of optimal segmentation fairly well, it is stillinsufcient because the geometric models and character recogni-tion model do not always meet the assumptions. To consider theeffects of different models and achieve a better performance, wetake the logarithm of probability and incorporate the weights ofdifferent models to get a generalized likelihood function f Xs; Tfor segmentation path evaluationf Xs; T 0 ni 1log ptijxsi 1 ni 1log ptijguci 2 ni 1log pti1tijgbci 3 ni 1log pzpi 1jguii 4 ni 1log pzgi 1jgbii ; 9where the term ni 1log 1=P has been omitted because it is aconstant for all segmentation paths given T. Hence, the optimalsegmentation can be dened assn arg maxjsj jTj nf Xs; T: 104.2. String alignment with dynamic time warpingAfter over-segmentation, a text line image is represented asa sequence of primitive segments ordered from left to rightX fx1; x2;; xmg; 11where xj is the j-th segment and m is the number of primitivesegments in this line. A primitive segment is a character or a partof a character and the concatenation of adjacent segments formcandidate character patterns. The transcript T is a character stringT ft1; t2;; tng; 12where tj is the j-th character and n is the number of charactersin the transcript (nm). An example of text line image and its texttranscript is shown in Fig. 7, where seven characters are to bealigned with 10 segments.A possible mapping Xs between a text line image X and itscorresponding transcript T is dened asXs ft1; xs1;; ti; xsi ;; tn; xsng ft1; x1xk11;; ti; xjki1xj;g; 13where xjki1xj is an ordered sequence of primitive segments,which are concatenated into a character pattern xiS to match withcharacter ti. ki is the number of primitive segments that form thecandidate character pattern for ti.The denition of alignment in (13) corresponds to a path frombottom left to top right in the grid of Fig. 6. The alignment cost in(9) can be rewritten asf Xs; T 4h 0h Fh; 14where h h 0;;4 are the weight coefcients, F0 is the char-acter recognition score, F1, F2, F3 and F4 are four geometric modelscores, respectively. Each term is the sum of scores over thesegmentation pathF0 ni 1log ptijxsi ni 1f 0tijxjki1xj; 15F1 ni 1log ptijguci ni 1f 1tijgxjki1xji ; 16F2 ni 2log pti1tijgbci ni 2f 2ti1tijgxpxjki ;xjki1xj; 17F3 ni 1log pzpi 1jguii ni 1f 3zpi 1jgxjki1xj ; 18F4 ni 2log pzgi 1jgbii ni 2f 4zgi 1jgxjki ;xjki1; 19where f 0, f 1, f 2, f 3, f 4 are the logarithms of condence ofcharacter recognition, class-dependent single character and between-character geometry, class-independent single character and between-character geometry, respectively. Note that the class-independentbetween-character geometry (f4) considers two adjacent primitivesegmentsonly,while the class-dependentbetween-character geometry(f2) considers two adjacent candidate characters.As the alignment cost is the summation of multiple stages, theoptimization problem (10) can be solved by searching for theminimum negative cost path (f Xs; T) with DTW. To do this, wedene Di; j as the accumulated cost of optimal alignmentFig. 7. An example of text line image and its text transcript. 3nf 3zi 1jg 4nf 4zi 1jg ; 21F. Yin et al. / Pattern Recognition 46 (2013) 280728182812 0nf 0tijxjki1xj 1nf 1tijgxjki1xj 2nf 2tprei tijgxpxjki ;xjki1xj3nf 3zpi 1jgxjki1xj 4nf 4zgi 1jgxjki ;xjki1; 22where tprei represents the preceding non-skipped character of ti.That is, tprei tl, l i1 if ti1 is not skipped, otherwise recursivelyset l l1 until tl is not skipped.The DTW search starts with D0;0 0, then for i 1;;n, andj 1;;m, Di; j are iteratively updated according to (20) and theoptimal number ki (number of primitive segments concatenated)is stored for (i,j). Finally, Dm;n gives the total cost of optimalalignment. According to our over-segmentation technique, weallow a candidate pattern to be formed by at most four primitivesegments (namely, 1ki 4).After we get the optimal alignment, the partition of primitivesegments can be retrieved by backtracking ki from (m,n) to thestart. This partition gives the segmentation of text line image intocharacters.4.3. Condence transformationFor probabilistic fusion of classier outputs, the transformedcondence measures are desired to approximate the class poster-ior probability pjx ( refers to a class and x is the featurevector). Though the posterior probability can be directly obtainedby the Bayes formula given a priori probability and the conditionalprobability density of each class, the probability density functionsare not trivial to estimate. Instead, there are many ways toapproximate the posterior probability from classier outputs, suchas sigmoidal function, soft-max function and DempsterShaftertheory of evidence [43]. The formulation (9) shows that theposterior probabilities indicate whether a candidate character orgeometric feature belongs to a specic class or not. Therefore, thesigmoidal function (23), which is often taken for binary posteriorprobability, is a good choice for our problempjjx expdjx 1 expdjx ; j 1;;M; 23where M is the number of dened classes considered by theclassier, djx is the dissimilarity score of class j, and are thecondence parameters.We optimize the condence parameters by minimizing thecross entropy (CE) loss function (24), which is commonly used inlogistic regression and neural network training. On a validationdataset (preferably different from the dataset for training theclassier) of N samples, the CE ismin CE Nn 1Mj 1tnj log Pj 1tnj log1Pj; 24where Pj Pjjx, tnj cn; jf0;1g and cnf1;2;;Mg is the classlabel of the n-th sample. The CE is minimized by stochasticgradient descent to update the condence parameters.4.4. Parameters optimizationThe objective of training is to tune the combining weights of (9)and the penalty parameters of (20) to optimize the alignmentperformance. However, there exist two types of trained para-meters in our alignment model. The differentiable combiningweights (0;; 4) can be optimized by stochastic gradient ascent;the penalty parameters (penaltyti, penaltyxjki1xj) are notdifferentiable and hence do not allow optimization by gradientsearch. Fortunately, we can optimize them with direct searchmethod which has been used in [44] to search for optimalparameters for page segmentation algorithm. Therefore, we designa two-stage optimization strategy. In the rst stage, we disable thedeleting and skipping operations by xing the penalty parametersto a maximum, and train the combining weights on a training setwhich contains neither deleted primitive segments nor skippedcharacters. In the second stage, the penalty parameters areoptimized by xing the combining weights on another trainingset containing all types of text line samples.4.4.1. String-level MCE training for weight parametersThe aim of this training stage is to tune the combining weightsso as to promote correct alignment and depress incorrect align-ment. String-level training with the minimum classication error(MCE) criterion has been widely used in speech recognition andhandwriting recognition and has reported superior performance[45,46].In string-level training, the weights are estimated on a datasetof string samples fXTn; Snc jn 1;;Ng, where Scn denotes thecorrect alignment (i.e., correct segmentation) of the sampleXTn Xn; Tn, by optimizing an objective function related to textline alignment performance. Denoting the cost between a sampleXTn and its alignment Sn as gXTn; Sn;, where f0;4g is theset of weights. In string-level MCE training described in [47], themisclassication measure for correct alignment Scn is approxi-mated bydXTn; gXTn; Snc ; gXTn; Snr ;; 25where gXTn; Snr ; is the cost of the closest rival (the minimumcost alignment excluding the correct one)gXTn; Snr ; maxkcgXTn; Snk ;: 26The misclassication measure is transformed to loss by thesigmoidal functionlXTn; 11 edXTn ; ; 27where is a parameter to control the hardness of sigmoidalnonlinearity. On the training sample set, the empirical loss isregularized to overcome the ill-posednessL 1NNn 1lXTn; 22: 28By stochastic gradient descent, the parameters are updated oneach training sample byt 1 ttlXTt ;j t; 29where t is the learning step. Accordingly, the updating formulaof each weight parameter can be derived ast1h 1thtl1l Li 1f hti; Strf hti; Stc; 30where L is the string length of sample XTt (the number ofcharacters in the text transcript), h 0;;4, and ti; St denotesthe alignment of character ti.In implementation, the weights are initialized as equal valuesand then iteratively updated on each string sample based on (30).And by retaining several alternative alignments in dynamicprogramming, we obtain the rival alignment which is mostconfusable with the correct one.4.4.2. Simplex search for penalty parametersThough the penalty parameters have important impact on thealignment performance, they are typically manually selected andno training method is explicitly sought [18,19,22,26]. Fortunately,this optimization problem is a multivariate nonsmooth nonlinearfunction optimization problem as those in [44] and can be solvedusing the simplex method. We use a public domain implementa-tion of downhill simplex method proposed by Nelder and Mead[38], and use the standard choice for reection (1), contraction(0.5), expansion (2), shrinkage (0.5) and stopping threshold (106)coefcients. The downhill simplex method is a local optimizationalgorithm. Thus, for each (different) starting point, the optimiza-tion algorithm could converge to a different optimal solution.To overcome the local optimum, we constrain the parametervalues to lie within a reasonable range and randomly choose somestarting locations within this range. The solution corresponding tothe lowest optimal value is chosen as the best optimal set ofparameters.5. Geometric context modelingSince the distinct outline features of alphanumeric charactersand punctuation marks can be exploited to improve the alignmentaccuracy, we design four statistical models for the class-dependentand class-independent single-character geometry (unary geo-metric context), and the class-dependent and class-independentbetween-character relationships (binary geometric context),respectively. In the following, we rst describe the geometricfeatures, and then describe the statistical models scoring thesefeatures.5.1. Class-dependent geometric featuresFor modeling single-character geometry, we rst estimate theaverage height of the text line image as in [41] because somefeatures related to the bounding box of the character, as the No.110 in Table 1. (2) Four scalar features related to the vertical centerof text line, as No. 1114 in Table 1. (3) 28 prole-based featuresinspired by the methods of [48,49], as No. 1542 in Table 1.We also extract 24 features for binary class-dependentgeometric context, which are grouped into two categories: (1) 16scalar features between the bounding boxes of two consecutivecharacter patterns, as No. 116 in Table 2. (2) Eight featuresbetween the proles of two consecutive character patterns, asNo. 1724 in Table 2.5.2. Class-independent geometric featuresTo measure whether a candidate pattern is a valid character ornot, we extract 12 class-independent geometric features. They arethe same features of the No. 110 and the No. 1516 in Table 1.The class-independent geometry between two consecutive candi-date patterns describes whether an over-segmentation gap is avalid between-character gap or not. For this we extract 14 class-independent features from two adjacent primitive segments, 13 ofthem are the same as the No. 113 in Table 2, and the last is theconvex hull-based distance as calculated in [50].5.3. Statistical modelsFor modeling the class-dependent geometry, we should rstreduce the number of character geometry classes, since thenumber of Chinese characters is very large and many differentcharacters have similar outline geometry. Particularly, the binaryclass-dependent geometric model considers pairs of characters,therbou/verertictricandtlineandowerticalvertcteroutightF. Yin et al. / Pattern Recognition 46 (2013) 28072818 2813Table 2Class-dependent between-character geometric features.No. Feature16 Distances between the upper bounds, lower bounds, upper-l78 Distances between the horizontal gravity centers and the ver910 Distances between the horizontal geometric centers and the1112 Height and width of the box enclosing two consecutive chara13 Gap between the bounding boxes14 Ratio of heights of the bounding boxes15 Ratio of widths of the bounding boxes16 Square root of the common area of the bounding boxes1720 Differences between the mean of upper, lower, left and right2124 Differences between the deviation of upper, lower, left and rgeometric features are necessarily normalized w.r.t. to the height(i.e., divided by it) so as to be invariable to text line height.We extract 42 geometric features from a candidate characterpattern, which are grouped into three categories: (1) 10 scalarTable 1Class-dependent single-character geometric features (The last column denotes wheNo. Feature12 Height and width of bounding box3 Sum of inner gaps45 Distances of horizontal/vertical gravity center to left/upper6 Logarithm of aspect ratio7 Square root of bounding box area8 Diagonal length of bounding box910 Distances of horizontal/vertical gravity center to horizontal1112 Distances of vertical gravity/geometric center to text line v1314 Distances of upper/lower bound to text line vertical geome1516 Means of horizontal/vertical projection proles1722 Normalized amplitude deviations, coefcients of skewness2330 Means and deviations of the upper, lower, left and right ou3142 Normalized amplitude deviations, coefcients of skewnessand it is formidable to store MM models (M is the number ofcharacter classes) and get enough training samples for so manymodels. Hence, we cluster the character classes into six super-classes (based on our prior knowledge of character shapes) usingnormalized w.r.t. the text line height or not.).NormYYnd YNYYtical geometric center Yal gravity/geometric center Ycenter YYkurtosis of the horizontal and vertical projection proles Nproles Ykurtosis of the upper, lower, left and right outline proles NNormbounds, lower-upper bounds, left bounds and right bounds Ygravity centers Yical geometric centers Ys YYNNYline proles Youtline proles Yin Table 3.under-segmentation errors. There are 8462 such lines in CASIA-HWDB and 304 lines in HIT-MW.To build the character classier, we rst converted the gray-scale character images to binary images, then extracted featuresusing the continuous NCFE (normalization-cooperated featureextraction) method combined with the MCBA (modied centriodboundary alignment) normalization method [52]. The resulting512-dimensional feature vector is projected onto a 160-dimensionalsubspace learned by Fisher linear discriminant analysis (FLDA),and then the 160-dimensional vector is input into a classier.The classier parameters were learned on the training samples ofisolated characters in CASIA-HWDB.For estimating the parameters of the geometric models, weextracted geometric features from 41,781 text lines of trainingtext pages.For training the alignment model, we rst ltered the trainingtext lines by removing the strings with under-segmentation errors(cases that correct segmentation points are not included in theF. Yin et al. / Pattern Recognition 46 (2013) 2807281828146. Experimental resultsWe evaluated the performance of our approach on a largedatabase of unconstrained Chinese handwriting, CASIA-HWDB [7],and on a small dataset HIT-MW [51]. Because we focused on theperformance of the automatic alignment algorithm, the followingexperiments did not involve any manual correction steps.6.1. Database and experimental settingThe CASIA-HWDB database contains both isolated charactersand unconstrained handwritten texts, and is divided intoa training set of 816 writers and a test set of 204 writers. Thetraining set contains 3,118,477 isolated character samples of 7356classes (7185 Chinese characters, 109 frequently used symbols,10 digits, and 52 English letters) and 4076 pages of handwrittentexts. The text pages have a few mis-written characters andcharacters beyond the 7356 classes, which we call non-charactersand outlier characters, respectively. We evaluated the text linealignment performance on 1015 handwritten pages of 204 testwriters, which were segmented into 10,449 text lines containingthe EM algorithm. After clustering, each single character isassigned to one of six super-classes and a pair of successivecharacters thus belongs to one of 36 binary super-classes.For estimating the statistical geometric models, training charactersamples are relabeled to six super-classes. We use a quadraticdiscriminant function (QDF) for both the unary and binary class-dependent geometric models. For unary class-dependent geome-try, the 42-dimensional feature vector is reduced to 5-dimensionalsubspace by Fisher linear discriminant analysis (FLDA), and theprojected samples are used to estimate the parameters of six-classQDF. For binary class-dependent geometry, the 36-class QDF isestimated using samples of 24-dimensional geometric features.The unary class-independent geometry indicates whether a can-didate pattern is a valid character or not. For this two-classproblem, we use a linear support vector machine (SVM) trainedwith character and non-character samples. The class-independentbinary geometry indicates whether a segmentation point betweentwo adjacent primitive segments is a between-character gap ornot. We similarly use a linear SVM for this two-class problem,trained with two-class labeled samples.The four geometric models in our system are summarizedTable 3Summary of geometric models in our system.Dimension ClassierUnary class-dependent 42-5 QDFBinary class-dependent 24 QDFUnary class-independent 12 SVMBinary class-independent 14 SVM268,628 characters (including 723 non-characters and 368 outliercharacters).To validate the effectiveness of our alignment algorithm, wealso tested on the dataset HIT-MW, from which a test set of 386text lines contains 8448 characters (7405 Chinese characters, 780symbols, 230 digits, 8 English letters, 16 non-characters and9 outlier characters).Note that there are under-segmentations errors after over-segmentation [41] and the characters of under-segmentationobviously cannot be aligned correctly. To evaluate the alignmentperformance more fairly, we also selected the text lines withoutcandidate search grid), then selected 5000 text lines which containneither deleted primitive segment nor skipped character to trainthe weight parameters; at last, the penalty parameters werelearned on 500 text lines (in which 250 text lines contain deletedprimitive segments, the others were constructed by insertingskipped characters randomly) selected from the remainingsamples.Though the annotation system (Fig. 3) involves text lineseparation, we focus on the performance of text line alignment(character segmentation and labeling), whereas the performanceof text line separation has been evaluated in [39].Using a state-of-the-art over-segmentation algorithm [41] on thetest pages of CASIA-HWDB database, we observed that 4.46% ofcharacters were not correctly separated (a character is correctlyover-segmented when it is separated from other characters despitethe within-character splits), i.e., they were under-segmented. Thesecharacters cannot be correctly segmented and aligned by combiningconsecutive primitive segments. This implies that the over-segmentation of characters is still a challenge. Some examples ofunder-segmentation errors are shown in Fig. 8.Our system was implemented on a PC with Intel Quad Core2.83 GHz CPU and 4 GB-RAM, and the algorithms were pro-grammed in MS Visual C++ 2008.6.2. Performance metricsWe evaluate the alignment performance of text lines using anaccuracy metric based on bounding boxes of characters [2427],Fig. 8. Some examples of over-segmentation.which is called alignment rate (AR)ARNc=Ntwhere Nc is the number of correctly aligned characters and Nt isthe total number of characters in the transcript.In our experiments, a match between a transcript character andprimitive segments, ti; xjki1xj, is judged as correct if thebounding box of the primitive segments and the bounding boxof the true character image overlap sufciently (the difference oftop, bottom, left and right bounds do not exceed 1.5 times of theestimated stroke width of the test text line).6.3. Text line alignment performanceeigenvectors per class for the MQDF classier. The parameters of NPC,any one geometric model or the combination of them. The binarydeformations. Therefore, in this section, we implemented twoword matching methods based on [15,16]. The method of [15]investigates the relationship of word lengths between image andtext, and is analogous to the unary geometric models in our work.The method of [16] calculates the distances between connectedcomponents, which is similar to the binary geometric models inour work. We accustomed the two methods to suit Chinese textline alignment based on character matching instead of wordmatching. For the method based on [15], we assume that thepunctuation marks, English letters, digits and Chinese charactersare of width of 1/10, 1/2, 1/2 and 1 standard character, respectively,and use DTW for exhaustive search to nd the best alignment. Forthe method based on [16], we compute the distance between twooverlapped components as in [50], and use greedy search to ndthe best alignment.The alignment accuracies of these two methods and our proposedmethod (using the NPC character classier) are shown in Table 6.Results with/without GMs and CT on test datasets.F. Yin et al. / Pattern Recognition 46 (2013) 28072818 2815Alignment accuracy (%)CASIA-HWDB HIT-MWATData/RTData ATData/RTDataMQDF (without CT) 91.47/98.37 91.76/97.19MQDF (with CT) 91.46/98.36 91.77/97.13MQDF+GM (without CT) 92.13/99.02 92.37/97.99MQDF+GM (with CT) 92.32/99.04 92.73/98.01NPC (without CT) 91.34/98.23 91.85/97.31NPC (with CT) 91.27/98.17 91.82/97.33NPC+GM (without CT) 92.16/98.97 92.36/97.83NPC+GM (with CT) 92.31/99.03 92.66/98.00prototype vectors, are trained using a one-vs-all cross-entropy (CE)criterion. This training objective favors character retrieval as well astext line alignment. We use an NPC with one prototype per class.6.3.1. Effects of geometric models and condence transformationTable 4 shows the text line alignment results of our proposedmethod on all test data (ATData) and the reduced test data withoutunder-segmentation errors (RTData). From the results, we observethat: (1) The geometric models (GMs) can signicantly improvethe alignment performance on different character classiers.(2) The condence transformation (CT) can effectively benetthe combination of geometric models with character classier.(3) The two character classiers, MQDF and NPC, perform com-parably well in text line alignment, though the NPC has muchlower complexity than the MQDF. (4) Removing the text lines withunder-segmentation, the proposed method achieves very highaccuracy of alignment on RTData. The highest accuracies on twodatabases CASIA-HWDB and HIT-MW are 99.04% and 98.01%,respectively.Table 4We rst evaluate the performance of condence transforma-tion and geometric models on the test text line data of CASIA-HWDB and HIT-MW; then discuss the effects of different geo-metric models; third, we compare our recognition-based methodwith word matching-based methods; and last, the alignmenterrors are analyzed in details.In addition, to evaluate the effects of different character classierscombined with geometric models, we tested two types of classiers:MQDF (modied quadratic discriminant function) [53] classier andNPC (nearest prototype classier), which have been widely used inhandwritten character recognition as well as handwritten documentretrieval [54]. The parameters of MQDF classier are estimated bymaximum likelihood estimation of class means and covariancematrices, eigenvalue decomposition of covariance matrices, andreplacing the minor eigenvalues with a constant. We use 10 principalgeometric models (f2 and f4) perform better than the unary geometricmodels (f1 and f3). This justies the importance of between-characterrelationship. Comparing the class-dependent models (f1 and f2) andthe class-independent models (f3 and f4), the results show that theclass-independent geometric models perform better. The best align-ment performance by combining four geometric models justies thecomplementariness of class-dependent and class-independentgeometric models. Moreover, the fact that geometric models exhibitalmost consistent results on different character classiers alsodemonstrates the universality of our geometric models.6.3.3. Comparing with word matching-based methodsThere had been few efforts devoted to Chinese handwrittentext line alignment before our work, particularly, there had notbeen character recognition-based methods for this purpose.To justify the superiority of recognition-based alignment, weidentied two word matching-based methods that are applicableto Chinese handwritten documents. Obviously, the geometrictransformation-based method is not suitable because it wasdesigned for printed documents and can only tolerate rigid6.3.2. Comparing geometric modelsTo investigate the effects of geometric contexts, we selected2000 text lines from the test data of CASIA-HWDB. These selectedtext lines have no under-segmentation errors, but are difcult tobe aligned using character recognizer only.Table 5 shows the effects of geometric models (f 1f 4) on text linealignment when combined with different character classiersf0MQDF or f0NPC. We can see that the incorporation of geometriccontexts improve the alignment accuracies remarkably when usingTable 5Effects of geometric contexts (with CT).Alignment accuracy (%)f0 f1 f2 f3 f4 f 0 MQDF f 0 NPC1 94.05 93.771 1 94.86 94.881 1 95.96 95.961 1 95.86 95.611 1 96.12 96.061 1 1 96.14 96.031 1 1 96.10 96.051 1 1 96.50 96.491 1 1 96.44 96.291 1 1 1 1 96.79 96.71We can see that the proposed character recognition-based methodFig. 10. Two types of insertion errors. (a) A dot (part of a Chinese character)is deleted (aligned with non-character #). (b) Two mis-written characters arealigned with two non-character #, this is a correct deletion though the twomis-written characters were labeled as one non-character in the ground-truth.F. Yin et al. / Pattern Recognition 46 (2013) 280728182816performs superiorly. From Table 4, we can see that the characterrecognition-based alignment without geometric models also outper-form the word matching-based methods remarkably. The superiorityFig. 9. Two types of miss errors. (a) A Chinese character is merged into thepreceding character. (b) A period is merged into the preceding Chinese character.Table 6Comparison of the proposed method and word matching-based methods.Alignment accuracy (%)CASIA-HWDB HIT-MWATData/RTData ATData/RTDataProposed method 92.31/99.03 92.66/98.00Method based on [12] 63.31/72.89 66.55/71.18Method based on [13] 43.19/46.12 46.07/48.41of character recognition-based alignment is attributed to the fact thecharacter recognizer models the variation of character shapes farmore accurately than the simple character width feature, and thegeometric models characterizes the variations of character outlineand between-character relationships more accurately than the simplebetween-component distance.6.3.4. Error analysisDue to the imperfection of pre-segmentation (over-segmenta-tion) and imprecision of character recognition and geometricmodels, some errors of character segmentation and labeling mayremain and all these remaining errors should be corrected manu-ally. The errors can be categorized into three types: miss error,alignment error, and insertion error. A miss error refers to the casethat a character in the transcript has no corresponding imagesegments, due to missed writing or mis-merging the segment ofthe character with other characters by under-segmentation. Fig. 9shows two examples of miss errors.An insertion error refers to the case that a primitive segmenthas no corresponding transcript character, i.e., it is aligned witha non-character (denoted as #). This implies an extra imagesegment is inserted into the transcript text. Fig. 10 shows twoexamples of insertion errors. The case in Fig. 10(b) is actuallya correct deletion because the deleted characters are mis-writtenand redundant.The dominant error, alignment error, includes mis-split ofa character into multiple ones and mis-merge of multiple charactersinto one. In our experiments, we observed three types of alignmenterrors: (1) Segmentation error between characters (Fig. 11(a)).(2) Heavy overlap of bounding boxes (Fig. 11(b)). (3) Touchingstrokes between characters are failed to split in pre-segmentation(Fig. 11(c)) . To improve alignment for such cases, we need toimprove the over-segmentation method.7. ConclusionWe proposed a recognition-based ground-truthing approachfor annotating Chinese handwritten document images. In thisapproach, we model the single character and between-charactergeometric contexts and combine with a character recognizer toevaluate the score of candidate segmentations (alignments) of textline image matched with the text transcript under the BayesianFig. 11. Three types of alignment errors.framework. The optimal alignment is found by dynamic timeon Pattern Analysis and Machine Intelligence 21 (2) (1999) 179183.[12] D.W. Kim, T. Kanungo, Attributed point matching for automatic groundtruthgeneration, International Journal on Document Analysis and Recognition 5 (1)F. Yin et al. / Pattern Recognition 46 (2013) 28072818 2817(2002) 4766.[13] D.W. Kim, T. Kanungo, A point matching algorithm for automatic generation ofgroundtruth for document images, in: Proceedings of the 4th InternationalWorkshop on Document Analysis Systems, Rio de Janeiro, Brazil, 2000,pp. 475485.[14] C. Viard-Gaudin, P.M. Lallican, S. Knerr, P. Binter, The IRESTE on/off (IRONOFF)dual handwriting database, in: Proceedings of the 5th InternationalConference on Document Analysis and Recognition, Bangalore, India, 1999,pp. 455458.[15] S. Zinger, J. Nerbonne, L. Schomaker, Text-image alignment for historicalhandwritten documents, in: Proceedings of the SPIE, San Jose, CA, USA,vol. 7247, 2009, pp. 18.[16] N. Stamatopoulos, G. Louloudis, B. Gatos, Efcient transcript mapping to easethe creation of document image segmentation ground truth with text-image[10] G. Zi, D. Doermann, Document image ground truth generation from electronictext, in: Proceedings of the 17th International Conference on Pattern Recogni-tion, vol. 4, 2004, pp. 663666.[11] J.D. Hobby, Matching document images with ground truth, InternationalJournal on Document Analysis and Recognition 1 (1) (1998) 5261.warping (DTW). Our experimental results demonstrated thesuperiority of character recognition-based alignment and thebenets of geometric models. Despite that some alignment errorsremain, the tool based on the proposed approach can be used forannotating practical handwritten documents with remainingerrors corrected by human operations. The large database CASIA-HWDB was annotated using the preliminary version of this tool,1and the ground-truth data has been used for design and evaluationof our works in text line segmentation and character stringrecognition. Further improvements can be expected using bettercharacter recognizer and over-segmentation method.Conict of interest statementThe authors declare no conicts of interest.AcknowledgmentThe authors would like to thank Tonghua Su for authorizing usto use the HIT-MW database. This work was supported by theNational Natural Science Foundation of China (NSFC) under Grants61175021 and 60933010.References[1] R. Plamondon, S.N. Srihari, On-line and off-line handwriting recognition: acomprehensive survey, IEEE Transactions on Pattern Analysis and MachineIntelligence 22 (1) (2000) 6384.[2] H. Fujisawa, Forty years of research in character and document recognitionanindustrial perspective, Pattern Recognition 41 (8) (2008) 24352446.[3] H. Bunke, K. Riesen, Recent advances in graph-based pattern recognition withapplications in document analysis, Pattern Recognition 44 (5) (2011)10571067.[4] M.T. Parvez, S.A. Mahmoud, Arabic handwriting recognition using structuraland syntactic pattern attributes, Pattern Recognition 46 (1) (2013) 141154.[5] R.M. Udrea, N. Vizireanu, Iterative generalization of morphological skeleton,Journal of Electronic Imaging 16 (1) (2007) 010501.[6] C.-L. Liu, F. Yin, D.-H. Wang, Q.-F. Wang, Online and ofine handwrittenChinese character recognition: benchmarking on new databases, PatternRecognition 46 (1) (2013) 155162.[7] C.-L. Liu, F. Yin, D.-H Wang, Q.-F Wang, CASIA online and ofine Chinesehandwriting database, in: Proceedings of the 11th International Conference onDocument Analysis and Recognition, 2011, pp. 3741.[8] T. Kanungo, R.M. Haralick, Automatic generation of character groundtruth forscanned documents: a closed-loop approach, in: Proceedings of the Interna-tional Conference on Pattern Recognition, Vienna, Austria, vol. 3, 1996,pp. 669675.[9] T. Kanungo, R.M. Haralick, An automatic closed-loop methodology forgenerating character groundtruth for scanned documents, IEEE Transactions1 Downloadable at, in: Proceedings of the 12th International Conference on Frontiersin Handwriting Recognition, 2010, pp. 226231.[17] E.M. Korneld, R. Manmatha, J. Allan. Text alignment with handwrittendocuments, in: Proceedings of the International Workshop on DocumentImage Analysis for Libraries, 2004, pp. 195209.[18] E.M. Korneld, R. Manmatha, J. Allan, Further explorations in text alignmentwith handwritten documents, International Journal on Document Analysisand Recognition 10 (1) (2007) 3952.[19] T.M. Rath, R. Manmatha, Word Image matching using dynamic time warping,in: Proceedings of the 2003 IEEE Computer Society Conference ComputerVision and Pattern Recognition, vol. 2, 2003, pp. 521527.[20] L.M. Lorigo, V. Govindaraju, Transcript mapping for handwritten Arabicdocuments, in: Proceedings of the SPIE, San Jose, CA, USA, vol. 6500-65000W, 2007.[21] C.V. Jawahar, A. Kumar, Content-level annotation of large collection of printeddocument images, in: Proceedings of the 9th International Conference onDocument Analysis and Recognition, 2007, pp. 799803.[22] A. Kumar, A. Balasubramanian, A. Namboodiri, C.V. Jawahar, Model-basedannotation of online handwritten datasets, in: Proceedings of the 10thInternational Workshop on Frontiers in Handwriting Recognition, 2006,pp. 914.[23] J. Rothfeder, R. Manmatha, T.M. Rath, Aligning transcripts to automaticallysegmented handwritten manuscripts, Document Analysis Systems VII, 2006,pp. 8495.[24] C.I. Tomai, B. Zhang, V. Govindaraju, Transcript mapping for historic hand-written document images, in: Proceedings of the 8th International Workshopon Frontiers in Handwriting Recognition, 2002, pp. 68.[25] B. Zhang, C. Tomai, S. Srihari, V. Govindaraju, Construction of handwritingdatabases using transcript-based mapping, in: Proceedings of the 1st Interna-tional Workshop on Document Image Analysis for Libraries, 2004, pp. 288298.[26] C. Huang, S.N. Srihari, Mapping transcripts to handwritten text, in: Proceed-ings of the 10th International Workshop on Frontiers in Handwriting Recogni-tion, 2006, pp. 1520.[27] M. Zimmermann, H. Bunke, Automatic segmentation of the IAM off-linedatabase for handwritten English text, in: Proceedings of the 16th Interna-tional Conference on Pattern Recognition, vol. 4, 2002, pp. 3539.[28] A. Toselli, V. Romero, E. Vidal, Viterbi based alignment between text imagesand their transcripts, in: Proceedings of the Workshop on Language Technol-ogy for Cultural Heritage Data, 2007, pp. 916.[29] E. Indermuhle, M. Liwicki, H. Bunke, Combining alignment results for historicalhandwritten document analysis, in: Proceedings of the 10th InternationalConference on Document Analysis and Recognition, 2009, pp. 11861190.[30] A.H. Toselli, V. Romero, M. Pastor, E. Vidal, Multimodal interactive transcrip-tion of text images, Pattern Recognition 43 (5) (2010) 18141825.[31] H. Xue, V. Govindaraju, Incorporating contextual character geometry in wordrecognition, in: Proceedings of the 8th International Workshop on Frontiers inHandwriting Recognition, Ontario, Canada, 2002, pp. 123127.[32] M. Koga, T. Kagehiro, H. Sako, H. Fujisawa, Segmentation of Japanese hand-written characters using peripheral feature analysis, in: Proceedings of the14th International Conference Pattern Recognition, Brisbane, Australia, vol. 2,1998, pp. 11371141.[33] T. Fukushima, M. Nakagawa, On-line writing-box-free recognition of hand-written Japanese text considering character size variations, in: Proceedings ofthe 15th International Conference Pattern Recognition, Barcelona, Spain, vol.2, 2000, pp. 359363.[34] B. Zhu, M. Nakagawa, Online handwritten Japanese text recognition byimproving segmentation quality, in: Proceedings of the 11th InternationalConference on Frontiers in Handwriting Recognition, Montreal, Canada, 2008,pp. 379384.[35] X.-D. Zhou, J.-L. Yu, C.-L. Liu, T. Nagasaki, K. Marukawa, Online handwrittenJapanese character string recognition incorporating geometric context,in: Proceedings of the 9th International Conference on Document Analysisand Recognition, Curitiba, Brazil, vol. 1, 2007, pp. 4852.[36] F. Yin, Q.-F. Wang, C.-L. Liu, A tool for ground-truthing text lines and charactersin off-line handwritten Chinese documents, in: Proceedings of the 10thInternational Conference on Document Analysis and Recognition, 2009,pp. 951955.[37] F. Yin, Q.-F. Wang, C.-L. Liu, Integrating geometric context for text alignment ofhandwritten Chinese documents, in: Proceedings of the 12th InternationalConference on Frontiers in Handwriting Recognition, 2010, pp. 712.[38] J.A. Nelder, R. Mead, A simplex method for function minimization, TheComputer Journal 7 (4) (1965) 308313.[39] F. Yin, C.-L. Liu, Handwritten Chinese text line segmentation by clustering withdistance metric learning, Pattern Recognition 42 (12) (2009) 31463157.[40] T.-H. Su, T.-W. Zhang, D.-J. Guan, H.-J. Huang, Off-line recognition of realisticChinese handwriting using segmentation-free strategy, Pattern Recognition42 (1) (2009) 167182.[41] C.-L. Liu, M. Koga, H. Fujisawa, Lexicon-driven segmentation and recognitionof handwritten character strings for Japanese address reading, IEEE Transac-tions on Pattern Analysis and Machine Intelligence 24 (11) (2002) 14251437.[42] C.-L. Liu, H. Sako, H. Fujisawa, Effects of classier structures and trainingregimes on integrated segmentation and recognition of handwritten numeralstrings, IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (11)(2004) 13951407.[43] Q.-F. Wang, F. Yin, C.-L. Liu, Improving handwritten Chinese Text Recognitionby Condence Transformation, in: Proceedings of the 11th InternationalConference on Document Analysis and Recognition, 2011, pp. 518522.[44] S. Mao, T. Kanungo, Empirical performance evaluation methodology and itsapplication to page segmentation algorithm, IEEE Transactions on PatternAnalysis and Machine Intelligence 23 (3) (2001) 242256.[45] W. Chou, Discriminant-function-based minimum recognition error patternrecognition approach to speech recognition, Proceedings of the IEEE 88 (8)(2000) 12011223.[46] C.-L. Liu, K. Marukawa, Handwritten numeral string recognition: character-level training vs. string-level training, in: Proceedings of the 17th InternationalConference Pattern Recognition, Cambridge, UK, vol. 1, 2004, pp. 405408.[47] M. Cheriet, N. Kharma, C.-L. Liu, C.Y. Suen, Character Recognition Systems:A Guide for Students and Practioners, John Wiley & Sons, NJ, 2007.[48] V. Lavrenko, T.M. Rath, R. Manmatha, Holistic word recognition for hand-written historical documents, in: Proceedings of the International Workshopon Document Image Analysis for Libraries DIAL'04, 2004, pp. 278287.[49] Y.Y. Chung, M.T. Wong, High accuracy handwritten character recognitionsystem using contour sequence moments, in: Proceedings of the 4th Interna-tional Conference Signal Processing, Beijing, China, vol.2, 1998, 12491252.[50] G. Louloudis, B. Gatos, I. Pratikakis, C. Halatsis, Text line and word segmenta-tion of handwritten documents, Pattern Recognition 42 (12) (2009)31693183.[51] T.H. Su, T.-W. Zhang, D.-J. Guan, Corpus-based HIT-MW database for ofinerecognition of general-purpose Chinese handwritten text, International Journalon Document Analysis and Recognition 10 (1) (2007) 2738.[52] C.-L. Liu, K. Marukawa, Global shape normalization for handwritten Chinesecharacter recognition: a new method, in: Proceedings of the 9th InternationalWorkshop on Frontiers in Handwriting Recognition, Tokyo, Japan, 2004, 300305.[53] F. Kimura, K. Takashina, S. Tsuruoka, Y. Miyake, Modied quadratic discrimi-nant functions and the application to Chinese character recognition, IEEETransactions on Pattern Analysis and Machine Intelligence 9 (1) (1987)149153.[54] H. Zhang, D.-H. Wang, C.-L. Liu, Keyword spotting from online Chinesehandwritten document using one-vs-all trained character classier, in: Pro-ceedings of the 12th International Conference Frontiers in HandwritingRecognition, Kolkata, India, 2010, 271276.Fei Yin is an Assistant Professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. He receivedthe B.S. degree in Computer Science from Xidian University of Posts and Telecommunications, Xi'an, China, the M.E. degree in Pattern Recognition and Intelligent Systemsfrom Huazhong University of Science and Technology, Wuhan, China, the Ph.D. degree in Pattern Recognition and Intelligent Systems from the Institute of Automation,Chinese Academy of Sciences, Beijing, China, in 1999, 2002 and 2010, respectively. His research interests include document image analysis, handwritten character recognitionand image processing. He has published over 20 papers at international journals and conferences.Qiu-FengWang received the B.S. degree in Computer Science from Nanjing University of Science and Technology, Nanjing, China, the Ph.D. degree in Pattern Recognition andIntelligent Systems at the Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2006 and 2012, respectively. He is an Assistant Professor at the Institute ofAutomation, Chinese Academy of Sciences, Beijing, China. His research interests include handwriting recognition, sequence pattern recognition, and language modeling.), Institute of Automation of Chinese Academy of Sciences, Beijing, China, and is now theg fromWuhan University, Wuhan, China, the M.E. degree in electronic engineering fromand intelligent control from the Chinese Academy of Sciences, Beijing, China, in 1989,ute of Science and Technology (KAIST) and later at Tokyo University of Agriculture andF. Yin et al. / Pattern Recognition 46 (2013) 280728182818Technology from March 1996 to March 1999. From 1999 to 2004, he was a Research Staff Member and later a Senior Researcher at the Central Research Laboratory, Hitachi,Ltd., Tokyo, Japan. His research interests include pattern recognition, image processing, neural networks, machine learning, and especially the applications to characterrecognition and document analysis. He has published over 140 technical papers at prestigious international journals and conferences.Cheng-Lin Liu is a Professor at the National Laboratory of Pattern Recognition (NLPRdeputy director of the laboratory. He received the B.S. degree in electronic engineerinBeijing Polytechnic University, Beijing, China, the Ph.D. degree in pattern recognition1992 and 1995, respectively. He was a Postdoctoral Fellow at Korea Advanced InstitTranscript mapping for handwritten Chinese documents by integrating character recognition model and geometric contextIntroductionRelated worksSystem overviewText line alignmentProblem formulationString alignment with dynamic time warpingConfidence transformationParameters optimizationString-level MCE training for weight parametersSimplex search for penalty parametersGeometric context modelingClass-dependent geometric featuresClass-independent geometric featuresStatistical modelsExperimental resultsDatabase and experimental settingPerformance metricsText line alignment performanceEffects of geometric models and confidence transformationComparing geometric modelsComparing with word matching-based methodsError analysisConclusionConflict of interest statementAcknowledgmentReferences


View more >