Transcript mapping for handwritten Chinese documents by integrating character recognition model and geometric context

Download Transcript mapping for handwritten Chinese documents by integrating character recognition model and geometric context

Post on 14-Dec-2016




0 download

Embed Size (px)


<ul><li><p>intr</p><p>of Sc</p><p>Keywords:Transcript mappingMinimum classication error (MCE)</p><p>Geometric context</p><p>e dt aingofionhe a</p><p>of annotation, is formulated as an optimization problem by incorporating geometric context of charactersand character recognition model. We evaluated the alignment performance on a Chinese handwritingdatabase CASIA-HWDB, which contains nearly four million character samples of 7356 classes and 5091</p><p>h has bthods hard datin systg use</p><p>rened on labeled data. By ground-truthing, the text regions, text</p><p>weendocu-matic(onender-</p><p>equal to the total number of characters in the transcript. Therefore, a</p><p>Contents lists available at SciVerse ScienceDirect</p><p>w.</p><p>Pattern Rec</p><p>Pattern Recognition 46 (2013) 28072818restriction [830]. For printed document images (C.-L. Liu).simple linear alignment does not work.In recent years, many efforts have been devoted to this difcult</p><p>problem, and some ground-truthing tools have been developed forautomatic annotation of document images generated with less</p><p>0031-3203/$ - see front matter &amp; 2013 Elsevier Ltd. All rights reserved.</p><p>n Corresponding author. Tel.: +86 10 8261 4560.E-mail addresses: (F. Yin), (Q.-F. Wang),of labeled data for training. On the other hand, the performance ofsegmentation and recognition should been again evaluated and</p><p>segmentation (two or more characters merged into one segment),thus the total number of segmented character images usually is notreleased recently by the authors [7]. Therefore, ground-truthingdocument images, i.e., annotating the regions, text lines, wordsand characters, become a prerequisite for handwritten Chinesedocument analysis research. On the one hand, the design ofsegmentation and recognition algorithms needs a large number</p><p>Chinese handwritten documents, there is no extra gap betwords than characters (Fig. 1 shows an example of Chinesement image and its character level transcript mapping). Autosegmentation may result in errors of over-segmentationcharacter segmented into two or more segments) and ubased algorithms for handwritten document recognition andretrieval, large datasets of unconstrained handwritten documentsbecome more demanding. However, to the best of our knowledge,no ofine handwritten Chinese document datasets ground-truthed to text lines and characters have existed except the one</p><p>the transcript available (especially for character level alignment),the alignment is not a trivial problem. First of all, the characterscannot be reliably segmented because the size and position ofcharacters are variable and the strokes are often touchingand overlapping in handwritten documents. For unconstrained1. Introduction</p><p>Handwriting recognition researcthan 40 years and many effective me6]. For all the research works, standdocument images play crucial rolestion. Particularly, with the increasinpages of unconstrained handwritten texts. The experimental results demonstrate the superiority ofrecognition-based text line alignment and the benet of integrating geometric context. On a test set of1015 handwritten pages (10,449 text lines), the proposed approach achieved character level alignmentaccuracy 92.32% when involving under-segmentation errors and 99.04% when excluding under-segmentation errors. The tool based on the proposed approach has been practically used for labelinghandwritten Chinese documents.</p><p>&amp; 2013 Elsevier Ltd. All rights reserved.</p><p>een pursued for moreave been proposed [1asets of characters andem design and evalua-of statistical learning-</p><p>lines and words/characters need to be segmented and tagged veryaccurately. Unfortunately, such ground-truth was typically createdmanually and therefore its creation is tedious, expensive andprone to human errors. To a large extent, automatic annotationtools can alleviate above difculties.</p><p>Aligning text line images with their text transcripts is thecrucial step of handwriting annotation. However, even withDynamic time warping (DTW)Condence transformation (CT)Transcript mapping for handwritten Chcharacter recognition model and geome</p><p>Fei Yin n, Qiu-Feng Wang, Cheng-Lin LiuNational Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy</p><p>a r t i c l e i n f o</p><p>Article history:Received 20 October 2012Received in revised form12 February 2013Accepted 18 March 2013Available online 28 March 2013</p><p>a b s t r a c t</p><p>Creating document imagprerequisite for documenlaborious and time consumand the large variabilityrecognition-based annotatthe Bayesian framework, t</p><p>journal homepage: wwese documents by integratingic context</p><p>iences, 95 Zhongguan East Road, Beijing 100190, PR China</p><p>atasets with ground-truths of regions, text lines and characters is analysis research. However, ground-truthing large datasets is not onlybut also prone to errors due to the difculty of character segmentationcharacter shape, size and position. This paper describes an effectiveapproach for ground-truthing handwritten Chinese documents. Underlignment of text line images with text transcript, which is the crucial step</p><p></p><p>ognition</p></li><li><p>about parameters optimization and extended experimental results</p><p>F. Yin et al. / Pattern Recognition 46 (2013) 280728182808by text editing, printing and scanning, the text description(transcript) can be matched with the scanned image to generateground-truth data automatically [813]. Handwritten documentimages can be similarly matched with the transcript (existinga priori or edited a posteriori), but the matching process is muchmore complicated due to the irregularity of document layout andwritten word/character shapes [14]. Usually, for handwrittendocuments, a dynamic search algorithm optimizing a match scorebetween the text line image and its transcript is used to align thesequence of image segments and the words/characters. Thesemethods can be roughly categorized into two groups dependingon whether word/character recognition models are used or not.In the rst group without recognition models, outline features areextracted from text line image for transcript alignment [1523].Recognition models can better measure the similarity betweenimage segment and word/character and thus can improve thealignment accuracy. The methods in [2426] use characterprototype-based word models, the ones in [2730] use HMM-based word recognizers. Taking into account the salient effects ofcharacter recognizers in word/string recognition, the recognizerlargely benets the alignment of text lines.</p><p>The alignment accuracy is still not sufcient despite the promiseof recognition models. In Chinese documents, the mixed alphanu-meric characters and punctuation marks are prone to segmentationand labeling errors because they have distinct geometric features ofsize, aspect ratio and position in text line. The misalignment ofChinese characters is mainly due to character touching and the gapswithin characters composed of multiple radicals. Fig. 2 shows typicalannotation errors caused by a punctuation mark and a radical ofChinese character. Geometric context features would be helpful toreduce such errors.</p><p>Geometric context has been used in handwriting recognition toreduce character segmentation and recognition errors, by usingvarious geometric features (such as character size, inter-characterand between-character gaps) and statistical models (geometric</p><p>Fig. 1. An example of transcript mapping. (a) Text line image and its transcript.(b) Character level alignment.class means, Gaussian density models, discriminative classiers,etc.) for handwritten English word and Japanese character stringrecognition [3135]. These methods cannot sufciently describe thegeometric features of handwritten Chinese characters. For example,the peripheral features in [32] are inadequate to grasp the complexstructures and shapes of Chinese characters; in [3335], somestroke based features were designed for online handwrittencharacters, these features are not readily applicable for ofinecharacters.</p><p>This paper describes a practical annotation system for ground-truthing handwritten Chinese documents. Unlike most previousworks, we integrate geometric context to improve the perfor-mance of text alignment in Chinese handwriting annotation.We use four statistical models to evaluate the single-charactergeometric features and between-character relationships. Underthe Bayesian framework, the geometric models are combined withand discussions. The rest of this paper is organized as follows.Section 2 reviews the related works; Section 3 describes theoutline of our annotation system; Section 4 formulates the textline alignment problem and describes the technical details;Section 5 describes the geometric models; Section 6 presents theexperimental results. Concluding remarks are offered in Section 7.</p><p>2. Related works</p><p>There have been many efforts in developing methods and systemsfor minimizing the human labors in document images ground-truthing. The published methods can be roughly categorized intothree classes: geometric transformation-based, word matching-basedand recognition-based methods.</p><p>Geometric transformation-based methods are mainly designedfor printed document alignment. The underlying idea is to estimatea global geometric transformation and then perform a robust localbitmap match between the text description image (generated usinga word-processing tool) and document image (printed, photocopied,or scanned from text description). For such matching, Kanungo andHaralick [8,9], Zi and Doermann [10] used four corner points of theimages as corresponding pairs to nd a linear transformation. Thesemethods are not robust when an image part is missing or therethe character recognizer to evaluate the text-to-handwriting match-ing score and the best match searched by dynamic time warping(DTW) gives the alignment result (character segmentation andlabeling). The combining parameters are optimized by string-leveltraining and simplex search. In our system, we also provide properpre-processing (text line segmentation and character over-segmen-tation) and post-processing modules for guaranteeing high accuracyof text line and character segmentation and labeling. After annota-tion, a document image is represented as a sequence of text lines,each line as an ordered sequence of connected components, whichare partitioned into characters. Such data can be used for design andevaluation of text line segmentation, character segmentation andrecognition algorithms, etc.</p><p>This paper is an extension to our conference papers [36,37].The extension is in several respects: Bayesian formulation ofalignment problem, condence transformation (CT), more details</p><p>Fig. 2. Annotation errors caused by punctuation mark (red box) and radical ofChinese character (green box). (For interpretation of the references to color in thisgure caption, the reader is referred to the web version of this article.)are extra feature points. Hobby [11] improved this method byconsidering all feature points and using a direct search optimizationalgorithm [38] to search for the afne transform parameters byminimizing the matching cost between the text description and thecharacter bounding boxes in the image. This method, however,suffers from heavy computational burden and local minimum. Kimand Kanungo [12,13] improved the method by grouping the featurepoints to reduce the complexity and employing a modied attrib-uted branch-and-bound matching algorithm to guarantee globalminimum. Though these methods do not consider the deformationof layout and character shapes, Viard-Gaudin [14] et al. tried thismethodology to create ground truth for handwritten documents.They designed a database of online and ofine handwritten data by</p></li><li><p>manually locating corresponding points in online and ofinedomain, but the matching process is much complex.</p><p>Annotating handwritten document images is usually addressedby formulating the matching between text line image and itstranscription as an alignment of two ordered sequences. Dependingon whether word/character recognition models are used or not,these methods also can be grouped into word matching-basedmethods and recognition-based methods.</p><p>In word matching-based methods, Zinger et al. [15] noticedthat the relative lengths of ASCII and handwritten words arehighly correlated. They proposed to nd the word level alignmentby sequentially adjusting the word image boundaries from right toleft with the cost function based on relative length. Stamatopouloset al. [16] adopted a greedy search algorithm which recursivelyseparates the text line according to the between-component gapsuntil the word number equals that of the transcription. In theapproach of Korneld et al. [17,18], the document was automati-cally segmented into a list of word images, which were matchedwith text using DTW based on global geometric properties (word-box features) extracted from both the handwritten word image</p><p>to improve the performance by combining the alignment resultsgiven by multiple different HMM-based word recognizers. Recently,Toselli et al. [30] also combined the human transcriber and a HMMbased handwritten text recognition system to generate the tran-scription of text images.</p><p>Our proposed approach uses a character recognizer for hand-written Chinese text lines alignment. Character-level alignment byDTW is suitable for Chinese documents because of the largecategory set (over 5000 characters are frequently used) andthe rich shape information of single characters. We also utilizegeometric context information to improve the alignment accuracy.HMM is an effective model for text line alignment and has beenwidely used to annotate the western text document databases.However, for handwritten Chinese text recognition, HMM has notshown superior performance (e.g., only 39% character correct ratewas reported in [40]). The proposed DTW-based algorithm issimpler than HMM in implementation. On the other hand, thestate sequence decoding process in HMM is similar to DTW, bothfollow the principle of dynamic programming.</p><p>of th</p><p>F. Yin et al. / Pattern Recognition 46 (2013) 28072818 2809and the transcription with a special font. Except using differentmatching features and replacing transcript with synthesized hand-written image based on writer's handwriting model in [21,22],the DTW was similarly employed in [1922]. The hidden Markovmodel (HMM) is another effective tool to solve the alignmentproblem. It was used in the work of Rothfeder et al. [23], where theword images were treated as the hidden variables, and the HMMmodels the probability of observing the word image given theword text. The Viterbi algorithm was used to decode the sequenceof word images.</p><p>Among the recognition-based methods, the one of Tomai andZhang et al. [24,25] formulates word alignment as an optimizationproblem involving multiple word segmentation hypotheses andword recognition, and uses dynamic programming to nd wordalignment in several coarse-to-ne stages. The word recognizerprovid...</p></li></ul>


View more >