farsi handwritten word recognition using continuous hidden markov models and structural features m....
TRANSCRIPT
Farsi Handwritten Word Recognition Using Continuous Hidden Markov Models and Structural Features
M. M. HajiCSE Department
Shiraz University
January 2005
2© M. M. Haji, 2005
Outline Introduction Preprocessing
Text SegmentationDocument Image BinarizationSkew and Slant CorrectionSkeletonization
Structural Feature Extraction Multi-CHMM Recognition Conclusion and Discussion
3© M. M. Haji, 2005
Introduction
One of the most challenging problems in Artificial Intelligence. Words are rather complex patterns, having much variability in
handwriting style. Performance of handwriting recognition systems is still far from
human's both in terms of accuracy and speed.
4© M. M. Haji, 2005
Introduction
Previous Research: Dehghan et al. (2001). "Handwritten Farsi (Arabic)
Word Recognition: A Holistic Approach Using Discrete HMM", Pattern Recognition, vol. 34, pp. 1057-1065.
Dehghan et al. (2001). "Unconstrained Farsi Handwritten Word Recognition Using Fuzzy Vector Quantization and Hidden Markov Models", Pattern Recognition Letters, vol. 22, pp. 209-214.
A maximum recognition rate of 65% for a 198-word lexicon!
5© M. M. Haji, 2005
Methodology
Holistic Strategies
Analytical StrategiesImplicit Segmentation
Explicit Segmentation
6© M. M. Haji, 2005
Holistic Strategies
Recognition on the whole representation of a word. No attempt to segment a word to its individual
characters. Necessary to segment the text lines into words.
Intra-word space is sometimes greater than inter-word space!
7© M. M. Haji, 2005
Holistic Strategies
Using a lexicon, a list of the allowed interpretations of the input word image.
The error rate increases with the lexicon size. Successful for postal address recognition or
bank check reading where lexicon is limited and small.
8© M. M. Haji, 2005
Analytical Strategies
Explicit Segmentation: Isolating single letters which are then separately recognized usually
by neural networks . Successful for English machine-printed text. Arabic/Farsi texts whether machine-printed or handwritten are
cursive. Cursiveness and character overlapping are the main challenges.
9© M. M. Haji, 2005
Analytical Strategies
Implicit Segmentation: Converting the text (line or word) image into a sequence of small size units. Recognition at this intermediate level rather than the word or character level
usually by Hidden Markov Model (HMM). Each unit may be a part of a letter, so a number of successive units can
belong to a single letter.
10© M. M. Haji, 2005
Text Segmentation
11© M. M. Haji, 2005
Text Segmentation Detecting text regions in an image (removing non-text components). Applications in document image analysis and understanding, image
compression and content-based image retrieval. Document image binarization and skew correction algorithms
usually require predominant text area to have an accurate estimate of text characteristics.
Numerous methods have been proposed (an extensive literature). There is no general method to detect arbitrary text strings. In the most general form, detection must be:
insensitive to noise, background model and lighting conditions and, invariant to text language, color, size, font and orientation even in a
same image!
12© M. M. Haji, 2005
Text Segmentation We believe that a text segmentation algorithm should have
adaptation and learning capability. A learner usually needs much time and training data to achieve
satisfactory results, which restricts its practicality. A simple procedure was developed for generating training data from
manually segmented images. A Naive Bayes Classifier (NBC) was utilized, which is fast both in
training and application phase. Surprisingly excellent results were obtained by this simple classifier!
13© M. M. Haji, 2005
Text Segmentation
DCT-18 features 10,000 training instance Naive Bayes Classification:
),...,,|(maxarg 21MAP njVv
aaavPvj
)()|,...,,(maxarg
),...,,(
)()|,...,,(maxarg
21
21
21MAP
jjnVv
n
jjn
Vv
vPvaaaP
aaaP
vPvaaaPv
j
j
)|()|,...,,( 21 jii
jn vaPvaaaP
14© M. M. Haji, 2005
Text Segmentation
Naive Bayes Classification:
)|()(maxargNB jii
jVv
vaPvPvj
P(Text) = P(Non-text) = 0.5.
)|()...|()|()|()...|()|(
)|()...|()|()Text(
21822211181211
1181211
vaPvaPvaPvaPvaPvaP
vaPvaPvaPP
15© M. M. Haji, 2005
Binarization
16© M. M. Haji, 2005
Binarization
Converting gray-scale images into two-level images. Many vision algorithms and operators only handle two-level images. Applied in primary steps of a vision algorithm. Selecting a proper threshold surface. Challenging for images with poor contrast, strong noise and variable
modalities in histograms. Global and local (adaptive) algorithms. General and special-purpose algorithms.
17© M. M. Haji, 2005
Binarization
Four different algorithms for document image binarization were compared and contrasted:
Otsu, N. (Jan. 1979). “A Threshold Selection Method from Gray Level Histograms”, IEEE Trans. on Systems, Man and Cybernetics, vol. 9, pp. 62-66.
Niblack, W. (1989). An Introduction to Digital Image Processing, Prentice Hall, Englewood Cliffs, pp. 115-116.
Wu, V. and Manmatha, R. (Jan. 1998). "Document Image Clean-Up and Binarization", Proceedings of SPIE conference on Document Recognition.
Liu, Y. and Srihari, S. N. (May 1997). “Document Image Binarization Based on Texture Features”, IEEE Trans. on PAMI, vol. 19(5), pp. 540-544.
global, general purpose
local, general-purpose
local, special-purpose
global, special-purpose
18© M. M. Haji, 2005
Binarization
Input
Otsu
Wu and Manmatha
Histogram
Niblack
Liu and Srihari
19© M. M. Haji, 2005
Binarization
Quality improvement by preprocessing and postprocessing. Preprocessing:
Taylor, M. J. and Dance, C. R. (Sep. 1998). "Enhancement of Document Images from Cameras", Proceedings of SPIE conference on Document Recognition, pp. 230-241.
Postprocessing: Trier, D. and Taxt, T. (March 1995). "Evaluation of Binarization Methods for
Document Images", IEEE Trans. on PAMI, vol. 17(3), pp. 312-315.
Unsharp Masking 3 Binarization 3Input Output
super-resolution
20© M. M. Haji, 2005
Skew Correction
21© M. M. Haji, 2005
Skew Correction
The angle that text lines deviate from the x-axis. Page decomposition techniques require properly aligned
images as input. 3 types:
global skew multiple skew non-uniform skew
“Skew correction" is applied by a rotation after "skew detection“.
22© M. M. Haji, 2005
Skew Correction
Categories based on the underlying techniques: Projection Profile Correlation Hough Transform Mathematical Morphology Fourier Transform Artificial Neural Networks Nearest-Neighbor Clustering
23© M. M. Haji, 2005
Skew Correction
The projection profile at the global skew angle of the document has narrow peaks and deep valleys.
24© M. M. Haji, 2005
Skew Correction
Projection profile technique:
θ)))(I,(( max argmaxmin
rotateProfileProjectionhorizontalfAngleSkewglobal
i
ihihSD 2))1()((
goodness measure
25© M. M. Haji, 2005
Skew Correction
Limiting the range of skew angles. Binary search for finding the maximizer of a function. Computing the sum of pixels along parallel lines at an angle, instead
of rotation at the angle. Reducing the size of input image, as much as structure of text lines
is preserved. MIN, MAX downsampling
Local skew correction, after line segmentation, by robust line fitting.
26© M. M. Haji, 2005
Slant Correction
uniform
non-uniform
27© M. M. Haji, 2005
Slant Correction
The deviation of average near-vertical strokes from the vertical direction.
Occurring in handwritten and machine-printed texts.
اراک Slant is non-informative. The average slant angle is estimated first and then a shear
transformation in horizontal direction is applied to the word (or line) image to correct its slant.
28© M. M. Haji, 2005
Slant Correction
The most effective methods are based on the analysis of vertical projection profiles (histograms) at various angles.
Identical to the projection profile based methods for skew correction, except that: The histograms are computed in vertical rather than horizontal direction. Shear transformation is used instead of rotation.
Accurate result for handwritten words with uniform slant. Robust to noise.
29© M. M. Haji, 2005
Slant Correction
30© M. M. Haji, 2005
Slant Correction
Projection profile technique:
θ)))(I,(( max argmaxmin
ShearhorizontalProfileProjectionverticalfAngleslant
i
ihihSD 2))1()((
goodness measure
31© M. M. Haji, 2005
Slant Correction
Postprocessing: Smoothing jagged edges.
1 1 1
1 p 0
1 1 1
1 0 0
1 p 0
1 0 0
…
A part of a slanted word after slant correction and after smoothing
32© M. M. Haji, 2005
Skeletonization
33© M. M. Haji, 2005
Skeletonization
Skeletonization or medial axis transform (MAT) of a shape has been one the most surveyed problems in image processing and machine vision.
A skeletonization (thinning) algorithm transforms a shape into arcs and curves of thickness one which is called skeleton.
An ideal skeleton has the following properties: retaining basic structural properties of the original shape well-centered well-connected precisely reconstructable robust
34© M. M. Haji, 2005
Skeletonization
Simplifying classification: Diminishing variability and distortion of instances of one class. Reducing the amount of data to be handled.
Proved to be effective in pattern recognition problems: Character recognition Fingerprint recognition Chromosome recognition …
Providing compact representations and structural analysis of objects.
35© M. M. Haji, 2005
Skeletonization
Five different skeletonization algorithms were compared and contrasted with the main focus on preserving text characteristics:
Naccache, N. J. and Shinghal, R. (1984). "SPTA: A Proposed Algorithm for Digital Pictures", IEEE Trans. on Systems, Man and Cybernetics, vol. SMC-14(3), pp. 409-418.
Zhang, T. Y. and Suen, C. Y. (1984). "A Fast Parallel Algorithm for Thinning Digital Patterns", Comm. ACM, vol. 27(3), pp. 236-239.
Ji, L. and Piper, J. (1992). "Fast Homotopy-Preserving Skeletons Using Mathematical Morphology", IEEE Trans. on PAMI, vol. 14(6), pp. 653 - 664.
Sajjadi, M. R. (Oct. 1996). "Skeletonization of Persian Characters", M. Sc. Thesis, Computer Science and Engineering Department, Shiraz University, Iran.
Huang, L., Wan, G. and Liu, C. (2003). "An Improved Parallel Thinning Algorithm", Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003), pp. 780-783.
36© M. M. Haji, 2005
Skeletonization
Input Homotopy-Preserving Zhang-Suen
SPTA DTSA Huang et al.
37© M. M. Haji, 2005
Skeletonization
Input Homotopy-Preserving Zhang-Suen
SPTA DTSA Huang et al.
38© M. M. Haji, 2005
Skeletonization
Input SPTA
DTSA Huang et al.
robustness to border noise
39© M. M. Haji, 2005
Skeletonization
Postprocessing: Removing spurious branches
40© M. M. Haji, 2005
Skeletonization
Modification: Removing 4-connectivity, and preserving 8-connectivity of the pattern.
0 1 x
1 p 0
x 0 0
x 1 x
1 p 1
x 0 x
…
41© M. M. Haji, 2005
Structural Feature Extraction
The connectivity number Cn:
Cn=0 Cn=1 Cn=2
Cn=2 Cn=3 Cn=4
dot end-point
branch-point cross-point
42© M. M. Haji, 2005
Structural Feature Extraction
Capable of tolerating much variation. Not robust to noise. Hard to extract. 1D HMM needs 1D observation sequence. Converting 2D word image into a 1D signal.
speech recognition, online handwritten recognition: 1D signal. offline handwritten recognition: 2D signal.
43© M. M. Haji, 2005
Structural Feature Extraction
Converting the word skeleton into a graph. Tracing the edges in a canonical order:
1
2
5
3
4
6
7
End-PointEnd-Point
Branch-Point
44© M. M. Haji, 2005
Structural Feature Extraction
Loop Extraction: Important distinctive features. Making the number of strokes smaller:
Easier Modeling Lower Computational Cost
Different types of loops: simple-loop multi-link-loop double-loop
A DFS algorithm was written to find complex loops in the word graph.
45© M. M. Haji, 2005
Structural Feature Extraction
simple-loopmulti-link-loopmulti-link-loopdouble-loopdouble-loopdouble-loop
صـص ف مـ و...
ـصــطـمــوـه...
هــهـ
46© M. M. Haji, 2005
Structural Feature Extraction
Each edge is transformed into a 10D feature vector: Normalized length feature (f1)
Curvature feature (f2)
Slope feature (f3)
Connection type feature (f4)
Endpoint distance feature (f5 )
Number of segments feature (f6 )
Curved features (f7-f10)
Independent of the baseline location. Invariance against scaling, translation and rotation.
47© M. M. Haji, 2005
Structural Feature Extraction
1: [0.68, 1.00, 6, 0 , 0.05, 1, 0.0, 0.0, 0.7, 0.0]
2: [0.11, 1.01, 6, 1 , 0.23, 1, 0.0, 0.0, 0.0, 0.0]
3: [2.00, 3.00, 8, 10, 0.00, 0, 0.0, 0.0, 0.0, 0.0]
...
48© M. M. Haji, 2005
Hidden Markov Models
Signal Modeling: Deterministic Stochastic:
Characterizing the signal by a parametric random process. HMM is a widely used statistical (stochastic) model:
The most widely used technique in modern ASR systems. Speech and handwritten text are similar:
Symbols with ambiguous boundaries. Symbols with variations in appearance.
Not modeling the whole pattern as a single feature vector, exploring the relationship between consecutive segments.
49© M. M. Haji, 2005
Hidden Markov Models
Nondeterministic finite state machines: Probabilistic state transition. Each state is associated with
a random function. Unknown state sequence. Some probabilistic function of
the state sequence can be seen.
Sunny
Cloudy Rainy0.2
0.3
0.8
0.6 0.4
50© M. M. Haji, 2005
Hidden Markov Models
N: The Number of states of the model
S = {s1, s2, ..., sN}: The set of states
∏ = {πi= P(si at t = 1)}: The initial state probabilities
A = {aij = P(sj at t+1 | si at t)}: The state transition probabilities
M: The Number of observation symbols
V = {v1, v2, ..., vM}: The set of possible observation symbols
B = {bi(vk) = P(vk at t | si at t}: The symbol emission probabilities
Ot: The observed symbol at time t
T: The length of observation sequence
λ = (A, B, ∏): The compact notation to denote the HMM.
51© M. M. Haji, 2005
Left-to-Right HMMs
S1 S2 S3 S4 S5
.
.
S1 S2 S3 S4 S5
.
.
A 5-state Left-to-Right HMM
A 5-state Left-to-Right HMM with maximum relative forward jump of 2
52© M. M. Haji, 2005
Hidden Markov Models
The Three Fundamental Problems:
1. Given a model λ = (A, B, ∏), how do we compute P(O | λ), the probability of occurrence of the observation seq. O = O1, O2, ..., OT.
The Forward-Backward Algorithm
2. Given the observation sequence O and a model λ, how do we choose a state sequence S = s1, s2, ..., sT so that P(O, S | λ) is maximized, i.e. finding a state sequence that best explains the observation.
The Viterbi Algorithm
3. Given the observation sequence O, how do we adjust the model parameters λ = (A, B, ∏) so that P(O | λ) or P(O, S | λ) is maximized. i.e. finding a model that best explains the observed data.
The Baum-Welch Algorithm, The Segmental K-means Algorithm
53© M. M. Haji, 2005
Hidden Markov Models
Discrete HMM: Discrete observation sequences: V = {v1, v2, ..., vM}.
A codebook obtained by Vector Quantization (VQ). Codebook size?
Distortion: information loss due to the quantization error!
Continuous Hidden Markov Model (CHMM): Overcoming the distortion problem. Requiring more parameters → more memory More deliberate initialization techniques:
Diverging with randomly selected initial parameters!
54© M. M. Haji, 2005
Hidden Markov Models
Multivariate Gaussian mixture:
))()(2
1exp(
||)2(
),;()(
1
1
1
M
m
Timtimimt
imK
im
M
mimimtimti
ooc
ocob
cim: The mth mixture gain coefficient in state i
μim: The mean of the mth mixture in state i
∑im: The covariance of the mth mixture in state i
M: The number of mixtures used
K: The dimensionality of the observation space
55© M. M. Haji, 2005
The Block Diagram of the Recognition System
NormalizationFeature
Extraction
P(O | λ1)
P(O | λ2)
P(O | λn)
ObservationSequenceInput Word Image
RankedWord List
Evaluate the likelihood of the observationsequence by the Viterbi algorithm againstall models
56© M. M. Haji, 2005
The Class Diagram of the Experimental Recognition System
WordClassifier
HMMWordClassifier NNWordClassifier
MLPWordClassifier
CHMMWordClassifier DHMMWordClassifier
CodeBook
1-codebook1
DHMMWordModel
1
-models2..*
CHMMWordModel
1
-models2..*
FeatureExtractor
FixedSizeFeatureExt.
1
-fe 1
1
-ffe
1
StructuralFeatureExt.
FourierFeatureExt.
57© M. M. Haji, 2005
Text SegmentationGlobal SkewCorrection
Line ExtractionLocal SkewCorrection
Slant Correction
Input Image
BinarizationWord
SegmentationDenoising and
Smoothing
Skeletonization Feature ExtractionMulti-CHMMRecognition
Output Text
HeightNormalization
An Overview of the Complete System
Two-Stage Skew Correction Postponed Binarization
58© M. M. Haji, 2005
Training Data
The recognition system was trained and evaluated on a dataset of 100 city names of Iran.
A pattern recognition problem with 100 classes was considered. Most samples in the dataset were automatically generated by a
Java program drawing input string with different fonts, sizes and orientations on output image.
The dataset contains 150 samples for each word.
59© M. M. Haji, 2005
Training Data
60© M. M. Haji, 2005
Training Data
61© M. M. Haji, 2005
Experimental Results
1-best recognized
62© M. M. Haji, 2005
Experimental Results
1-best recognized
63© M. M. Haji, 2005
Experimental Results
1-best recognized
64© M. M. Haji, 2005
Experimental Results
1-best recognized
65© M. M. Haji, 2005
Experimental Results
1-best recognized
66© M. M. Haji, 2005
Experimental Results
3-best recognized:
. 3 اصفهان. 2 زنجان .1دامغان
67© M. M. Haji, 2005
Experimental Results
4-best recognized:
قم. 2 قشم .1 مرند. 3 مشهد. 4
68© M. M. Haji, 2005
Experimental Results
Not N-best recognized, for N ≤ 20
69© M. M. Haji, 2005
Experimental Results
Not N-best recognized, for N ≤ 20
70© M. M. Haji, 2005
Experimental Results
Not N-best recognized, for N ≤ 20
71© M. M. Haji, 2005
Conclusion
The first work to use CHMMs with structural features to recognize Farsi handwritten words.
A complete offline recognition system for Farsi handwritten words.
A new machine learning approach based on the NBC for text segmentation.
Comparing and contrasting different algorithms for: Binarization Skew and Slant Correction Skeletonization
Excellent generalization performance. A maximum recognition rate of 82% on our dataset of
size 100.
72© M. M. Haji, 2005
Thanks for your attention
Please feel free to ask any question