design and comparison of segmentation driven and recognition
TRANSCRIPT
![Page 1: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/1.jpg)
Design and Comparison of Segmentation Driven and Recognition Driven
Devanagari OCR
Suryaprakash Kompalli, Srirangaraj Setlur, Venu Govindaraju
Department of Computer Science and Engineering, University at Buffalo
![Page 2: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/2.jpg)
Outline• Background• Segmentation driven OCR• Recognition driven OCR• Character recognition results• Post processing• Word recognition results• Contributions • Work in progress
![Page 3: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/3.jpg)
Background(Alphabet and terminology)
Devanagari alphabet (glyphs) Forming words, characters and components
Ascenders
Descenders
CoreHead line
Base line
Word
Characters
Glyphs
Components
Shirorekha
![Page 4: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/4.jpg)
Background(Segmentation level vs Class space)
Holistic techniques may be used to recognize words without segmentation
Character:Segmentation is rarely dependant on fontClass space: ~1000 characters [CEDAR-ILT]
Glyph/Alphabet:Segmentation needs to address font variationsClass space: ~129
Component:Segmentation is not as tough as character to glyphClass space: ~82
![Page 5: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/5.jpg)
Background(Character distribution in Devanagari)
12% of all characters need complex segmentation especially in multi-font OCR [CEDAR-ILT data set, Pal 2002, Bansal 2002]
Conjuncts (Two consonants fused, 6%)
Vowel modifiers (6%)
88% of all characters may be segmented by removing shirorekha
Vowels/consonants (45%)
Vowels/consonants with modifiers (43%)
• Goal of an ideal system should be to prevent:– Over-segmentation of the 88% – Under-segmentation in the 12%
![Page 6: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/6.jpg)
• OCR paradigms: [Casey 96]– Dissection (Segmentation driven OCR):
– Recognition driven:
– Holistic:
Background(Recognition paradigms)
Input word Segmentation Classification Post-processing
Rank or modify segmentation
Segmentation driven Recognition driven Holistic
Input word Feature extraction Classification Post-processing
Input word Segmentation Classification Post-processing
![Page 7: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/7.jpg)
• Study level of segmentation in Devanagari– We compare component level and character
level classifiers • Prevent under-segmentation and over-
segmentation in multi-font Devanagari OCR– We outline a new representation scheme to
enable non-linear, multi-font segmentation– We design a recognition driven OCR
framework• Design a suitable language model to
enhance classifier results
Background(Goals and achievements)
![Page 8: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/8.jpg)
Outline• Background• Segmentation driven OCR• Recognition driven OCR• Character recognition results• Post processing• Word recognition results• Contributions • Work in progress
![Page 9: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/9.jpg)
Avg. height
CoreDescender
(b) Character separation
(a) Shirorekha and ascender separation
(c) Descender separationComponent images,
input to classifier
Segmentation driven OCR(Segmentation)
AscenderShirorekha
Ascender
Descender
Core
• Shirorekha and ascender separation done using horizontal profile
• Vertical profile used for character separation• Average height of a line of text used to separate
descenders• Component images are normalized to 32 X 32
![Page 10: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/10.jpg)
Segmentation driven OCR(Classifier design)
• Some core components are placed in more than one neural network– E.g.: is placed in no bar and right bar neural network
• Cumulative accuracy of core recognizer: 74%
No bar
Center/left bar
Right bar
Multiple bars
Ascender(7 classes)
Feature extraction 4 class nearest neighbor
Descender(2 classes) Feature extraction 2 class nearest neighbor Post-processing
Core(68 classes)
Feature extraction
Identify location and number of vertical bars
20 Class neural network
6 Class neural network
46 Class neural network
11 Class neural networkAccuracy: 85%
Accuracy: 93%
Accuracy: 89%
Accuracy: 91%
Accuracy: 95%
Accuracy: 72%
Accuracy: 92%
![Page 11: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/11.jpg)
Outline• Background• Segmentation driven OCR• Recognition driven OCR• Character recognition results• Post processing• Word recognition results• Contributions • Work in progress
![Page 12: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/12.jpg)
Recognition driven OCR (BAG creation)
• Build a Line Adjacency Graph (LAG) for each word (character shown for clarity)
• Identify curves, merging or splitting runs to create a Block Adjacency Graph (BAG)
• Remove noisy elements, combine small blocks with neighbors
Merging runs
Split runs
Curve
![Page 13: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/13.jpg)
Recognition driven OCR (BAG creation)
Branching
Merging
![Page 14: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/14.jpg)
Recognition driven OCR (Conjunct segmentation using BAG)
Block adjacency graph for the conjunct
Combinations of blocks give core component hypothesis. (11 in this case)
Half consonant
Œ Fullconsonant
11 blocks
6 left + 5 right blocks
1 left block + 10 right blocks
11 left + 0 right blocks
Conjunct character
![Page 15: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/15.jpg)
Recognition driven OCR (Descender segmentation using BAG)
• Blocks corresponding to vowel modifiers occur at the bottom or side
• Core components can be selected from top to bottom or left to right
![Page 16: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/16.jpg)
Recognition driven OCR (Component classifier)
• Receiver-operator characteristics are analyzed and equal error rare confidence is selected as threshold
Ascender(7 classes)
GSC Features 7 class nearest neighbor
Post-processingGSC Features 5 Class nearest neighbor
Top 3 results
Is top choice confidence > threshold
Reject the hypothesis
Yes
No
42 Class nearest neighborGSC Features
Descender hypotheses
Core hypotheses
Top 3 results
Componenthypotheses
![Page 17: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/17.jpg)
• 512 Gradient, Structural and Concavity (GSC) features [Favata et al 96] :
• Classifier:– K-nearest neighbor with k=3– Top-3 choices are returned
Recognition driven OCR (Component classifier)
192 gradient features with gradients quantized in 12 directions
192 structural features: Horizontal, vertical, diagonal and corner mini-strokes
128 concavity: pixel density, horizontal, vertical and concavity features
![Page 18: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/18.jpg)
Recognition driven OCR (BAG creation)
• Identify ascenders by removing shirorekha (header line)• Use average height of core components to obtain baseline• Retain shirorekha after obtaining core components
Shirorekha
RetainedShirorekha
Ascender
Shirorekha
Baseline
Baseline
![Page 19: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/19.jpg)
Ascender
Recognition driven OCR (Details: Consonant/vowel and ascender)
Core
Ascenders found?
Obtain BAG (B0-m) from word image
Obtain shirorekha and baseline
Classify and remove ascenders
Confidence abovethreshold?
Classify consonants/vowels
Start processing words
Yes
No
Yes
No
Seg
Shirorekha
Baseline
Post-processing
![Page 20: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/20.jpg)
Recognition driven OCR (Details: Consonant/vowel and ascender)
Are any blocks below baseline?
Yes
Seg
Segment character from top to bottom
Classifyhalf-consonants
Segment character from left to right
Large aspect ratio/ block count?
Conjunct, consonant-descenderand half-consonant processing
No
No
Yes
Descendercharacter
Conjunct character
Post-processing
![Page 21: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/21.jpg)
Recognition driven OCR (Results of each stage)
Input word with 5 types of components: ascenders, characters w/o modifiers, conjuncts, descenders, fragmented characters
Accuracy: 83%
Work in progress
FRR = 0; FAR = 0;
FRR = 4.93% character w/o modifier FAR = 8.28% conjuncts
4.38% descender characters
Identify and remove ascenders
Identify and removecharacters
w/o modifiers
Identify and removecharacters
with descenders
Classify half-characters
Identify conjunct characters
Classify ascenders(6 subclasses)
Classify consonants/vowels
(40 subclasses)
Segment and classifycharacter with descender
Segment and classifyconjunct character
99.38%top 1
99.75%accuracytop 1
94.12%top 5
85.57%top 5
![Page 22: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/22.jpg)
Outline• Background• Segmentation driven OCR• Recognition driven OCR• Character recognition results• Post processing• Word recognition results• Contributions • Work in progress
![Page 23: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/23.jpg)
Character recognition results(Descender recognition example)• Segmentation driven OCR:
• Recognition driven OCR:Average height used to obtain descender Segmentation Classifier output Truth
Shirorekha
Baseline
Core component separation
Classification
, 0.68
, 0.23
Threshold confidences
, 0.42, 0.36
…, 0.49, 0.31…
Segmentation:
Classifier result:
![Page 24: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/24.jpg)
Character recognition results(Descender recognition results)
• Segmentation driven OCR:– Over-segmentation error: 5.73%– Under-segmentation error: 73%
• Recognition driven OCR:– Over-segmentation error: 4.93%– Under-segmentation error: ~17%
![Page 25: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/25.jpg)
• Segmentation driven OCR has fixed class space • Recognition driven OCR attempts partial results
– E.g.: is a fused character misrecognized as
– E.g.: is not present in class space
Character recognition results(Conjunct recognition example)
Segmentation hypotheses:
Classifier result:
Recognition driven OCR gives the consonants at different segmentation points
Recognition driven OCR gives correct results
Segmentation hypotheses:
Classifier result::
![Page 26: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/26.jpg)
Character recognition results (Conjunct recognition results)
• Segmentation driven: – Only 32 classes present, covering 60.32% conjuncts
• Recognition driven:– Handles additional 65 classes, covering 87.60% of all conjuncts– Lends itself to post-processing
![Page 27: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/27.jpg)
Outline• Background• Segmentation driven OCR• Recognition driven OCR• Character recognition results• Post processing• Word recognition results• Contributions • Work in progress
![Page 28: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/28.jpg)
Recognition driven OCR gives lattice of components Eg:
Post processing (OCR framework)
Lattice containing component hypothesis
Segmentation driven OCR gives one result for each component Eg:
![Page 29: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/29.jpg)
Post processing (Possible approaches)• Prune classifier results using rules of “script
writing grammar” [Sinha 87]:– E.g.: Vowel modifiers must be preceded by a
consonant• Use Devanagari phonetic properties: [Ohala 83]
– Breathy voiced stops do notfollow each other
– Very few consonants occur twice in the same word– BVS rarely co-occur with vowel modifiers in between
• Stochastic language models can be used before dictionary lookup
![Page 30: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/30.jpg)
• Stochastic FSA can represent rules and statistical measures.
• Example:
Post processing (Implementation)
CV1 CV2
Trigger: P( , ) = 0.5
S: Start/Accept statehC: State after accepting half-consonantC: State after accepting full-consonantCV1,CV2 : States after accepting vowel modifiers
hC CS
A simplified FSA to reject and accept and
![Page 31: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/31.jpg)
Post processing (Implementation)
• Example:
CV2 C CS E
Trigger: Same consonant in a word
Transition probabilities of the FSA favor over
![Page 32: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/32.jpg)
Outline• Background• Segmentation driven OCR• Recognition driven OCR• Character recognition results• Post processing• Word recognition results• Contributions • Work in progress
![Page 33: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/33.jpg)
Word recognition results(Example)
A word with fused character, word options ~25
5 words are left after FSA based pruning
![Page 34: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/34.jpg)
Word recognition results(Example)
String edit distance
Input word:
Segmentation:
Recognition:
Input word with conjunct and fused character
Input word with descenderInput word with no descender, conjunct or fused characters
![Page 35: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/35.jpg)
Word recognition results(Segmentation driven vs Recognition driven)
• Average string edit distance decreased by 50%– Number of errors cut by almost half
• Number of words at edit distance 4 decreased by 50%
• Edit distance 1 results nearly doubled
• Average string edit distance decreased by 50%– Number of errors cut by almost half
• Number of words at edit distance 4 decreased by 50%
• Edit distance 1 results nearly doubled
![Page 36: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/36.jpg)
Word recognition results(Comparison with prior work)
• Most reported results are on font-specific systems
• Recognition driven OCR is superior for multi-font data
![Page 37: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/37.jpg)
Outline• Background• Segmentation driven OCR• Recognition driven OCR• Character recognition results• Post processing• Word recognition results• Contributions • Work in progress
![Page 38: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/38.jpg)
• New representation scheme for nonlinear, multi-font character segmentation
• Framework for recognition driven Devanagari OCR– Recognition results are better than segmentation
driven OCR• Stochastic language model to prune OCR
results before dictionary lookup• 75.28% word recognition on multi-font
documents
Contributions
![Page 39: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/39.jpg)
Work in progress(Enhancing the Devanagari language model)
• Adding additional rules into the language model• Comparison with studies in entropy-reduction
– Word level trigger pairs reduce cross-entropy of English by 17-24% [Rosenfeld 96]
• Application: Speech recognition results improved by 10-14% with this model
– Character n-grams:• Classing used to improve bi-gram probabilities P(xi|xi-1)
– E.g.: All digits placed in one class• Linear combination of history used to obtain probability
– Pcombined(xi|h) = jP(x|hj), where j {1…. k}
• Using all 3 top choices of classifier, only top choice is being used currently
![Page 40: Design and Comparison of Segmentation Driven and Recognition](https://reader036.vdocuments.net/reader036/viewer/2022070222/613d2558736caf36b759dbd1/html5/thumbnails/40.jpg)
Work in progress(Enhancing the Devanagari language model)
• Classing done using phonetic properties of characters• Obtain a lower entropy using proposed language model and
compare with:– Random classing– Reduction in number of classes (Reducing the number of classes
inherently decreases the entropy)