dr. istván marosi scansoft-recognita, inc., hungary ssip 2005, szeged character recognition...
DESCRIPTION
04 Jul 2005Istvan Marosi OCR Internals Main tasks of an OCR system: Image acquisition Get image B/W Scanning Gray Scanning Color Scanning Load from image file Preprocess image Layout recognition Text recognition User assisted correction Result exportationTRANSCRIPT
![Page 1: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/1.jpg)
Dr. István MarosiScansoft-Recognita, Inc., Hungary
SSIP 2005, Szeged
Character Recognition Internals
![Page 2: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/2.jpg)
04 Jul 2005 Istvan Marosi
OCR InternalsMain tasks of an OCR system:
Image acquisitionLayout recognitionText recognitionUser assisted correctionResult exportation
![Page 3: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/3.jpg)
04 Jul 2005 Istvan Marosi
OCR InternalsMain tasks of an OCR system:
Image acquisitionGet image
B/W ScanningGray ScanningColor ScanningLoad from image file
Preprocess imageLayout recognitionText recognitionUser assisted correctionResult exportation
![Page 4: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/4.jpg)
04 Jul 2005 Istvan Marosi
OCR InternalsMain tasks of an OCR system:
Image acquisitionGet imagePreprocess image
Color separationThresholdingDespecklingRotationDeskewing
Layout recognitionText recognitionUser assisted correctionResult exportation
Color SeparationDe-speckle, de-skew
![Page 5: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/5.jpg)
04 Jul 2005 Istvan Marosi
The Preprocessed ImageJoined chars
![Page 6: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/6.jpg)
04 Jul 2005 Istvan Marosi
Joined charsThe Preprocessed Image
![Page 7: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/7.jpg)
04 Jul 2005 Istvan Marosi
The Preprocessed ImageJoined chars
![Page 8: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/8.jpg)
04 Jul 2005 Istvan Marosi
The Preprocessed ImageBroken chars
![Page 9: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/9.jpg)
04 Jul 2005 Istvan Marosi
The Preprocessed ImageBroken chars
![Page 10: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/10.jpg)
04 Jul 2005 Istvan Marosi
The Preprocessed ImageBroken chars
![Page 11: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/11.jpg)
04 Jul 2005 Istvan Marosi
OCR InternalsMain tasks of an OCR system:
Image acquisition
Layout recognitionText zones
Columns of flowed textTablesInverse text
Graphic zonesText recognitionUser assisted correctionResult exportation
![Page 12: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/12.jpg)
04 Jul 2005 Istvan Marosi
OCR InternalsMain tasks of an OCR system:
Image acquisition
Layout recognitionText zonesGraphic zones
Line ArtPhoto
Text recognitionUser assisted correctionResult exportation
![Page 13: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/13.jpg)
04 Jul 2005 Istvan Marosi
OCR InternalsMain tasks of an OCR system:
Image acquisitionLayout recognition
Text recognitionSegmentationCalculation of Feature Vector ElementsClassificationLanguage AnalysisVoting
User assisted correctionResult exportation
![Page 14: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/14.jpg)
04 Jul 2005 Istvan Marosi
SegmentationWhat are those pixel groups belonging to a single letter?
![Page 15: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/15.jpg)
04 Jul 2005 Istvan Marosi
SegmentationWhat are those pixel groups belonging to a single letter?
![Page 16: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/16.jpg)
04 Jul 2005 Istvan Marosi
SegmentationWhat are those pixel groups belonging to a single letter?
![Page 17: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/17.jpg)
04 Jul 2005 Istvan Marosi
SegmentationWhat are those pixel groups belonging to a single letter?
![Page 18: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/18.jpg)
04 Jul 2005 Istvan Marosi
SegmentationWhat are those pixel groups belonging to a single letter?
![Page 19: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/19.jpg)
04 Jul 2005 Istvan Marosi
SegmentationWhat are those pixel groups belonging to a single letter?
![Page 20: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/20.jpg)
04 Jul 2005 Istvan Marosi
SegmentationWhat are those pixel groups belonging to a single letter?
![Page 21: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/21.jpg)
04 Jul 2005 Istvan Marosi
OCR InternalsMain tasks of an OCR system:
Image acquisitionLayout recognition
Text recognitionSegmentationCalculation of Feature Vector ElementsClassificationLanguage AnalysisVoting
User assisted correctionResult exportation
![Page 22: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/22.jpg)
04 Jul 2005 Istvan Marosi
Calculation of FV Elements: Contour Tracing
Find a (new) white-black transitionFollow the “edge” of the pixels using the MIN or MAX ruleAdministrate the already traced white-black transitionsCollect information while going aroundAnd repeat the process on new shapes ...
![Page 23: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/23.jpg)
04 Jul 2005 Istvan Marosi
Contour TracingFind a (new) white-black transition
Follow the “edge” of the pixels using the MIN or MAX ruleAdministrate the already traced white-black transitionsCollect information while going aroundAnd repeat the process on new shapes ...
![Page 24: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/24.jpg)
04 Jul 2005 Istvan Marosi
Contour TracingFind a (new) white-black transitionFollow the “edge” of the pixels using the MIN or MAX rule
if black(a) then turn(ccw)else if black(b) then forwardelse turn(cw)
a b
![Page 25: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/25.jpg)
04 Jul 2005 Istvan Marosi
Contour TracingFind a (new) white-black transitionFollow the “edge” of the pixels using the MIN or MAX rule
if black(a) then turn(ccw)else if black(b) then forwardelse turn(cw)
a b
if white(b) then turn(cw)else if white(a) then forwardelse turn(ccw)
ab
![Page 26: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/26.jpg)
04 Jul 2005 Istvan Marosi
Contour TracingFind a (new) white-black transitionFollow the “edge” of the pixels using the MIN or MAX ruleAdministrate the already traced white-black transitionsCollect information while going aroundAnd repeat the process on new shapes ...
![Page 27: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/27.jpg)
04 Jul 2005 Istvan Marosi
Some Easily Calculatable Data
Problem #1
Turning CW: In=In-1+1Turning CCW: In=In-1-1 Going Forward: In=In-1
![Page 28: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/28.jpg)
04 Jul 2005 Istvan Marosi
Some Easily Calculatable Data
Problem #2
Turning CW: In=In-1+1Turning CCW: In=In-1-1 Going Forward: In=In-1
![Page 29: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/29.jpg)
04 Jul 2005 Istvan Marosi
Some Easily Calculatable Data
Problem #3
Going Up: In=In-1-Xn
Going Down: In=In-1+Xn
Going Right: In=In-1
Going Left: In=In-1
![Page 30: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/30.jpg)
04 Jul 2005 Istvan Marosi
Some Easily Calculatable Data
Problem #4
Going Up: In=In-1-Xn
Going Down: In=In-1+Xn
Going Right: In=In-1
Going Left: In=In-1
![Page 31: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/31.jpg)
04 Jul 2005 Istvan Marosi
OCR InternalsMain tasks of an OCR system:
Image acquisitionLayout recognition
Text recognitionSegmentationCalculation of Feature Vector ElementsClassificationLanguage AnalysisVoting
User assisted correctionResult exportation
AB A B
![Page 32: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/32.jpg)
04 Jul 2005 Istvan Marosi
A
B AB
Classification; Training modelsRestricted Coulomb Energy (RCE) Network(Dr. Leon Cooper, Dr. Charles Elbaum)
![Page 33: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/33.jpg)
04 Jul 2005 Istvan Marosi
Classification; Training modelsRestricted Coulomb Energy (RCE) Network(Dr. Leon Cooper, Dr. Charles Elbaum)
A
B AB
![Page 34: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/34.jpg)
04 Jul 2005 Istvan Marosi
Classification; Training modelsNestor Learning System (NLS)
![Page 35: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/35.jpg)
04 Jul 2005 Istvan Marosi
Classification; Training modelsNestor Learning System (NLS)
Default radius Rmax
![Page 36: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/36.jpg)
04 Jul 2005 Istvan Marosi
Classification; Training modelsNestor Learning System (NLS)
![Page 37: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/37.jpg)
04 Jul 2005 Istvan Marosi
Classification; Training modelsNestor Learning System (NLS)
Default radius Rmax
![Page 38: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/38.jpg)
04 Jul 2005 Istvan Marosi
Classification; Training modelsNestor Learning System (NLS)
![Page 39: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/39.jpg)
04 Jul 2005 Istvan Marosi
Classification; Training modelsNestor Learning System (NLS)
![Page 40: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/40.jpg)
04 Jul 2005 Istvan Marosi
Classification; Training modelsNestor Learning System (NLS)
Default radius Rmax
![Page 41: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/41.jpg)
04 Jul 2005 Istvan Marosi
Classification; Training modelsNestor Learning System (NLS)
![Page 42: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/42.jpg)
04 Jul 2005 Istvan Marosi
Classification; Training modelsNestor Learning System (NLS)
Decreased radius
![Page 43: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/43.jpg)
04 Jul 2005 Istvan Marosi
Classification; Training modelsNestor Learning System (NLS)
![Page 44: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/44.jpg)
04 Jul 2005 Istvan Marosi
Classification; Training modelsNestor Learning System (NLS)
Decreased radius Rmin
![Page 45: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/45.jpg)
04 Jul 2005 Istvan Marosi
Classification; Training modelsNestor Learning System (NLS)
Pass 2Decreased radius
![Page 46: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/46.jpg)
04 Jul 2005 Istvan Marosi
OCR InternalsMain tasks of an OCR system:
Image acquisitionLayout recognition
Text recognitionSegmentationCalculation of Feature Vector ElementsClassificationLanguage AnalysisVoting
User assisted correctionResult exportation
![Page 47: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/47.jpg)
04 Jul 2005 Istvan Marosi
VotingText recognition in OmniPage Pro
OCR Engines available:Caere’s engine (codename: Salt & Pepper)
Recognita’s engine (codename: Paprika)
ScanSoft’s engine (codename: Fireworx)
![Page 48: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/48.jpg)
04 Jul 2005 Istvan Marosi
Text recognition in OmniPage ProOCR Engines available:
Caere’s engine (Salt & Pepper)
Uses a Matrix Matching based algorithmfeature set: 40 cells of an 8x5 gridgood overall description of a shapeweaker at detailed structure
Recognita’s engine (Paprika)
Uses a Contour Tracing based algorithmfeture set: convex and concave arcs on the contourgood detailed description of a shapeweaker at overall structure
Voting
![Page 49: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/49.jpg)
04 Jul 2005 Istvan Marosi
Text recognition in OmniPage ProOCR Engines available:
Caere’s engine (Salt & Pepper)
Recognita’s engine (Paprika)
ScanSoft’s engine (Fireworx)
Segmentation algorithms:
Voting
![Page 50: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/50.jpg)
04 Jul 2005 Istvan Marosi
Text recognition in OmniPage ProOCR Engines available:
Caere’s engine (Salt & Pepper)
Recognita’s engine (Paprika)
ScanSoft’s engine (Fireworx)
Segmentation algorithms:Developed by independent groupsHave different strengths and weaknesses
Voting
![Page 51: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/51.jpg)
04 Jul 2005 Istvan Marosi
Text recognition in OmniPage ProOCR Engines availableSegmentation algorithms
Conclusion:They are complementaryLet’s create a voting system
Voting
![Page 52: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/52.jpg)
04 Jul 2005 Istvan Marosi
Voting strategiesExternal „Black box”voting
~20% gain
Image
Paprika Salt &Pepper
Vote
Txt 3 Txt 1
Dict
Final Txt
Voting
Fire-worx
Txt 2
![Page 53: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/53.jpg)
04 Jul 2005 Istvan Marosi
Voting strategiesExternal „Black box”voting
Internal „Shape”voting
Voting Image
Paprika
Fire-worx
BronzeTxt 3
Txt 2
Dict
Final Txt
Salt &Pepper
Txt 1
![Page 54: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/54.jpg)
04 Jul 2005 Istvan Marosi
Paprika
Original segmentation:Every independent connected component is a
character
Good segmentation: recognizeBad segmentation: reject
Image
Recognize originalsegmentationK.B.
Voting
![Page 55: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/55.jpg)
04 Jul 2005 Istvan Marosi
Paprika
Image
Recognize originalsegmentation
Txt 2Train adaptive classifier
from original shapes
K.B.
AdaptiveK.B.
VotingTxt 1
![Page 56: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/56.jpg)
04 Jul 2005 Istvan Marosi
Paprika
Try several segmentationsLoop if unrecognizable
Image
Recognize originalsegmentation
Txt 2Train adaptive classifier
from original shapes
Recognize broken andjoined shapes
K.B.
AdaptiveK.B.
VotingTxt 1
Dict
![Page 57: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/57.jpg)
04 Jul 2005 Istvan Marosi
Paprika
Image
Recognize originalsegmentation
Txt 2Train adaptive classifier
from original shapes
Recognize broken andjoined shapes
K.B.
AdaptiveK.B.
Train adaptive classifierfrom ‘ugly’ shapes
VotingTxt 1
Dict
![Page 58: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/58.jpg)
04 Jul 2005 Istvan Marosi
Paprika
Image
Recognize originalsegmentation
Txt 3
Txt 2Train adaptive classifier
from original shapes
Recognize broken andjoined shapes
K.B.
AdaptiveK.B.
Train adaptive classifierfrom ‘ugly’ shapes
Recognize more brokenand joined shapes Try several segmentations
Loop if unrecognizable
VotingTxt 1
Dict
![Page 59: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/59.jpg)
04 Jul 2005 Istvan Marosi
Image
Paprika
Fire-worx
BronzeTxt 3
Txt 1
Dict
Final Txt
Salt &Pepper
Txt 1
Voting strategies
~60% gain
Voting
![Page 60: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/60.jpg)
04 Jul 2005 Istvan Marosi
OCR InternalsMain tasks of an OCR system:
Image acquisitionLayout recognitionText recognition
User assisted correctionBy the user’s random editing...
Pop-up verifierManual Training
By proofreading of doubtful wordsResult exportation
![Page 61: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/61.jpg)
04 Jul 2005 Istvan Marosi
OCR InternalsMain tasks of an OCR system:
Image acquisitionLayout recognitionText recognition
User assisted correctionBy the user’s random editing...By proofreading of doubtful words
Correct: User dictionaryChanged: IntelliTrain
Remember trained charactersApply them on following pages
Result exportation
![Page 62: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/62.jpg)
04 Jul 2005 Istvan Marosi
IntelliTrainRecognized word: sorneUüng
![Page 63: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/63.jpg)
04 Jul 2005 Istvan Marosi
IntelliTrainRecognized word: sorneUüngFixed word: something
![Page 64: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/64.jpg)
04 Jul 2005 Istvan Marosi
IntelliTrainRecognized word: sorneUüngFixed word: something
![Page 65: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/65.jpg)
04 Jul 2005 Istvan Marosi
IntelliTrainRecognized word: sorneUüngFixed word: somethingSubstitutions found: m rn
thi Uü
![Page 66: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/66.jpg)
04 Jul 2005 Istvan Marosi
IntelliTrainRecognized word: sorneUüngFixed word: somethingSubstitutions found: m rn
thi UüPerform automatically:
Learn image pattern and substitution infoFind similar substituted (‘blue’) text on actual pageMatch against pattern of substitution and correctFind such errors on following pages, too
![Page 67: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/67.jpg)
04 Jul 2005 Istvan Marosi
OCR InternalsMain tasks of an OCR system:
Image acquisitionLayout recognitionText recognitionUser assisted correction
Result exportationCombine pages into a Document
Header / Footer recognitionPage numbersHyperlinks (e.g. „See Table 20”)
Save results
![Page 68: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/68.jpg)
04 Jul 2005 Istvan Marosi
OCR InternalsMain tasks of an OCR system:
Image acquisitionLayout recognitionText recognitionUser assisted correction
Result exportationCombine pages into a DocumentSave results
doc filee-mailSpeech synthesizer
![Page 69: Dr. István Marosi Scansoft-Recognita, Inc., Hungary SSIP 2005, Szeged Character Recognition Internals](https://reader036.vdocuments.net/reader036/viewer/2022062412/5a4d1ad67f8b9ab059973233/html5/thumbnails/69.jpg)
04 Jul 2005 Istvan Marosi