imaged document text retrieval without ocr ieee trans. on pami vol.24, no.6 june, 2002...
TRANSCRIPT
![Page 1: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/1.jpg)
Imaged Document Text Retrieval without OCR
IEEE Trans. on PAMI vol.24, no.6
June, 2002
報告人:周遵儒
![Page 2: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/2.jpg)
Outline IntroductionHTD and VTDClass of Character ObjectsSimilarity Measure of DocumentsExperimental ResultsConclusions
![Page 3: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/3.jpg)
IntroductionRetrieval of Imaged DocumentsProcess with OCR v.s. without OCRLanguage dependence v.s. language
independence
![Page 4: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/4.jpg)
Procedure Image Preprocessing Feature extraction of character objects
Horizontal Traverse Density (HTD) Vertical Traverse Density (VTD)
Clustering To Identify classes of character objects
Document representation Hash Table
N-Gram To construct indexes for imaged document
retrieval
![Page 5: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/5.jpg)
Features: HTD and VTD
![Page 6: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/6.jpg)
Class of Character ObjectsUnsupervise Clustering with HTD and V
TDDistance measure of character objects
![Page 7: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/7.jpg)
Distance Measure of Character Objects
![Page 8: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/8.jpg)
Examples of Character Objects
![Page 9: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/9.jpg)
Similarity Measure of Documents
N-Gram AlgorithmCosine angle between two documents
![Page 10: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/10.jpg)
CorpusUW1 database (600 dpi)
![Page 11: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/11.jpg)
Experimental Results
Corpus IE01-E26
![Page 12: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/12.jpg)
Experimental ResultsCorpus II
![Page 13: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/13.jpg)
Experimental Results
![Page 14: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/14.jpg)
Experimental Results
![Page 15: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/15.jpg)
Experimental Results
![Page 16: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/16.jpg)
Experimental Results
![Page 17: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒](https://reader030.vdocuments.net/reader030/viewer/2022033100/56649f565503460f94c7af8b/html5/thumbnails/17.jpg)
Conclusion and Future WorkA new method for image document
retrieval without OCRRetrieval of language independence Improvement of robustness for different
fonts and noisy documents