document image retrieval using bag of visual words model

109
IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

Upload: dakota-gregory

Post on 30-Dec-2015

43 views

Category:

Documents


0 download

DESCRIPTION

Document Image Retrieval using Bag of Visual Words Model. Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar. Motivation. Large number of printed books are digitized. Motivation. Large number of printed books are digitized - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Document Image Retrieval using Bag of Visual Words Model

Ravi ShekharCVIT, IIIT Hyderabad

Advisor : Prof. C.V. Jawahar

Page 2: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Motivation• Large number of printed books are digitized

Page 3: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Motivation• Large number of printed books are digitized

• Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc.

Digital Library Database

Page 4: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Motivation• Large number of printed books are digitized

• Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc.

• Need to design efficient and effective methodology for content level access

Digital Library Database

Page 5: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Process Overview

IndexDatabase

Documents

Processing Input Query

Matching

Retrieved Documents

Scanning

Matching can be done by two levels : “Text” and “Image”

Page 6: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Matching Approaches

• Recognition Based Approach (Text Level Matching)• Optical Character Recognition (OCR)

• Recognition Free Approach (Image Level Matching)• Word Spotting

Page 7: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Recognition Based Approach

• Optical Character Recognition (OCR)• Binarization of Document• Segmentation using connected components

• Line level• Word level• Character level

• Character recognition using different features like patch, profile etc• Classification using ANN or SVM

Page 8: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Limitations of Recognition Based Approach

• Cuts

Page 9: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Limitations of Recognition Based Approach

• Cuts• Merges

Page 10: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Limitations of Recognition Based Approach

• Cuts• Merges• Variation in Script

Page 11: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Limitations of Recognition Based Approach

• Cuts• Merges• Variation in Script• Variation in Font and Typesetting

Page 12: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Limitations of Recognition Based Approach

• Cuts• Merges• Variation in Script• Variation in Font and Typesetting• Underline and Over Written

Page 13: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Recognition Free Approach

• Word Spotting• Representation of word image using global (profile) features

Page 14: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Recognition Free Approach

• Word Spotting• Representation of word image using global (profile) features• Matching features using different distance measures like L1, L2 etc

Page 15: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Recognition Free Approach

• Word Spotting• Representation of word image using global (profile) features• Matching features using different distance measures like L1, L2 etc• Comparison of different size word images using Dynamic time warping

(DTW)

Page 16: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Why Recognition Free Approach ?

• Robust OCRs are unavailable for many non-Latin languages• These languages have rich heritage and there is a need for

content level search• Word Spotting based methods are too slow for real time system• Most of the existing retrieval methods are memory intensive• Scalability is an immediate challenge

Page 17: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Word Image Retrieval using Bag of Visual Words

Page 18: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Bag of Visual Words (BoVW)

• Bag of Words (BoW) representation is the most popular representation for text retrieval

• BoW based efficient systems like Lucene are publically available• Bag of Visual Words (BoVW) performs excellently for image and

video retrieval• BoVW based system is flexible, powerful and scalable to Billions

of images

Page 19: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

BoVW Representation

• Word Images are represented using Histogram of Visual Words

Page 20: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

BoVW Representation

• Code Book generation• Subset of Images is used• Clustering is done using Hierarchical K-Means (HKM)• HKM is faster than K-Means both in building tree and finding nearest

neighbours

Page 21: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

BoVW based Representation

Page 22: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

BoVW based Representation

Page 23: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Histogram of Visual Words

BoVW based Representation

Page 24: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

BoVW based Representation

Cuts

Page 25: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Histogram of Visual Words

BoVW based Representation

Cuts

Page 26: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

BoVW based Representation

Merges

Page 27: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Histogram of Visual Words

BoVW based Representation

Merges

Page 28: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Proposed Architecture

Page 29: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

• Fixed size representation

Advantages of BoVW based Representation

Page 30: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

• Fixed size representation

Advantages of BoVW based Representation

Clean

Clean

Page 31: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

• Fixed size representation• Robust against degradation

Advantages of BoVW based Representation

Page 32: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

• Fixed size representation• Robust against degradation

Advantages of BoVW based Representation

Cuts MergeClean

Page 33: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

• Fixed size representation• Robust against degradation• Scalable to Billions of images

Advantage of BoVW based Representation

Page 34: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

• Fixed size representation• Robust against degradation• Scalable to Billions of Images• Language independent

Advantages of BoVW based Representation

Page 35: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

• Lost Geometry

Spatial Verification

Page 36: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

• Lost Geometry

Spatial Verification

Clean

Clean

Page 37: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

• Lost Geometry

Spatial Verification

Clean

Clean

Clean

Page 38: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

• Lost Geometry

Spatial Verification

Clean

Clean

Clean

Page 39: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

• Lost Geometry• Spatial Verification

Spatial Verification

Page 40: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

• Lost Geometry• Spatial Verification

Spatial Verification

Page 41: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

• Lost Geometry• Spatial Verification

Spatial Verification

Page 42: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Re-ranking

• SIFT based re-ranking• Higher the Total Score, better the match

j I # SIFT iniI# SIFT in

nts#Match Poi

jI

iIScore ),(

image theofpart for Score : ) ,(

image entirefor Score : ) ,( where,

) ,(3

1) ,() ,(

kthI kjI k

iScore

jIiI Score

I kjI k

ik

Score j

Ii

I Scorej

Ii

I ScoreTotal3

1

Page 43: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Experimentations

Books Used in Experimentations

Language #Books #Pages #Words

Hindi 4 427 112677

Malayalam 6 610 108767

Telugu 5 742 131156

Bangla 3 363 124584

Hindi 32 3992 1008138

Page 44: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Quantitative Results

Performance Statistics

Language #Images #Query mAPmAP

after Re-ranking

mAP after Spatial

Verification

Hindi 112677 138 0.6808 0.7820 0.7865

Malayalam 108767 101 0.6962 0.7991 0.8188

Telugu 131156 131 0.6483 0.7328 0.7495

Bangla 124584 125 0.7806 0.8766 0.8947

Hindi 1008138 138 0.5895 0.7022 0.7062

Page 45: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Quantitative Results

Performance Statistics

Language #Images #Query Prec@10Prec@10

after Re-ranking

Prec@10 after Spatial Verification

Hindi 112677 138 0.8437 0.8719 0.8770

Malayalam 108767 101 0.7668 0.8328 0.8581

Telugu 131156 131 0.8507 0.8668 0.883

Bangla 124584 125 0.8498 0.9022 0.9182

Hindi 1008138 138 0.8059 0.8509 0.8543

Page 46: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Quantitative Results

• mAP Vs Query Length

Page 47: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Quantitative Results

• mAP Vs Query Length• More the # characters, better the results

Page 48: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Quantitative Results

Retrieval Time and Index Size

#Images Retrieval Time Index Size

25K 50ms 28 MB

100K 209ms 130 MB

0.5M 411ms 550 MB

1M 700ms 1.2 GB

Page 49: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Qualitative Results

Query Retrieved Results

HI

Page 50: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Qualitative Results

Query Retrieved Results

Page 51: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Qualitative Results

Query Retrieved Results

Page 52: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Qualitative Results

Query Retrieved Results

Page 53: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Qualitative Results

• Sample Output for Noisy Images where Commercial OCR fails

Query Retrieved Results

Page 54: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Enhancement over Bag of Visual Words based Word Image Retrieval

Page 55: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Query Expansion

• Observation: Top ranked results are correct• Top-k results are used to form new query• Improves the precision of retrieved list• Modified average query expansion

─ Instead of equal weight to every Top-k results, rank based weight (1/2rank) is given

• Improves mAP and Prec@10 by 2%

Page 56: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Query Expansion

Query Image

Index

Histogram

Querying

Refined Histogram

Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6

Query ImageRank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6

Query Histogram

Page 57: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Query Expansion

Query Image

Index

Expanded Query Histogram

Querying

Previous Results

Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6

Modified Results

Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6

Page 58: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Text Query Support

• Originally formulated in a “query by example” setting but users would prefer textual interface for document image collection

• We propose a novel and simple framework for text query support• Used a small subset of data with ground truth covering all possible

characters in a particular language• Visual words are learnt specific to each character and averaged across its

different variations• Given a textual query, we synthesize its BoVW histogram

• Text query results are comparable to word image results

Page 59: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Text Query Support

• Query by example setting

Input Query Image Histogram

Page 60: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Text Query Support

• Query by example setting• Text Queries Support

Input Text Query

Text Query Histogram

Page 61: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Qualitative Results

Sample output for queries using different techniques

Page 62: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Vector Quantization

• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

Page 63: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Vector Quantization

• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

Codebook :

Code :

Descriptor : where,

,0,1||||,1||||..

||||minarg

10

1

2

B

c

x

icccts

Bcx

i

i

ilili

N

iii

C

Page 64: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Vector Quantization

• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

Page 65: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Vector Quantization

• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

(a)

Input Descriptor

Page 66: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Vector Quantization

• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

• Problems with VQ

Page 67: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Vector Quantization

• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

• Problems with VQ• Visual word uncertainty

Page 68: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Vector Quantization

• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

• Problems with VQ• Visual word uncertainty

• Mapping single VW from out of 2 or more possible

Page 69: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Vector Quantization

• In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment

• Problems with VQ• Visual word uncertainty

• Mapping single VW from out of 2 or more possible

Page 70: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Vector Quantization

• In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment

• Problems with VQ• Visual word uncertainty• Visual word plausibility

Page 71: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Vector Quantization

• In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment

• Problems with VQ• Visual word uncertainty• Visual word plausibility

• Mapping a visual word without a suitable candidate in the vocabulary

Page 72: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Vector Quantization

• In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment

• Problems with VQ• Visual word uncertainty• Visual word plausibility

• Mapping a visual word without a suitable candidate in the vocabulary.

Page 73: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Vector Quantization

• In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment

• Problems with VQ• Visual word uncertainty• Visual word plausibility

• Solution: Soft Assignment• Map each feature vector to 2 or more possible VW

Page 74: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Soft Assignment

• Map each feature vector to 2 or more possible VW• Approached of Soft Assignment

• Distance based • Equal weight• Based on Distance in Feature Space• Gaussian Distance• Does not minimize reconstruction error

Page 75: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Soft Assignment

• Map each feature vector to 2 or more possible VW• Approached of Soft Assignment

• Distance based • Equal weight• Based on Distance in Feature Space• Gaussian Distance• Does not minimize reconstruction error Input

Descriptor

Page 76: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Soft Assignment

• Map each feature vector to 2 or more possible VW• Approached of Soft Assignment

• Distance based • Equal weight• Based on Distance in Feature Space• Gaussian Distance• Does not minimize reconstruction error

• Through learning optimal reconstruction

Page 77: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Locality-constrained Linear Coding (LLC)

• Similar patch should have similar code• Locality of Visual Word is used to describe feature vector

Page 78: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Locality-constrained Linear Coding (LLC)

• Similar patch should have similar code• Locality of Visual Word is used to describe feature vector

)B),dist(xexp(

,11..

||||||||minarg

i

andtion multiplica wise-element is ,

2

1

2

i

where

iT

ii

N

iii

C

d

icts

cdBcx

Page 79: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Locality-constrained Linear Coding (LLC)

• Similar patch should have similar code• Locality of Visual Word is used to describe feature vector• LLC Coding Process

• Find K – Nearest Neighbors of xi denoted as B

• Reconstruct xi using B

• Replace input xi with non-zero code obtained from previous step Input

Descriptor

Page 80: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Re-ranking

• SIFT based re-ranking1

• Longest common sub-sequence (LCS) based re-ranking2

• Size of LCS of visual words projected on x-axis• Larger the size, better the match

1. Ravi Shekhar, C. V. Jawahar: Word Image Retrieval Using Bag of Visual Words. DAS 20122. Ismet Zeki Yalniz, R. Manmatha: An Efficient Framework for Searching Text in Noisy Document Images, DAS 2012

V1

V2

V6

V4

V4

V8

V9

x

y

0.5

0

1

0.5 1 1.5 2 2.5 3

Page 81: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Re-ranking

• SIFT based re-ranking1

• Longest common sub-sequence (LCS) based re-ranking2

• Size of LCS of visual words projected on X-axis• Larger the size, better the match

• Linear Combination2Final Score = λ * Index_Score + (1-λ) * Re-ranking _Score where λ weighting

parameter

1. Ravi Shekhar, C. V. Jawahar: Word Image Retrieval Using Bag of Visual Words. DAS 20122. Ismet Zeki Yalniz, R. Manmatha: An Efficient Framework for Searching Text in Noisy Document Images, DAS 2012

Page 82: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Dataset Used

Books Used For The Experiments

Book #Pages #Words

Telugu- 1716 120 4121

Telugu- 1718 100 21345

English-1601 363 113008

Page 83: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Quantitative Results

LLC Based Statistics (mAP)

Book BoVWBoVW +

SIFT Re-ranking

BoVW + LCS

Re-rankingLLC

LLC + LCS Re-raking

Telugu-1716 0.8173 0.8645 0.9036 0.91 0.95

Telugu-1718 0.7834 0.8861 0.918 0.92 0.96

English-1601 0.8015 0.8531 0.92 0.8765 0.9451

Page 84: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Quantitative Results

Text Query Based Statistics

Book Method mAP

Telugu- 1716 Text Query 0.8413

Telugu- 1718 Text Query 0.90

English-1601 Text Query 0.87

Page 85: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Patch Based Word Image Retrieval

Page 86: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch

Page 87: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features

Page 88: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

Page 89: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile

Page 90: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Measures ink distribution of word image

Page 91: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition

• Measures internal shape of image

Page 92: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition

• Measures internal shape of image

Page 93: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition• Upper Word Profile

Page 94: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition• Upper Word Profile

• Distance from Upper Boundary of word image

Page 95: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition• Upper Word Profile

• Distance from Upper Boundary of word image

Page 96: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition• Upper Word Profile• Lower Word Profile

Page 97: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition• Upper Word Profile• Lower Word Profile

• Distance from Lower Boundary of word image

Page 98: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Patch Based Word Image Retrieval

• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature

• Projection Profile• Ink Transition• Upper Word Profile• Lower Word Profile

• Distance from Lower Boundary of word image

Page 99: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Overview of Feature Calculation

. . .

Calculate 4 profile features

Concatenate 4 profile features

Projection profile

Lower word profile

Ink Transition

Upper word profile

Input word image

Descriptor

Page 100: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Fast Pre-Processing

. . .

. . .

. . .

. . .

.

.

.

. . .

V1

V2

V3

.

.

.

Vk

InputPatch

Corresponding Patch Vector

Lookup Table

Is patch Vector

Present ?

Find corresponding

Visual WordRetrieve corresponding Visual

Word

Yes

No

Update

Page 101: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Dataset Used

Book #Pages #Words

Telugu- 1718 100 21345

English-1601 363 113008

Page 102: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Quantitative Results

Baseline Statistics

Book Method mAP

Telugu- 1718 SIFT 0.7834

Telugu- 1718 Patch 0.53

Telugu- 1718 Patch Feature 0.6183

Telugu- 1718 Patch Feature with Overlap 0.7214

Page 103: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Quantitative Results

Enhancement on Baseline Statistics

Enhancement Method SIFT Patch Feature

Query Expansion 0.7920 0.75

Spatial Verification 0.8571 0.83

LCS Re-ranking 0.8798 0.8481

Page 104: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Quantitative Results

Results with Split Features

Book SIFT Patch Feature

Telugu -1718 0.94 0.954

English – 1601 0.93 0.90

Page 105: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Qualitative Results

Page 106: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Contributions

• Language Independent System• Tested on 4 different languages

• Scalable to huge dataset • Tested on 1 Millions of word Images

• Handles Noisy document images• Demonstrated performance on dataset where commercial OCR fails.

• Enhancement on baseline results• Query Expansion • Text Query Support• Document specific Sparse coding

• Document Specific descriptor is proposed

Page 107: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Future Work

• Test on different font dataset• Similar method for handwritten, camera based datasets• Learning character level visual word automatically using

annotated data• Multi Keyword support• Combine both recognition based and recognition free

methods• Improve patch based descriptor.

Page 108: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Related Publications

• Ravi Shekhar and C. V. Jawahar , “Word Image Retrieval using Bag of Visual Words”, In Proceedings of 10th IAPR International Workshop on Document Analysis Systems (DAS), 2012.

• Praveen Krishnan, Ravi Shekhar and C. V. Jawahar, “Content Level Access to Digital Library of India Pages”, In Proceedings of 8th Indian Conference on Vision, Graphics and Image Processing (ICVGIP), 2012.

• Ravi Shekhar and C. V. Jawahar, “Document Specific Sparse Coding for Word Retrieval”, In Proceedings of 12th International Conference on Document Analysis and Recognition (ICDAR), 2013.

Page 109: Document Image Retrieval using Bag of Visual Words Model

IIIT H

yderabad

Thanks !!!