document image retrieval using bag of visual words model
DESCRIPTION
Document Image Retrieval using Bag of Visual Words Model. Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar. Motivation. Large number of printed books are digitized. Motivation. Large number of printed books are digitized - PowerPoint PPT PresentationTRANSCRIPT
IIIT H
yderabad
Document Image Retrieval using Bag of Visual Words Model
Ravi ShekharCVIT, IIIT Hyderabad
Advisor : Prof. C.V. Jawahar
IIIT H
yderabad
Motivation• Large number of printed books are digitized
IIIT H
yderabad
Motivation• Large number of printed books are digitized
• Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc.
Digital Library Database
IIIT H
yderabad
Motivation• Large number of printed books are digitized
• Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc.
• Need to design efficient and effective methodology for content level access
Digital Library Database
IIIT H
yderabad
Process Overview
IndexDatabase
Documents
Processing Input Query
Matching
Retrieved Documents
Scanning
Matching can be done by two levels : “Text” and “Image”
IIIT H
yderabad
Matching Approaches
• Recognition Based Approach (Text Level Matching)• Optical Character Recognition (OCR)
• Recognition Free Approach (Image Level Matching)• Word Spotting
IIIT H
yderabad
Recognition Based Approach
• Optical Character Recognition (OCR)• Binarization of Document• Segmentation using connected components
• Line level• Word level• Character level
• Character recognition using different features like patch, profile etc• Classification using ANN or SVM
IIIT H
yderabad
Limitations of Recognition Based Approach
• Cuts
IIIT H
yderabad
Limitations of Recognition Based Approach
• Cuts• Merges
IIIT H
yderabad
Limitations of Recognition Based Approach
• Cuts• Merges• Variation in Script
IIIT H
yderabad
Limitations of Recognition Based Approach
• Cuts• Merges• Variation in Script• Variation in Font and Typesetting
IIIT H
yderabad
Limitations of Recognition Based Approach
• Cuts• Merges• Variation in Script• Variation in Font and Typesetting• Underline and Over Written
IIIT H
yderabad
Recognition Free Approach
• Word Spotting• Representation of word image using global (profile) features
IIIT H
yderabad
Recognition Free Approach
• Word Spotting• Representation of word image using global (profile) features• Matching features using different distance measures like L1, L2 etc
IIIT H
yderabad
Recognition Free Approach
• Word Spotting• Representation of word image using global (profile) features• Matching features using different distance measures like L1, L2 etc• Comparison of different size word images using Dynamic time warping
(DTW)
IIIT H
yderabad
Why Recognition Free Approach ?
• Robust OCRs are unavailable for many non-Latin languages• These languages have rich heritage and there is a need for
content level search• Word Spotting based methods are too slow for real time system• Most of the existing retrieval methods are memory intensive• Scalability is an immediate challenge
IIIT H
yderabad
Word Image Retrieval using Bag of Visual Words
IIIT H
yderabad
Bag of Visual Words (BoVW)
• Bag of Words (BoW) representation is the most popular representation for text retrieval
• BoW based efficient systems like Lucene are publically available• Bag of Visual Words (BoVW) performs excellently for image and
video retrieval• BoVW based system is flexible, powerful and scalable to Billions
of images
IIIT H
yderabad
BoVW Representation
• Word Images are represented using Histogram of Visual Words
IIIT H
yderabad
BoVW Representation
• Code Book generation• Subset of Images is used• Clustering is done using Hierarchical K-Means (HKM)• HKM is faster than K-Means both in building tree and finding nearest
neighbours
IIIT H
yderabad
BoVW based Representation
IIIT H
yderabad
BoVW based Representation
IIIT H
yderabad
Histogram of Visual Words
BoVW based Representation
IIIT H
yderabad
BoVW based Representation
Cuts
IIIT H
yderabad
Histogram of Visual Words
BoVW based Representation
Cuts
IIIT H
yderabad
BoVW based Representation
Merges
IIIT H
yderabad
Histogram of Visual Words
BoVW based Representation
Merges
IIIT H
yderabad
Proposed Architecture
IIIT H
yderabad
• Fixed size representation
Advantages of BoVW based Representation
IIIT H
yderabad
• Fixed size representation
Advantages of BoVW based Representation
Clean
Clean
IIIT H
yderabad
• Fixed size representation• Robust against degradation
Advantages of BoVW based Representation
IIIT H
yderabad
• Fixed size representation• Robust against degradation
Advantages of BoVW based Representation
Cuts MergeClean
IIIT H
yderabad
• Fixed size representation• Robust against degradation• Scalable to Billions of images
Advantage of BoVW based Representation
IIIT H
yderabad
• Fixed size representation• Robust against degradation• Scalable to Billions of Images• Language independent
Advantages of BoVW based Representation
IIIT H
yderabad
• Lost Geometry
Spatial Verification
IIIT H
yderabad
• Lost Geometry
Spatial Verification
Clean
Clean
IIIT H
yderabad
• Lost Geometry
Spatial Verification
Clean
Clean
Clean
IIIT H
yderabad
• Lost Geometry
Spatial Verification
Clean
Clean
Clean
IIIT H
yderabad
• Lost Geometry• Spatial Verification
Spatial Verification
IIIT H
yderabad
• Lost Geometry• Spatial Verification
Spatial Verification
IIIT H
yderabad
• Lost Geometry• Spatial Verification
Spatial Verification
IIIT H
yderabad
Re-ranking
• SIFT based re-ranking• Higher the Total Score, better the match
j I # SIFT iniI# SIFT in
nts#Match Poi
jI
iIScore ),(
image theofpart for Score : ) ,(
image entirefor Score : ) ,( where,
) ,(3
1) ,() ,(
kthI kjI k
iScore
jIiI Score
I kjI k
ik
Score j
Ii
I Scorej
Ii
I ScoreTotal3
1
IIIT H
yderabad
Experimentations
Books Used in Experimentations
Language #Books #Pages #Words
Hindi 4 427 112677
Malayalam 6 610 108767
Telugu 5 742 131156
Bangla 3 363 124584
Hindi 32 3992 1008138
IIIT H
yderabad
Quantitative Results
Performance Statistics
Language #Images #Query mAPmAP
after Re-ranking
mAP after Spatial
Verification
Hindi 112677 138 0.6808 0.7820 0.7865
Malayalam 108767 101 0.6962 0.7991 0.8188
Telugu 131156 131 0.6483 0.7328 0.7495
Bangla 124584 125 0.7806 0.8766 0.8947
Hindi 1008138 138 0.5895 0.7022 0.7062
IIIT H
yderabad
Quantitative Results
Performance Statistics
Language #Images #Query Prec@10Prec@10
after Re-ranking
Prec@10 after Spatial Verification
Hindi 112677 138 0.8437 0.8719 0.8770
Malayalam 108767 101 0.7668 0.8328 0.8581
Telugu 131156 131 0.8507 0.8668 0.883
Bangla 124584 125 0.8498 0.9022 0.9182
Hindi 1008138 138 0.8059 0.8509 0.8543
IIIT H
yderabad
Quantitative Results
• mAP Vs Query Length
IIIT H
yderabad
Quantitative Results
• mAP Vs Query Length• More the # characters, better the results
IIIT H
yderabad
Quantitative Results
Retrieval Time and Index Size
#Images Retrieval Time Index Size
25K 50ms 28 MB
100K 209ms 130 MB
0.5M 411ms 550 MB
1M 700ms 1.2 GB
IIIT H
yderabad
Qualitative Results
Query Retrieved Results
HI
IIIT H
yderabad
Qualitative Results
Query Retrieved Results
IIIT H
yderabad
Qualitative Results
Query Retrieved Results
IIIT H
yderabad
Qualitative Results
Query Retrieved Results
IIIT H
yderabad
Qualitative Results
• Sample Output for Noisy Images where Commercial OCR fails
Query Retrieved Results
IIIT H
yderabad
Enhancement over Bag of Visual Words based Word Image Retrieval
IIIT H
yderabad
Query Expansion
• Observation: Top ranked results are correct• Top-k results are used to form new query• Improves the precision of retrieved list• Modified average query expansion
─ Instead of equal weight to every Top-k results, rank based weight (1/2rank) is given
• Improves mAP and Prec@10 by 2%
IIIT H
yderabad
Query Expansion
Query Image
Index
Histogram
Querying
Refined Histogram
Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6
Query ImageRank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6
Query Histogram
IIIT H
yderabad
Query Expansion
Query Image
Index
Expanded Query Histogram
Querying
Previous Results
Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6
Modified Results
Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6
IIIT H
yderabad
Text Query Support
• Originally formulated in a “query by example” setting but users would prefer textual interface for document image collection
• We propose a novel and simple framework for text query support• Used a small subset of data with ground truth covering all possible
characters in a particular language• Visual words are learnt specific to each character and averaged across its
different variations• Given a textual query, we synthesize its BoVW histogram
• Text query results are comparable to word image results
IIIT H
yderabad
Text Query Support
• Query by example setting
Input Query Image Histogram
IIIT H
yderabad
Text Query Support
• Query by example setting• Text Queries Support
Input Text Query
Text Query Histogram
IIIT H
yderabad
Qualitative Results
Sample output for queries using different techniques
IIIT H
yderabad
Vector Quantization
• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment
IIIT H
yderabad
Vector Quantization
• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment
Codebook :
Code :
Descriptor : where,
,0,1||||,1||||..
||||minarg
10
1
2
B
c
x
icccts
Bcx
i
i
ilili
N
iii
C
IIIT H
yderabad
Vector Quantization
• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment
IIIT H
yderabad
Vector Quantization
• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment
(a)
Input Descriptor
IIIT H
yderabad
Vector Quantization
• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment
• Problems with VQ
IIIT H
yderabad
Vector Quantization
• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment
• Problems with VQ• Visual word uncertainty
IIIT H
yderabad
Vector Quantization
• In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment
• Problems with VQ• Visual word uncertainty
• Mapping single VW from out of 2 or more possible
IIIT H
yderabad
Vector Quantization
• In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment
• Problems with VQ• Visual word uncertainty
• Mapping single VW from out of 2 or more possible
IIIT H
yderabad
Vector Quantization
• In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment
• Problems with VQ• Visual word uncertainty• Visual word plausibility
IIIT H
yderabad
Vector Quantization
• In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment
• Problems with VQ• Visual word uncertainty• Visual word plausibility
• Mapping a visual word without a suitable candidate in the vocabulary
IIIT H
yderabad
Vector Quantization
• In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment
• Problems with VQ• Visual word uncertainty• Visual word plausibility
• Mapping a visual word without a suitable candidate in the vocabulary.
IIIT H
yderabad
Vector Quantization
• In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment
• Problems with VQ• Visual word uncertainty• Visual word plausibility
• Solution: Soft Assignment• Map each feature vector to 2 or more possible VW
IIIT H
yderabad
Soft Assignment
• Map each feature vector to 2 or more possible VW• Approached of Soft Assignment
• Distance based • Equal weight• Based on Distance in Feature Space• Gaussian Distance• Does not minimize reconstruction error
IIIT H
yderabad
Soft Assignment
• Map each feature vector to 2 or more possible VW• Approached of Soft Assignment
• Distance based • Equal weight• Based on Distance in Feature Space• Gaussian Distance• Does not minimize reconstruction error Input
Descriptor
IIIT H
yderabad
Soft Assignment
• Map each feature vector to 2 or more possible VW• Approached of Soft Assignment
• Distance based • Equal weight• Based on Distance in Feature Space• Gaussian Distance• Does not minimize reconstruction error
• Through learning optimal reconstruction
IIIT H
yderabad
Locality-constrained Linear Coding (LLC)
• Similar patch should have similar code• Locality of Visual Word is used to describe feature vector
IIIT H
yderabad
Locality-constrained Linear Coding (LLC)
• Similar patch should have similar code• Locality of Visual Word is used to describe feature vector
)B),dist(xexp(
,11..
||||||||minarg
i
andtion multiplica wise-element is ,
2
1
2
i
where
iT
ii
N
iii
C
d
icts
cdBcx
IIIT H
yderabad
Locality-constrained Linear Coding (LLC)
• Similar patch should have similar code• Locality of Visual Word is used to describe feature vector• LLC Coding Process
• Find K – Nearest Neighbors of xi denoted as B
• Reconstruct xi using B
• Replace input xi with non-zero code obtained from previous step Input
Descriptor
IIIT H
yderabad
Re-ranking
• SIFT based re-ranking1
• Longest common sub-sequence (LCS) based re-ranking2
• Size of LCS of visual words projected on x-axis• Larger the size, better the match
1. Ravi Shekhar, C. V. Jawahar: Word Image Retrieval Using Bag of Visual Words. DAS 20122. Ismet Zeki Yalniz, R. Manmatha: An Efficient Framework for Searching Text in Noisy Document Images, DAS 2012
V1
V2
V6
V4
V4
V8
V9
x
y
0.5
0
1
0.5 1 1.5 2 2.5 3
IIIT H
yderabad
Re-ranking
• SIFT based re-ranking1
• Longest common sub-sequence (LCS) based re-ranking2
• Size of LCS of visual words projected on X-axis• Larger the size, better the match
• Linear Combination2Final Score = λ * Index_Score + (1-λ) * Re-ranking _Score where λ weighting
parameter
1. Ravi Shekhar, C. V. Jawahar: Word Image Retrieval Using Bag of Visual Words. DAS 20122. Ismet Zeki Yalniz, R. Manmatha: An Efficient Framework for Searching Text in Noisy Document Images, DAS 2012
IIIT H
yderabad
Dataset Used
Books Used For The Experiments
Book #Pages #Words
Telugu- 1716 120 4121
Telugu- 1718 100 21345
English-1601 363 113008
IIIT H
yderabad
Quantitative Results
LLC Based Statistics (mAP)
Book BoVWBoVW +
SIFT Re-ranking
BoVW + LCS
Re-rankingLLC
LLC + LCS Re-raking
Telugu-1716 0.8173 0.8645 0.9036 0.91 0.95
Telugu-1718 0.7834 0.8861 0.918 0.92 0.96
English-1601 0.8015 0.8531 0.92 0.8765 0.9451
IIIT H
yderabad
Quantitative Results
Text Query Based Statistics
Book Method mAP
Telugu- 1716 Text Query 0.8413
Telugu- 1718 Text Query 0.90
English-1601 Text Query 0.87
IIIT H
yderabad
Patch Based Word Image Retrieval
IIIT H
yderabad
Patch Based Word Image Retrieval
• Designed feature based on patch
IIIT H
yderabad
Patch Based Word Image Retrieval
• Designed feature based on patch• Representation of Patch using Profile Features
IIIT H
yderabad
Patch Based Word Image Retrieval
• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature
IIIT H
yderabad
Patch Based Word Image Retrieval
• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature
• Projection Profile
IIIT H
yderabad
Patch Based Word Image Retrieval
• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature
• Projection Profile• Measures ink distribution of word image
IIIT H
yderabad
Patch Based Word Image Retrieval
• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature
• Projection Profile• Ink Transition
• Measures internal shape of image
IIIT H
yderabad
Patch Based Word Image Retrieval
• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature
• Projection Profile• Ink Transition
• Measures internal shape of image
IIIT H
yderabad
Patch Based Word Image Retrieval
• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature
• Projection Profile• Ink Transition• Upper Word Profile
IIIT H
yderabad
Patch Based Word Image Retrieval
• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature
• Projection Profile• Ink Transition• Upper Word Profile
• Distance from Upper Boundary of word image
IIIT H
yderabad
Patch Based Word Image Retrieval
• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature
• Projection Profile• Ink Transition• Upper Word Profile
• Distance from Upper Boundary of word image
IIIT H
yderabad
Patch Based Word Image Retrieval
• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature
• Projection Profile• Ink Transition• Upper Word Profile• Lower Word Profile
IIIT H
yderabad
Patch Based Word Image Retrieval
• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature
• Projection Profile• Ink Transition• Upper Word Profile• Lower Word Profile
• Distance from Lower Boundary of word image
IIIT H
yderabad
Patch Based Word Image Retrieval
• Designed feature based on patch• Representation of Patch using Profile Features• Profile Feature
• Projection Profile• Ink Transition• Upper Word Profile• Lower Word Profile
• Distance from Lower Boundary of word image
IIIT H
yderabad
Overview of Feature Calculation
. . .
Calculate 4 profile features
Concatenate 4 profile features
Projection profile
Lower word profile
Ink Transition
Upper word profile
Input word image
Descriptor
IIIT H
yderabad
Fast Pre-Processing
. . .
. . .
. . .
. . .
.
.
.
. . .
V1
V2
V3
.
.
.
Vk
InputPatch
Corresponding Patch Vector
Lookup Table
Is patch Vector
Present ?
Find corresponding
Visual WordRetrieve corresponding Visual
Word
Yes
No
Update
IIIT H
yderabad
Dataset Used
Book #Pages #Words
Telugu- 1718 100 21345
English-1601 363 113008
IIIT H
yderabad
Quantitative Results
Baseline Statistics
Book Method mAP
Telugu- 1718 SIFT 0.7834
Telugu- 1718 Patch 0.53
Telugu- 1718 Patch Feature 0.6183
Telugu- 1718 Patch Feature with Overlap 0.7214
IIIT H
yderabad
Quantitative Results
Enhancement on Baseline Statistics
Enhancement Method SIFT Patch Feature
Query Expansion 0.7920 0.75
Spatial Verification 0.8571 0.83
LCS Re-ranking 0.8798 0.8481
IIIT H
yderabad
Quantitative Results
Results with Split Features
Book SIFT Patch Feature
Telugu -1718 0.94 0.954
English – 1601 0.93 0.90
IIIT H
yderabad
Qualitative Results
IIIT H
yderabad
Contributions
• Language Independent System• Tested on 4 different languages
• Scalable to huge dataset • Tested on 1 Millions of word Images
• Handles Noisy document images• Demonstrated performance on dataset where commercial OCR fails.
• Enhancement on baseline results• Query Expansion • Text Query Support• Document specific Sparse coding
• Document Specific descriptor is proposed
IIIT H
yderabad
Future Work
• Test on different font dataset• Similar method for handwritten, camera based datasets• Learning character level visual word automatically using
annotated data• Multi Keyword support• Combine both recognition based and recognition free
methods• Improve patch based descriptor.
IIIT H
yderabad
Related Publications
• Ravi Shekhar and C. V. Jawahar , “Word Image Retrieval using Bag of Visual Words”, In Proceedings of 10th IAPR International Workshop on Document Analysis Systems (DAS), 2012.
• Praveen Krishnan, Ravi Shekhar and C. V. Jawahar, “Content Level Access to Digital Library of India Pages”, In Proceedings of 8th Indian Conference on Vision, Graphics and Image Processing (ICVGIP), 2012.
• Ravi Shekhar and C. V. Jawahar, “Document Specific Sparse Coding for Word Retrieval”, In Proceedings of 12th International Conference on Document Analysis and Recognition (ICDAR), 2013.
IIIT H
yderabad
Thanks !!!