tree histogram coding for mobile image...

IMAGE MATCHING

-Aditya Pabbaraju-Srujankumar Puchakayala

G. Schindler, M. Brown, and R. Szeliski, “City-scale location recognition,” CVPR 2007

D. M. Chen, S. S. Tsai, V. Chandrasekhar, G. Takacs, J. Singh, and B. Girod, “Tree histogram coding for mobile image matching,” in DCC, '09

OUTLINE Scalable Vocabulary Tree

City-Scale Location Recognitiona) Greedy N-Best Pathsb) Informative Featuresc) Voting Schemed) Resultse) Conclusions

Tree Histogram Coding for Mobile Image Matchinga) Motivation & Previous Literatureb) Developing Tree Histogramc) Compression & Coding of Tree Histogramd) Mathematics behind the scenese) Experimental Setupf) Image Matching Resultsg) Conclusions & Future Work

SCALABLE VOCABULARY TREE

[Nistér and Stewénius, CVPR 2006]

Picture Credit: David Chen

SCALABLE VOCABULARY TREE


DEVELOPING TREE HISTOGRAM


BACKGROUND ON SCALABLEVOCABULARY TREE

Nodes of an SVT are centroids obtained byhierarchical k-means clustering of sample featuredescriptors.

We classify an image I by quantizing its featuredescriptors by traversing from top to bottom of aSVT and greedily choosing the nearest node ateach level.

This search performed on a mobile is relativelyefficient as it requires O(KD) computation perfeature.

A FULLY BALANCED SVT WITHDEPTH D = 3 AND MAXIMUM

BRANCH FACTOR K = 3

Where N is the total number of database images and is thetotal number of database images with at least one descriptorhaving visited node .



City-Scale Location Recognitiona) Greedy N-Best Pathsb) Informative Featuresc) Voting Schemed) Resultse) Conclusions


CITY-SCALE LOCATION RECOGNITION

The focus in this context is on building trees forspecific databases to be used for locationrecognition, rather than generic trees for objectrecognition.

The paper shows how to exploit the naturalstructure of vocabulary trees to define afeature’s information.

Rather than reducing the size of the featuredatabase, the paper proposes using featureinformation to guide the building of thevocabulary tree instead.

G. Schindler, M. Brown, and R. Szeliski, “City-scale location recognition,” CVPR 2007

The authors demonstrate results on an automaticallycaptured 30,000 image database, consisting of over100 million SIFT features, and covering a continuous20 kilometer stretch of roads through commercial,residential, and industrial areas.

The structure of the vocabulary tree in the 128-dimensional SIFT space can be visualized as a nestedset of Voronoi cells.

Trees are constructed with hierarchical k-means,where Gonzalez’s algorithm is used to initialize thecluster centers with points that are as far apart fromeach other as possible.

VORONOI DIAGRAMS

GREEDY N-BEST PATHS SEARCH

The Greedy N-Best Paths (GNP) algorithm followsmultiple branches at each level rather than just thebranch whose parent is closest to the query feature.

Given query feature q, and level l = 1Compute distance from q to all k children of root nodeWhile (l < L){

l = l + 1Candidates=children of closest N nodes at level l − 1Compute distance from q to all kN candidates

}Return all features quantized under closest candidate

For branching factor k and depth L, the normalsearch algorithm for a metric tree performs kcomparisons between the query feature and the nodesof the tree at each of L levels for a total of kLcomparisons.

GNP algorithm performs k comparisons at the toplevel, and kN comparisons at each of the remainingL−1 levels, for a total of k +kN(L−1) comparisons.

This allows us to specify the amount of computationper search by varying the number of paths followedN. Note that traditional search is just the specific casein which N = 1.


COMMENTS ON PREVIOUS WORK

Previous work on improving the efficiency ofvocabulary trees showed that increasing thebranching factor for fixed vocabulary size tended toimprove the quality of search results.

Much of this improvement is not due to the fact thatincreasing branching factor produces better-structured trees, but to the fact that more nodes arebeing considered in traversing a tree with higherbranching factor.

Changing the branching factor of a vocabulary treerequires time-consuming offline retraining viahierarchical k-means.

The number of nodes searched is a decision canbe varied at search time based on availablecomputational power.

It provides a relationship between performanceand number of comparsions per query featurethan the traditional way of getting relationshipbetween performance and branching factor

ADVANTAGES OF GNP

INFORMATIVE FEATURES

If the database consists of a fixed set of images,the aim should be to build the vocabulary treewhich maximizes performance of queries on thedatabase.

Training data needs to be chosen such that thecapacity of the tree is spent modeling the partsof SIFT space occupied by those features whichare most informative about the locations of thedatabase images.

Intuitively, we want to find features which occur inall images of some specific location, but rarely ornever occur anywhere outside of that single location.

This intuition is captured well by the formal conceptof information gain.

Information gain I(X|Y ) is a measure of how muchuncertainty is removed from a distribution givensome specific additional knowledge, and it is definedwith respect to the entropy H(X) and conditionalentropy H(X|Y ) of distributions P(X) and P(X|Y ).


In this case, information gain I(Li|Wj) is alwayscomputed with respect to a specific location li and aspecific visual word wj .

‘Li’is a binary variable that is true when we are atlocation li.Wj is a binary variable that is true when the visualword wj is in view.

Thus, the information gain of visual word wj atlocation li is


Since the entropy H(Li) is constant across allvisual words at location li, the visual word thatmaximizes the information gain I(Li|Wj) alsominimizes the conditional entropy H(Li|Wj).

‘NDB’ is the number of images in the database.‘NL’ is the number of images at each location.‘a’ is the number of times visual word wj occurs atlocation li.‘b’ be the number of times visual word wj occursat other database locations.


The significance of the final equation in the previous slideis that the information gain of a visual word is captured bya simple function of the values a and b


VOTING SCHEME

The paper matches each feature in the query image toa number of features in the database using avocabulary tree with a simple voting scheme in whichmatched features from the database vote for theimages from which they originate.

Ni - the number of features in a given database imageNNk - the number of near neighbors returned for agiven query feature fk

To achieve better performance, vote tallies arenormalized by Ni and NNk. In addition, the tallies areaveraged over a local neighborhood of NL images.

The number of votes Cd for a database image dcan be computed by looping over every feature ineach image in a local neighborhood, andcomparing it against each of the Nq features inthe query image.

where

VOTING SCHEME

RESULTS

Two experiments are performed- one to evaluate the effectivenessof using informative features to build vocabulary trees and theother to evaluate the performance of the Greedy N-Best Pathsalgorithm for vocabulary trees of varying branching factor.

CONCLUSIONS

The performance of a vocabulary tree on recognitiontasks can be significantly affected by the specificvocabulary chosen.

In particular, using the features that are mostinformative about specific locations to build thevocabulary tree can greatly improve performanceresults as the database increases in size.

The performance of a given vocabulary tree can beimproved by controlling the number of nodesconsidered during search, rather than by increasingthe branching factor of the vocabulary tree.


City-Scale Location Recognitiona) Greedy N-Best Pathsb) Informative Featuresc) Voting Schemed) Resultse)Conclusions


MOTIVATION


D. M. Chen, S. S. Tsai, V. Chandrasekhar, G. Takacs, J. Singh, and B. Girod, in DCC, '09.

APPLICATIONS OF MOBILE IMAGEMATCHING

Picture Credit: Bernd Girod

OTHER APPLICATIONS

Picture Credit: Bernd Girod

PREVIOUS SYSTEMS

Emerging class of CBIR applications requirequery feature descriptors to be transmitted froma mobile device to a remote server having thedatabase.

Codec developed by C.Yeo et al generates a hashfor each descriptor using random projections.

TRANSFORM CODING OF IMAGEFEATURE DESCRIPTORS

[Chandrasekhar et al., VCIP 09]

COMPARISON

Retrieval pipeline for compressing features and then classifyingthem at the server – used by the previous two methods

Retrieval pipeline for classifying features on the mobile deviceand then compressing the tree histogram – used in the paper


DRAWBACKS

These methods are suited for pair-wise imagematching.

Neither of these methods exploit the fact that,the tree histogram will be good enough foraccurate classification in tree-based retrieval.

The histogram can be sent instead of descriptors,if the classification tree can be stored at theencoder.

A FULLY BALANCED SVT WITHDEPTH D = 3 AND MAXIMUM

BRANCH FACTOR K = 3

IDEA USED FOR COMPRESSION

A parent node is visited whenever one of itschildren node is visited.

This idea is used to calculate the visit count atlevel L from the visit counts at level L+1

Where is the visit count of the number ofdescriptors of I which are quantized to node and

is the children set of node .

DISSIMILARITY SCORE

Therefore the entire tree histogram can be constructedfrom the counts at the leaf level.

The discriminative value is possessed by the nodes thatvisit few images.

The vector of all entropy-weighted visit counts through theentire SVT forms the weighted tree histogram for image I.

The mutual dissimilarity score between a query image Qand the mth database image Dm

COMPRESSION OF TREEHISTOGRAM

The maximum number of nodes is for a fullybalanced tree.

Since SURF descriptors are 64-dimensional and require half thememory of SIFT descriptors which are 128-dimensional, they areused for the retrieval system.

Each node is represented with 8-bit precision per dimension.

The trees built for other image databases which are not requiredfor the current application can be stored on a memory card.

Training of the SVT can be done on the server using a large imagedatabase and sent to the mobile phone.

Incremental modifications can be sent to the mobile device forsynchronization.

MEMORY REQUIREMENTS FORSVTS


ENCODING OF TREE HISTOGRAM


SYMBOLS ENCODED

There are many zero counts because most of theleaf nodes are not visited by the query image.

Two sequences of symbols are transmitteda) Symbols showing the zero-runs between

positive-count nodes.b) Sequence with same length as the before one

and encodes the actual counts at these positive-count nodes.

ENCODING OF TREE HISTOGRAM


RATE SAVINGS

Rate savings happens in two ways

a) From Quantization: Tree-structured vectorquantization is done compared to scalarquantization of descriptors.

b) From not considering order: Order among Nfeatures can be discarded because the treehistogram is a bag-of-words representation.There is saving of bits.

PROBABILITY OF VISITING EACHLEAF NODE IN AN SURF SVT

ZuBuD CDD


DISTRIBUTIONS FOR RUNS

Two assumptions are made while doing thestatistical analysis

a) Leaf nodes are visited often uniformly.b) Image features are classified independently.

Runs are modeled as a geometric randomvariable

PROBABILITY DISTRIBUTIONS FORRUNS


DISTRIBUTIONS FOR COUNTS

Counts are modeled as non-zero binomial randomvariables.

PROBABILITY DISTRIBUTIONS FORCOUNTS


EXPERIMENTAL DATABASE

Two image databases are useda) Zurich Building Database (ZuBuD): 1,005 database

images of 640 x 480 pixels resolution, representing fiveviews of 201 different building facades in Zurich.Query Images for ZuBuD: 115 images of 320 × 240pixels resolution# of descriptors (SURF features) = 220

b) CD Database (CDD): 10,597 database images of 500 × 500pixels resolution, representing 10,597 different clean CDcovers.Query Images for CDD: 50 query images of 640 × 480pixels resolution# of descriptors (SURF features) = 370

ZURICH BUILDING DATABASE


CD COVER DATABASE


EXPERIMENTAL SETUP

For the databases taken, an SVT of depth D = 7 and themaximum branch factor K = 10 is trained using SURFdescriptors obtained from database images.

The leaf histogram is then obtained by classification of theSURF features of the query image and encoded andtransmitted.

For each query image, the database image which has theminimum dissimilarity score is sent back to the mobile.

The rates are forced to be equal among different techniquesfor fair comparison.

COMPARISON OF DIFFERENTCODECS


DIFFERENT BIT RATES

The bit rate in the case of descriptor coding is thenumber of bits used to send entropy-coded,scalar-quantized transform coefficients.

The bit rate in the case of TSVQ isbits per image.

The bit rate in the case of tree histogram codingis the number of bits in the arithmetic-coded bitstream for the counts and runs symbols.

ADVANTAGES

The runs symbols occupy most of the bit budgetcompared to the counts symbols for the treehistogram encoding.

Descriptor coding requires 5x the bit rate of treehistogram coding.

Since the order is not transmitted among thenodes, bits per image are saved in thetree histogram technique.

IMAGE MATCHING RESULTS


VARIATION IN TREE DEPTH


EFFECTS OF DEPTH VARIATION

For conservation of memory, the tree depth needsto be reduced.

Reducing the leaf nodes decreases the entropy ofthe tree histogram. It can be observed thatmatching accuracy is improved by a deeper tree.

CDD has a larger database than ZuBuD,therefore classification of CDD query imagessuffers as a result of decrease in the tree size.

CONCLUSIONS

It has been found that rate efficiency can beenhanced by transmitting the tree histogramrather than the individual feature descriptors.

Since there is a dramatic reduction intransmission over a slow wireless link, this paperis of great interest in mobile image matchingapplications.

FUTURE WORK

Classification of trees by using memory moreefficiently.

Check the tree histograms for geometricconsistency.

Questions?

tree histogram coding for mobile image...

Documents