large scale online learning of image similarity through ranking
DESCRIPTION
Presentation of paper "Large Scale Online Learning of Image Similarity Through Ranking" for Synchromedia Seminar on 15. 9. 2012.TRANSCRIPT
Large Scale Online Learning of Image Similarity Through Rankingfrom G. Chechik, V. Sharma, U. Shalit, S. Bengio – JML 2010
by Lukas Tencer
Motivation• Needed for applications, which compare any kind of data:
– image, video, web-page, document• Two levels of similarity:
– Features (visual for images)– Semantic
• Large-scale learning: limited by computational cost, not by availability of data
• What similarity the user wants to express, visual or semantic?• Presented approach deals with semantic similarity once we have
visual similarity• Similarity learning requires pairwise distance, not always available• Instead pairwise distance use relative distance, two images are
close:– if are returned by same query – if does have the same label
Example of query• Especially problem in QVE (Query by Visual Example)
• Query:
• Images retrieved for vs. visually similar images
“mount royal park”
Motivation II• Relationship to classification:
– Similarity measure could be used as metric for classification
– Good classification infers labels, which induce similarity across images
• Constrain on semidefinite positive similarity matrix: – for small data prevents overfitting– for big data, with enough of samples could
be removed to reduce computational cost
Problem Statement• Get pairwise similarity function S on given data
on relative pairs of image simlarities• Given data P and relative
similarities• We do not have access to all values of r, where
it is not available equals 0• Then is defined as:
),( jiij pprr
),( ji ppS
),(),(,,,),,(),( iiiiiiiiiii pprpprassuchPpppppSppS
ddj
TijiW RWwhereWppppS ,),(
Online Algorithm• Passive-Aggressive family of learning
algorithms, online learning algorithm (iterative)– PA 1:
– Passive, if loss function is 0– Aggressive, if loss is positive, enforces to satisfy
regardless of the step size
– PA2: Trade off between proximity and desired margin – constrained optimization problem
0)),(;(,2
1 2
1 minarg
ttt
Rw
t yxwlassuchwwwn
0)),(;( tt yxwl
Online Algorithm II• So we are searching for S, with safety margin of 1, to
then:
• The hinge loss function is defined as:
• Then the PA 2 constrained optimization problem is:
where C is the parameter, which controls tradeoff between margin enforcement and proximity of solution
1),(),( iiWiiW ppSppS
)},(),(1,0max{),,( iiWiiWiiiW ppSppSpppl
Pppp
iiiWW
iii
ppplL),,(
),,(
0),,(
2
1minarg
21
andppplthatsuch
CWWw
iiiW
Fro
i
W
i
Online Algorithm III
• Loss bound could be derived by rewriting into linear classification problem
Sampling strategy• Uniformly sample pi from P• Uniformly sample pi+ from images with same category• Uniformly sample pi- from images which does not share
category with pi, – pi- could be chosen by random from all images, if number of
categories and number of queries is very large
• If relevance feedback r(pi,pj) is not just binary function, then sampling of positive examples could be changed to prioritize samples with higher relevance
Image representation• bag-of-word approach (bag-of-local-descriptors)
– get regions of interest– calculate local descriptors– treat them independently
• Divide image into overlapping square blocks• Extract color and edge descriptors
– Edge: uniform Local Binary Patterns – difference of intensities at circular neighborhood,
• 2^8 possible sequence = 256 bin histogram• Non-uniform sequences could be merged 59 bin histogram
– Color: histograms from k-means clustering• Train color codebook and map block pixel to closes value in codebook
– Concatenate in the end
Image representation II• Aim for high dimensional sparse vector representation• Thus representing local descriptor as visual term and
image is represented as binary vector indicating presence/absence of visual term
• Visual terms are rated according to term frequency and inverse document frequency
• Parameters of setup: – 20 bins for colors– 10000 visterm vocabulary size (approx 70 non 0 values / img)– Blocks of 64x64 overlapping each 32 pixels– Blocks extracted at different scales, by downscaling images by
factor of 1:25 until less then 10 block remains
Experiments and evaluation• Tested in 2 settings
– Caltech256 dataset (30k images)– Web-Scale experiment (2.7 M images)– (another databases for image retrieval testing: MIRFLICK 1M, Corel5k,
Corel30k, UCID)
• Web-Scale Experiment:
– Queries from Google Image Search and relevance feedback– Stop condition for training is value of mean average precision (160M
iterations) ~ 4000 min on single CPU– Evaluation Criterion: mAP and precision at top k
Failure cases
Scalability• Comparison with Largest Margin Nearest Neighbour LMNN• Scales linearly with number of images
Caltech 256 test
Discussion• Metric learning could help to capture semantic relationships, once
visual similarity is available• Relevance feedback or semantic similarity measure (class
modeling) is required to capture semantic similarity• Compared to raw visual similarity comparison precision at top k
and mAP increases, • recall is hard to measure for databases, which are not fully
annotated• Online metric learning is an ongoing problem (Davis 2007) (Jain
2008) (Chechik 2010) and even though applied to images, could be used in other fields to capture semantic similarity• Images: object semantics vs. visual features• Documents: topics vs. textual features (dtf,tf-idf)• SBIR: relative object mapping vs. sketch features
Thank you for your attentionAvailable at: http://www.slideshare.net/lukastencer