large scale online learning of image similarity through ranking

Large Scale Online Learning of Image Similarity Through Rankingfrom G. Chechik, V. Sharma, U. Shalit, S. Bengio – JML 2010

by Lukas Tencer

Motivation• Needed for applications, which compare any kind of data:

– image, video, web-page, document• Two levels of similarity:

– Features (visual for images)– Semantic

• Large-scale learning: limited by computational cost, not by availability of data

• What similarity the user wants to express, visual or semantic?• Presented approach deals with semantic similarity once we have

visual similarity• Similarity learning requires pairwise distance, not always available• Instead pairwise distance use relative distance, two images are

close:– if are returned by same query – if does have the same label

Example of query• Especially problem in QVE (Query by Visual Example)

• Query:

• Images retrieved for vs. visually similar images

“mount royal park”

Motivation II• Relationship to classification:

– Similarity measure could be used as metric for classification

– Good classification infers labels, which induce similarity across images

• Constrain on semidefinite positive similarity matrix: – for small data prevents overfitting– for big data, with enough of samples could

be removed to reduce computational cost

Problem Statement• Get pairwise similarity function S on given data

on relative pairs of image simlarities• Given data P and relative

similarities• We do not have access to all values of r, where

it is not available equals 0• Then is defined as:

),( jiij pprr

),( ji ppS

),(),(,,,),,(),( iiiiiiiiiii pprpprassuchPpppppSppS

ddj

TijiW RWwhereWppppS ,),(

Online Algorithm• Passive-Aggressive family of learning

algorithms, online learning algorithm (iterative)– PA 1:

– Passive, if loss function is 0– Aggressive, if loss is positive, enforces to satisfy

regardless of the step size

– PA2: Trade off between proximity and desired margin – constrained optimization problem

0)),(;(,2

1 2

1 minarg

ttt

Rw

t yxwlassuchwwwn

0)),(;( tt yxwl

Online Algorithm II• So we are searching for S, with safety margin of 1, to

then:

• The hinge loss function is defined as:

• Then the PA 2 constrained optimization problem is:

where C is the parameter, which controls tradeoff between margin enforcement and proximity of solution

1),(),( iiWiiW ppSppS

)},(),(1,0max{),,( iiWiiWiiiW ppSppSpppl

Pppp

iiiWW

iii

ppplL),,(

),,(

0),,(

2

1minarg

21

andppplthatsuch

CWWw

iiiW

Fro

i

W

i

Online Algorithm III

• Loss bound could be derived by rewriting into linear classification problem

Sampling strategy• Uniformly sample pi from P• Uniformly sample pi+ from images with same category• Uniformly sample pi- from images which does not share

category with pi, – pi- could be chosen by random from all images, if number of

categories and number of queries is very large

• If relevance feedback r(pi,pj) is not just binary function, then sampling of positive examples could be changed to prioritize samples with higher relevance

Image representation• bag-of-word approach (bag-of-local-descriptors)

– get regions of interest– calculate local descriptors– treat them independently

• Divide image into overlapping square blocks• Extract color and edge descriptors

– Edge: uniform Local Binary Patterns – difference of intensities at circular neighborhood,

• 2^8 possible sequence = 256 bin histogram• Non-uniform sequences could be merged 59 bin histogram

– Color: histograms from k-means clustering• Train color codebook and map block pixel to closes value in codebook

– Concatenate in the end

Image representation II• Aim for high dimensional sparse vector representation• Thus representing local descriptor as visual term and

image is represented as binary vector indicating presence/absence of visual term

• Visual terms are rated according to term frequency and inverse document frequency

• Parameters of setup: – 20 bins for colors– 10000 visterm vocabulary size (approx 70 non 0 values / img)– Blocks of 64x64 overlapping each 32 pixels– Blocks extracted at different scales, by downscaling images by

factor of 1:25 until less then 10 block remains

Experiments and evaluation• Tested in 2 settings

– Caltech256 dataset (30k images)– Web-Scale experiment (2.7 M images)– (another databases for image retrieval testing: MIRFLICK 1M, Corel5k,

Corel30k, UCID)

• Web-Scale Experiment:

– Queries from Google Image Search and relevance feedback– Stop condition for training is value of mean average precision (160M

iterations) ~ 4000 min on single CPU– Evaluation Criterion: mAP and precision at top k

Failure cases

Scalability• Comparison with Largest Margin Nearest Neighbour LMNN• Scales linearly with number of images

Caltech 256 test

Discussion• Metric learning could help to capture semantic relationships, once

visual similarity is available• Relevance feedback or semantic similarity measure (class

modeling) is required to capture semantic similarity• Compared to raw visual similarity comparison precision at top k

and mAP increases, • recall is hard to measure for databases, which are not fully

annotated• Online metric learning is an ongoing problem (Davis 2007) (Jain

2008) (Chechik 2010) and even though applied to images, could be used in other fields to capture semantic similarity• Images: object semantics vs. visual features• Documents: topics vs. textual features (dtf,tf-idf)• SBIR: relative object mapping vs. sketch features

Thank you for your attentionAvailable at: http://www.slideshare.net/lukastencer

http://www.slideshare.net/lukastencer

large scale online learning of image similarity through ranking

Technology

pi s pi

pi sw pi

pi r pi

sample pi

lwlw pi

pi max

rij r pi

number of images