accounting for the relative importance of objects in image retrieval
DESCRIPTION
Sung Ju Hwang and Kristen Grauman University of Texas at Austin. Accounting for the relative importance of objects in image retrieval. Image retrieval. Content-based retrieval from an image database. Image 1. Image 2. Image Database. Query image. …. Image k. - PowerPoint PPT PresentationTRANSCRIPT
ACCOUNTING FOR THE RELATIVE IMPORTANCE OF OB-JECTS IN IMAGE RETRIEVAL
Sung Ju Hwang and Kristen GraumanUniversity of Texas at Austin
Image retrieval
Query image
Image Database
Image 1
Image 2
Image k
Content-based retrieval from an image database…
Relative importance of objects
Query image
Image Database
Which image is more relevant to the query?
?
Relative importance of objects
Query imagecow
bird
water
cow
birdwater
Image Database
cow
fence
mud
Which image is more relevant to the query?
?
sky
Relative importance of objects
An image can contain many different objects,
but some are more “impor-tant” than oth-ers.
sky
water
mountain
architecture
bird
cow
Relative importance of objects
Some objects are background
sky
water
mountain
architecture
bird
cow
Relative importance of objects
Some objects are less salient
sky
water
mountain
architecture
bird
cow
Relative importance of objects
Some objects are more promi-nent or percep-tually define the scene
sky
water
mountain
architecture
bird
cow
Our goal
Goal: Retrieve those images that share important ob-jects with the query image.
versus
How to learn a representation that accounts for this?
The order in which person assigns tags provides implicit cues about object importance to scene.
Idea: image tags as importance cue
TAGSCowBirdsArchitectureWaterSky
TAGS:
CowBirdsArchitectureWaterSky
Idea: image tags as importance cue
Learn this connection to improve cross-modal retrieval and CBIR.
The order in which person assigns tags provides implicit cues about object importance to scene.
Related work
Previous work using tagged images focuses on the noun ↔ object correspondence.
Duygulu et al. 02 Fergus et al. 05 Li et al., 09Berg et al. 04
Lavrenko et al. 2003, Monay & Gatica-Perez 2003, Barnard et al. 2004, Schroff et al. 2007, Gupta & Davis 2008, …
Related work building richer image representations from “two-view” text+image data:
Bekkerman & Jeon 07, Qi et al. 09, Quack et al. 08, Quattoni et al 07, Yakhnenko & Honavar 09,…
Gupta et al. 08
height: 6-11 weight: 235 lbs position:forward, croatia college:
Blaschko & Lampert 08Hardoon et al. 04
Approach overview:Building the image database
Extract visual and tag-based
features
CowGrass
HorseGrass
CarHouseGrassSky
Learn projections from each feature
space into common “semantic space”
Tagged training images
…
CowTree
Retrieved tag-list
• Image-to-image retrieval• Image-to-tag auto annotation• Tag-to-image retrieval
Approach overview:Retrieval from the database
Untagged query image
CowTreeGrass
Tag list query
Imagedatabase
Retrieved im-ages
Dual-view semantic space
Visual features and tag-lists are two views generated by the same concept.
Semantic space
Learning mappings to semantic spaceCanonical Correlation Analysis (CCA): choose pro-jection directions that maximize the correlation of views projected from same instance.
Semantic space: new common feature space
View 1View 2
Kernel Canonical Correlation Analysis
Linear CCA Given paired data:
Select directions so as to maximize:
Same objective, but projections in kernel space:
,
Kernel CCA Given pair of kernel functions:
,
[Akaho 2001, Fyfe et al. 2001, Hardoon et al. 2004]
Semantic space
Building the kernels for each view
Word frequency,rank kernels
Visual kernels
Visual features
captures the HSV color distribution
captures the total scene structure
captures local ap-pearance (k-means on DoG+SIFT)
Color Histogram Visual WordsGist
[Torralba et al.]
Average the component χ2 kernels to build a sin-gle visual kernel .
Tag features
Traditional bag-of-(text)wordsWord Frequency
CowBirdWaterArchitectureMountainSky
tag countCow 1Bird 1Water 1Architecture 1Mountain 1Sky 1Car 0Person 0
Tag features
Absolute Rank
CowBirdWaterArchitectureMountainSky
Absolute rank in this image’s tag-list
tag valueCow 1Bird 0.63Water 0.50Architecture 0.43Mountain 0.39Sky 0.36Car 0Person 0
Tag features
Relative Rank
CowBirdWaterArchitectureMountainSky
Percentile rank obtained from the rank distribution of that word in all tag-lists. tag value
Cow 0.9Bird 0.6Water 0.8Architecture 0.5Mountain 0.8Sky 0.8Car 0Person 0
Average the component χ2 kernels to build a sin-gle tag kernel .
Recap: Building the image database
Semantic space
Visual feature space tag feature space
Experiments
We compare the retrieval performance of our method with two baselines:
Query image
1st retrieved image
Visual-Only Baseline
Query im-age
1st retrieved image
Words+Visual Baseline
[Hardoon et al. 2004, Yakhenenko et al. 2009]
KCCA seman-tic space
We use Normalized Discounted Cumulative Gain at top K (NDCG@K) to evaluate retrieval performance:
Evaluation
Doing well in the top ranks is more important.
Sum of all the scores for the perfect ranking(normalization)
Reward termscore for pth ranked example
[Kekalainen & Jarvelin, 2002]
We present the NDCG@k score using two different re-ward terms:
Evaluation
scale presence relative rank
absolute rank
Object presence/scale Ordered tag similarity
CowTreeGrass
PersonCowTreeFenceGrass
Rewards similarity of query’s ob-jects/scales and those in re-trieved image(s).
Rewards similarity of query’s ground truth tag ranks and those in retrieved image(s).
Dataset
LabelMe
6352 images Database: 3799 images Query: 2553 images
Scene-oriented Contains the ordered
tag lists via labels added
56 unique taggers ~23 tags/image
Pascal
9963 images Database: 5011 images Query: 4952 images
Object-central Tag lists obtained on
Mechanical Turk 758 unique taggers ~5.5 tags/image
Imagedatabase
Image-to-image retrieval
We want to retrieve images most similar to the given query image in terms of object importance.
Tag-list kernel spaceVisual kernel space
Untagged query image
Retrieved images
Our method
Words +
Visual
Visual only
Image-to-image retrieval results
Query Image
Image-to-image retrieval results
Our method
Words +
Visual
Visual only
Query Image
Image-to-image retrieval results
Our method better retrieves images that share the query’s important objects, by both measures.
Retrieval accuracymeasured by object+scale similarity
Retrieval accuracymeasured by ordered tag-list similarity
39% improvement
Tag-to-image retrieval
We want to retrieve the images that are best described by the given tag list
Imagedatabase
Tag-list kernel spaceVisual kernel space
Query tags
CowPersonTreeGrassRetrieved images
Tag-to-image retrieval results
Our method better respects the importance cues implied by the user’s keyword query.
31% improvement
Image-to-tag auto annotation
We want to annotate query image with ordered tags that best describe the scene.
Imagedatabase
Tag-list kernel spaceVisual kernel space
Untagged query image Output tag-lists
CowTreeGrass
CowGrass
FieldCowFence
Image-to-tag auto annotation results
BoatPersonWaterSkyRock
BottleKnifeNapkinLightfork
PersonTreeCarChairWindow
TreeBoatGrassWaterPerson
Method k=1 k=3 k=5 k=10
Visual-only 0.0826 0.1765 0.2022 0.2095
Word+Visual 0.0818 0.1712 0.1992 0.2097
Ours 0.0901 0.1936 0.2230 0.2335
k = number of nearest neighbors used
WomanTableMugLadder
Implicit tag cues as localization prior
MugKeyKeyboardTooth-brushPenPhotoPost-it
Object de-tector
Implicit tag features
ComputerPosterDeskScreenMugPoster
Training: Learn object-specific connection between localization parameters and implicit tag features.
MugEiffel
DeskMugOffice
MugCoffee
Testing: Given novel image, localize objects based on both tags and appearance.
P (location, scale | tags)
Implicit tag features
[Hwang & Grauman, CVPR 2010]
Conclusion
• We want to learn what is implied (beyond objects present) by how a human provides tags for an im-age
• Approach requires minimal supervision to learn the connection between importance conveyed by tags and visual features.
• Consistent gains over• content-based visual search • tag+visual approach that disregards importance
THANK YOU