automatic image annotation and retreval using cross-media relevance models
DESCRIPTION
Automatic Image Annotation and Retreval using Cross-Media Relevance Models. J.Jeon, V. Lavrenko and R. Manmatha Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherts. 1096304144 鄭志毅. Introduction(1). What is Image Retrieval? - PowerPoint PPT PresentationTRANSCRIPT
Automatic Image Annotation and Automatic Image Annotation and Retreval using Cross-Media Retreval using Cross-Media Relevance ModelsRelevance Models
J.Jeon, V. Lavrenko and R. Manmatha
Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherts
1096304144 鄭志毅
Introduction(1)Introduction(1)What is Image Retrieval? Given a database of images and a query
string (e.g. words), what are the images
that are described by the words?Query String: “jet”
Introduction(2)Introduction(2)query by example
QBIC(IBM) , PhotoBook (MIT) ,VisualSEEK(UBC)
Introduction(3)Introduction(3)What is Image Annotation(1)?( Object
recognition)each region have a word to describe
Introduction(4)Introduction(4)Given an image, what are the words that describe the
image(2)(use a set of word to annotation image)
OutlineOutline
Preprocessing
Cross-Media Relevance Model
Experiment
Conclusions
Preprocessing(1)_segmentPreprocessing(1)_segment
tiger , grass, green xi = vector of image features
x = {x1, x2 , … }wi = one word
w = {w1, w2 , … }= vector of words
= vector of feature vectors
b1= vector of image
featuresb2= vector of image
featureslocal based(region)
global based(grid)
Normalized cuts segmentation
Grid segmentation
Preprocessing(2)_feature Preprocessing(2)_feature extractionextraction
extract each region featuresall 30 features[22]:
area x, y, boundary_len^2/area, convexity, moment-of-inertia (6)
color moment:
ave RGB (3) (mean)
RGB stdev (3) (standard deviation)
ave L*a*b (3) (mean)
lab stdev (3) (standard deviation)
texture :
oriented energy, 30 degree increments (12)
f1 f2 f3 …. f30
b1
b2
b3
bi
30 features
blobs
Preprocessing(3)_Clustering Preprocessing(3)_Clustering to blobto blobuse k-means to cluster each
region features(k=500)get a cluster maps ,and each
cluster call “blob” in the maps
blob1
blob2
blob3
… …
… … … … …
… … … … …
… … … … …
… … … … …
… … … … …
Blobs
… …
Segments
k=500
Preprocessing(4)_finalPreprocessing(4)_finaleach image I ={b1,b2,b3,…..} (non
annotation image)
each image have one or five keyword in training set ,J={b1,b2,b3…bm; w1, w2 , …wn };wn is Tf (term frequency)
Cross Media Relevance Cross Media Relevance ModelsModels
Estimating Relevance Model – the joint distribution of words and blobs
Find probability of observing word w and image region bi P(w,b1,…,bm) together(information retrieval, language models:elevance model)
To annotate image with blobs ◦ Grass, tiger, water, road◦ P(w|b1,b2,b3,b4)◦ If top three probabilities are for words
grass, water, tiger. Then annotate image with grass, water, tiger
Tiger
R
Water
Grass
Relevance ModelsRelevance ModelsAnnotationJoint distribution computed as an
expectation over the training set J
Given J, the events are independent
)...|()|( 1 mbbwPIwP
)|,...,,()()...,( 11 JbbwPJPbbwP mJ
m
)|()|()()...,(1
1 JbPJwPJPbbwP iJ
m
im
set test in the image all :I,set traingin the images all:J
)annotation(non I image theof blos:ib
Image AnnotationImage AnnotationCompute P(w|I) for different w
Probabilistic Annotation:◦Annotate the image with every
possible w in the vocabulary with associated probabilities.
◦take the top (3 or 4) words for every image and annotate images with them.
Image RetrievalImage Retrieval• Language Modeling Approach: • Given a Query Q, the probability of
drawing Q from image I is
• Or using the probabilistic annotation.
• Rank images according to this probability.
)|()|(1
IwPIQPk
jj
)|()|()()|(11
JbPJwPJPIQP iJ
m
ij
k
j
ExperimentExperimentDataset[22] 5,000 images from 50 Corel Stock
Photo cds (4500 tringing set,500 test set)
Segmentation using normalized cuts followed by quantization ensures that there are 1-10 blobs for each image.
Each image was also assigned 1-5 keywords.
371 words and 500 blobs
for a single word(compare other two models for image annotation)
Nc is the number of correctly predicted test images
N is the number of all test image predicted by the word
Nr is the number of test images actually annotated by the word
precision = recall =
Comparison of 3 models: The graph shows mean precisions and recall for 3 models for 70 queries (one word queries)
N
Nc
Nr
Nc
• Annotation examples - CMRMAnnotation examples - CMRM
Annotation
Original
PEOPLE, POOL, SWIMMERS, WATER
CARS, FORMULA, TRACKS, WALL
CLOUDS, MOUNTAIN, SKY, WATER
FIELD, FOALS, HORSES, MARE
JET, PLANE, SKY
BIRDS, NEST, TREE
Automatic
WATER, PEOPLE, SWIMMERS, POOL
CARS, TRACKS, WALL, FORMULA
SKY, MOUNTAIN, CLOUDS, PARK
FIELD, HORSES, FOALS, MARE
SKY, PLANE, JET, CLOUDS
BIRDS, NEST, GRASS, TREE
• Retrieval examples – Top 4 images, CMRM
Query : Tiger Query : Pillar
ConclusionsConclusionslarge amounts of labeled training
and test data
better feature extraction or the use of continuous features will probably improve the results