automatic image annotation and retreval using cross-media relevance models

Automatic Image Annotation and Automatic Image Annotation and Retreval using Cross-Media Retreval using Cross-Media Relevance ModelsRelevance Models

J.Jeon, V. Lavrenko and R. Manmatha

Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherts

1096304144 鄭志毅

Introduction(1)Introduction(1)What is Image Retrieval? Given a database of images and a query

string (e.g. words), what are the images

that are described by the words?Query String: “jet”

Introduction(2)Introduction(2)query by example

QBIC(IBM) , PhotoBook (MIT) ,VisualSEEK(UBC)

Introduction(3)Introduction(3)What is Image Annotation(1)?( Object

recognition)each region have a word to describe

Introduction(4)Introduction(4)Given an image, what are the words that describe the

image(2)(use a set of word to annotation image)

OutlineOutline

Preprocessing

Cross-Media Relevance Model

Experiment

Conclusions

Preprocessing(1)_segmentPreprocessing(1)_segment

tiger , grass, green xi = vector of image features

x = {x1, x2 , … }wi = one word

w = {w1, w2 , … }= vector of words

= vector of feature vectors

b1= vector of image

featuresb2= vector of image

featureslocal based(region)

global based(grid)

Normalized cuts segmentation

Grid segmentation

Preprocessing(2)_feature Preprocessing(2)_feature extractionextraction

extract each region featuresall 30 features[22]:

area x, y, boundary_len^2/area, convexity, moment-of-inertia (6)

color moment:

ave RGB (3) (mean)

RGB stdev (3) (standard deviation)

ave L*a*b (3) (mean)

lab stdev (3) (standard deviation)

texture :

oriented energy, 30 degree increments (12)

f1 f2 f3 …. f30

b1

b2

b3

bi

30 features

blobs

Preprocessing(3)_Clustering Preprocessing(3)_Clustering to blobto blobuse k-means to cluster each

region features(k=500)get a cluster maps ,and each

cluster call “blob” in the maps

blob1

blob2

blob3

… …

… … … … …

… … … … …

… … … … …

… … … … …

… … … … …

Blobs

… …

Segments

k=500

Preprocessing(4)_finalPreprocessing(4)_finaleach image I ={b1,b2,b3,…..} (non

annotation image)

each image have one or five keyword in training set ,J={b1,b2,b3…bm; w1, w2 , …wn };wn is Tf (term frequency)

Cross Media Relevance Cross Media Relevance ModelsModels

Estimating Relevance Model – the joint distribution of words and blobs

Find probability of observing word w and image region bi P(w,b1,…,bm) together(information retrieval, language models:elevance model)

To annotate image with blobs ◦ Grass, tiger, water, road◦ P(w|b1,b2,b3,b4)◦ If top three probabilities are for words

grass, water, tiger. Then annotate image with grass, water, tiger

Tiger

R

Water

Grass

Relevance ModelsRelevance ModelsAnnotationJoint distribution computed as an

expectation over the training set J

Given J, the events are independent

)...|()|( 1 mbbwPIwP

)|,...,,()()...,( 11 JbbwPJPbbwP mJ

m

)|()|()()...,(1

1 JbPJwPJPbbwP iJ

m

im

set test in the image all :I,set traingin the images all:J

)annotation(non I image theof blos:ib

Image AnnotationImage AnnotationCompute P(w|I) for different w

Probabilistic Annotation:◦Annotate the image with every

possible w in the vocabulary with associated probabilities.

◦take the top (3 or 4) words for every image and annotate images with them.

Image RetrievalImage Retrieval• Language Modeling Approach: • Given a Query Q, the probability of

drawing Q from image I is

• Or using the probabilistic annotation.

• Rank images according to this probability.

)|()|(1

IwPIQPk

jj

)|()|()()|(11

JbPJwPJPIQP iJ

m

ij

k

j

ExperimentExperimentDataset[22] 5,000 images from 50 Corel Stock

Photo cds (4500 tringing set,500 test set)

Segmentation using normalized cuts followed by quantization ensures that there are 1-10 blobs for each image.

Each image was also assigned 1-5 keywords.

371 words and 500 blobs

for a single word(compare other two models for image annotation)

Nc is the number of correctly predicted test images

N is the number of all test image predicted by the word

Nr is the number of test images actually annotated by the word

precision = recall =

Comparison of 3 models: The graph shows mean precisions and recall for 3 models for 70 queries (one word queries)

N

Nc

Nr

Nc

• Annotation examples - CMRMAnnotation examples - CMRM

Annotation

Original

PEOPLE, POOL, SWIMMERS, WATER

CARS, FORMULA, TRACKS, WALL

CLOUDS, MOUNTAIN, SKY, WATER

FIELD, FOALS, HORSES, MARE

JET, PLANE, SKY

BIRDS, NEST, TREE

Automatic

WATER, PEOPLE, SWIMMERS, POOL

CARS, TRACKS, WALL, FORMULA

SKY, MOUNTAIN, CLOUDS, PARK

FIELD, HORSES, FOALS, MARE

SKY, PLANE, JET, CLOUDS

BIRDS, NEST, GRASS, TREE

• Retrieval examples – Top 4 images, CMRM

Query : Tiger Query : Pillar

ConclusionsConclusionslarge amounts of labeled training

and test data

better feature extraction or the use of continuous features will probably improve the results

automatic image annotation and retreval using cross-media relevance models

Documents