[ieee 2008 ieee international conference on research, innovation and vision for the future in...

7
Factorial Correspondence Analysis for Image Retrieval Nguyen-Khang Pham ∗† , Annie Morin , Patrick Gros and Quyet-Thang Le College of Information Technology Cantho University Cantho, Vietnam {pnkhang,lqthang}@cit.ctu.edu.vn TEXMEX Group IRISA Rennes, France {pnguyenk,amorin,pgros}@irisa.fr Abstract—We are concerned by the use of Factorial Correspon- dence Analysis (FCA) for image retrieval. FCA is designed for analyzing contingency tables. In Textual Data Analysis (TDA), FCA analyzes a contingency table crossing terms/words and doc- uments. To adapt FCA on images, we first define “visual words” computed from Scalable Invariant Feature Transform (SIFT) descriptors in images and use them for image quantization. At this step, we can build a contingency table crossing “visual words” as terms/words and images as documents. The method was tested on the Caltech4 and Stew´ enius and Nist´ er datasets on which it provides better results (quality of results and execution time) than classical methods as tf idf and Probabilistic Latent Semantic Analysis (PLSA). To scale up and improve the retrieval quality, we propose a new retrieval schema using inverted files based on the relevant indicators of Correspondence Analysis (representation quality of images on axes and contribution of images to the inertia of the axes). The numerical experiments show that our algorithm performs faster than the exhaustive method without losing precision. Index Terms—Bag of words, Content based Image Retrieval, Factorial Correspondence Analysis, Inverted file, SIFT I. I NTRODUCTION The use of local descriptor in images has shown to be a good choice for image analysis. There are many successful applications using local descriptors such as: image recognition, image classification and image retrieval. Recently, the methods developed initially for Textual Data Analysis such as LSA (Latent Semantic Analysis) [1], PLSA (probabilistic Latent Semantic Analysis) [2], LDA (Latent Dirichlet Allocation) [3] have been used for image analysis, e.g. image classification [4], image topic discovery [5], scene classification [6] and image retrieval [7]. These methods try to model the corpora and to reduce dimensions. Among the disadvantage of these methods, we find the use of an ad hoc model and an EM algorithm to find a local optimum and the difficulty to interpret the results. Most of the works use such methods as black boxes. Here, we focus on the use of Factorial Correspondence Analysis (FCA) [8] for the retrieval of images. Given an image query, the system must return the most similar images (in the image collection) to the query. FCA reduces the space representing images and defines the similarity among images in a smaller space. To deal with large databases, many techniques have been developed. Most of them suffer from high dimension problem due to the curse of dimensionality [9]. For overcome this problem we propose a new retrieval schema using inverted files constructed from relevant indicators of Correspondence Analysis. The article is organized as follows: we briefly describe the PLSA and FCA methods in the section 2. Section 3 presents the construction of “visual words” and image representation. The image retrieval by FCA is presented in the section 4. Section 5 shows some numerical results. In the last section, we present some perspectives for this work. II. METHODS A. Probabilistic Latent Semantic Analysis Proposed by Thomas Hofmann, PLSA is a statistical tech- nique for the analysis of contingency tables. PLSA is evolved from Latent Semantic Analysis (LSA) which is a purely geo- metric method mapping documents to a reduced vector space, so-called latent semantic space. The mapping is restricted to be linear and is based on Singular Value Decomposition of the co-occurrence table. In contrast of LSA, PLSA is based on a decomposition of mixtures derived from a latent variable model for co-occurrence data which associates an unobservable variable z Z = {z 1 ,z 2 ,...,z K } with each observation. The joint probability P (d, w) over documents and words is defined by the mixture P (d, w) = P (d)P (w|d), where (1) P (w|d) = zZ P (w|z)P (z|d). (2) The logarithm of likelihood of the corpora is computed by the formula: L = dD wW F (d, w)log(P (d, w)) (3) where 269 978-1-4244-2379-8/08/$25.00 (c)2008 IEEE

Upload: quyet-thang

Post on 26-Feb-2017

217 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: [IEEE 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies - Ho Chi Minh City, Vietnam (2008.07.13-2008.07.17)]

Factorial Correspondence Analysis for ImageRetrieval

Nguyen-Khang Pham∗ †, Annie Morin†, Patrick Gros† and Quyet-Thang Le∗∗College of Information Technology

Cantho UniversityCantho, Vietnam

{pnkhang,lqthang}@cit.ctu.edu.vn†TEXMEX Group

IRISARennes, France

{pnguyenk,amorin,pgros}@irisa.fr

Abstract—We are concerned by the use of Factorial Correspon-dence Analysis (FCA) for image retrieval. FCA is designed foranalyzing contingency tables. In Textual Data Analysis (TDA),FCA analyzes a contingency table crossing terms/words and doc-uments. To adapt FCA on images, we first define “visual words”computed from Scalable Invariant Feature Transform (SIFT)descriptors in images and use them for image quantization.At this step, we can build a contingency table crossing “visualwords” as terms/words and images as documents. The methodwas tested on the Caltech4 and Stewenius and Nister datasets onwhich it provides better results (quality of results and executiontime) than classical methods as tf ∗ idf and Probabilistic LatentSemantic Analysis (PLSA). To scale up and improve the retrievalquality, we propose a new retrieval schema using inverted filesbased on the relevant indicators of Correspondence Analysis(representation quality of images on axes and contribution ofimages to the inertia of the axes). The numerical experimentsshow that our algorithm performs faster than the exhaustivemethod without losing precision.

Index Terms—Bag of words, Content based Image Retrieval,Factorial Correspondence Analysis, Inverted file, SIFT

I. INTRODUCTION

The use of local descriptor in images has shown to be agood choice for image analysis. There are many successfulapplications using local descriptors such as: image recognition,image classification and image retrieval. Recently, the methodsdeveloped initially for Textual Data Analysis such as LSA(Latent Semantic Analysis) [1], PLSA (probabilistic LatentSemantic Analysis) [2], LDA (Latent Dirichlet Allocation) [3]have been used for image analysis, e.g. image classification[4], image topic discovery [5], scene classification [6] andimage retrieval [7]. These methods try to model the corporaand to reduce dimensions. Among the disadvantage of thesemethods, we find the use of an ad hoc model and an EMalgorithm to find a local optimum and the difficulty to interpretthe results. Most of the works use such methods as blackboxes.

Here, we focus on the use of Factorial CorrespondenceAnalysis (FCA) [8] for the retrieval of images. Given an imagequery, the system must return the most similar images (inthe image collection) to the query. FCA reduces the space

representing images and defines the similarity among imagesin a smaller space.

To deal with large databases, many techniques have beendeveloped. Most of them suffer from high dimension problemdue to the curse of dimensionality [9]. For overcome thisproblem we propose a new retrieval schema using invertedfiles constructed from relevant indicators of CorrespondenceAnalysis.

The article is organized as follows: we briefly describe thePLSA and FCA methods in the section 2. Section 3 presentsthe construction of “visual words” and image representation.The image retrieval by FCA is presented in the section 4.Section 5 shows some numerical results. In the last section,we present some perspectives for this work.

II. METHODS

A. Probabilistic Latent Semantic Analysis

Proposed by Thomas Hofmann, PLSA is a statistical tech-nique for the analysis of contingency tables. PLSA is evolvedfrom Latent Semantic Analysis (LSA) which is a purely geo-metric method mapping documents to a reduced vector space,so-called latent semantic space. The mapping is restrictedto be linear and is based on Singular Value Decompositionof the co-occurrence table. In contrast of LSA, PLSA isbased on a decomposition of mixtures derived from a latentvariable model for co-occurrence data which associates anunobservable variable z ∈ Z = {z1, z2, . . . , zK} with eachobservation. The joint probability P (d,w) over documents andwords is defined by the mixture

P (d,w) = P (d)P (w|d),where (1)

P (w|d) =∑z∈Z

P (w|z)P (z|d). (2)

The logarithm of likelihood of the corpora is computed bythe formula:

L =∑d∈D

∑w∈W

F (d,w)log(P (d,w)) (3)

where

269978-1-4244-2379-8/08/$25.00 (c)2008 IEEE

Page 2: [IEEE 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies - Ho Chi Minh City, Vietnam (2008.07.13-2008.07.17)]

• D = {d1, d2, ..., dM}: set of documents;• W = {w1, w2, ..., wN}: set of words and• F : contingency table.The model is fitted by an Expectation Maximization (EM)

algorithm. EM alternates two steps:(i) E-step:

P (z|d,w) =P (z)P (d|z)P (w|z)∑

z′ P (z′)P (d|z′)P (w|z′) (4)

(ii) M-step:

P (d|z) =∑

w F (d,w)P (z|d,w)∑d′,w F (d′, w)P (z|d′, w)

(5)

P (w|z) =∑

d F (d,w)P (z|d,w)∑d,w′ F (d,w′)P (z|d,w′) (6)

P (z) =1R

∑d,w

F (d,w)P (z|d,w),where (7)

R ≡∑d,w

F (d,w). (8)

B. Factorial Correspondence Analysis

FCA is a classical exploratory method for the analysis ofcontingency tables. It was proposed by J. P. Benzecri [8] inthe linguistic context, i.e. textual data analysis. The first studywas performed on the tragedies of Racine. FCA on a tablecrossing words and documents allows answering the followingquestions: Is there any proximity among certain words? Isthere any proximity among certain documents? Is there anylink among certain words and certain documents? FCA likemost factorial method uses a singular value decomposition of aparticular matrix and allows viewing words and documents ina reduced space. This reduced space has a particular proprietywhere points are projected (words and/or documents) with amaximum inertia. In addition, FCA provides some relevantindicators for the interpretation of the axes as the contributionof a word or a document to the inertia of the axis or therepresentation quality of a word and/or document on an axis[10], [11]. We now briefly describe the method:

Given a contingency table, F = {fij}M,N , (N < M) wenormalize F to X by:

s =M∑i=1

N∑j=1

fij

xij =fij

s, ∀i = 1..M, j = 1..N

and note:

pi =N∑

j=1

xij , ∀i = 1..M qj =M∑i=1

xij , ∀j = 1..N

P =

⎛⎜⎝

p1 0. . .

0 pM

⎞⎟⎠ Q =

⎛⎜⎝

q1 0. . .

0 qN

⎞⎟⎠

To determine the best sub-space for data projection, wecalculate the eigenvalues and eigenvectors of the matrix V =

XTP−1XQ−1 with size N ×N where XT is transpose ofX .

We then obtain the eigenvalues λ and eigenvectors μ of thematrix V:

λ =

⎛⎜⎜⎜⎝

λ1

λ2

...λN

⎞⎟⎟⎟⎠ μ =

⎛⎜⎜⎜⎝

μ11 μ12 ... μ1N

μ21 μ22 ... μ2N

......

. . ....

μN1 μN2 ... μNN

⎞⎟⎟⎟⎠

We keep only K(K < N) first eigenvalues and their cor-responding eigenvectors. These K eigenvectors constitute anorthonormal basis of the reduced space (also called, factorspace). The number of dimensions passes from N to K. Thedocuments (images) are projected in the reduced space by thefollowing formula:

Z = P−1XA where A = Q−1μ (9)

In this formula, P−1X represents line profiles and A is thetransition matrix associated to the FCA. The new coordinatesof the terms/words are computed by:

W = Q−1XTZλ−1/2 (10)

An unseen document (i.e. query) r = [r1 r2 . . . rN ] willbe projected in the reduced space by the transition formula(9):

Zr = rA where ri =ri∑N

j=1 rj(11)

III. IMAGE REPRESENTATION

In order to adapt textual methods (e.g. tf ∗ idf , PLSA,FCA) on images, we must first represent the image corporain the form of contingency table. Here images are treatedas documents and the “visual words” (to be defined) asterms/words.

Words in the images, called “visual words”, must be calcu-lated to form a vocabulary of N words. Each image will berepresented by a word histogram. The construction of visualwords is processed in two steps: (i) computation of localdescriptors for an image set, (ii) classification (clustering)of obtained descriptors. Each cluster corresponds to a visualword. Local descriptors in an image are also computed in twostages: we first detect the interest points in the image. Thesepoints are either maximums of Laplace of Gaussian [12], or3D local extremas of Difference of Gaussian [13], or thepoints detected by a Hessian-Affine detector [14]. Fig. 1 showssome interest points detected by a Hessian-Affine detector. Thedescriptor of interest points is then computed on gray levelgradient of the region around the point. The scalable invariantfeature transform descriptor, SIFT [15] is often preferred. EachSIFT descriptor is a 128-dimensional vector. An example ofSIFT is shown in Fig. 2. The second step is to form visualwords from the local descriptors computed in the previousstep. Most of the works perform a k-means on descriptors andtake the averages of each cluster as visual word [6], [7], [5],[4], [16]. After building the visual vocabulary, each descriptor

270978-1-4244-2379-8/08/$25.00 (c)2008 IEEE

Page 3: [IEEE 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies - Ho Chi Minh City, Vietnam (2008.07.13-2008.07.17)]

Fig. 1. Interest points detected by Hessian-Affine detector

Fig. 2. A SIFT descriptor computed from the region around the interpretpoint (the circle): gradient of the image (left), descriptor of the interest point(right)

is assigned to its nearest cluster. For this, we compute, inR128, distances from each descriptor to the representatives ofpreviously defined clusters. Thus an image is characterized bythe frequency of its descriptors and the image corpora will berepresented in the form of a contingency table crossing imagesand clusters (visual words).

In our experiments, we use the method described in [14] todetect interest points. On Caltech4 database the vocabulary isbuilt using a k-means with about 300000 descriptors drawnrandomly (one third for each category: faces, motorbikes,airplanes, cars and background). The obtained vocabularyconsists of 2224 words from 4090 images. The number ofwords in the vocabulary was chosen by Sivic [5].

IV. FCA FOR IMAGE RETRIEVAL

A. FCA for reduction of dimension

One of advantages of FCA is to reduce the dimensionality ofthe problem. A tree structure based indexing as a kd-tree [17]is desirable after dimension reduction by the AFC. However,such a structure would become inefficient when the numberof dimensions is greater than 16 because of the curse ofdimensionality [9]. In addition tree based indexing often usesthe Euclidean distance for nearest neighbors search. So weencounter some difficulties when working with other distances.

B. Advanced search with FCA

In the context of k nearest neighbors query, many ap-proaches have been developed to overcome the curse ofdimensionality. They are usually classified in five categories:

(1) tree based indexing; (2) space-filling curves; (3) dimensionreduction; (4) approximate algorithms and (5) filtering-based(i.e. approximation) approaches. Our approach is based on thelast category. The filtering-based approaches filter the pointsso that only a small fraction of the database is scanned duringa search. The main idea of our approach is that two similarimages share certain common proprieties. Given a query theimages which share nothing with the query will be filtered.

There are two important indicators for the interpretation andevaluation of FCA. These are contribution to the inertia of anaxis (factor) of images on the one hand and the quality ofrepresentation of images on an axis on the other. We will usethese indicators as relevant properties for filtering. This willreduce the number of images to be considered when calcu-lating their similarity with the query and sometimes improvesthe quality of results. In fact, we associate two inverted fileswith each axis (one for positive part and another for negativepart). Each part of an axis corresponds to a relevant proprietythat allows distinguishing images. The definition of invertedfiles is given in the following.

1) Definition 1 (contribution): The contribution of theimage i to the inertia of the axis j is defined by:

Crj(i) = pi

Z2ij

λj(12)

where

• pi: mass of the image i;• i: ith eigenvalue and• Zij : coordinate of the image i on the axis j

2) Definition 2 (representation quality): The representationquality of the image i on the axis j is the square of cosine ofthe angle between the vector which joints the gravity centerG to the point i and the axis j:

Cos2j (i) =d2

j (i, G)d2(i, G)

=Z2

ij∑j Z

2ij

(13)

3) Definition 3 (contribution based inverted file): Givena threshold ε > 0, the two contribution based inverted filesassociated to the axis j are defined by:

CF+j = {i|Crj(i) > ε and Zij > 0} (14)

CF−j = {i|Crj(i) > ε and Zij < 0} (15)

4) Definition 4 (representation quality based inverted file):Given a threshold ε > 0, the two representation quality basedinverted files associated to the axis j are defined by:

QF+j = {i|Cos2j (i) > ε and Zij > 0} (16)

QF−j = {i|Cos2j (i) > ε and Zij < 0} (17)

5) Image retrieval algorithm using inverted files: First thequery is folded-in using formula (11). After obtaining thecoordinates of the query in reduced space (factor space) wethen choose some relevant proprieties and take their associatedinverted file. The inverted files are merged to form a list ofcandidate images. Finally, the k nearest neighbors of the query

271978-1-4244-2379-8/08/$25.00 (c)2008 IEEE

Page 4: [IEEE 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies - Ho Chi Minh City, Vietnam (2008.07.13-2008.07.17)]

TABLE IRETRIEVAL ALGORITHM USING INVERTED FILES

Retrieval AlgorithmInput:

q: queryk: number of nearest neighbors

Output:k nearest neighbors of q

Algorithm:1. Project q in factor space (fold-in query) using formula

(11) with r ≡ q2. Sort the axes by contribution (or representation quality)3. Choose n first axes and take their associated inverted file4. Merge the inverted files ⇒ L5. Filter images to form a list of candidate images C:

C = ∅FOR each image i in L DO

IF i appears at least n thres times in L THENC = C ∪ {i}

6. Find k nearest neighbors of q in C7. Return k nearest neighbors of q

are searched in the list of candidates. The algorithm is givenin table I.

In this algorithm, there are two parameters to tune: thenumber of inverted files to take n in step 3 and n thres instep 5. A naive solution is that we take any odd number forn (e.g. 3) and set n thres to �n/2� (e.g. 2).

We propose here a heuristics-based approach which can beused to automatically determine n and n thres dependingon the image query. This approach is based on the followingobservation: “if a point is well represented on some axes,the representation quality on these axes will be great and therepresentation quality on the other axes will be small becausethe sum of representation quality is equal to one”. So, we cantake n first axes such that their representation quality is greaterthan the threshold ε in the phase of construction of invertedfiles and/or the sum of their representation quality is greaterthan a threshold α (e.g. α = 0.75). To determine n thres, webase on the fact that if n thres is too great the constraint istoo restricted and the list of candidate images can be empty.If n thres is too small the number of images in the list willbe great and the respond time is long. So n thres should bethe greatest integer such that the number of images in the listof candidate images is greater than a given number (e.g. 500).

V. NUMERICAL RESULTS

We implemented the FCA algorithm in C++ using CLA-PACK library [18] for matrix manipulation and eigen problemsolver. The tests were realized on two datasets: Caltech4 [5]and Stewenius and Nister dataset [19].

A. Datasets

1) Caltech4 dataset: The Caltech4 image database contains4090 images extracted from the Caltech11 database [20]distributed into 5 categories. The size of vocabulary is 2224.Table II describes the database.

TABLE IIDESCRIPTION OF THE CALTECH4 DATABASE

Category Number of imagesfaces 435motorbikes 800airplanes 800backgrounds 900cars (rear) 1155Total 4090

Fig. 3. Images drawn from the Caltech4 database

2) Stewenius and Nister dataset: This set consists of 2550groups of 4 images each which give 10200 images in total.All the images are 640x480. Fig. 4 draws some images inthis database. We experimented with a 5000 visual wordsconstructed from a subsample of SIFT descriptors extractedfrom the Corel image database.

B. FCA versus other methods

1) tf ∗ idf : This method performs a matrix transformation.Each element of the co-occurrence table is normalized totf(i, j) and weighted by idf(j) where tf(i, j) is number oftimes the word j appears in the document i divided by thelength of the document i; and idf(j) = log(N/Nj) withNj is the number of documents containing the word j andN is the number of documents in the corpora. The tf ∗ idfweighting schema is often used in the vector space modeltogether with cosine similarity to determine the similaritybetween two documents [21]. In our experiments, we use

272978-1-4244-2379-8/08/$25.00 (c)2008 IEEE

Page 5: [IEEE 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies - Ho Chi Minh City, Vietnam (2008.07.13-2008.07.17)]

Fig. 4. Images drawn from Stewenius and Nister image database

cosine similarity to compute the similarity of query and theimages in database. The technique of inverted file in [19] isalso used for accelerating the search.

2) PLSA: We used a PLSA model with 7 modalities whichgives the best results on the base Caltech4 [5]. Each imagein the database is represented by its distribution P (z|d). Thedimension of the problem is reduced to 7. The class specificdistribution P (z|d) are used to compute the similarity betweenthe query and images in the database.

3) FCA: To compare with PLSA we kept only 7 first axisafter having applied FCA on images. The projection of imageson some first axes is drawn in Fig. 5. With only 2 or 3axes, the categories are well separated after doing a FCA.This demonstrates that FCA has well performed dimensionreduction without losing discriminant.

4) Discussion: The number of topics (PLSA), number ofaxes kept (FCA), are parameters to tune. It is difficult to choosebecause the eigenvalues of FCA decrease slowly. The number“7” is chosen for comparison to PLSA. The performanceaugments when we keep more axes (e.g. 15, 30 axes). Fig. 6shows the performance (precision - recall curves) of differentmethods. It is clear that FCA performs better other methodswith cosine similarity measure.

In this experiment, FCA performs slightly better than PLSA.Generally, it is difficult to say which method is better. Herewe choose FCA because FCA reduces dimension and gives

Fig. 5. Projection of image in the Caltech4 database on some first axes afterFCA: 4 categories (top), 5 categories (bottom)

two important indicators for interpretation of result. We usethese indicators to construct inverted files for acceleratingthe retrieval. In addition, since PLSA uses EM algorithm, itcan only find a local solution but not a global solution. Theresult strongly depends on the initial configuration. Moreover,it is difficult (even impossible) to deal large databases withPLSA due to times and memory while it is easy to performincrementally a FCA in the case of large databases.

C. FCA with inverted file

1) Caltech4 dataset: To compare the performance of thenew retrieval method with the exhaustive method which scansentire database, we computed the precision on 5, 10, 50 and100 top images returned. The number of kept axes, parameterK, is set to 7, 15 and 30. Cosine similarity is used to measuresimilarity between two images. The threshold for inverted filesis set to 1/4 mean of contribution in the case of contribution-based inverted files and is set to 1/4 mean of representationquality in the case of representation quality-based invertedfiles; n is set to 3 and is computed automatically according to

273978-1-4244-2379-8/08/$25.00 (c)2008 IEEE

Page 6: [IEEE 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies - Ho Chi Minh City, Vietnam (2008.07.13-2008.07.17)]

Fig. 6. Precision - Recall Curves: performance comparison of tf*idf, PLSAand FCA with Euclidean distance and cosine similarity

TABLE IIIPERFORMANCE COMPARISON: PRECISION AT 5, 10, 50 AND 100 TOP

RETURNED IMAGES ON CALTECH4 DATABASE

Methods #image 5 10 50 100 Time(1) K = 7 – 94.87 93.48 90.54 89.08 0.42(1) K = 15 – 95.92 94.61 91.35 89.64 0.50(1) K = 30 – 96.30 95.03 91.22 89.23 0.66(2) K = 7 836 94.62 93.30 90.30 88.70 0.11(2) K = 15 984 95.97 94.63 91.23 89.43 0.15(2) K = 30 1137 96.29 94.97 91.05 88.87 0.22(3) K = 7 888 94.55 93.23 90.24 88.76 0.12(3) K = 15 791 95.85 94.47 91.01 88.93 0.13(3) K = 30 741 96.18 94.88 90.38 87.05 0.15(4) auto 1 783 95.94 94.58 91.23 89.41 0.13(5) auto 2 922 95.82 94.52 91.12 89.23 0.14(6) tf ∗ idf – 88.24 84.81 77.52 73.72 2.94

the heuristics described in the section IV-B5 above; an imagewill be added in the list of candidates if it appears at least inn thres = �n/2� inverted file (majority vote). The results areshown in table III. The “#imgs” column describes the averagesize of the candidate list. The columns 5, 10, 50 and 100show the precision at 5, 10, 50 and 100 first returned images.The “Time” column describes the response time per imagein millisecond. The parameter K is the number of kept axes.The method (1) scans entire database. The method (2) usesrepresentation quality-based inverted files while the method(3) uses contribution-based inverted files. Both of method (2)and (3) set n equal to 3 and n thes equal to 2. Methods (4)and (5) use the heuristics described in the section IV-B5 todetermine n in two cases: representation quality-based invertedfiles, method (4), and contribution-based inverted files, method(5). In these two experiments, K is set to 15. In comparison tomethod (1) the new algorithm performed about 4 times fasterthan the exhaustive one with losing less than 1% in term ofprecision. In all of cases, FCA (with or without inverted files)give better result than tf ∗ idf (method (6)).

2) Stewenius and Nister database: For this database, wecompute the precision on 4 top returned images and use it

TABLE IVPERFORMANCE COMPARISON OF METHODS ON STEWENIUS AND NISTER

DATABASE

Methods #image Perf. (%) Time (ms)FCA 10200 79.82 12.01Advanced FCAn = 21 650 79.75 1.04n thres = �n/2� = 11Advanced FCAn: auto (36), (sum > 0.75) 332 79.62 0.99n thres = n/2Advanced FCAn: auto (36), (sum > 0.75) 377 79.72 1.03n thres: autoAdvanced FCAn: auto (55), (sum > 0.85) 397 79.96 1.33n thres: autoAdvanced FCAn: auto (70), (sum > 0.9) 407 80.01 1.55n thres: autotf*idf 10200 73.04 36.45

for performance comparison. The number of axes kept isset to 200. Table IV shows the results of FCA, FCA withinverted files and tf ∗ idf methods. n is computed followingthe heuristic described in section IV-B5. We accelerated about10 times (in comparison to exhaustive method and 30 timesin comparison to tf ∗ idf ) without losing precision evenimproving the result.

VI. CONCLUSION

We have presented in this paper the use of Factorial Cor-respondence Analysis for content based image retrieval. Wealso proposed a new algorithm for improving the search quality(respond time and precision). The numerical experiments showthat FCA gives much better results than tf ∗ idf and slightlybetter PLSA. The new retrieval algorithm using invertedfiles improves considerable the respond time without losingprecision of result.

While studying the impact of parameter n thres we foundthat with a size of 1/100 database the list of candidates containsabout 90% relevant images. It means that we can achieve aprecision of 90% by scanning only 1/100 database if an ap-propriate similarity measure is used. This motivates us to planto combine our indexing technique and a relevant similaritymeasure as CDM (Contextual Dissimilarity Measure) [16] infuture works.

ACKNOWLEDGMENT

The authors would like to thank A. Zisserman for data ofCaltech4 and H. Jegou for preparing data of Stewenius andNister database.

REFERENCES

[1] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harsman,“Indexing by latent semantic analysis,” Journal of the American Societyfor Information Science, vol. 41, no. 6, pp. 391–407, 1990.

[2] T. Hofmann, “Probabilistic latent semantic analysis,” in Proceedings ofthe 15th Conference on Uncertainty in Artificial Intelligence (UAI’99),1999, pp. 289–296.

[3] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,”Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003.

274978-1-4244-2379-8/08/$25.00 (c)2008 IEEE

Page 7: [IEEE 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies - Ho Chi Minh City, Vietnam (2008.07.13-2008.07.17)]

[4] J. Willamowski, D. Arregui, G. Csurka, C. Dance, and L. Fan, “Catego-rizing nine visual classes using local appearance descriptors,” in Work-shop Learning for Adaptable Visual Systems (ICPR 2004) Cambridge,United Kingdom, 2004.

[5] J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman,“Discovering objects and their location in image collections,” in Pro-ceedings of the International Conference on Computer Vision, 2005, pp.370–377.

[6] A. Bosch, A. Zisserman, and X. Munoz, “Scene classification via pLSA,”in Proceedings of the European Conference on Computer Vision, 2006.

[7] R. Lienhart and M. Slaney, “plsa on large scale image databases,”in IEEE International Conference on Acoustics, Speech and SignalProcessing, 2007, pp. 1217–1220.

[8] J. P. Benzecri, L’analyse des correspondances. Paris: Dunod, 1973.[9] R. Bellman, “Adaptive control processes: A guided tour,” 1961.

[10] M. J. Greenacre, Correspondence analysis in practice, Second edition.Chapman and Hall, 2007.

[11] A. Morin, “Intensive use of correspondence analysis for informationretrieval,” in Proceedings of the 26th International Conference onInformation Technology Interfaces, ITI2004, 2004, pp. 255–258.

[12] T. Lindeberg, “Feature detection with automatic scale selection,” Inter-national Journal of Computer Vision, vol. 30, no. 2, pp. 79–116, 1998.

[13] D. G. Lowe, “Object recognition from local scale-invariant features,” inProceedings of the 7th International Conference on Computer Vision,Kerkyra, Greece, 1999, pp. 1150–1157.

[14] K. Mikolajczyk and C. Schmid, “Scale and affine invariant interest pointdetectors,” Proceedings of IJC V, vol. 60, no. 1, pp. 63–86, 2004.

[15] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”in International Journal of Computer Vision, 2004, pp. 91–110.

[16] H. Jegou, H. Harzallah, and C. Schmid, “A contextual dissimilaritymeasure for accurate and efficient image search,” in Proceedings ofCVPR’07, 2007, pp. 1–8.

[17] J. L. Bentley, “Multidimensional binary search trees used for associativesearching,” Communications of the ACM, vol. 18, no. 9, pp. 509–517,1975.

[18] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Don-garra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, andD. Sorensen, LAPACK Users’ Guide, 3rd ed. Philadelphia, PA: Societyfor Industrial and Applied Mathematics, 1999.

[19] D. Nister and H. Stewenius, “Scalable recognition with a vocabularytree,” in IEEE Conference on Computer Vision and Pattern Recognition(CVPR), vol. 2, June 2006, pp. 2161–2168.

[20] R. Fergus, P. Perona, and A. Zisserman, “Object class recognitionby unsupervised scale-invariant learning,” in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, vol. 2, june2003, pp. 264–271.

[21] G. Salton and C. Buckley, “Term-weighting approaches in automatictext retrieval,” Information Processing & Management, vol. 24, no. 5,pp. 513–523, 1988.

275978-1-4244-2379-8/08/$25.00 (c)2008 IEEE