image: an extensible agent-based architecture for image retrieval

15
Int J Digit Libr (2000) 2: 236–250 INTERNATIONAL JOURNAL ON Digital Libraries Springer-Verlag 2000 ImAge: an extensible agent-based architecture for image retrieval Hiranmay Ghosh 1 , Santanu Chaudhury 2, * , Chetan Arora 2 , Paramjeet Nirankari 2 1 Centre for Development of Telematics, 9th floor, Akbar Bhawan, New Delhi 110021, India; E-mail: [email protected] 2 Department of Electrical Engineering, Indian Institute of Technology, New Delhi 110016, India; E-mail: [email protected] Abstract. We present an open and extensible architec- ture, ImAge, for content-based image retrieval in a dis- tributed environment. The architecture proposes the use of system components with standard public interfaces for implementing retrieval functionality. The standard- ization of the components and their encapsulation in autonomous software agents result in functional strati- fication and easy extensibility. Collaboration of the in- dependent retrieval resources in ImAge results in en- hanced system capability. Reuse of existing retrieval re- sources is achieved by encapsulating them in agents with standard interfaces. The addition of independent agents with domain knowledege adds the capability of process- ing conceptual queries, while reusing the existing system components for feature-based retrieval. A communica- tion protocol allows the declaration of the capabilities of the system components and negotiations for optimal re- source selection for solving a retrieval problem. The use of mobile agents alleviates network bottlenecks. This pa- per describes a prototype implementation that validates the architecture. Key words: Content-based image retrieval – Digital li- brary – Multi-agent system – Distributed architecture – Conceptual query interpretation 1 Introduction The availability of digital images for different applica- tion domains calls for effective retrieval tools. An image, which is a two-dimensional array of image pixels, en- codes an enormous amount of information. Research in * Correspondence to: Santanu Chaudhury content-based image retrieval investigates new ways to in- terpret image data (pattern recognition algorithms) and establishing similarities between the images using such an interpretation. The effectiveness of different retrieval algorithms depends on the application. The same set of images needs to be interpreted differently by different retrieval methods to meet diverse user requirements. In a networked world, the image collections, the retrieval tools and the users are expected to be distributed across multiple locations. This paper addresses the problem of designing an image retrieval system that can fulfil the needs of a distributed environment and disperate constraints. The existing image repositories adopt different re- trieval paradigms and implement a few retrieval methods. Some of them use aggregate image features such as the color histogram and texture [5, 10], some use segmenta- tion information, i.e., image regions with relatively ho- mogeneous properties [4], while some others associate se- mantic meaning to the image segments using some do- main knowledge [6, 15, 18]. An image repository adopts some data model for representation of the image data. The images are indexed with one or more entities in the data model and are retrieved using a combination of these indices in the context of a query. The nature of the queries that can be satisfied by a repository is limited by the data model implemented in the collection. For example, a retrieval system like Webseek [5] that supports some se- mantic categorization of images and indexing based on aggregate image features in its data model cannot sup- port a query requiring segmentation. The different image repositories on the internet exhibit heterogeneity with re- spect to the data models and hence, with respect to their access mechanisms. An integrated framework for a multimedia digital library reusing the existing heterogeneous network re-

Upload: santanu-chaudhury

Post on 10-Dec-2016

228 views

Category:

Documents


0 download

TRANSCRIPT

Int J Digit Libr (2000) 2: 236–250 I N T E R N AT I O N A L J O U R N A L O N

Digital Libraries Springer-Verlag 2000

ImAge: an extensible agent-based architecture for imageretrieval

Hiranmay Ghosh1, Santanu Chaudhury2,∗, Chetan Arora2, Paramjeet Nirankari2

1 Centre for Development of Telematics, 9th floor, Akbar Bhawan, New Delhi 110021, India;E-mail: [email protected] Department of Electrical Engineering, Indian Institute of Technology, New Delhi 110016, India;E-mail: [email protected]

Abstract. We present an open and extensible architec-ture, ImAge, for content-based image retrieval in a dis-tributed environment. The architecture proposes the useof system components with standard public interfacesfor implementing retrieval functionality. The standard-ization of the components and their encapsulation inautonomous software agents result in functional strati-fication and easy extensibility. Collaboration of the in-dependent retrieval resources in ImAge results in en-hanced system capability. Reuse of existing retrieval re-sources is achieved by encapsulating them in agents withstandard interfaces. The addition of independent agentswith domain knowledege adds the capability of process-ing conceptual queries, while reusing the existing systemcomponents for feature-based retrieval. A communica-tion protocol allows the declaration of the capabilities ofthe system components and negotiations for optimal re-source selection for solving a retrieval problem. The useof mobile agents alleviates network bottlenecks. This pa-per describes a prototype implementation that validatesthe architecture.

Key words: Content-based image retrieval – Digital li-brary – Multi-agent system – Distributed architecture –Conceptual query interpretation

1 Introduction

The availability of digital images for different applica-tion domains calls for effective retrieval tools. An image,which is a two-dimensional array of image pixels, en-codes an enormous amount of information. Research in

∗ Correspondence to: Santanu Chaudhury

content-based image retrieval investigates new ways to in-terpret image data (pattern recognition algorithms) andestablishing similarities between the images using suchan interpretation. The effectiveness of different retrievalalgorithms depends on the application. The same set ofimages needs to be interpreted differently by differentretrieval methods to meet diverse user requirements. Ina networked world, the image collections, the retrievaltools and the users are expected to be distributed acrossmultiple locations. This paper addresses the problemof designing an image retrieval system that can fulfilthe needs of a distributed environment and disperateconstraints.

The existing image repositories adopt different re-trieval paradigms and implement a few retrieval methods.Some of them use aggregate image features such as thecolor histogram and texture [5, 10], some use segmenta-tion information, i.e., image regions with relatively ho-mogeneous properties [4], while some others associate se-mantic meaning to the image segments using some do-main knowledge [6, 15, 18]. An image repository adoptssome data model for representation of the image data.The images are indexed with one or more entities in thedata model and are retrieved using a combination of theseindices in the context of a query. The nature of the queriesthat can be satisfied by a repository is limited by thedata model implemented in the collection. For example,a retrieval system like Webseek [5] that supports some se-mantic categorization of images and indexing based onaggregate image features in its data model cannot sup-port a query requiring segmentation. The different imagerepositories on the internet exhibit heterogeneity with re-spect to the data models and hence, with respect to theiraccess mechanisms.

An integrated framework for a multimedia digitallibrary reusing the existing heterogeneous network re-

H. Ghosh et al.: ImAge: an extensible agent-based architecture for image retrieval 237

sources has been attempted in UMDL [2] and the Stan-ford University Digital Library [19] projects. These sys-tems use some mediator software to coordinate retrievalfrom the multiple repositories, which may have differ-ent organizations and different built-in retrieval methods.The loose coupling between the producers and consumersof information and the mechanism of dynamic resourcediscovery make the systems amenable to easy extension.The systems direct a query transparently to a set of ca-pable repositories. However, the architecture does notenhance the capabilities of the individual repositories. Asa result, retrieval is restricted to the repositories havinga built-in capability to process a query. Moreover, therecan be heterogeneity in the local interpretations resultingin an inconsistent set of documents being retrieved.

There are some examples of extensible image data-bases, where the data-model of a repository can be en-hanced by the action of external agencies. In MOODS [12],the system stores a set of low-level media features, whilea user can provide the rules for their interpretation usinga script language. Thus, the data model of the system canbe extended by adding new user scripts. In Mirror [7],some demons visit the database to extract new media fea-tures to augment the capability of the system. In eithercase, the extension becomes a permanent feature of thesystem, is done in anticipation, and cannot be dynami-cally tailored to the needs of a specific query.

In this paper, we present ImAge, an open and exten-sible architecture for a digital image library, where theretrieval functionality is implemented through the inter-action of standard reusable components. These compo-nents represent various entities required for a retrievalsystem, for example, the query interface of a repository,the data entities that populate a repository, and the pat-tern recognition routines, that transform a data objectinto another. The components may have different inter-nal structures but are encapsulated with standard publicinterface definitions. The different repositories may sup-port different sets of the data and query objects, therebyhaving their own individual character.

The approach followed in ImAge has quite a few ad-vantages. The definition of standard component inter-faces allow separation of the different functional units ofthe retrieval system, such as query interpretation, clas-sification and pattern recognition methods. These inde-pendent modules can be encapsulated into autonomoussoftware agents. The agents collaborate with each otherduring retrieval using the methods defined in their publicinterfaces. New agents conforming to the interface speci-fications can be dynamically incorporated in the system,resulting in its extensibility. The agents can declare theircapability set, which is used for negotiation in the contextof a retrieval. The architecture includes a mechanism forbenchmarking these agents against some common bench-mark data to ascertain their relative merits. Componentsencapsulating semantic knowledge can also be added tothe system resulting in the capability to process concep-

tual queries. The standardization of the interfaces resultin the possibility of independent research teams to con-tribute image analysis routines and domain knowledgeto the system independently of the underlying repositorystructures. These routines can be used with any imagerepository resulting in effective resource sharing. Theycan also build upon one another using public interfaces.It is also possible to include the existing image retrievalresources (e.g., WebSeek [5], QBIC [10], BlobWorld [4],etc.) in the architecture, by encapsulating them into au-tonomous agents conforming to the standard interfacedefinitions.

The ImAge architecture, which is motivated byUMDL, proposes a new communication framework whichallows the autonomous agents encapsulating the differentsystem components to collaborate during a retrieval. Thedifferent retrieval resources can be contributed by inde-pendent research groups and may exist anywhere in thenetwork. We encapsulate the pattern recognition routinesas mobile agents, so that they can travel across a widearea network to the repository sites and analyze the im-ages at their source.

We have implemented a prototype image retrievalsystem, ImAge, based on this architecture. The basicsystem supports query by example using the extractedimage features. Though the implementation is generic,we have experimented with the system on a collection oftourism-related images. An extended implementation in-cludes conceptual knowledge in the domain of tourismand supports conceptual query. The system can be easilyextended to other applications by incorporating appro-priate domain knowledge.

The aim of our research is the development of an ar-chitecture that will support content-based image retrievalfrom a multitude of distributed repositories which sup-port a standard retrieval protocol. We explore the pos-sibility of encapsulating the retrieval resources to realizestandard interfaces, so that they can collaborate duringretrieval. We do not consider the development of spe-cific retrieval algorithms, such as data models and patternrecognition algorithms, as part of this research. Thereis currently a strong research interest in multimedia re-trieval methods and adequate availability of the retrievalresources has been assumed.

The rest of this paper is organized as follows: Sec-tion 2 presents an overview of the multi-agent archi-tecture and describes the various roles played by theagents in the system. Section 3 describes the proto-col for capability negotiation and selection of agentteams. Section 4 describes the communication archi-tecture for the agents constituting the system. Sec-tions 5 and 6 describe some global policies for formulatingsearch strategy. Section 7 describes vertical extensionof the basic feature based retrieval system for concep-tual query processing. Finally, we conclude (Sect. 9)with a summary of our contribution and scope of futurework.

238 H. Ghosh et al.: ImAge: an extensible agent-based architecture for image retrieval

2 Architecture

ImAge has been modeled as an open society of au-tonomous and communicating software agents. Eachagent in the society implements an independent unit ofretrieval functionality. The collaboration of these agentsresults in solving a retrieval problem. New agents can dy-namically join the society and contribute to its growth.In order that an autonomous agent can contribute in anopen society, we define some definite roles in the sys-tem. An agent participates in the system in one of thesepredefined roles. We have followed an object-oriented ap-proach. An agent class has been associated with each ofthe roles in the system. Every agent is viewed as an objectbelonging to an agent class. Each agent class is charac-terized by a public interface definition, which defines itsfunctional behavior. Different agents in an agent class im-plement the public interface in its own way. Each agentclass has a generic implementation that implements itspublic interface. Every agent belonging to a class extendsthe generic agent and implements its technology specificmethods. For example, the generic Search Agent (SA) de-fines an abstract method similarity(), that returns thesimilarity value between two images. It is extended byevery agent of that class with a feature specific algorithm.ImAge puts no restrictions in the internal design or know-ledge representation techniques of an agent.

The agent classes in ImAge and their interactions areshown in Fig. 1. A User Interface Agent (UIA) providesthe human-machine interface of the system. It encapsu-lates the knowledge about the users, e.g., a user’s prefer-ences, feedback, history, etc. A UIA can implement anytype of user interface, e.g. natural language interface,query by example, etc. However, it must communicatethe query to the rest of the system in a standard form.A Search Coordination Agent (SCA) encapsulates theknowledge and the heuristic methods for solving a re-trieval problem and its optimization. It accepts the userquery from a UIA, interprets it and interacts with the

Meta

CollectionAgent

Registrar

CollectionAgent

CoordinatorSearch

UserInterface

SearchAgentAgent

SearchAgent

Search

Registrar

Fig. 1. Agent interaction

other agents for planning and scheduling the retrievalsubproblems. A Collection Agent (CA) forms a layer ofabstraction over an image repository. It encapsulates therepository structure and produces a standard view of thedifferent data elements available with the repository. Itdeclares the capabilities of a repository in terms of itsquery and data services to the external world. A SearchAgent (SA) encapsulates a specific image retrieval al-gorithm. It is developed independent of any repositorystructure and are made available in the network for pub-lic use. These agents can build upon one another to derivea complex data-model. These agents are designed as mo-bile agents so that they can travel to the collection sitesand can analyze the documents at their sources.

Since ImAge allows dynamic growth, the agents inthe system cannot be aware of each other’s existence.The Registration Agents (RAs) maintain a list of theagents available in the system with their capability de-scriptions and provide a mechanism for dynamic resourcediscovery. Since the architecture encourages agents to befreely installed in the system, the system may be popu-lated with a number of agents with similar capabilitiesbut with different performance figures. The BenchmarkAgents (BAs) benchmark the agents against a commonset of data, which enables optimal choice of agents forsolving a retrieval problem. The agent classes are de-scribed in more detail in the following subsections.

2.1 User Interface Agent

A UIA provides the human-machine interface of ImAge.It is possible to have different types of user interfaces inImAge that incorporate different forms of inputs, suchas query by example, keywords, natural language input,etc. A user can select an appropriate UIA depending onhis/her convenience. Every UIA should, however, trans-late the query to a standard form which is understood bythe rest of the system (see Sect. 4.2). Besides this, a UIAshould be able to handle a few other functions, such asthe convenient display of results, user registration, main-tenance of history, and accepting user feedback.

Since the functionality of a UIA largely depends onthe nature of the supported interface, there is no genericimplementation for this agent class. A UIA is transpar-ent to the complexity of the actual retrieval mechanism,that involves interaction of many retrieval resources. Itviews SCA as a complete search engine and submits theuser queries to the latter in interactive or non-interactivemodes.

2.2 Search Coordinator Agent

A Search Coordinator Agent (SCA) coordinates the re-trieval process utilizing the available resources in the sys-tem. Collaboration is achieved using a two-phase protocolas in [14]. In the planning stage, an SCA identifies an op-timal set of image classes in the available repositories and

H. Ghosh et al.: ImAge: an extensible agent-based architecture for image retrieval 239

an optimal set of search algorithms for each of the classes.The choice of image classes is guided by two factors:

1. The features of the data-model of a repository and theavailable public PR routines may not match for the re-pository to take part in a retrieval problem. In thatcase, the repository is not selected for retrieval.

2. A user is usually not interested in finding out an ex-haustive set of images that satisfy the query. Thus, itis possible to improve the performance of retrieval byselecting a subset of document classes where the prob-ability of finding the relevant documents is better thanthat in the others.

The selection of the search agents for a selected docu-ment class depends on the query and the data model sup-ported in the repository. In general, while the breadth ofsearch (the selection of the image classes) is determinedby the recall requirement, the selection of the search algo-rithms, where alternatives exist, is determined by the pre-cision requirements and the real-time constraints. Theseaspects are further elaborated in Sects. 3 and 6.

During the execution stage, an SCA contacts the se-lected CAs, and requests them to schedule the selectedSAs on the selected image categories. At the end of the re-trieval, an SCA sorts the images by the descending orderof their relevance and produces a subset of the images tothe user that meets the desired precision requirement.

A generic implementation of the search coordinatorimplements its overall operational logic, which can bespecialized using specific policies and heuristics to selectoptimal set of resources during the planning stage. It isalso possible to implement some specific strategies, e.g.for duplicate removal, while combining the search resultsfrom multiple sources.

2.3 Collection Agent

Individual image collections in a distributed environmentare characterized by their private data models and re-trieval methods. Generally, it is not possible for a softwareentity to utilize the data, unless it is tightly coupled tothe repository. It is in ImAge that we propose an interfacethat allows the repositories to export a part of their func-tionality, depending on their implementation. The publicview of a repository can be used by software agents thatare developed independent of the underlying repositorystructure.

A CA forms a layer of abstraction over a repository.It declares the public interface of a repository in termsof its public data and query models and exports the dataelements as standard objects when required by a retrievalalgorithm. Retrieval routines, external to the repository,can complement the native indexing supported in a re-pository and thereby augment the retrieval capability of-fered by the collection.

We view different image representations as reusableobjects when encapsulated with standard interfaces.

For example, the raw image data can be abstracted toa bitmap format, by supporting public methods likegetHeight(), getWidth() and getP ixel(x, y) regardlessof its physical format. Similarly, a color histogram canbe abstracted to 3-D bin values with public methods likegetBinV al(red, blue, green). We assume a set of suchstandard component definitions exist in a developmentlibrary and the public interface of a repository to bedeveloped using such interfaces. The library can be ex-tended by including new interfaces with innovation ofnew image data representations.

The existence of standard data objects in a repositorydoes not preclude it from having a private data model,exclusive to the repository. However, the query capabil-ity arising out of the private data model must be encap-sulated with a standard public interface. For example,the semantic categorization of the images in WebSeekcan be implemented using a private data model, whilethe query interface can be encapsulated as a standardkeyword-based query. The images retrieved from one ormore semantic categories of WebSeek can be subjected toexternal image analysis routines to augment its retrievalcapability.

The repositories in a heterogeneous environment canimplement different data models. However, the dataitems need to be encapsulated to a standard form for theSAs to build upon them. The static data model of therepository may be built by some manual or automaticprocess or generated by some of the available SAs. Thefeatures extracted by the SAs in the context of a querycan also become a part of the data model of a repository,or can be cached for possible future use, depending onthe policy of the CA. However, such dynamic capabilityenhancements increase the complexity of the capabilitydeclaration of the CA.

2.4 Search Agent

In ImAge, the capability of a repository is complementedby the action of some public1 pattern recognition (PR)routines in the context of a query. A PR routine acceptsraw image data or extracted features through the publicinterface of a repository and incorporates methods to in-terpret them in a novel way. These routines are developedindependently of any specific repository structure andcan be contributed to the system by independent researchteams. Moreover, these PR routines can build on one an-other and can produce a complex view of the image data.

An SA encapsulates a specific pattern recognition al-gorithm. Besides a feature extraction algorithm, it mustimplement a method to compare the feature object witha similar one to determine a similarity score for a pair of

1 The word public does not have any commercial connotation.The PR routines may be used for a fee or free of cost depending onthe policy of its provider.

240 H. Ghosh et al.: ImAge: an extensible agent-based architecture for image retrieval

images.2 Like a CA, an SA can allow public access to itsdata model, so that other SAs can build over it. In suchcases, the data elements must be encapsulated as stan-dard objects. The interface specification for such dataobjects are made publicly available in ImAge.

We consider two types of SAs in ImAge. Generic SAsimplement some feature-based retrieval algorithms, likecolors [22], textures [23] or simple shapes [11]. Theseagents can be used for many different applications. Spe-cialized SAs are application specific and incorporatea specific intelligence and training set for recognition ofcomplex image objects, e.g., the face of an important per-sonality. They can build upon low-level image featuresextracted by Generic SAs.

The SAs are designed independently of any repositorystructure. They complement the native capabilities of therepositories during a retrieval process. It may be neces-sary to use more than one SA in succession to satisfya query. The SAs are implemented as mobile agents, sothat they can travel to the collection sites and process themedia data or meta-data at their source. This feature alle-viates network traffic when a large number of multimediadocuments from different repositories are to be processedusing a series of SAs. It also alleviates the computationalbottlenecks by distributing the processing to multiple re-pository sites.

2.5 Registration Agent

The agents in the open-ended system do not have a prioriknowledge of each others capabilities and communicationaddresses. The Registration Agents (RAs) aid the agentcommunity in dynamic resource discovery. Every agent,which implements some services to be utilized by others,registers itself with an RA. A capability-based search bya client with the RAs yields the set of prospective agentswith the required capabilities. There can be several RAson the network. An agent can get registered with any ofthese RAs. The existence of multiple RAs distributes theworkload and reduces the network traffic. The RAs them-selves register with a meta-registrar, which helps in theirdiscovery. The address of the meta-registrar is universallyknown in the system.

A generic RA defines methods to register and to with-draw an agent. An agent is uniquely identified by its URIin a global network. An RA maintains a list of capabili-ties of an agent in quantitative terms. It defines a lookupmethod, which produces a ranked list of agents with re-spect to the desired capability set.

2 There may be different similarity computation algorithms asso-ciated with the same media feature, for example, histogram inter-section and vector space distance for color histograms. We assumeexactly one of the methods to be implemented in an SA. Anothersimilarity measure may be implemented in another agent which isknown in the system using a different URI.

2.6 Benchmark Agents

Since ImAge encourages uncontrolled addition of PR al-gorithms in the form of SAs, it is possible that the systemwill be populated with many SAs with similar capabili-ties but different performance figures. It is necessary tocompare their performances against a common set of rep-resentative media data, so that the most suitable agentcan be selected for a retrieval. A Benchmark Agent (BA)benchmarks a set of similar SAs against a common set ofbenchmark data. It typically contains a reasonably largeset of sample images drawn from multiple domains. A BAis designed to support benchmarking of a set of SAs, allof which can work with similar data models. A numberof BAs can co-exist in the system, having different data-models, and hence capable of benchmarking different setsof SAs.

An RA looks for suitable BAs that can benchmarkan SA at the time of its registration. If no such agent isfound (which is more likely in case of the Special SAs),the agent cannot be benchmarked and the performancedeclaration by the provider of the agent is relied on. Ifmore than one BA is found, the performance data for theagent is computed as a union of all benchmark results.The performance data are stored with the RA for futurereference.

The retrieval algorithm in ImAge relies on fusion ofinformation from many SAs. The algorithms employedfor similarity computation may produce results in dif-ferent ranges with different semantic interpretation. It istherefore required to normalize the results to a commonscale before combining them. We normalize the valuesto the range [0,1] using Gaussain normalization as inMARS [17].

3 Capability negotiation for agent teamformation

The different image repositories in a heterogeneous en-vironment, in general, implement different data modelsand have different capabilities. A repository can be en-capsulated in a CA to conform to some standard queryinterface definitions. The SAs can exploit these stan-dard interfaces to retrieve image data (or metadata). Thestandardization of the interface, however, does not implyuniformity. A repository may support a few of the sev-eral different interfaces defined with the system. For ex-ample, while WebSeek provides keyword-based semanticretrieval, a system like NetView [26] provides color-basedclassification. Therefore, the SAs required for solving a re-trieval problem depend not only on the the query butalso on the public interface of the repository. The prob-lem can be modeled as a search problem where the ob-jective is to reach any of a set of destinations (availableaccess methods of the repository) from a source (query re-quirement) through the intermediate states that can begenerated by the set of available SAs. The situation is

H. Ghosh et al.: ImAge: an extensible agent-based architecture for image retrieval 241

ImageData

Raw

q

Repository

NativeData

Model

A

B

C

D

E

F

G

I

H

P Q R S

Fig. 2. Agent cooperation

pictorially depicted in Fig. 2. In the figure, q representsa query object, {A,B, ...I} represent a set of intermediatemetadata objects, and {P,Q,R, S} represent the publicdata interfaces in a repository. The edges Aq, BA, etc.,represent the action of SAs in transforming one form ofmetadata to another. The graph is generated dynamicallyin the context of a query by a planning module, encap-sulated in a SCA with the knowledge of the capabilitiesof the CAs and the SAs in the system. In the example,we find two paths qCDQ and qEFGR that connects thequery to the public data interfaces in the repository. Theleast-cost path satisfying the performance constraints ischosen for satisfying the query. If no such path is found,the problem is intractable, and the repository cannottake part in the retrieval process. A capability descrip-tion language (Sect. 4.3) helps the SCA in this reasoningprocess.

4 Agent communication

The retrieval system proposed in this paper has beenmodeled as an open-ended multi-agent system, whereproblem solving is achieved through coordination ofa set of autonomous agents, and where new agents canbe dynamically added to the system. This necessitatesadoption of a standard communication language (ACL)throughout the system. There have been some initiativeson the development of ACL for heterogeneous environ-

ments, namely the KQML3 and the Arcol4. FIPA5 hasreleased a draft for a standardized version. The moti-vation behind the development of these languages hasbeen to enable a set of independently designed agentsto communicate to each other. The languages focus onthe transport and the language levels, i.e., they deal withthe mechanism of sending and receiving messages andtheir interpretation at the performative level. Since theselanguages are quite generic, their use requires some fur-ther understandings between the communicating agents.In order to make a meaningful communication, a senderagent needs to make some assumptions about the re-ceiver agent and vice-versa. Besides this, these languagesassume that the agents are implemented using some be-lief states, resulting in restrictions on the agent design.Finally, these generic ACLs do not provide for any val-idation of communication policies and communicationarchitecture. As a result, a complex protocol is requiredto notify the capabilities of the agents to each other and torecover from a situation when an agent receives a messagethat it cannot interpret.

In object-oriented design, it is possible to define thecommunication between the objects in terms of the pub-lic methods defined with these objects. One object cancommunicate some information to another by invokinga public method defined with the latter. The semantics ofthe communication is established by defining the mean-ings of the objects exchanged, either as a parameter or asthe return value, in the method invocation. Distributedcomputing platforms, such as Java or CORBA, supportRemote Method Invocation (RMI), which hides the un-derlying transport mechanism and enables an object toinvoke the method defined in a remote object transpar-ently, as if they were collocated [8]. We have used RMIas the underlying message transport mechanism. Eachagent is designed as an object and extends its servicesto others through a set of public methods, which can beremotely invoked. The use of RMI considerably simpli-fies the communication architecture. Since the messagesthat can be interpreted by an agent is defined in its pub-lic interface, it is possible to validate most of the mes-sages at compile time, eliminating the need of a runtimeprotocol. A stronger semantics for the communication isestablished by the communication objects, since the in-terpretation of a communication object is largely builtinto the object itself. Besides, RMI eliminates the need fordevelopment of a communication subsystem, such as theKQML router interface library (KRIL).

In a proprietary system, where there are a handful ofagents, it is possible to define an interface for every agentindividually, and those interfaces to be globally known toevery other agent in the system. However, in an open-ended system where agents can be dynamically added,

3 University of Michigan: www.cs.umbc.edu/kqml4 France Telecom: www.arcol.asso.fr5 Foundation for Intelligent Physical Agents: www.fipa.org

242 H. Ghosh et al.: ImAge: an extensible agent-based architecture for image retrieval

there will be a large diversity of messages and the com-munication will break down, if every agent were allowedto define its own communication language, i.e., its ownpublic interface. To overcome this difficulty, we have de-veloped an ACL based on social commitments [21]. Inthis model, an agent-based system is viewed as a commu-nity of agents with some defined social roles. Every agentin the system participates in a problem-solving exercisein one (or more) of these roles. The communication be-tween any two agents in such system is guided by the rolesthey perform. With a finite and predefined number ofroles in the system, the diversity of communication needscan be contained without either limiting the number ofagents or imposing any restrictions on their internal de-sign. The definition of the messages in the system formsthe functional specifications for the roles, and any agentcan participate in the role by adhering to those specifica-tions. In the proposed architecture, the agents are classi-fied into a few broad functional categories as described inSect. 2. Each of these agent classes corresponds to a rolein the system required for solving a retrieval problem.A public communication interface, that includes a set ofmethods, is defined for every class of agents. Every agentin an agent class implements the corresponding publicinterface.

The standardization of the communication interfacedoes not restrict the diversity of operations of the agentsbelonging to an agent class. The methods defined in thepublic interfaces of an agent class exhibit polymorphism,since every agent in the class has the flexibility to im-plement the method in its own way. While some of thecommunication objects have a fixed interpretation in thesystem, many are defined in a flexible way with the rulesof interpretation encoded in those objects themselves, sothat they can be customized to the needs of some specificagents using a content language. The content language isfurther extended to a query language and to a capabil-ity description language, which are used to represent thequeries at various stages of refinement and the capabili-ties of the different agents in the system. These languagesare described in the following subsection.

4.1 Content language

In order to offer flexibility in agent communication, someof the message fields are loosely defined as generic ob-jects and no type checking is enforced on them at com-pile time. Examples of such fields include agent capabilitydescription, query specification, etc. These objects arepolymorphic in nature, i.e., they can be overloaded withdifferent types of information depending on the context.We encode these objects in a descriptive way similar toASN.1 [1] so that they may be unambiguously interpretedby the recipient agents. The complex data structures arebuilt recursively using fundamental data types, integer,float and String, and construction mechanisms, fields,repetition and choice. We have extended the repetition

construct to include logical operations AND, OR andNOT . Every data type is represented by a name. Eachdata is encapsulated in a class TypeV alue which is anordered pair of the type (name) and the value of the dataitem. A value field can contain an elementary value, fields,repetition, or another TypeValue specification.

The vocabulary of the content language is not re-stricted to a finite set but is left open-ended. Thus, it ispossible to extend the language to include new elemen-tary or complex data types which are relevant for somenew agents. As a result, it may not be possible for anagent to interpret every message fully. In the distributedarchitecture, where an agent solves a part of the problem,such a capability is not required. An agent interprets onlythat part of a message that is relevant for it, i.e., for whichit is designed. In fact, the selection of an agent team atany stage of solving a retrieval problem is based on the ca-pability of the agents to interpret the different parts of the(refined) query.

In the following sections, we use the notation t= v todenote a TypeV alue pair. A more complex type (a recur-sive buildup of TypeV alue pairs) is represented as t1 =(t2 = v) where any depth of nesting is allowed. The funda-mental data types are expressed directly, e.g., by “string1”instead of string =“string1”, or true for boolean= true.The absence of a value is represented by ∅ (null). Repe-tition is be denoted by [v1, v2, . . . ] and logical operationsby op[v1, v2, . . . ] where op denotes a logical operator. Forexample,

keyword :OR[“tiger”, “panther”, “leopard” ]

indicates a logical disjunction of the three keywords.Fields are represented by {f1, f2, . . . fn}, for examplequality : {cost, performance} indicates that the qualityinformation includes cost and performance fields.6 Com-ments are enclosed between /∗ · · ·∗/.

4.2 Query language

ImAge has been designed as an open-ended system, wheredifferent types of retrieval mechanisms may co-exist. Thismotivates the development of a generic query languagewhich is expressive enough to integrate a variety of queryobjects. At the same time, the language should not belimited by a static set of vocabulary but should be eas-ily extensible to yet unforeseen image features as well asconceptual query objects.

The query language is defined on top of the genericcontent language and a query object is defined as a repe-tition of smaller query objects. Each of these sub-queries,in turn, can be a repetition of even smaller query ob-jects or can be a choice of specifications, such as imagecategories, search specifications, performance specifica-tions, and time constraints. This choice can be extended

6 A TypeV alue pair could also be expressed as fields, but we usea different notation for its special significance.

H. Ghosh et al.: ImAge: an extensible agent-based architecture for image retrieval 243

to include other types of specifications, such as concep-tual specifications. Every specification object can be anelementary specification, or a logical combination of ele-mentary specifications. Every elementary specification isassociated with a score, indicating its importance in thequery.

As an example, consider a query to retrieve yellowflower from image category nature, with a precision levelof 0.8. A “flower” is specified as a blob with yellow colorand shape similar to that of any of a set of alternatesample blobs, sample1, ..., samplen. The query can be ex-pressed as:

query = {

category = (keyword= “nature”),

feature= (blob= [

{color = {255, 255, 0}/∗yellow ∗/,

weight= 0.4},

{shape= (blobData=

OR[sample1, ..., samplen]),

weight= 0.6}])

performance= [(precision= 0.8)]

}

Since the architecture is intended to be open-ended,the vocabulary of the query language is not restricted toa closed set of terminology. Thus, any agent in the systemmay not be able to interpret an entire query specifica-tion. An agent is designed to interpret only the part of thequery language required to perform its role successfullyin the system. For example, while the field category canbe interpreted by a CA, a feature with color attribute isinterpreted by an SA incorporating a histogram compari-son algorithm.

4.3 Agent Capability Description Language

In ImAge, the SCA forms a team of CAs and SAs forsolving a retrieval problem using a capability negotiationprocess. In order to form an agent team that can effec-tively solve a retrieval problem, the capabilities of theindividual agents need to be known. The Capability De-scription Language (CDL) allows the CAs and the SAs todeclare their capabilities. The language is a extension ofthe generic content language.

Wavelets

(with spatial relationships)Blobs

Image Data

ColorHistogram

Texture

Fig. 3. Data model for a Blobworld-like repository

The capability specification for an SA includes the dif-ferent types of input objects it can accept, the cost forprocessing each of the inputs (to extract the metadata ofinterest) and its performance. For the generic agents, theperformance is computed in terms of the axes of the per-ceptual space through a benchmarking process describedin Sect. 5. For special agents, their performance towardstheir specific functionality is declared by the designerof the algorithm. The capability description for a SA,that implements color matching through histograms, is asfollows:

capability = {

input = [

{data= bitmapImage,cost= 2.0},

{data= colorHistogram, cost= 0}],

performance= [

{attribute= color, score= 0.95}

{attribute= texture, score= 0.80}]

}

In the above example, the SA can accept eitherbitmap image data or color histogram data with costs 2and 0 (zero) units, respectively. Since the SA uses his-togram for image similarity computation, the featureextraction cost is zero when histograms are directly avail-able. The agent has the performance scores 0.95 towardscolor and 0.80 towards texture, as judged by a benchmarkprocess.

The CAs need to specify its query services and its dataservices, i.e., the image features used for indexing the im-ages, and the views to its data model that is available tothe external world. The CDL used in ImAge is similar tothe Resource Description Framework (RDF) [20] in se-mantics, but is based on our Generic Content Languagerather than XML. To illustrate the capability descriptionof the CAs, let us consider a repository like BlobWorld [4],where every image is segmented into blobs of relativelyuniform color and texture. Let us assume that the reposi-tory indexes the images based on the color, texture andthe relative position of the blobs, i.e., it is possible to sup-port a query like “Find the images comprising (at least)two blobs, blob1 and blob2, where blob1 and blob2 are simi-lar in color and texture to two given samples, and thatblob1 is left to blob2 in the image”. The data-model re-quired to support such a query is pictorially shown inFig. 3. Let us also assume that the repository providespublic access to the raw image data and to the blob repre-sentation of the images for the SAs to build upon, but thehistogram and texture data are private to the repository.The capability description of the CA is as follows:

capability = [

{data= “bitmapImage”, parent= ∅,

access= “public”, key = false}

{data= “blobImage”, parent= bitmapImage,

access= “public”, key = false},

{data= “colorHistogram”, parent= blobImage,

244 H. Ghosh et al.: ImAge: an extensible agent-based architecture for image retrieval

access= “private”, key = true}

{data= “gaborTexture” , parent= blobImage,

access= “private”, key = true}

]

In the above example, the parent field indicates therelationship between the different elements of the datamodel. The access field indicates whether the data itemis available for public access or is private to the repository.The key field indicates whether the images are indexed bythis data. It is assumed that instances of bitmapImage,blobImage, colorHistogram and gaborTexture (textureas determined using Gabor’s functions [23]) are encapsu-lated in objects with standard interfaces, and that thesestrings serve as the Uniform Resource Identifier (URI) forthose classes.

5 Search Agent characterization and selection

The planning process in ImAge relies on an optimal se-lection of the search agents. It is therefore necessary toevaluate the SAs against a common set of benchmarkdata. An agent may be benchmarked for many of its at-tributes. We have implemented a simple benchmarkingscheme for Generic SAs that operate on raw media data.

The content of an image is, in general, characterizedby several independent features [17] and the performanceof retrieval is determined by the quality of the SAs incomparing the images with respect to these features. Weuse the subjective judgment of a sample of users as thebasis for benchmarking the agents for their performance.We consider an orthogonal perceptual space comprisinga set of features, each of which has a definite meaning tothe human beings. It is possible for a person to judge thedegree of similarity between two images in terms of thesefeatures. We have chosen four orthogonal features to de-fine a perceptual space:

1. Color. It is judged by the color contents (RGB compo-nents and their combinations) of an image.

2. Texture. It is judged by the dominant textural patternof the images.

1 n

Imag

e

...

Imag

en

1

Image ... Image

Perceptual space

Col

or

Texture

Distinguish

ing feature

Overall shape

Reference similarity matrixFig. 4. Perceptual space and simi-larity matrix

3. Shape. Shape is judged by the contours of the objectscontained in the images.

4. Distinguishing feature. It is judged by other imagecharacteristics, which cannot be expressed as a com-bination of color, texture and shape. For example,two images may be considered similar if they containa common emblem though they are different in allother aspects. Similarly, the signature of an artist onhis paintings may be a distinguishing feature to iden-tify the artist.

We assume that the similarity value between two im-ages i1 and i2 in a perceptual dimension j can be repre-sented by a number simj(i1, i2), normalized in the range[0,1], where 1 represents maximum similarity and 0 repre-sents minimum similarity.

The similarity measure between two images in the per-ceptual space is expressed as a similarity vector, eachelement of which represents a similarity measure in a per-ceptual dimension. The benchmarking data comprisesa set of sample images and the similarity vectors betweenevery pair of images. The sample images are selected atrandom from a large collection on various themes, be-lieved to exhibit a spectrum of values in the perceptualdimensions. The similarity values of every pair of imagesin each of the perceptual dimensions are determined asthe average of the subjective judgments from a samplepopulation of users in a normalized scale [0,1]. The data isrepresented as a matrix, where every element correspondsto a pair of images, and comprises a similarity vector inthe perceptual space. This matrix is referred to as the ref-erence similarity matrix (Fig. 4).

When an SA m is registered with a registrar, the lat-ter runs it to evaluate the similarity vectors between theimage pairs in the benchmarking database. The similar-ity score awarded by the search agent, which is also inthe range [0,1], is then compared with the correspond-ing score in every perceptual dimension, wherever avail-able, and an error function εmji = smk−srji is computed.Here, smk represents the similarity score awarded by thesearch agent m to the i-th image pair, and srji representsthe similarity score in the reference matrix for the same

H. Ghosh et al.: ImAge: an extensible agent-based architecture for image retrieval 245

pair of images in the perceptual dimension j. The score ofthe agent m in a perceptual dimension is then given by

σmj = 1−

(1

κj

∑i

εmji2

) 12

,

where the similarity measure for the j-th dimension isavailable for κj pairs of images in the reference similar-ity matrix. The benchmark data bm for a search agentm comprises its scores in the dimensions of the percep-tual space and its average time of execution for comparinga pair of image, i.e., bm = 〈σm1, σ

m2, ..., σ

mn, τm〉, where

there are n dimensions in the perceptual space and τmrepresents the average time taken by an agent to comparean image pair. This processing time is measured undersome ideal conditions and is weighted by a penalty factordepending on the target execution environment.

In order to select the optimal search agent(s) in con-text of a query, we need a ranked list of SAs with respectto a requested search criterion. The different search cri-teria are interpreted as different points in the perceptualspace with different projections in the feature dimensions.For example, the criterion appearance maps to 0.5 in eachof the color and shape dimensions.

The score of a search agent m with respect to a crite-rion c is computed as σmc =

∑j p

cjσmj , where pcj is the

projection of the criterion c on feature j and σmj is thescore of the agent m with respect to the feature j as a re-sult of benchmarking. The agent with the highest value ofσmc is normally selected for retrieval.

While many pattern recognition algorithms are de-signed to deliver good performance for one or more fea-tures in this conceptual space, there could be some agentsthat encapsulate algorithms for specialized applications,for example, identifying the face of an important person-ality. Benchmarking on the perceptual space is insuffi-cient to capture these capabilities. To account for theseagents, we have defined a distinct class of SAs, namelythe special search agents. Rather than benchmarking, theRA records the feature(s) and score(s) of the agent asdeclared by the supplier of the agent using the capabil-ity description language as described in Sect. 4.3. Thesescores are processed in the same way as the benchmarkscores in the context of a retrieval.

6 Collection expertise

In principle, it is possible to analyze every image in everypossible repository online using a collaboration of CAsand SAs in response to a query. Such an endeavor willbe prohibitively costly on the internet, where millions ofimage documents exist. Besides, searching a general col-lection of images with a limited set of media features islikely to result in poor precision 7.

7 Try any feature-based image retrieval engine on the web, e.g.,QBIC, Webseek or Blobworld.

A CA encapsulates an image repository and providesa standard query interface if supported by an underly-ing data model of the repository. The query interfacecan be used to identify some document classes (subsetsof documents in a repository produced as a result ofa priori classification), where the query is more likely tobe satisfied. In the absence of any classification infor-mation, all images in the repository may be consideredto belong to a single image class. ImAge utilizes avail-able classification information to prune the search space.For example, consider a query like yellow flower, whichwould be very difficult to satisfy on a general collectionof images. However, a collection like WebSeek organizesits collection in several semantic categories, including“nature/flowers”. A search for a predominance of yellowcolor in this image class is likely not only to reduce com-putational overheads8 but also to yield more satisfactoryresults.

In general, the different repositories may classify thedocuments with different perspectives independently ofthe other system components. Therefore, the image cate-gory requested in a query may not directly map to one ormore categories of a repository. The different categoriesin the repository will rather have different degrees of sim-ilarity with the query specifications. The degree of simi-larity between a query and the available categories can bemeasured in many ways. We have implemented a vector-space based method. Every image class is associated witha set of keywords. Some relevant keywords are also asso-ciated with a query, either by the user or by some domainknowledge (see Sect. 7). The independently supplied key-words are mapped to some controlled vocabulary usinga thesaurus, so that a document class and a query rep-resent two points in a defined vector-space. A similarityvalue in the range [0,1] is computed as a function of theangles of the two vectors in the vector space. An imagecategory having some non-zero similarity value (whereat least one of the keywords matches) with respect toa query qualifies for participating in the retrieval process.Ideally, we need to select a subset of the qualifying imagecategories which satisfies the required recall value withminimum computational overheads for optimal retrieval.

We use the following heuristic method to find a subsetof qualifying image categories that satisfies the requestedrecall at a low computational cost. Let nj denote the totalnumber of images in a qualifying image category σj , andsj the similarity value of the category with respect toa query q. An estimate of the number of relevant imagesin σj is given by n̂j = sj×nj . Let {m1, ...mκj} be the setof SAs selected for query q. Let τk represent the standardcost of execution for an agent mk and pjk be the penaltyfunction for that agent for image category σj . The totalretrieval cost for σj in the context of the query is given bycj = nj×

∑k=1..κj

(τk×pjk).

8 The nature/flowers category of WebSeek contains 853 imagesagainst a total of 65000 image in the whole collection.

246 H. Ghosh et al.: ImAge: an extensible agent-based architecture for image retrieval

We define a figure of merit for an image category σj asfj = n̂j/cj , and prepare a ranked list of categories usingthis figure of merit. From the top of the list, we select theleast number of image categories so that the requestedrecall is satisfied. The estimated number of relevant docu-ments in a set of image categories Sκ = {σ1, ...σκ} is givenby

N̂κ =

∑j=1..κ

sj×nj∑j=1..κ

nj×Nκ ,

where Nκ is the total number of image documents in⋃j=1..κ σj . The estimated recall when Sκ is selected for

retrieval is given by rκ = N̂κ/N̂ , where the denominatordenotes the estimated number of relevant documents inthe set of all qualifying image categories.

7 Vertical extension: semantic interpretation ofquery

A user is usually interested in a semantic category ofimages, for example, images depicting medieval monu-ments or snow peaks, rather than low-level image fea-tures like color and texture. A concept is an abstractentity and cannot be directly “observed” in an image.However, it leads to some definite patterns in the imageforms, recognition of which leads to the belief in the pres-ence of the object, with some underlying assumptions.For example, a medieval monument can be identified inan image by recognizing a combination of media objectsrepresenting its domes, minaret and the facade, assum-ing that the image depicts a place of tourist interest(other assumptions are also possible). A combination ofmedia objects that can be used to identify a concept iscalled its observation model. The recognition of the me-dia objects often requires specialized pattern recognitionalgorithms, called the recognition functions. Experiencegained through many observations of the concept allowsthe association of a number of alternative observationmodels to a concept in a specific domain. A feature-basedretrieval system requires the user to specify a few imagefeatures, which is an oversimplification of an observationmodel and hence, rarely produces good results. Semanticclassification of an image requires the combination of evi-dences from a number of media features and some domainspecific assumptions.

Knowledge-assisted interpretation of image docu-ments has been attempted in specific domains [6, 25].However, the knowledge representations in these systemsare tightly coupled with the underlying data model ofthe repositories. In an “open” architecture, it is possibleto interpret a query by an independent agent to a setof alternative observation models, comprising some stan-dard, possibly qualified, image patterns. For example, themedia object “dome” can be represented as a set of al-

ternative blobs, each characterized by a specific sampleshape. Since the query refinement is undertaken by anindependent agent, it is not guaranteed that all the spec-ified image properties will be supported at any particularrepository. Once the observation models are available,the SCA negotiates with the other agents (see Sect. 3) toidentify teams of CAs and SAs, which can participate inthe retrieval.

In ImAge, a concept is interpreted by an autonomousagent, called a Thematic Agent (TA) having the requi-site domain knowledge. A TA is designed to operate ina closed domain of knowledge. It associates a finite num-ber of concepts in the domain with some property valuesthat can be observed in image documents. It capturesthe specialization and containment relationships betweenthe concepts that imply property inheritance. A TA cangenerate the observation model for a concept using its en-coded knowledge.

The process of query refinement by a TA results in theselection of a set of observation models from a specifiedconceptual query. A query intends to express one or moreconcepts using some descriptors (e.g., keywords). The de-scriptors are viewed as special observation models whichare observed in a query. The concepts and the observationmodels have a many-to-many relationship. Therefore, it isnot possible to uniquely identify a concept using a set ofdescriptors. We have modeled the relationship as a cause-effect relationship, where (the intention of) a conceptcan cause a descriptor in the query with some non-zeroprobability. Similarly, a concept can cause an observationmodel to materialize in an image. The probability valuesare associated to the cause-effect relationship with theexperience of numerous actual observations by a domainexpert. The TA constructs a belief network [16] with thestates of the root node representing the set of concepts inits knowledge domain, and the descriptors and the obser-vation models as the leaf nodes. Without any assumptionabout the user behavior, the states of the root node (con-cepts) are initialized with equal a priori probabilities. Theobservation of the descriptors in a query provides virtualevidences for the nodes. These evidences are propagatedin the belief network, and the set of observation modelsthat provides the most probable explanation to the obser-vation set are selected for retrieval.

The open architecture has several advantages for con-ceptual retrieval. The existing SAs can be reused inthe context of conceptual retrieval. Thus, the researcherof a semantic recognition algorithm can focus on theknowledge domain and does not have to worry aboutthe feature recognition algorithms. The separation ofthe semantic knowledge and the media knowledge al-lows mixed mode queries, where a concept can be qual-ified by one or more media features. For example, ina query like “white medieval monument”, the semanticentity “medieval monument” requires a knowledge-basedapproach. The media feature “white” can be detected bya histogram evaluation method. Thus, such a query can

H. Ghosh et al.: ImAge: an extensible agent-based architecture for image retrieval 247

be evaluated in a repository like the Blobworld9 with thesupport of suitable TAs and SAs.

Contemporary semantic databases are restricted toa specific knowledge domain. The open architecture ofImAge allows several semantic classification engines, eachspecializing in one (possibly overlapping) knowledge do-main, to co-exist in the system and to allow retrieval froma common set of repositories. New agents, encoding newdescriptions of knowledge domains, can be dynamicallyadded to the system. The knowledge of an individual TAmay be limited to a tiny domain and be hand-coded bysome domain experts. Many such agents collectively pro-vide a scalable and non-trivial knowledge base. This fea-ture is particularly useful when the same image can beinterpreted from multiple semantic perspectives, for ex-ample a photograph of a festival has sociological, cultural,as well as tourism-related implications.

The extension to the ImAge architecture for process-ing a conceptual query is depicted in Fig. 5. The UIAinteracts with a TA for query refinement before submit-

9 Assuming that it has been designed with an open architectureand can export its data model.

Interface

SearchCoordinator

ThematicAgent

Thematic

User

Agent

Registrar

...

Fig. 5. Extension of agent architecture for conceptual query

Fig. 6. A snow peak and the Lotus Temple: violation of closed domain assumption

ting the feature-based query to the SCA. The several TAsthat may co-exist in the system register themselves withthe RAs specifying their domain of expertise. The UIAselects one or more of the TAs for query refinement de-pending on the domain of the user’s interest.

The decomposition of a conceptual entity into alter-native observation models has another advantage. Evenif any particular observation model can be realized ata few repositories only, the retrieval can be supported ona larger set of repositories, since several alternatives exist.

We have extended the query language to incorporatespecification of the conceptual entities. The simplest rep-resentation of a concept in our system is through one ormore concept descriptors. We have used simple keywords,e.g., monument, as the concept descriptors. Visual de-scriptors are more expressive and can be incorporated atthe cost of added computational complexity. More com-plex concepts can be represented by associating conceptmodifiers with a concept descriptor. We envisage differenttypes of concept modifiers.

1. A concept descriptor may be modified by anotherto represent a specialization of the concept, for ex-ample, monument.medieval represents a special typeof monument.

2. A concept may also be specialized by associating someproperty attributes with it. For example, 〈monument,[color = white]〉. This construct represents a mixedmode query, where conceptual specifications are com-plemented with feature specifications.

3. A third type of concept modifier associates two con-cepts with a connective to indicate a set of new con-cepts, for example, of(hills, India) indicates a set ofmountain ranges, e.g., {Himalayas, Bindhyas, Nilgiri,... }.The concept specification can have a combination of

modifiers, for example, 〈monument.medieval, [color =white]〉. The concept specification in the query is comple-mented with a specification for the knowledge domain. It

248 H. Ghosh et al.: ImAge: an extensible agent-based architecture for image retrieval

is necessary, because the same descriptor can have differ-ent semantic connotations in different domains. For ex-ample, the keyword monument has different connotationsin the domains of architecture and tourism. The selectionof the TA is guided by this domain specification.

Content-based retrieval for conceptual queries arebased on some assumptions about the domain. For ex-ample, a white object of triangular shape can be inter-preted as a snow-peak assuming that the images beinganalyzed pertain to natural scenery. The existence ofan image like that of the Lotus-Temple violates its as-sumptions and will produce unsatisfactory results (seeFig. 6). The TA associates some descriptors (e.g., key-words) with a concept domain, which are used to selecta subset of image classes (see Sect. 6) for retrieval. Weimplicitly assume that the underlying assumptions forthe content-based assumption are satisfied in the selectedimage classes.

8 Implementation

We have implemented a prototype system to validate thearchitecture. Our emphasis has been on the developmentof a set of standard interfaces to achieve separation of theretrieval functionality and encapsulation of available re-trieval resources.

A shell for the CAs declares and exports the supportedretrieval capability and the data model of a repository.A small private collection in the tourism domain has beenencapsulated in this shell. The collection contains abouta 100 image documents. We have also encapsulated anavailable retrieval resource on the internet, namely Web-Seek, to demonstrate reuse capability. In the private col-lection, the images are referenced by some HTML docu-ments. The image file names and text in the HTML docu-ments serve as the annotations to the images. We havemanually categorized the image collections in a few se-mantic categories based on these annotations from theperspective of tourist interest and have associated a set ofkeywords as descriptors with each of the image categories.Some of the categories are as follows10:

architecture/fort (fort, quila ...)/temple (god, mandir ...)/tomb (chhatri, maqbara ...)

nature /beach (beach, shore ... )/flora (flower, green ... )/mountain (hill, snow ... )/waterfall (fall, jhora ... )/wildlife (fauna, tiger ...)

The CA caches the image feature extracted by an SA inthe context of a query and thus dynamically augments itsdata model (though such increments are not declared to

10 The keywords are enclosed in brackets. Some of the keywordsare synonyms in Indian languages.

the external world). The cached features are utilized inany future query requiring the same feature description.

In order to encapsulate WebSeek, we have downloadedits semantic classification scheme and encoded it with as-sociated thesaurus terms. In response to the keywordsin a query, one or more semantic categories are selectedusing the method described in Sect. 6. The CGI com-mands of WebSeek are emulated and the resultant HTMLfiles are parsed to find the URL of the images in a seman-tic category.

We have so far encapsulated five general purposeimage analysis algorithms as SAs. These algorithms in-clude color matching (using the RGB histogram inter-section method) [22], chromaticity (using the illumina-tion invariant YV histogram) [9], Funt and Finalyson’smethod [11], color distribution (correlogram) [13], andtexture matching (using Gabor filter) [23]. We have alsointegrated two special agents, employing Eigen-spacebased techniques [3, 24], one for recognition of faces andthe other for recognition of emblems and miniature flags.These two agents use the same code but different trainingsets.

A Benchmark Agent (BA) that can benchmark Ge-neric SAs has been implemented. These SAs operate onraw image data. The BA maintains a set of images andsome perceptual similarity values (see Sect. 5) for everypair of image in its collection. It schedules an SA on itsdataset and computes the normalization and perform-ance parameters for the SA.

A UIA implements Query By Example (QBE). A sam-ple image sample can be selected from a set maintainedby the UIA for that purpose. The user can also supply animage sample from a collection external to the system byproviding the URL. The other query parameters include:(a) a combination of search criteria (e.g., color, texture,appearance, etc.); (b) a few semantic keywords, which areused to select the image classes with the CA; and (c) per-formance criteria (desired recall and precision values).

We have implemented a prototype TA in the tourismdomain for demonstrating the capability of vertical ex-tension. The TA encodes a handful of concepts of touristinterest, e.g., historic relics, places with scenic beauties,etc. The concepts are hand-coded in this TA and are as-sociated to standard image properties and descriptors.The prototype can be extended using proper domainexpertise.

To support the distributed system design on hetero-geneous computing platforms, we considered the use ofJava and CORBA, so that we can focus more on sys-tem development rather than sorting out the networkingand communication issues. Implementations of Java andCORBA are available on most of the standard machines.While the Java distributed environment is language spe-cific, CORBA allows multi-language development. Useof multiple programming language and integration withrelational databases is, however, possible in Java archi-tecture JNI and JDBC. The mobile code for the SAs re-

H. Ghosh et al.: ImAge: an extensible agent-based architecture for image retrieval 249

quired platform independent code which is possible withJava. The Java IDL for CORBA was yet to be madeavailable when we started the project and public domainimplementation of CORBA was rather slow compared toJava RMI. Considering all these factors, we chose Java asour implementation language and RMI as the distributedcomputing infrastructure.

We have developed a mobile agent framework overthe Java platform that caters to the specific needs of theretrieval system. At the minimum, the mobile agents re-quire an environment that supports portability by hidingthe heterogeneity of the physical nodes and a facility foragent migration. The Java Virtual Machine Environment(JVM) provides for platform independent executablesand thereby achieves portability. The agent migration isimplemented as a layer over the JVM using a dynamicclass loader and remote method interface (RMI).

We have implemented the UIA with thin client ar-chitecture, where a back-end (server) module runs con-tinuously in the system and provides the various userand query management functionality. It has a persistentmemory to remember the history of usage of the sys-tem. The human interface is built through a small clientmodule. Multiple users can use the same server together.The server is implemented as a Java application, whilethe client is implemented as a Java applet. The appletis downloaded on a browser on the user’s machine whena user wants to interact with the system.

9 Conclusions

We have developed an open and extensible architecturefor retrieval where the retrieval logic and the data modelare decomposed into well-defined functional components.Each of the components implements a standard interface,and thus, can interact with each other. They are encapsu-lated into autonomous communicating agents. A collab-oration of these agents realize the retrieval capability ofthe system. The architecture proposes a communicationprotocol based on social role models of the agents. Thevarious communication needs of the agents are encodedusing a generic content language that rides over the proto-col. An existing retrieval resource can be encapsulated inan autonomous agent conforming to the communicationprotocol and be reused in the system. The componentoriented architecture leads to easy extensibility of the sys-tem. The use of mobile agents leads to economy in use ofthe network infrastructure.

A limitation of the architecture is that the reposi-tories that do not have a public interface compliant tothe defined protocol cannot participate in the retrieval.Even with a public interface, the participation of a reposi-tory depends on the query, the capability of the reposi-tory and the available public image analysis expertise.ImAge makes the best effort to make the independent sys-tem components interwork without a guaranteed success.

Whenever possible, retrieval takes place with optimal re-source utilization.

Currently, there are not enough standards for thedevelopment of reusable system components to supporta full-scale digital library. Some standardization effortsfor multimedia data representation have been initiated inthe MPEG-7 forum. We propose that some interface defi-nitions be included in a library that serves as the develop-ment kit for the system builders and that can be extendedwith new inclusions. De facto standards will emerge fromsuch activity.

Though a prototype system has been developed forimage retrieval, the architecture is quite general and canbe easily extended to other media forms and in generalto multimedia documents by the addition of appropri-ate knowledge-base and pattern-recognition agents. Mul-timedia retrieval has some specific advantage for concep-tual query. The domain knowledge can produce obser-vation models in alternative media forms, which can beexploited for retrieval. For example, a railway steam en-gine may be identified either by its body shape and smokecloud or by its characteristic whistle and huff-and-puff.We plan to incorporate textual and audio SAs into thesystem and build thematic agents that can produce obser-vation models in all these media forms. Another area ofresearch is the global planning and optimization policieswith multimedia search methods which will result in theoverall performance improvement of the system.

References

1. Recommendation X.680 (07/94): Information technology –abstract syntax notation one (asn.1): Specification of ba-sic notation. www.itu.int/itudoc/itut/rec/x/x500up/x680_27252.html, July 1994

2. Atkins, D.E., Birmingham, W.P., Durfee, E.H., Glover, E.J.,Mullen, T., Rundensteiner, E.A., Soloway, E., Vidal, J.M.,Wallace, R., Wellman, M.P.: Towards inquiry-based educa-tion through interacting software agents. IEEE Comput.29(5):69–76, 1996

3. Carey, S., Diamond, R.: From piecemeal to configurationalrepresentation of faces. Sci. 195:312–313, 1977

4. Carson, C., Belongie, S., Greenspan, H., Malik, J.: Blobworld:Image segmentation using expectation-maximization and itsapplication to image querying. IEEE Trans. Pattern Anal.Mach. Intell., submitted

5. Chang, S.-F., Smith, J.R., Beigi, M., Benitez, A.: Visual infor-mation retrieval from large distributed on-line repositories.Comm. ACM, 40(12), 1997. ftp://ftp.ee.columbia.edu/pub/CTR-Research/advent/public/ papers97/CACMdec97.pdf

6. Chu, W.W., Hsu, C.-C., Cardenas, A.F., Taira, R.K.: Know-ledge based image retrieval with spatial and temporal con-structs. IEEE Trans. Knowl. Data Eng. 10(6):872–888, 1998

7. Vries, A.P., de Blanken, H.M.: Database technology and themanagement of multimedia data in Mirror. In: Proc. SPIE,November 1998, pp. 443–455

8. Downing, T.B.: Java RMI: Remote Method Invocation. IDGBooks Worldwide, 1998

9. Drew, M., Wei, J., Li, Z.N.: Illumination invariant color objectrecognition via compressed chromaticity histograms of colorchannel normalized images. In: Proc. 6th Int. Conf. on Com-puter Vision, 1998, pp. 533–540

250 H. Ghosh et al.: ImAge: an extensible agent-based architecture for image retrieval

10. Flicker, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q.,Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele,D., Yanker, P.: Query by image and video content: The QBICsystem. IEEE Comp. 28(9):23–32, 1995

11. Funt, B., Finalyson, C.: Color constant color indexing. IEEETrans. Pattern Anal. Mach. Intell. 17(5):122–129, 1995

12. Griffioen, J., Yavatkar, R., Adams, R.: A framework for de-veloping content based retrieval systems. In: Maybury, M.T.(ed.). Intelligent Multimedia Information Retrieval, AAAIPress, 1997, pp. 295–311

13. Huang, J., Kumar, S.R., Mitra, M., Zhu, W.J., Zabih, R.:Image indexing using color correlogram. In: Proc. ComputerVision and Pattern Recognition, 1997, pp. 762–768

14. Jennings, N.: Controlling co-operative problem solving in in-dustrial multi-agent systems using joint intentions. Artif. In-tell. 75:195–240, 1995

15. Manmatha, R., Croft, W.B.: Word spotting: indexing hand-written manuscripts. In: Maybury, M.T. (ed.). Intelligent Mul-timedia Information Retrieval, AAAI Press, 1997, pp. 43–64

16. Neapolitan, R.E.: Probabilistic Reasoning in Expert Systems:Theory and Algorithms. Wiley, 1990

17. Ortega, M., Rul, Y., Chakrabarti, K., Porkaew, K., Mehro-tra, S., Huang, T.S.: Supporting ranked Boolean similarityqueries in MARS. IEEE Trans. Knowl. Data Eng. 10(6):905–925, November–December 1998

18. Ozkarahan, E.: Multimedia document retrieval. Inf. Process.Manage. 31:113–131, 1995

19. Paepcke, A., Cousins, S.B., Garcia-Molina, H., Hassan, S.W.,Ketchpel, S.P., Roscheisen, M., Winograd, T.: Using dis-tributed objects for digital library interoperability. IEEEComput. 29(5):61–68, 1996

20. Resource description framework (rdf) model and syntax spe-cification (rec-rdf-syntax-19990222).: www.w3.org/TR/REC-rdf-syntax/, February 1999

21. Singh, M.P.: Agent communication languages: Rethinking theprinciples. IEEE Comput. 31(12):40–47, 1998

22. Swain, M. Ballard, D.: Color indexing. Int. J. Comput. Vision7(1):11–32, 1991

23. Swapp, D.: Estimation of Visual Textual Gradient using Ga-bor Function. PhD thesis, University of Aberdeen, Aberdeen,UK. www.csd.abdn.ac.uk/publications/theses/swapp.html,1996

24. Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogni.Neurosci. 3(1):71–86, 1991

25. Yoshitaka, A., Kishida, S., Hirakawa, M., Ichikawa, T.: Know-ledge-assisted content-based retrieval for multimedia data-bases. IEEE Multimedia 1(4):12–21, 1994

26. Zhang, A., Chang, W., Sheikholeslami, G.: Netview: Integrat-ing large-scale distributed visual databases. IEEE Multimedia7(3):47–59, 1998