Supporting efficient multimedia database exploration

Download Supporting efficient multimedia database exploration

Post on 15-Jul-2016




2 download

Embed Size (px)


  • The VLDB Journal (2001) 9: 312326 / Digital Object Identier (DOI) 10.1007/s007780100040

    Supporting efcient multimedia database explorationWen-Syan Li, K. Selcuk Candan, Kyoji Hirata, Yoshinori HaraC&C Research Laboratories, NEC USA, 110 Rio Robles, M/S SJ100, San Jose, CA 95134, USA;E-mail: {wen,candan,hirata,hara}

    Edited by S. Christodoulakis. Received: 9 June 1998/ Accepted: 21 July 2000Published online: 4 May 2001 c Springer-Verlag 2001

    Abstract. Due to the fuzziness of query specication andme-dia matching, multimedia retrieval is conducted by way of ex-ploration. It is essential to provide feedback so that users canvisualize query reformulation alternatives and database con-tent distribution. Since media matching is an expensive task,another issue is how to efciently support exploration so thatthe system is not overloaded by perpetual query reformulation.In this paper, we present a uniform framework to represent sta-tistical information of both semantics and visual metadata forimages in the databases.We propose the concept of query veri-cation, which evaluates queries using statistics, and providesusers with feedback, including the strictness and reformula-tion alternatives of each query condition as well as estimatednumbers of matches. With query verication, the system in-creases the efciency of the multimedia database explorationfor both users and the system. Such statistical informationis also utilized to support progressive query processing andquery relaxation.

    Key words: Multimedia database Exploration Query re-laxation Progressive processing Selectivity statistics Hu-man computer interaction

    1 Introduction

    Query processing in multimedia databases is different fromquery processing in traditional database systems. Contentsstored in traditional database systems are generally preciseand, as a result, query processing answers are deterministic.On the other hand, in both document retrieval and image re-trieval, results are based on similarity calculations. In docu-ment retrieval, documents are represented as keyword lists. Toretrieve a document, information systems compare keywordsspecied by users with the documents keyword lists. Images,

    Correspondence to: K.S. Candan, Computer Science and Engineer-ing Department, College of Engineering and Applied Sciences, Ari-zona State University, Box 875406 Tempe, AZ 85287-5406, USA;E-mail:

    This work was performed when K.S. Candan visited NEC, CCRL.

    in a similar manner, are usually represented as media features.However, image matching is carried out through comparingthese feature vectors. Comparison of these three types of queryprocessing are illustrated at the top of Fig. 1.We see that mul-timedia database query processing is truly an integration ofthese three types of query processing.

    An image consists of three types of information repre-senting its contents: visual features, structural layout (spatialrelationships), and semantics of image objects, as shown at thebottom of Fig. 1. A multimedia database query requires simi-larity measures in all these aspects. For example, a query maybe posed as retrieve images in which there is a woman to theright of an object and the object is visually similar to the pro-vided image". This query results in a list of candidate imagesranked by their aggregated scores with degrees of uncertaintybased on all above three aspects. Thus, we view multime-dia database query processing as a combination of: (1) infor-mation retrieval notions described in [1] (exploratory query-ing, inexact match, query renement); and (2) ORDBMS orOODBMS database notions (recognition of specic concepts,variety of data types).

    In most multimedia applications, supporting partial matchcapability in contrast to supporting only exact match func-tionalities in relational DBMSs is essential and desirable.There are two major reasons:1. There may not be a reasonable number of images which

    match with the user query. Figure 2 (Query) shows theconceptual representation of a query. Figures 2ad showexamples of candidate images that may match this query.The numbers next to the objects in these candidate imagesdenote the similarity values for the object-level matching.The candidate image in Fig. 2a satises object matchingconditions but its layout does not match user specication.Figure 2b and d satisfy image layout conditions but objectsdo not perfectly match the specication. Figure 2c hasstructural and object matching with low scores.Note that in Fig. 2a, the spatial constraint, and in Fig. 2c,the image similarity constraint for the lake completely fail(i.e., the match is 0.0). Such candidate images in gen-eral would not be returned by SQL/DBMS-based imageretrieval systems. Query relaxation is needed to includesuch types of partially matched images.

  • W.-S. Li et al.: Supporting efcient multimedia database exploration 313

    Fig. 1. Multimedia databases: integration of tradi-tional query processing, IR, and image matching

    2. The users may not be able to specify queries preciselyor correctly due to the so-called word mismatch prob-lem" described and studied in the eld of informationretrieval[2]. Another reason for such misspecied imagequeries is that it is difcult for users to describe a color ora shape without ambiguity.

    Due to the large volume of data, the unstructured or semi-structured nature of information, and the vagueness in the con-cept of amatch, queryingmultimedia databases should be con-ducted in an exploratory fashion guided by system feedback.In other words, image retrieval should be performed throughcomputer human interaction (CHI). We dene a CHI explo-ration cycle as a user query followed by a sequence of queryreformulations guided by the system for honing target images.In the design of a multimedia database system, along with theoptimization of query processing, we concentrate on the im-provement of the speed of the image retrieval in a whole CHIexploration cycle.

    We view media retrieval as transitions of correspondingquery result space. By reformulating a query, the result spacethat a user sees is shifted from one to another until the targetresult space is reached.We believe that themedia retrieval pro-cess can be improved through well-designed computer humaninteraction. Li et al.[4] introduced the concept of system facil-itated exploration, in which users interact with themultimediadatabase system to reformulate queries. This approach is be-

    yond browsing which is supported by most existing systems.However, some drawbacks observed are that the system canonly provide feedback after query processing and feedback islimited to semantics-based query criteria.

    In this paper, we present a new approach to media retrievalby systematically facilitating users to reformulate queries inthe scope of the SEMCOG Multimedia Database System[3]developed at NEC C&C Research Laboratories in San Jose.SEMCOG provides query verication functionalities, whichpresents users with various types of feedback regarding thequery and the database content to assist users in exploringmultimedia databasesmore efciently. The feedback includes:(1) query reformulation alternatives for both semantics- andvisual characteristics-based query criteria; (2) the estimatednumber of matched images; and (3) database content distribu-tion related to query conditions. With this information, usersare facilitated to navigate" from the current result space to-ward the target result space, rather than ad hoc trial and erroron an image-by-image basis. The facilitation allows the usersto reformulate queries interactively based on system feedbackand query analysis.

    To support such facilitation efciently, the system utilizesmultimedia content statistics. Using content statistics for re-sult estimation and query optimization has been studied fordecades in the scope of traditional databases. However, to ex-tend it to multimedia contents is not an easy task. We develop

  • 314 W.-S. Li et al.: Supporting efcient multimedia database exploration

    Fig. 2. Query image and candi-dates of partial matches

    Fig. 3. Using multimedia statistics for facilitat-ing query and exploration

    a uniform framework for representing statistics for both tex-tual and media data. SEMCOG classies media objects intoclusters based on semantics and visual characteristics. Thenumber of images in each cluster is viewed as the selectivityfor such criteria, which can be color, shape, or object position.In Fig. 3, a user issues a query image and species one of theimage objects as Pro Golfer. Based on the selectivity values ofthemedia and semantics specied in the query, SEMCOGesti-mates the number ofmatching images and provides alternativeconditions, such as related semantic terms or similar colors,for the user to reformulate the query. Based on the indexingscheme, SEMCOG also supports automated query relaxationand progressive processing to generate the top ranked resultsincrementally.

    The rest of the paper is organized as follows: we rstpresent background information about our system. In Sects.3 and 4, we describe the schemes to compute the selectivityfor semantics and visual characteristics of multimedia con-tents. In Sect. 5, we present the usage of these indices andselectivity values for: (1) estimation of matching images; (2)automated query relaxation; and (3) progressive processing.We also present experimental results to estimate the efciencyof image retrieval by clustering. In Sect. 6, we review relatedwork. In Sect. 7 we offer our concluding remarks.

    2 Object-based modeling and query languageIn this section, we give a summary of the modeling and lan-guage design in SEMCOG. We only describe the material re-lated to the work presented in this paper. Additional informa-

    tion about the system architecture, language syntax, and userinterface can be found in [3].

    We view images as compound objects containing mul-tiple component objects and their spatial relationships. Thesemantic interpretation of an object is its real-world mean-ing. The visual interpretation of an object, on the other hand,is what it looks like based on the perception of humans orimage-matching engines. Our object extraction technique isbased on region-division. A region is dened as a homoge-neous (in terms of colors) and continuous segment identiedin an image. SEMCOG stores and retrieves atomic and com-pound objects. This enables the query verier to create nergranularity feedback: instead of returning feedback that en-compasses the whole image, the query verier can now returnfeedback that describes alternatives for semantics, visual con-tents, and spatial relationships of the constituent objects. Inaddition, the hierarchical structure of the image content al-lows the system to choose the level of feedback granularitydepending on the needs and inputs of the user. Furthermore,since the data model captures alternative semantic and visualrepresentations and the corresponding condence (or match-ing) values, the system can manipulate the objects for queryreformulation very naturally.

    When an image is registered in SEMCOG, content-inde-pendent metadata, such as size, format, etc., are automaticallyextracted,while some content-dependentmetadata, such as se-mantics of images, cannot be extracted automatically. Figure 4illustrates media object extraction and semantics specicationin our system. On the left of the gure there is a set of images.Regions in these images are extracted based on color homo-

  • W.-S. Li et al.: Supporting efcient multimedia database exploration 315

    Fig. 4. Region and semantics specication

    geneities and segment continuities. The region extraction re-sults are shown to the user for verication. In this example, forthe second and third images, the user overrides the systemsrecommendation and species proper region segmentations.After conrmation or modication of the segments, the userspecies the semantics of each region using the tool shown inFig. 5. The segmented regions and the corresponding seman-tics and visual characteristics are then stored in the database.Note that the number of the regions identied depends on theimage content and the level of detail the user requests. In SEM-COG, we set a constraint that the number of regions for onelevel of the hierarchy must be less than or equal to 8.

    Like the data model used in SEMCOG, the Cognition andSemantics-based Query Language, CSQL, used for image re-trieval by SEMCOG, is inherently suitable for query refor-mulation. CSQL consists of predicates which correspond todifferent levels of query strictness:

    Semantics-based selection predicates.The following three predicates can be used to describe dif-ferent levels of semantic relationships: is: The is predicate returns true if and only if both argu-ments have identical semantics (man versus man).

    is a: The is a predicate returns true if and only if thesecond argument is a generalization of the rst one (caris a transportation).

    s like: The s like (i.e., semantically like) predicate returnstrue if and only if both arguments are semantically similar(man versus male).

    Cognition-based selection predicate: Thecognition-based predicate introduced in CSQL is i like (i.e.,image like). In fact, i like is a combination of a group offeature-specic predicates, such as i like hist andi like shape. If the user species i like, then SEMCOG usesall possible features and returns a single, merged similarityvalue. The corresponding weights of the features are as-sumed to be set by the user earlier. Alternatively, users canspecify a conjunction of feature-specic predicates along

    with the corresponding weights. SEMCOG uses only thespecied features along with the specied weights to iden-tify the nal similarity value.

    Spatial relationship-based selectionpredicate: CSQL supports the following spatial predi-cates: above, and below, to the right of, and to the left of.By denition, the s like and i like predicates act as implicit

    query relaxation tools: they let non-exactmatches in the result.Furthermore, the query verier can explicitly relax the queryby changing the constant terms in a given query, as done bymany relaxation systems, as well as by changing semantic,visual, and spatial predicates. The complete syntax denitionsof the CSQL query language are described in [3].

    3 Semantics indexing and selectivity construction

    As is the case in most database systems, SEMCOG period-ically collects database statistics for query optimization pur-poses. SEMCOGalsouses these statistics to recommendqueryreformulation alternatives to users, to estimate the numbers ofmatches, and to automate query relaxation. In this section,we describe the indexing schemes for semantics selectivityvalues.

    3.1 Semantics selectivity statistics

    In SEMCOG, semantics are indexed within a hierarchicalstructure as shown in Fig. 6. This structure benets from therelationships between different predicates and their resultsin improving the utilization of database statistics. Figure 6shows a hierarchical structure fo...