one size does not fit all: toward user- and query-dependent ranking for web databases

One Size Does Not Fit All: Toward User- andQuery-Dependent Ranking for Web Databases

Aditya Telang, Chengkai Li, and Sharma Chakravarthy

Abstract—With the emergence of the deep web, searching web databases in domains such as vehicles, real estate, etc., has become

a routine task. One of the problems in this context is ranking the results of a user query. Earlier approaches for addressing this problem

have used frequencies of database values, query logs, and user profiles. A common thread in most of these approaches is that ranking

is done in a user- and/or query-independent manner. This paper proposes a novel query- and user-dependent approach for ranking

query results in web databases. We present a ranking model, based on two complementary notions of user and query similarity, to

derive a ranking function for a given user query. This function is acquired from a sparse workload comprising of several such ranking

functions derived for various user-query pairs. The model is based on the intuition that similar users display comparable ranking

preferences over the results of similar queries. We define these similarities formally in alternative ways and discuss their effectiveness

analytically and experimentally over two distinct web databases.

Index Terms—Automated ranking, web databases, user similarity, query similarity, workload.

Ç

1 INTRODUCTION

THE emergence of the deep web [7], [9] has led to theproliferation of a large number of web databases for a

variety of applications (e.g., airline reservations, vehiclesearch, real estate scouting). These databases are typicallysearched by formulating query conditions on their schemaattributes. When the number of results returned is large, itis time consuming to browse and choose the most usefulanswer(s) for further investigation. Currently, web data-bases simplify this task by displaying query results sortedon the values of a single attribute (e.g., Price, Mileage, etc.).However, most web users would prefer an orderingderived using multiple attribute values, which would becloser to their expectation.

Consider Google Base’s [15] Vehicle database thatcomprises of a table with attributes Make, Price, Mileage,Location, Color, etc., where each tuple represents a vehiclefor sale. We use the following two scenarios as ourrunning examples.

Example 1. Two users—a company executive (U1) and astudent (U2), seek answers to the same query (Q1):“Make ¼ Honda AND Location ¼ Dallas ;TX,” for whichmore than 18,000 tuples are typically returned inresponse. Intuitively, U1 would typically search fornew vehicles with specific color choices (e.g., only redcolored vehicles), and hence would prefer vehicles with“Condition ¼ New AND Color ¼ Red” to be ranked anddisplayed higher than the others. In contrast, U2 wouldmost likely search for old vehicles priced under a

specific amount (e.g., “Price < 5;000$”); hence, for U2,vehicles with “Condition ¼ Old AND Price < 5;000$”should be displayed before the rest.

Example 2. The same student user (U2) moves to Google foran internship and asks a different query (say Q4):“Make ¼ Pontiac AND Location ¼ Mountain View; CA.”We can presume (since he has procured an internship)that he may be willing to pay a slightly higher price for alesser mileage vehicle (e.g., “Mileage < 100;000”), andhence would prefer vehicles with “Condition ¼ Old ANDMileage < 100;000” to be ranked higher than others.

Example 1 illustrates that different web users may havecontrasting ranking preferences toward the results of thesame query. Example 2 emphasizes that the same user maydisplay different ranking preferences for the results ofdifferent queries. Thus, it is evident that in the context ofweb databases, where a large set of queries given by variedclasses of users is involved, the corresponding resultsshould be ranked in a user- and query-dependent manner.

The current sorting-based mechanisms used by webdatabases do not perform such ranking. While someextensions to SQL allow manual specification of attributeweights [30], [21], [23], [33], this approach is cumbersomefor most web users. Automated ranking of database resultshas been studied in the context of relational databases, andalthough a number of techniques [10], [11], [27], [34]perform a query-dependent ranking, they do not differentiatebetween users and hence, provide a single ranking order fora given query across all users. In contrast, techniques forbuilding extensive user profiles [20] as well as requiringusers to order data tuples [18], proposed for user-dependentranking, do not distinguish between queries and provide asingle ranking order for any query given by the same user.Recommendation (i.e., collaborative [17], [5], [8] and contentfiltering [4], [6], [14]) as well as information retrievalsystems use the notions of user- and object/item-similarity

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 9, SEPTEMBER 2012 1671

. The authors are with the Department of Computer Science and Engineering,University of Texas at Arlington, Arlington, TX 76019-0015.E-mail: [email protected], {cli, sharma}@cse.uta.edu.

Manuscript received 15 Apr. 2010; revised 28 Sept. 2010; accepted 16 Dec.2010; published online 7 Feb. 2011.Recommended for acceptance by Q. Ling.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TKDE-2010-04-0221.Digital Object Identifier no. 10.1109/TKDE.2011.36.

1041-4347/12/$31.00 � 2012 IEEE Published by the IEEE Computer Society

for recommending objects to users. Although our work isinspired by this idea, there are differences that prevent itsdirect applicability to database ranking (elaborated inSection 2).

In this paper, we propose a user- and query-dependentapproach for ranking the results of web database queries.For a query Qj given by a user Ui, a relevant rankingfunction (F xy) is identified from a workload of rankingfunctions (inferred for a number of user-query pairs), torank Qj’s results. The choice of an appropriate function isbased on a novel similarity-based ranking model proposed inthe paper. The intuition behind our approach is: 1) for theresults of a given query, similar users display comparableranking preferences, and 2) a user displays analogousranking preferences over results of similar queries.

We decompose the notion of similarity into: 1) querysimilarity, and 2) user similarity. While the former isestimated using either of the proposed metrics—query-condition or query-result, the latter is calculated by compar-ing individual ranking functions over a set of commonqueries between users. Although each model can be appliedindependently, we also propose a unified model todetermine an improved ranking order. The ranking func-tion used in our framework is a linear weighted-sum functioncomprising of: 1) attribute-weights denoting the significanceof individual attributes and 2) value-weights representing theimportance of attribute values.

In order to make our approach practically useful, aminimal workload is important. One way to acquire such aworkload is to adapt relevance feedback techniques [16]used in document retrieval systems. However (as elabo-rated in Section 6), there exist several challenges in applyingthese techniques to web databases directly. Although thefocus of this paper is on the usage, instead of the acquisitionof such workloads, we discuss and compare some potentialapproaches (in Section 6) for establishing such workloadsand elaborate on a learning method for deriving individualranking functions.

Contributions. The contributions of this paper are:

1. We propose a user- and query-dependent approach forranking query results of web databases.

2. We develop a ranking model, based on two comple-mentary measures of query similarity and user similar-ity, to derive functions from a workload containingranking functions for several user-query pairs.

3. We present experimental results over two webdatabases supported by Google Base to validateour approach in terms of efficiency as well as qualityfor real-world use.

4. We present a discussion on the approaches foracquiring/generating a workload, and propose alearning method for the same with experimentalresults.

Roadmap. Section 2 discusses the related work andSection 3 formally defines the ranking problem. Sections 4and 5 explain the similarity-based ranking model and discussour experimental results. In Section 6, we highlight thechallenges in generating appropriate workloads and pre-sent a learning method with preliminary results for deriving aranking function. Section 7 concludes the paper.

2 RELATED WORK

Although there was no notion of ranking in traditionaldatabases, it has existed in the context of informationretrieval for quite some time. With the advent of the web,ranking gained prominence due to the volume of informa-tion being searched/browsed. Currently, ranking hasbecome ubiquitous and is used in document retrievalsystems, recommended systems, web search/browsing,and traditional databases as well. Below, we relate oureffort to earlier work in these areas.

Ranking in recommendation systems. Given the notionof user- and query-similarity, it appears that our proposal issimilar to the techniques of collaborative [17], [5], [8] andcontent filtering [4], [6], [14] used in recommendationsystems. However, there are some important differences(between ranking tuples for database queries versusrecommending items in a specific order) that distinguishour work. For instance, each cell in the user-item matrix ofrecommendation systems represents a single scalar valuethat indicates the rating/preference of a particular usertoward a specific item. Similarly, in the context ofrecommendations for social tagging [2], [24], [35], each cellin the corresponding user-URL/item-tag matrix indicates thepresence or absence of a tag provided by a user for a givenURL/item. In contrast, each cell in the user-query matrix(used for database ranking) contains an ordered set oftuples (represented by a ranking function). Further,although the rating/relevance given to each tuple (in theresults of a given query) by a user can be considered to besimilar to a rating given for an item in recommendationsystems, if the same tuple occurs in the results of distinctqueries, it may receive different ratings from the same user.This aspect of the same item receiving varied ratings by thesame user in different contexts is not addressed by currentrecommendations systems to the best of our knowledge.

Another important distinction that sets our work apartfrom recommendation systems is the notion of similarity. Incontent filtering, the similarity between items is establishedeither using a domain expert, or user profiles [14], or by usinga feature recognition algorithm [4] over the different featuresof an item (e.g., author and publisher of a book, director andactor in a movie, etc.). In contrast, since our frameworkrequires establishing similarity between actual SQL queries(instead of simple keyword queries), the direct application ofthese techniques does not seem to be appropriate. To the bestof our knowledge, a model for establishing similaritybetween database queries (expressed in SQL) has notreceived attention. In addition, a user profile is unlikely toreveal the kind of queries a user might be interested in.Further, since we assume that the same user may havedifferent preferences for different queries, capturing thisinformation via profiles will not be a suitable alternative.

The notion of user similarity used in our framework isidentical to the one adopted in collaborative filtering;however, the technique used for determining this similarityis different. In collaborative filtering, users are comparedbased on the ratings given to individual items (i.e., if twousers have given a positive/negative rating for the sameitems, then the two users are similar). In the context ofdatabase ranking, we propose a rigorous definition of user

1672 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 9, SEPTEMBER 2012

similarity based on the similarity between their respectiveranking functions, and hence ranked orders. Furthermore,this work extends user-personalization using context in-formation based on user and query similarity instead ofstatic profiles and data analysis.

Ranking in databases. Although ranking query resultsfor relational and web databases has received significantattention over the past years, simultaneous support for anautomated user- and query-dependent ranking has not beenaddressed in this context. For instance, Chaudhuri et al.[10] address the problem of query-dependent ranking byadapting the vector model from information retrieval,whereas Chaudhuri et al. [11], Su et al. [27] do the sameby adapting the probabilistic model. However, for a givenquery, these techniques provide the same ordering oftuples across all users.

Employing user personalizations by considering thecontext and profiles of users for user-dependent ranking indatabases has been proposed in [20]. Similarly, the workproposed in [18] requires the user to specify an orderingacross the database tuples, without posing any specificquery, from which a global ordering is obtained for eachuser. A drawback in all these works is that they do notconsider that the same user may have varied rankingpreferences for different queries.

The closest form of query- and user-dependent ranking inrelational databases involves manual specification of theranking function/preferences as part of SQL queries [30],[21], [23], [33]. However, this technique is unsuitable forweb users who are not proficient with query languages andranking functions. In contrast, our framework provides anautomated query- as well as user-dependent ranking solutionwithout requiring users to possess knowledge about querylanguages, data models, and ranking mechanisms.

Ranking in information retrieval. Ranking has beenextensively investigated in the domain of informationretrieval. The cosine-similarity metric [3] is very successfulin practice, and we employ its variant [1] for establishingsimilarities between attribute-value pairs as well as queryresults in our framework. The problem of integrating theinformation retrieval system and database systems havebeen attempted [13] with a view to apply the rankingmodels (devised for the former) to the latter; however, theintrinsic difference between their underlying models is amajor problem.

Relevance feedback. Inferring a ranking function byanalyzing the user’s interaction with the query resultsoriginates from the concepts of relevance feedback [16],[25], [31], [32], [26], [22] in the domain of document andimage retrieval systems. However, the direct application ofeither explicit or implicit feedback mechanisms for infer-ring database ranking functions has several challenges(Section 6.2), and to the best of out knowledge have notbeen addressed in literature.

3 PROBLEM DEFINITION AND ARCHITECTURE

3.1 Problem Definition

Consider a web database table D over a set of M attributes,A ¼ fA1; A2; . . . ; AMg. A user Ui asks a query Qj of the form:SELECT * FROMDWHEREA1 ¼ a1 AND � � � ANDAs ¼ as,where each Ai 2 A and ai is a value in its domain. Let Nj ¼ft1; t2; . . . ; tng be the set of result tuples for Qj, and W be aworkload of ranking functions derived across several user-query pairs (refer to Tables 1 and 2 for an example).

The ranking problem can be stated as: “For the query Qj

given by the user Ui, determine aranking function FUiQjfrom

W.” Given the scale of web users and the large number ofqueries that can be posed on D, W will not possess afunction for every user-query pair; hence, the need for asimilarity-based method to find an acceptable function(FUxQy

) in place of the missing FUiQj. The ranking problem,

thus, can be split into:

1. Identifying a ranking function using the similarity model:Given W, determine a user Ux similar to Ui and aquery Qy similar to Qj such that the function FUxQy

exists in W.2. Generating a workload of ranking functions: Given a

user Ux asking query Qy, based on Ux’s preferencestoward Qy’s results, determine, explicitly or impli-citly, a ranking function FUxQy

. W is then establishedas a collection of such ranking functions learned overdifferent user-query pairs.

The above description refers to point queries withconjunctive conditions. However, queries may containrange/IN conditions and several Boolean operators (AND,OR, NOT). However, our focus is on the problem of pointqueries over a single table. Extensions are being explored asfuture work.

3.2 Ranking Architecture

The Similarity model (shown in Fig. 1) forms the corecomponent of our ranking framework. When the user Uiposes the query Qj, the query-similarity model determinesthe set of queries (fQj;Q1; Q2; . . . ; Qpg) most similar to Qj.

TELANG ET AL.: ONE SIZE DOES NOT FIT ALL: TOWARD USER- AND QUERY-DEPENDENT RANKING FOR WEB DATABASES 1673

TABLE 1Sample Workload-A

TABLE 2Sample Workload-B

Fig. 1. Similarity model for ranking.

Likewise, the user-similarity model determines the set ofusers (fUi; U1; U2; . . . ; Urg) most similar to Ui. Using thesetwo ordered sets of similar queries and users, it searches theworkload to identify the function FUxQy such that thecombination of Ux and Qy is most similar to Ui and Qj.FUxQy is then used to rank Qj’s results for Ui.

The workload used in our framework comprises ofranking functions for several user-query pairs. Fig. 2 showsthe high level view of deriving an individual rankingfunction for a user-query pair (Ux, Qy). By analyzing Ux’spreferences (in terms of a selected set of tuples (Rxy)) overthe results (Ny), an approximate ranking function (FUxQy)can be derived.

As our ranking function is of the linear weighted-sumtype, it is important that the mechanism used for derivingthis function captures the: 1) significance associated with theuser to each attribute, i.e., an attribute-weight and 2) user’semphasis on individual values of an attribute, i.e., a value-weight. These weights can then be integrated into a rankingfunctionF xy to assign a tuple score to every tuple t inNy using

tuple scoreðtÞ ¼Xmi¼1

wi � vi; ð1Þ

where wi represents the attribute-weight of Ai and virepresents the value-weight for Ai’s value in tuple t.

The workload W is populated using such rankingfunctions. Tables 1 and 2 show two instances of theworkload (represented in the form of a matrix of users andqueries). Cell [x,y] in the workload, if defined, consists of theranking function F xy for the user-query pair Ux and Qy.

4 SIMILARITY MODEL FOR RANKING

The concept of similarity-based ranking is aimed at situa-tions when the ranking functions are known for a small (andhopefully representative) set of user-query pairs. At the timeof answering a query asked by a user, if no ranking functionis available for this user-query pair, the proposed query- anduser-similarity models can effectively identify a suitablefunction to rank the corresponding results.

4.1 Query Similarity

For the user U1 from Example 1, a ranking function doesnot exist for ranking Q1’s results (N1). From the sampleworkload-A1 shown in Table 1, ranking functions overqueries Q2, Q5, Q7 (shown in Table 3) have been derived;thus, forming U1’s workload. It would be useful to

analyze if any of F 12, F 15, or F 17 can be used for rankingQ1’s results for U1. However, from Example 2, we knowthat a user is likely to have displayed different rankingpreferences for different query results. Consequently, arandomly selected function from U1’s workload is notlikely to give a desirable ranking order over N1. On theother hand, the ranking functions are likely to becomparable for queries similar to each other.

We advance the hypothesis that if Q1 is most similar toquery Qy (in U1’s workload), U1 would display similarranking preferences over the results of both queries; thus,the ranking function (F 1y) derived for Qy can be used torank N1. Similar to recommendation systems, our frame-work can utilize an aggregate function, composed from thefunctions corresponding to the top-k most similar queries toQ1, to rank N1. Although the results of our experimentsshowed that an aggregate function works well for certainindividual instances of users asking particular queries, onaverage across all users asking a number of queries, usingan individual function proved better than an aggregatefunction. Hence, for the reminder of the section, we onlyconsider the most similar query (to Q1). We translate thisproposal of query similarity into a principled approach viatwo alternative models: 1) query-condition similarity, and2) query-result similarity.

4.1.1 Query-Condition Similarity

In this model, the similarity between two queries isdetermined by comparing the attribute values in thequery conditions. Consider Example 1 and the queries fromTable 3. Intuitively, “Honda” and “Toyota” are vehicleswith similar characteristics, i.e., they have similar prices,mileage ranges, and so on. In contrast, “Honda” is a verydifferent from “Lexus.” Similarly, “Dallas” and “Atlanta,”both being large metropolitan cities, are more similar toeach other than “Basin,” a small town.

From the above analysis, Q1 appears more similar to Q2

than Q5. In order to validate this intuitive similarity, weexamine the relationship between the different values foreach attribute in the query conditions. For this, we assumeindependence of schema attributes, since, availability ofappropriate knowledge of functional dependencies and/orattribute correlations is not assumed.

Definition. Given two queries Q and Q0, each with theconjunctive selection conditions, respectively, of the form“WHERE A1 ¼ a1 AND � � � AND Am ¼ am” and “WHEREA1 ¼ a01 AND � � � AND Am ¼ a0m” (where ai or a0i is “any”2

if Ai is not specified), the query-condition similarity between


Fig. 2. Workload generation for similarity-based ranking.

TABLE 3Input Query (Q1) and U1’s Workload

1. For web databases, although the workload matrix can be extremelylarge, it is very sparse as obtaining preferences for large number of user-query pairs is practically difficult. We have purposely shown a densematrix to make our model easily understandable.

2. The value “any” represents a union of all values for the domain of theparticular attribute. For example, a value of “any” for the Transmissionattribute retrieves cars with “manual” as well as “auto” transmission.

Q and Q0 is given as the conjunctive similarities between thevalues ai and a0i for every attribute Ai (2).

similarityðQ;Q0Þ ¼Ymi¼1

simðQ½Ai ¼ ai�; Q0½Ai ¼ a0i�Þ: ð2Þ

In order to determine the right-hand-side (RHS) for theabove equation, it is necessary to translate the intuitivesimilarity between values (e.g., “Honda” is more similar to“Toyota” than it is with “Lexus”) to a formal model. This isachieved by determining the similarity between databasestuples corresponding to point queries with these attributevalues. For instance, consider the values “Honda,” “Toyo-ta,” and “Lexus” for the attribute “Make.” The modelgenerates three distinct queries (QH , QT , and QL) with theconditions—“Make ¼ Honda,” “Make ¼ Toyota,” and“Make ¼ Lexus,” respectively, and obtains the individualsets of results NH , NT , and NL (shown3 in Tables 4, 5, and6). It can be observed that the tuples for “Toyota” and“Honda” display a high degree of similarity over multipleattributes as compared to the tuples for “Lexus” indicatingthat the former two attribute values are more similar toeach other than the latter. The similarity between each pairof query results (i.e., [NH , NT ], [NH , NL], [NT , NL]) is thentranslated as the similarity between the respective pairs ofattribute values.4

Formally, we define the similarity between any twovalues a1 and a2 for an attribute Ai as follows: Two queriesQa1

and Qa2with the respective selection conditions:

“WHERE Ai ¼ a1” and “WHERE Ai ¼ a2” are generated.Let Na1

and Na2be the set of results obtained from the

database for these two queries. The similarity between a1 and

a2 is then given as the similarity between Na1and Na2

, and isdetermined using the variant of the cosine-similarity model[1]). Given two tuples T ¼ <t1; t2; . . . ; tm> in Na1 and T 0 ¼<t01; t

02; . . . ; t0m> in Na2, the similarity between T and T 0 is

simðT; T 0Þ ¼Xmi¼1

simðti; t0iÞ; ð3Þ

where

simðti; t0iÞ ¼1 if ti ¼ t0i;0 if ti 6¼ t0i:

�ð4Þ

It is obvious that (4) will work improperly for numericalattributes where exact matches are difficult to find acrosstuple comparisons. In this paper, we assume that numericaldata have been discretized in the form of histograms (as donefor query processing) or other meaningful schemes (as doneby Google Base; shown for the values of “price” and“mileage” in Tables 4, 5, and 6). The existence of a (reason-able) discretization is needed for our model (instead of itsjustification which is beyond the scope of the paper).

Using (3), the similarity between the two sets Na1 andNa2 (which in turn, corresponds to the similarity betweenthe values a1 and a2) is estimated as the average pairwisesimilarity between the tuples in Na1 and Na2 (5).

simðNa1; Na2Þ ¼PjNa1j

i¼1

PjNa2jj¼1 simðTi; T 0jÞ

jNa1j � jNa2j: ð5Þ

These similarities between attribute values can then besubstituted into (2) to estimate query-condition similarity.

4.1.2 Query-Result Similarity

In this model, similarity between a pair of queries isevaluated as the similarity between the tuples in therespective query results. The intuition behind this modelis that if two queries are similar, the results are likely toexhibit greater similarity.

For the queries in Table 3, let the results shown in Tables 7,8, and 9, respectively, correspond to a sample set (again top-3results and five attributes displayed) for Q1, Q2, and Q5. Weobserve that there exists certain similarity between theresults of Q1 and Q2 for attributes such as “price” and“mileage” (and even “color” to a certain extent). In contrast,the results ofQ5 are substantially different; thus, allowing us


TABLE 4Sample Results (NH ) for Query “make ¼ Honda”

TABLE 5Sample Results (NT ) for Query “make ¼ Toyota”

TABLE 6Sample Results (NL) for Query “make ¼ Lexus”

TABLE 7Sample Results of Q1 from Table 3

3. In the interest of space, we have displayed only the top-3 query resultsreturned by Google Base. For the sake of readability, only five out of the11 attributes from the Vehicle database schema are shown.

4. The similarity between an attribute value (e.g., Honda) and the value“any” is estimated as the average similarity between Honda and every othervalue in the domain of the corresponding attribute.


to infer that Q1 is more similar to Q2 than Q5. Formally, wedefine query-result similarity below.

Definition. Given two queries Q and Q0, let N and N 0 be theirquery results. The query-result similarity between Q and Q0

is then computed as the similarity between the result sets Nand N 0, given by

similarityðQ;Q0Þ ¼ simðN;N 0Þ: ð6Þ

The similarity between the pair of results (N and N 0) isestimated using (3), (4), and (5).

4.1.3 Analysis of Query Similarity Models

The computation of similarity for the two modelsdiscussed above is summarized in Fig. 3. While thequery-condition similarity uses the conjunctive equivalencebetween individual attribute values, the query-result simi-larity performs a holistic comparison between the results.Below we discuss the accuracy and computational effi-ciency of the two models.

Accuracy. Intuitively, similarity between queries de-pends on the proximity between their respective queryconditions. For instance, “Honda” and “Toyota” are carswith a lot of common features which reflects in the tuplesrepresenting these values in the database, and hence,queries that search for these vehicles are more similar thanqueries asking for “Lexus.” In contrast, two queries mayreturn very similar results although their conditions couldbe quite dissimilar. For example, the following two querieson Google Base—“Make ¼ Mercedes AND Color ¼ Lilac,”and “Location ¼ Anaheim; CA AND Price > 35;000$,” endup returning exactly the same set of results. Although theseare very similar from the query-result similarity definition,the queries, in fact, are intuitively not similar. In general,similar queries are likely to generate similar results;however, the converse is not necessarily true. Hence, thequery-condition similarity model is expected to be moreaccurate and consistent than the query-result similarity modelwhich is also borne out by the experiments.

Computational efficiency. Both query similarities can becomputed by applying queries over the web database, i.e.,direct access to the data is not needed, which can be difficultfor web database such as Google Base (a collection ofmultiple databases). One alternative toward efficientlyimplementing these similarities would be to precomputethem and use the respective values at query time.5

In order to distinguish the two models, consider theworkload W and assume a schema of M attributes. Forthe sake of simplicity, let the domain of each attributehave n values. The query-condition model relies purely onthe pairwise similarities between attribute values, andhence can be independently precomputed (generatingM�n queries, one for each value of every attribute, andperforming M � n2 computations). At query-time, thesimilarity between input (Qi) and every query in W canbe estimated by a lookup of the attribute-value simila-rities. In contrast, the query-result model requires acomparison between the query results. Since an inputquery cannot be predicted, precomputation is not possible

(unless every possible query on the database can begenerated and compared—a combinatorial explosionscenario); hence, at query-time, a costly computation ofsimilarities between Qi with every query in W would berequired.

Thus, intuitively, the query-condition model is superior tothe query-result model. Also, computation of similarity bythe query-condition model is tractable as compared to thequery-result model, an observation substantiated by ourexperiments.

4.2 User Similarity

For Example 1, we need to determine a function (F 11) to rankthe results (N1) for U1. The workload-A (Table 1) shows thatfor users U2 and U3, functions F 21 and F 31 are available forQ1. It would be useful to determine if any of these functionscan be used to in place of F 11. However, we know fromExample 1 that different users may display different rankingpreferences toward the same query. Thus, to determine afunction, instead of randomly picking one from W, thatwould provide a reasonable ranking of Q1’s results for U1,we propose the notion of user similarity.

We put forward the hypothesis that if U1 is similar to anexisting user Ux, then, for the results of a given query (sayQ1), both users will show similar ranking preferences;therefore, Ux’s ranking function (F x1) can be used to rankQ1’s results for U1 as well. Again, as explained in Section 4.1,instead of using a single most similar user (to U1), ourframework can be extended to determine top-k set of mostsimilar users to establish user-similarity. However, like query-similarity, an aggregate ranking function did not providesignificant improvement in the ranking quality; hence, weonly consider the most similar user (to U1) in our discussion.

In order to translate the hypothesis of user-similarity intoa model, we need to understand how to compute similaritybetween a given pair of users. In this paper, we estimate itbased on the similarity of users’ individual rankingfunctions over different common queries in the workload.

Definition. Given two users Ui and Uj with the set of commonqueries—{Q1, Q2; . . . ; Qr},

6 for which ranking functions(fF i1, F i2; . . . ;F irg and fF j1, F j2; . . . ;F jrg) exist in W,the user similarity between Ui and Uj is expressed as theaverage similarity between their individual ranking functionsfor each query Qp (shown in (7)):

similarityðUi; UjÞ ¼P

r simðF ip;F jpÞr

: ð7Þ

In order to determine the right-hand-side of the (7), it isnecessary to quantify a measure that establishes the similaritybetween a given pair of ranking functions. We use theSpearman’s rank correlation coefficient (�) to compute similaritybetween the sets obtained by applying these rankingfunctions on the query results. We choose the Spearmancoefficient based on the observations [12] regarding itsusefulness, with respect to other metrics, in comparingranked lists.

Consider two functions F i1 and F j1 derived for a pair ofusers for the same query Q1 that has results N1. We apply


5. Precomputation assumes that the underlying data do not changesignificantly over time. Considering the size of these databases, smallpercent of changes are unlikely to affect the similarity computations;however it is possible to recompute the similarities periodically.

6. Without loss of generality, we assume {Q1, Q2; . . . ; Qr} are the commonqueries for Ui and Uj, although they can be any queries.

these two functions individually on N1 to obtain two rankedsets of results—NRi1

and NRj1. If the number of tuples in the

result sets isN , and di is the difference between the ranks ofthe same tuple (ti) in NRi1

and NRj1, then we express the

similarity between F i1 and F j1 as the Spearman’s rankcorrelation coefficient given by

simðFi1; Fj1Þ ¼ 1� 6 �PN

i¼1 d2i

N � ðN 2 � 1Þ: ð8Þ

In our method for estimating user similarity, we haveconsidered all the queries that are common to a given pair ofusers. This assumption forms one of our models for usersimilarity termed query-independent user similarity. However,it might be useful to estimate user similarity based on onlythose queries that are similar to the input query Q1. In otherwords, in this hypothesis, two users who may not be verysimilar to each other over the entire workload comprising ofsimilar and dissimilar queries, may in fact, be very similar toeach other over a smaller set of similar queries. We formalizethis hypothesis using two different models—1) clustered,and 2) top-K—for determining user similarity.

Before explaining these models, we would like to pointout that given a workload, if no function exists for anyquery for a user, estimating similarity between that userand any other user is not possible. Consequently, noranking is possible in such a scenario. For the rest of thediscussion, we assume that all users have at least oneranking function in the workload.

4.2.1 Query-Independent User Similarity

This model follows the simplest paradigm and estimates thesimilarity between a pair of users based on all the queriescommon to them. For instance, given workload-A in Table 1,this model determines the similarity between U1 and U2

using the ranking functions of Q2 and Q7. From the queriesin Table 3, let the query-similarity model indicate that Q2 ismost similar toQ1 whereasQ7 is least similar toQ1, and let usconsider the user-similarity results be as shown in Table 10.

This model will pick U3 as the most similar user to U1.However, if only Q2 (which is most similar to Q1) is used, U2

is more similar to U1. Based on our premise that similar usersdisplay similar ranking preferences over the results of similarqueries, it is reasonable to assume that employingF 21 to rankQ1’s results would lead to a better ranking order (fromU1’s viewpoint) than the one obtained using F 31. The failureto distinguish between queries is thus a potential drawbackof this model, which the following models aim to overcome.

4.2.2 Cluster-Based User Similarity

In order to meaningfully restrict the number of queries thatare similar to each other, one alternative is to cluster queriesin the workload based on query similarity. This can be doneusing a simple K-means clustering method [19]. Given anexisting workload of m queries (Q1; Q2; . . . ; Qm), each query

(Qj) is represented as a m-dimensional vector of the form<sj1; . . . ; sjm> where sjp represents the query-conditionsimilarity score between the queries Qj and Qp (by (2)).Using K-means, we cluster m queries into K clusters basedon a predefined K and number of iterations.

Consider Example 1 and the queries in Table 3. Assumingthe similarities specified in Section 4.2.1 (Q2 and Q7 aremost and least similar to Q1, respectively), for a value ofK ¼ 2, the simple K-means algorithm will generate twoclusters—C1 containing Q1 and Q2 (along with other similarqueries), and C2 containing Q7 (in addition to other queriesnot similar to Q1). We then estimate the similarity betweenU1 and every other user only for the cluster C1 (since itcontains queries most similar to the input query). Using thescenario from Table 10, U2 would be chosen as the mostsimilar user and F 21 would be used to rank the correspond-ing query results.

The above model assumes that ranking functions areavailable for reasonable number of queries in each cluster.However, as the workload is likely to be sparse for mostweb databases, it is possible that no ranking functions areavailable in the cluster most similar to the incoming query.For example, considering the workload-B in Table 2 andassuming a cluster C1 of queries Q1, Q2, Q3, and Q4, due tothe lack of ranking functions, no similarity can beestablished between U1 and other users. Consequently, thesimilarities would then be estimated in other clusters, thushampering the quality of the ranking achieved due todissimilarity between the input query and the queries in thecorresponding clusters.

A well-established drawback of using a cluster-basedalternative is the choice of K. In a web database, a smallvalue of K would lead to a large number of queries in everycluster, some of which may not be very similar to the rest,thus, affecting the overall user similarity. In contrast, a largevalue of K would generate clusters with few queries, and insuch cases, the probability that there exist no users with anyfunction in the cluster increases significantly.

4.2.3 Top-K User Similarity

Instead of finding a reasonable K for clustering, we proposea refinement, termed top-K user similarity. We propose threemeasures to determine top-K queries that are most similar toan input query (say Q1 from Example 1), and estimates thesimilarity between the user (U1) and every other user.

Strict top-K user similarity. Given an input query Q1

by U1, only the top-K most similar queries to Q1 areselected. However, the model does not check the presence(or absence) of ranking functions in the workload forthese K queries. For instance, based on assumption inSection 4.2.4, and using K ¼ 3, Q2, Q3, and Q4 are the



Fig. 3. Summarized view of the query-similarity models.

three queries most similar to Q1, and hence, would beselected by this model.

In the case of workload-A, similarity between U1 and U2 aswell as between U1 and U3 will be estimated using Q2.However, in the case of workload-B, similar to the problem inthe clustering alternative, there is no query commonbetween U1 and U2 (as well as U3). Consequently, similaritycannot be established and hence, no ranking is possible.

User-based top-K user similarity. In this model, wecalculate user similarity for a given query Q1 by U1, byselecting top-K most similar queries to Q1, each of which hasa ranking function for U1. Consequently for workload-A,using k ¼ 3, the queries Q2, Q5, and Q7 would be selected.Likewise, in the case of workload-B, this measure wouldselect Q3, Q5, and Q6 using the “top-3” selection. However,since there exist no function for users U2 and U3 (in workload-B) given these queries, no similarity can be determined, andconsequently, no ranking would be possible.

Workload-based top-K user similarity. In order toaddress the problems in previous two models, we proposea workload-based top-K model that provides the stability ofthe query-independent model (in terms of ensuring thatranking is always possible, assuming there is at least onenonempty cell in the workload for that user) and ensuresthat similarity between users can be computed in a query-dependent manner.

Given a Q1 by U1, the top-K most similar queries to Q1 areselected such that for each of these queries, there exists: 1) aranking function for U1 in the workload, and 2) a rankingfunction for at least one other user (Ui) in the workload.Considering k ¼ 3, this model will select Q2, Q5, and Q7 inthe case of workload-A and the queries Q7 and Q8 forworkload-B, and ensure a ranking of results in every case.

4.2.4 Summary of User Similarity Models

Intuitively, a query-dependent estimation of a user similarityis likely to yield better ranking as compared to the query-independent model. Also, barring the workload-based top-Kmodel, ranking may not always be possible in other query-dependent models. In order to overcome this, the frameworkcan always dynamically choose the top-K queries at runtimein the following order: strict, user-based, and workload-based.The strict top-k gives the best results as can be understoodintuitively and has been ascertained experimentally. Thealternatives extend the availability of ranking functions withreduced accuracy. Additionally, as our experiments show,the choice of K in top-K is not as critical as the selection of Kin K-means algorithm. Thus, the top-K model seems to be thebest alternative in a query-dependent environment.

We would also like to point out that since this frame-work is based on the notion of similarity, we could haveused the similarity notion used in recommended systemsthat is typically based on user profiles. However, ourpremise was that the profile-based systems provide static or

context-independent similarity (i.e., a users behavior doesnot change across all items/objects) whereas we believe thatranking of results (unlike recommending) varies substan-tially for the same user based on the query type (i.e., iscontext-dependent) and attributes specified (as elaboratedby Example 2 in Section 1). This work can be treated as ageneralization that uses information beyond profiles andseems appropriate for the web database context.

4.3 The Composite Similarity Model

In order to derive a user’s (Ui) ranking function for a query(Qj), we have proposed two independent approachesbased on user and query similarity. However, given thescale of web users and queries, and the sparseness of theworkload, applying only one model may not be the bestchoice at all times.

Considering Example 1 and Workload-B (Table 2), we wantto identify a ranking function to rank Q1’s results for U1.Using only the query-similarity model, F 13 will be selectedsinceQ3 is most similar toQ1. In contrast, applying only user-similarity model will yield F 21 as U2 is most similar to U1. Itwould be meaningful to rank these functions (F 13 andF 21) tochoose the most appropriate one. Furthermore, in a morepractical setting, the workload is likely to have a rankingfunction for a similar query (toQ1) derived for a similar user(to U1). For instance, the likelihood of F 22 existing in theworkload would be higher than the occurrence of either F 13

or F 21. Hence, it would be meaningful to combine the twomeasures into a single Similarity Model.

The goal of this composite model is to determine aranking function (F xy) derived for the most similar query(Qy) to Qj given by the most similar user (Ux) to Ui to rankQj’s results. The process for finding such an appropriateranking function is given by the Algorithm 1.

Algorithm 1. Deriving Ranking Functions from Workload

INPUT: Ui, Qj, Workload W (M queries, N users)

OUTPUT: Ranking Function F xy to be used for Ui, Qj

STEP ONE:

for p ¼ 1 to M do

%% Using Equation 2 %%

Calculate Query Condition Similarity (Qj, Qp)

end for

%% Based on descending order of similarity with Qj %%

Sort(Q1, Q2; . . .QM )

Select QKset i.e., top-K queries from the above sorted set

STEP TWO:

for r ¼ 1 to N do

%% Using Equation 7 %%

Calculate User Similarity (Ui, Ur) over QKset

end for

%% Based on descending order of similarity with Ui %%

Sort(U1, U2; . . .UN ) to yield UsetSTEP THREE:

for Each Qs 2 QKset do

for Each Ut 2 Uset do

Rank(Ut;QsÞ ¼Rank(Ut 2 Uset) þ Rank(Qs 2 QKset)

end for

end for

F xy ¼ Get-RankingFunction()


TABLE 10Drawbacks of Query-Independent

User Similarity

The input to the algorithm is a user (Ui) and a query (Qj)along with the workload matrix (W ) containing rankingfunctions. The algorithm begins by determining the query-condition similarity (STEP ONE) between Qj and every queryin the workload. It then sorts all these queries (indescending order) based on their similarity with Qj andselects the set (QKset) of the top-K most similar queries to Qj

that satisfy the conditions for the top-K user similarity model.Based on these selected queries, the algorithm determinesthe user-similarity (STEP TWO) between Ui and every userin the workload. All the users are then sorted (again, indescending order) based on their similarity to Ui. We thengenerate a list of all the user-query pairs (by combining theelements from the two sorted sets), and linearize these pairsby assigning a rank (which is the sum of query and usesimilarity ranks) to each pair (STEP THREE). For instance, ifUx and Qy occur as the xth and yth elements in therespective ordering with the input pair, the pair (Ux, Qy) areassigned an aggregate rank. In this case, a rank of “xþ y”will be assigned. The “Get-RankingFunction” method thenselects the pair (Ux, Qy) that has the lowest combined rankand contains a ranking function (F xy) in the workload.Then, in order to rank the results (Nj), the correspondingattribute weights and value weights obtained for F xy will beindividually applied to each tuple in Nj (using (1)), fromwhich a general ordering of all tuples will be achieved.

Algorithm 1 only displays a higher level view of findinga desired function using the Similarity Model. However, theprocess of estimating query similarities, user similarities aswell as matrix traversal can be costly with respect to time.Applying adequate indexing techniques where estimatingquery similarity can be reduced to a simple lookup ofsimilarities between attribute value-pairs, precomputationof user similarity to maintain an indexed list of similar usersfor every user, and maintaining appropriate data structuresto model the matrix traversal as a mere lookup operationare some of the preliminary approaches that we adopted toestablish a workable ranking framework. We discuss someof the preliminary results of efficiently using our frame-work in Section 5. Although there is great scope for devisingtechniques to make the system scalable and efficient, thefocus of this work was to propose techniques to establish agood ranking quality.

5 EXPERIMENTAL EVALUATION

We have evaluated each proposed model (query-similarityand user-similarity) in isolation, and then compared boththese models with the combined model for quality/accuracy.We also evaluated the efficiency of our ranking framework.

Ideally, we would have preferred to compare ourapproach against existing ranking schemes in databases.However, what has been addressed in literature is the useof exclusive profiles for user-based ranking (the techniquesfor the same do not distinguish between queries) or theanalysis of the database in terms of frequencies of attributevalues for query-dependent ranking (which does notdifferentiate between users). In the context of web databaseslike Google Base, the data are obtained on-the-fly from acollection of data sources; thus, obtaining the entiredatabase for determining the individual ranking functions,

for comparing with query-dependent ranking techniques, isdifficult. Even, if we obtain ranking functions for differentqueries, all users will see the same ranking order for a givenquery. Thus, comparing such static ordering of tuplesagainst our approach (that determines distinct ranking oftuples for each user and query separately) would not be ameaningful/fair comparison. Similarly, we felt that thecomparing static user profiles (that ignore the differentpreferences of the same user for different queries) to ourproposed definition of user similarity, for user-dependentranking will not be fair. Hence, we have tried to comparethe proposed user, query, and combined similarities toindicate the effectiveness of each model with respect to theother two models.

Further, the context of recommendation systems isdifferent from the one considered in this paper; hence, thedirect application of these techniques for comparing againstour framework was not feasible. We would also like topoint out that unlike information retrieval, there are nostandard benchmarks available and hence, we had to relyon controlled user studies for evaluating our framework.

5.1 Setup

We used two real web databases provided by Google Base.The first is a vehicle database comprising of eight catego-rical/discretized attributes (Make, Vehicle-Type, Mileage,Price, Color, etc.). Although Price and Mileage are numeric,they are discretized a priori by Google into meaningfulranges. In addition, for an every attribute, the domain ofvalues (e.g., “Chevrolet,” “Honda,” “Toyota,” “Volkswa-gen,” . . . for the Make attribute) is provided. The second isa real estate database with 12 categorical/discretizedattributes (Location, Price, House Area, Bedrooms, Bath-rooms, etc.). Google provides APIs for querying itsdatabases, and returns a maximum of 5,000 results forevery API query. Our experiments were performed on aWindows XP Machine with a 2.6 GHz Pentium 4 processorand 4 GB RAM. All algorithms were implemented in Java.

5.2 Workload Generation and User Studies

Our framework utilizes a workload comprising of users,queries, and ranking functions (derived over a reasonableset of user queries). Currently, user and query statistics arenot publicly available for any existing web databases.Furthermore, establishing a real workload of users andqueries for a web database would require significantsupport from portals supporting web databases such asYahoo or Google—a task beyond the scope of this paper.Hence, to experimentally validate the quality of ourframework, we had to rely on controlled user studies forgenerating the workload. For each database, we initiallygenerated a pool of 60 random queries (comprising ofconditions based on randomly selected attributes and theirvalues), and manually selected 20 representative queriesthat are likely to be formulated by real users. Tables 11 and12 show three such queries over each database.

We then conducted two separate surveys (one for eachdatabase) where every user was shown 20 queries (one-at-a-time) and asked to input, for each query, a ranking functionby assigning weights to the schema attributes (on a scale of1 to 10). For aiding the user in expressing these preferences,we also displayed the set of results returned for each query.


In reality, collecting functions explicitly from users forforming the workload is far from ideal; however, the focusof this paper was on using, instead of establishing, aworkload for performing similarity-based ranking.Although generating a larger set of queries would havebeen ideal for testing the framework, asking users tointeract with more than 20 queries would have beendifficult. We would also like to indicate that as is the casewith most user studies in the domain of databases,obtaining a large number of users and queries to participatein the corresponding survey is difficult and hence, thesenumbers are typically very small (as seen by the usersurveys conducted for database ranking in [11], [27], [20]).

Each explicit ranking provided by a user for aparticular query was then stored in the associated work-load W. The vehicle database survey was taken by55 users, whereas the real estate database survey wastaken by 75 users (graduate students and facultymembers) who provided ranking functions for all thequeries displayed to them. Thus, we generated a work-load of 1,100 ranking functions for the vehicle databaseand 1,500 functions for the real estate database.

It is evident that more the number of functions in theworkload, better will be the quality of ranking achieved(since more similar queries and users would be found forwhom functions exist in the workload). This hypothesis wasfurther validated when we achieved a better ranking qualitywhen 50 percent of the workload was filled (i.e., 50 percentof the functions were masked out) as compared to the oneachieved when the workload contains 25 and 10 percent ofthe total ranking function. However, for most webdatabases, the workloads would generally be very sparsesince obtaining ranking functions across a large number ofuser-query pairs would be practically difficult. In order toshow the effectiveness of our model for such scenarios, inthis paper, we present the results when ranking functionsexist for only 10 percent workload (i.e., 90 percent of thefunctions are masked out). Hence, for the rest of the section,we consider the workload for the vehicle database consistingonly 110 (10 percent of 1,100) ranking functions, and the realestate database comprising 150 ranking functions.

5.3 Quality Evaluation

Query similarity. Based on the two proposed models ofquery similarity (Sections 4.1.1 and 4.1.2), in the absence of a

function F ij for a user-query pair (Ui, Qj), the most similarquery (Qc and Qr using the query-condition and the query-result model, respectively) asked by Ui, for which a function(F ic and F ir, respectively) exists in the workload, is selectedand the corresponding function is used to rank Qj’s results.

We test the quality of both query similarity models asfollows: We rank Qj’s results (Nj) using F ic and F ir,respectively, and obtain two sets of ranked results (R0 andR00). We then use the original (masked) function F ij to rankNj and obtain the set (R). SinceR represents the true rankingorder provided by Ui forQj, we determine the quality of thismodel by computing the Spearman rank correlation coefficient(8) between R and R0, and between R and R00. If thecoefficients obtained are high (nearing 1.0), it validates ourhypothesis (that for similar queries, the same user displayssimilar ranking preferences). Furthermore, if the coefficientbetween R and R0 is greater than the one between R and R00,our understanding that query-condition model performsbetter than the query-result model is validated.

We performed the above process for each user askingevery query. Fig. 4 shows, for both databases, the averagequery-condition similarity (as well as the average query-resultsimilarity) obtained across every query. The horizontal axisrepresents the queries; whereas the vertical axis representsthe average value of the resulting Spearman coefficient. As thegraph shows, over both the domains, the query-conditionmodel outperforms the query-result model. The graphsindicate that the comparative loss of quality (highest valueof Spearman coefficient being 0.95 for query 5) is due to therestricted number of queries in the workload. Althoughfinding a similar query (for which a ranking function isavailable) for a workload comprising of 20 queries and only10 percent of ranking functions is difficult, the results arevery encouraging. Based on the results of these experi-ments, we believe that the query similarity model wouldperform at an acceptable level of quality even for large,sparse workloads.

We further tested this model for comparing the qualityproduced by applying an aggregate function (i.e., selectingthe top-K similar queries and combining their respectivefunctions) instead of using the function of the most similarquery. We varied the values of K from 2, 4, 5, and 10. For avalue of K ¼ 5, the query-condition-similarity model pro-duced an overall average value of 0.86 for the Spearmancoefficient (versus the 0.91 obtained if a single function of


TABLE 12Sample Queries from the Real Estate Database

TABLE 11Sample Queries from the Vehicle Database

Fig. 4. Ranking quality of query similarity models.

the most similar query is used) for the Vehicle database.Similarly, a value of 0.83 was obtained for the query-result-similarity model (versus the 0.86 obtained for a singlefunction). A similar trend was observed for the Real Estatedatabase as well. In the interest of space, we do not showthe graphs for these results. The reader is directed to [29] forthe complete set of detailed results of these experiments.

User similarity. We validate the user similarity model asfollows: Using the original function given by a user Ui forQj, we obtain a ranked set of results R. Then we determine,for—query-independent, clustered, and the three top-K models,the corresponding user Ul most similar to Ui having functionF lj in the workload. Using the corresponding function foreach case, we get the ranked set of results and the Spearmancoefficient between this set and R. In our experiments, weset a value of K ¼ 5 for the K-means algorithm. A smallervalue of K ¼ 2 was chosen for the top-K models as ourworkload (in the number of queries) is small.

Fig. 5 shows the average ranking quality individuallyachieved for the vehicle as well as real estate database, acrossall queries for all users taking the survey. Our results clearlyshow that the strict top-K model performs consistently betterthan the rest. However, as Fig. 5 shows, the strict top-K (aswell as the clustered and user top-K) fail to find functions forseveral queries (shown for a randomly selected user U27).Fig. 6 further confirms this fact by comparing the differentmodels, in terms of their ability to determine rankingfunctions, across the entire workload. In spite of this, theaccuracy of the strict top-K model is superior to all othermodels when a ranking function can be identified. Onlywhen a ranking function cannot be found, using strict top-Kdoes not make sense. We have proposed a suit of models(such as clustered, user top-k, workload-based top-k as well asquery-independent) precisely for this reason and together willcover all the scenarios encountered in a sparse workload toprovide a meaningful ranking function.

Similar to the Query Similarity model, using an aggregatefunction (i.e., derived by combining functions of top-Kmost similar users) did not provide any improvement in thequality of ranking than the one achieved by using a singlefunction of the most similar user. Given the lack of space,we do not provide the details of these results (can be foundin [29]).

Combined similarity. We evaluated the quality of thecombined similarity model (Algorithm 1). Fig. 7 show theaverage quality of the combined model for both databases.We observe that the composite model performs better thanthe individual models as the sparseness of the workloaddecreases (i.e., the number of ranking functions increases).This matches the intuition that with more ranking functionsin the workload, one is likely to find both individual andcomposite function with better similarity value.

For instance, in the vehicle database, the compositemodel achieved an average Spearman coefficient (across allusers asking all queries) of 0.89 versus the 0.85 achieved bythe user similarity model and 0.82 achieved by the querysimilarity model when 90 percent of the functions weremasked (i.e., out of the 5,000 results for the query, thecombined similarity model correctly ranked 4,450 tuplesversus the user and query similarity model that correctlyranked 4,250 and 41,000 tuples, respectively). However,when 99 percent of the functions were masked, thecomposite model achieved a 0.77 Spearman coefficientversus the 0.72 and 0.71 achieved by the user and querysimilarity models, respectively, (i.e., the combined modelcorrectly ranked 3,850 tuples, whereas the user and querysimilarity models correctly ranked 3,600 and 3,550 tuples,


Fig. 5. Ranking quality of user similarity models.

Fig. 6. User similarity: Ranking functions derived. Fig. 7. Ranking quality of a combined similarity model.

respectively). A similar trend is seen for the real estatedatabase as well. Furthermore, user similarity shows abetter quality than the query similarity. Since the averagesare presented, one reason for the user similarity to be betteris that in most cases a similar user is found due to the largernumber of users than queries in our experimental setting.

5.4 Efficiency Evaluation

The goal of this study was to determine whether ourframework can be incorporated into a real-world applica-tion. We generated a workload comprising of 1 millionqueries and 1 million users, and randomly masked outranking functions such that only 0.001percent of theworkload was filled. We then generated 20 additionalqueries and selected 50 random users from the workload.We measure the efficiency of our system in terms of theaverage time, taken across all users, to perform rankingover the results of these queries (using Algorithm 1).

If we use main memory for storing the workload and notuse any precomputation and indexing for retrieval, deter-mining similarities (STEPS ONE & TWO) are computationalbottlenecks as compared to the latter. In order to reduce thetime for estimating query similarities, we can precomputepairwise similarities between all values of every attribute inthe schema. Furthermore, in order to reduce the time tolookup every query in the workload and then evaluate itssimilarity with the input query, we use a value-basedhashing technique [11] to store all the queries in theworkload. Likewise, all users are stored using a similartechnique where the values corresponding to a user refer tovarious properties of the user profile.

Fig. 8 show the performance of our framework usingnaive (where all steps are done in main memory at querytime) and precomputed/hash-based approaches (whereSteps One and Two are precomputed and the values arestored using hashing) for both databases. As is expectedthe naive approach is an order of magnitude slower thanthe other approach. For the vehicle database, Google Baseprovides unranked results in approximately 1.5 seconds.Our hash-based precomputated system takes an average of2.84 seconds (including Google Base response time) to rankand return the results. Our system adds, on the average,1.3 seconds to return results using query- and user-dependent ranking. Likewise, for the real estate database,our system takes an average of 2.91 seconds (as comparedto 2 seconds for Google Base) to show a ranked set ofresults adding less than a second for the improved ranking.

Although our system adds approximately a second overGoogle to display the results, the user is getting acustomized, ranked result as compared to a “one size fitsall” ranking. We believe that if, the ranking produced is

desirable to the users, the extra overhead added would stillbe tolerable as compared to displaying unranked results.Finally, as we indicated earlier, the performance can beimproved further.

6 WORKLOAD OF RANKING FUNCTIONS

Our proposed framework uses a workload of rankingfunctions derived across several user-query pairs. Since aranking function symbolizes a users’ specific preferencestoward individual query results, obtaining such a function(especially for queries returning large number of results) isnot a trivial task in the context of web databases.Furthermore, since obtaining ranking functions from userson the web is difficult (given the effort and time the userneeds to spend in providing his/her preferences) determin-ing the exact set of ranking functions to be derived forestablishing the workload is important.

6.1 Alternatives for Obtaining a Ranking Function

Given a user U , a query Q, and its results N , the rankingfunction (FUQ) corresponds to the preference associatedwith each tuple (in N) by U . A straightforward alternativefor deriving FUQ would be to ask U to manually specify thepreferences as part of Q; a technique adapted in relationaldatabases [30], [21], [23], [33]. However, web users aretypically unaware of the query language, the data model aswell as the working of a ranking mechanism; thus,rendering this solution unsuitable for web databases.

Since web applications allow users to select results oftheir choice by an interaction (clicking, cursor placement,etc.) with the webpage, a solution to obtain FUQ would be toanalyze U’s interaction over N ; an approach similar torelevance feedback [16] in document retrieval systems.However, mandatorily asking U to iteratively mark a setof tuples as relevant (and nonrelevant), i.e., obtaining anexplicit feedback, may be difficult since web users aretypically reluctant to indulge in a lengthy interactions.Although providing various incentives to users (as done byroutine surveys conducted by portals such as Yahoo,Amazon, etc.) is possible, an explicit feedback approachappears impractical in web databases’ context.

In contrast, since most users voluntarily choose a fewtuples for further investigation (again by clicking, cursorplacement, etc.), applying an implicit feedback (where theseselected tuples can be analyzed without U’s knowledge) toobtain FUQ seems a more practical alternative. However,determining the number of tuples to be selected forgenerating an accurate function is difficult. This problem isfurther compounded by the fact that most users typicallyselect very few tuples from the displayed results. An


Fig. 8. Performance comparison: Naive versus hash-based precomputation.

alternative to this problem would be to display an appro-priately chosen sample, instead of the entire set of tuples, anddetermineFUQ based on the tuples selected from this sample(and extrapolate it over the entire set of results). Althoughdetermining an appropriate sample and validating thecorresponding function obtained is difficult, we believe thatthere is a scope to further investigate this problem and deriverelevant solutions. In this paper, we present a preliminarysolution, in the form of a learning model, to obtain therequisite function FUQ using implicit techniques.

6.1.1 Learning Model for Deducing a Ranking Function

For a query Q given by user U , we have at our disposal theset of query results N generated by the database. Let R(� N) be the set of tuples selected implicitly by U . For anattribute Ai, the relationship between its values in N and Rcan capture the significance (or weight) associated with Ufor Ai. As discussed in Section 3, our ranking function is alinear combination of attribute- and value-weights. Wepropose a learning technique called the Probabilistic DataDistribution Difference method for estimating these weights.

Learning attribute-weights. From Example 1, we knowthat U1 is interested in “red” colored “Honda” vehicles. Letus focus on the preferences for the “Color” and “Mile-age”attributes, and consider the individual probabilitydistributions (shown in Fig. 9) of their respective valuesin sets N and R. Since the set R will contain only “red”colored vehicles, there is a significant difference betweenthe distributions (for “Color”) in R and N . In contrast,assuming that U1 is not interested in any particularmileage, the difference in the distributions for “Mileage”over sets R (which will contain cars with differentmileages) and N is small.

Based on this observation, we can hypothesize that—foran attribute Ai, if the difference in the probability distribu-tions between N and R is large, it indicates that Ai isimportant to the user, and hence, will be assigned a higherweight, and vice versa. The attribute weights are estimatedformally, using the popular Bhattacharya distance (DB)measure. If the probability distributions for the values ofattribute Ai in sets N and R are Np and Rp, respectively, theattribute-weight (WAi

) of Ai is given as

WAi¼ DB ðNp;RpÞ ¼ �ln ðBCðNp;RpÞÞ; ð9Þ

where Bhattacharya coefficient (BC) for categorical andnumerical attributes is given in (10) and (11) as

BCðNp;RpÞ ¼Xx2 X

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiNx;Rx

pð10Þ

BCðNp;RpÞ ¼Z ffiffiffiffiffiffiffiffiffiffiffiffiffiffi

Nx;Rx

pdx: ð11Þ

Determining value-weights. In order to rank the resultsof a query (using (1)), it is necessary that the scoreassociated with every tuple is on a canonical scale.Although the attribute-scores estimated using the Bhatta-charya Distance are between [0.0, 1.0], the values in thetuples may have different types and ranges. Hence, we needto normalize them to obtain the necessary value-weights.

Since we have assumed that numerical attribute valuesare discretized using an appropriate scheme (Section 4.1.1),we normalize only categorical attributes (e.g., “Make,”“Color,” etc.) using a frequency-based approach. For thequery Q by U , let a be the value of a categorical attribute Ai.The value-weight (aweight) is given (by (12)) as the ratio of thefrequency of a in R (ar) with its frequency in N (an)

aweight ¼ar=jRjan=jN j

: ð12Þ

6.1.2 Learning Model Evaluation

We test the quality of our proposed learning method(Probabilistic Data Distribution Difference) in deriving attri-bute-weights for a user query. In an ideal scenario, using thefeedback (i.e., tuples selected) provided by an user over aquery’s results, a ranking function would be deduced usingthe learning model. The quality of this function, however,can only be evaluated the next time the same user asks thesame query, and would be estimated in terms of thepercentage of the top-k results generated (by this function)that match the user’s preferences. In an experimentalsetting, asking a user to select tuples (from a large set)once, and then validate the function by asking the samequery again would be difficult.

Consequently, we test the quality of the proposedlearning model as follows: From the workload in Section 5,we have at our disposal ranking functions provided by usersover 20 distinct queries on two databases. Consider queryQ1

(with results N1) for which a user U1 has provided a rankingfunction (F 11). Using this function, we rankN1 to obtainNr1,and select a set of top-K tuples (from this ranked set) as theset (R1) chosen by U1. Using our learning model, we derive aranking function (F011) comparing N1 and R1. We then useF011 to rank Q1’s results and obtain N 0r1. The quality of thelearning model is then estimated as the Spearman rankcorrelation coefficient (8) between Nr1 and N 0r1. Higher thevalue of this coefficient, better is the quality of the rankingfunction (and the model), and vice versa.

In our experiments, we chose a value of K ¼ 25, i.e., wechoose the top-25 tuples generated by the user’s originalranking function as the set R since a web user, in general,selects a very small percentage of the top-K tuples shown tohim/her. The model was also tested with R varying from 10to 50 tuples and since, the results are similar for differentvalues of R, we omit the details in the interest of space. Wevalidate the ranking quality of our proposed learning modelto the quality achieved by using two established and widelyused learning models—linear regression and naive Bayesianclassifier. Fig. 10 compares the average ranking qualityachieved by the models in deriving ranking functions foreach individual query for all the users in both databases.

In order to prove the generic effectiveness of our model,the R is now chosen using different sampling techniques


Fig. 9. Probability distribution of sets N and R.

(instead of top-25). It is natural that based on users’preferences, higher ranked tuples (from Nr1) should havea higher probability of being sampled than the lowerranked ones. Hence, the sampling technique we chooseselects the required 25 tuples using following power-lawdistributions: Zipf, Zeta, and Pareto. Fig. 10 compares thequality of our learning model (shown by vertical bar in thegraph) with the other models across both databases usingdifferent sampling schemes. The results clearly indicatesthat our proposed method performs on par with thepopular learning models. These results also validate ourclaim of learning using feedback as an alternative to obtainingranking functions for generating workloads.

6.2 Challenges in Establishing a Workload

At the time of answering a user query, if a ranking functionfor this or a similar pair is not available, the Similaritymodel will be forced to use a ranking function correspond-ing to a user and/or query which may not be very similar tothe input pair, thus, affecting the final quality of rankingachieved by the function. Therefore, in order to achieve anacceptable quality of ranking, the workload should beestablished in a way such that for any user asking a givenquery, there exists a ranking function corresponding to atleast one similar pair.

Consequently, for the workload W comprising ofM queries and N users, the goal is to determine a set S ofuser-query pairs such that, for an each user-query pair inW, there exists at least one user-query pair in S that occursin the list of pairs most similar to the former. In order todetermine such a set, two important questions must beanswered—1) what is the minimum size of this set S?, and2) which user-query pairs (or cells) should be selected fromW to represent this set S?.

The first question is critical as obtaining rankingfunctions for web database queries requires the user tospend a considerable amount of time and effort. Likewise,the second question is crucial since the quality of rankingobtained depends largely on the functions derived for anappropriate set of user-query pairs. We are currentlyworking on addressing these challenges for generating anappropriate workload [28].

7 CONCLUSION

In this paper, we proposed a user- and query-dependentsolution for ranking query results for web databases. Weformally defined the similarity models (user, query, andcombined) and presented experimental results over two webdatabases to corroborate our analysis. We demonstrated thepracticality of our implementation for real-life databases.Further, we discussed the problem of establishing a work-load, and presented a learning method for inferringindividual ranking functions.

Our work brings forth several additional challenges. Inthe context of web databases, an important challenge is thedesign and maintenance of an appropriate workload thatsatisfies properties of a similarity-based ranking. Determin-ing techniques for inferring ranking functions over webdatabases is an interesting challenge as well. Anotherinteresting problem would be to combine the notion ofuser similarity proposed in our work with existing userprofiles to analyze if ranking quality can be improvedfurther. Accommodating range queries, usage of functionaldependencies and attribute correlations needs to be exam-ined. Applicability of this model for other domains andapplications also needs to be explored.

ACKNOWLEDGMENTS

The authors thank the anonymous referees for theirextremely useful comments and suggestions on an earlierdraft of this paper. This material is based upon workpartially supported by US National Science FoundationGrant IIS-1018865, CCF-1117369, and 2011, 2012 HP LabsInnovation Research Award. Any opinions, findings, andconclusions or recommendations expressed in this publica-tion are those of the author(s) and do not necessarily reflectthe views of the funding agencies.

REFERENCES

[1] S. Agrawal, S. Chaudhuri, G. Das, and A. Gionis, “AutomatedRanking of Database Query Results,” Proc. Conf. Innovative DataSystems Research (CIDR), 2003.

[2] S. Amer-Yahia, A. Galland, J. Stoyanovich, and C. Yu, “Fromdel.icio.us to x.qui.site: Recommendations in Social Tagging Sites,”Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 1323-1326,2008.

[3] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval.ACM Press, 1999.

[4] M. Balabanovic and Y. Shoham, “Content-Based CollaborativeRecommendation,” Comm. ACM, vol. 40, no. 3, pp. 66-72, 1997.

[5] J. Basilico and T. Hofmann, “A Joint Framework for Collabora-tive and Content Filtering,” Proc. 27th Ann. Int’l ACM SIGIRConf. Research and Development in Information Retrieval, pp. 550-551, 2004.

[6] C. Basu, H. Hirsh, and W.W. Cohen, “Recommendation asClassification: Using Social and Content-Based Information inRecommendation,” Proc. 15th Nat’l Conf. Artificial Intelligence(AAAI/IAAI), pp. 714-720, 1998.

[7] M.K. Bergman, “The Deep Web: Surfacing Hidden Value,”J. Electronic Publishing, vol. 7, no. 1, pp. 41-50, 2001.

[8] D. Billsus and M.J. Pazzani, “Learning Collaborative InformationFilters,” Proc. Int’l Conf. Machine Learning (ICML), pp. 46-54, 1998.

[9] K.C.-C. Chang, B. He, C. Li, M. Patil, and Z. Zhang, “StructuredDatabases on the Web: Observations and Implications,” SIGMODRecord, vol. 33, no. 3, pp. 61-70, 2004.

[10] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum, “ProbabilisticRanking of Database Query Results,” Proc. 30th Int’l Conf. VeryLarge Data Bases (VLDB), pp. 888-899, 2004.


Fig. 10. Ranking quality of learning models.

[11] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum, “ProbabilisticInformation Retrieval Approach for Ranking of Database QueryResults,” ACM Trans. Database Systems, vol. 31, no. 3, pp. 1134-1168, 2006.

[12] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar, “RankAggregation Methods for the Web,” Proc. Int’l Conf. World WideWeb (WWW), pp. 613-622, 2001.

[13] N. Fuhr, “A Probabilistic Framework for Vague Queries andImprecise Information in Databases,” Proc. 16th Int’l Conf. VeryLarge Data Bases (VLDB), pp. 696-707, 1990.

[14] S. Gauch and M. Speretta, “User Profiles for PersonalizedInformation Access,” Adaptive Web, pp. 54-89, 2007.

[15] Google, Google Base, http://www.google.com/base, 2012.[16] B. He, “Relevance Feedback,” Encyclopedia of Database Systems,

pp. 2378-2379, Springer, 2009.[17] T. Hofmann, “Collaborative Filtering via Gaussian Probabilistic

Latent Semantic Analysis,” Proc. 26th Ann. Int’l ACM SIGIR Conf.Research and Development in Information Retrieval, pp. 259-266, 2003.

[18] S.-W. Hwang, “Supporting Ranking For Data Retrieval,” PhDthesis, Univ. of Illinois, Urbana Champaign, 2005.

[19] T. Kanungo and D. Mount, “An Efficient K-Means ClusteringAlgorithm: Analysis and Implementation,” IEEE Trans. of PatternAnalysis in Machine Intelligence, vol. 24, no. 7, pp. 881-892, July2002.

[20] G. Koutrika and Y.E. Ioannidis, “Constrained Optimalities inQuery Personalization,” Proc. ACM SIGMOD Int’l Conf. Manage-ment of Data, pp. 73-84, 2005.

[21] C. Li, K.C.-C. Chang, I.F. Ilyas, and S. Song, “Ranksql: QueryAlgebra and Optimization for Relational Top-k Queries,” Proc.ACM SIGMOD Int’l Conf. Management of Data, pp. 131-142, 2005.

[22] X. Luo, X. Wei, and J. Zhang, “Guided Game-Based LearningUsing Fuzzy Cognitive Maps,” IEEE Trans. Learning Technologies,vol. 3, no. 4, pp. 344-357, Oct.-Dec. 2010.

[23] A. Marian, N. Bruno, and L. Gravano, “Evaluating Top-k Queriesover Web-Accessible Databases,” ACM Trans. Database Systems,vol. 29, no. 2, pp. 319-362, 2004.

[24] A. Penev and R.K. Wong, “Finding Similar Pages in a SocialTagging Repository,” Proc. 17th Int’l Conf. World Wide Web(WWW), pp. 1091-1092, 2008.

[25] Y. Rui, T.S. Huang, and S. Mehrotra, “Content-Based ImageRetrieval with Relevance Feedback in Mars,” Proc. IEEE Int’l Conf.Image Processing, pp. 815-818, 1997.

[26] X. Shi and C.C. Yang, “Mining Related Queries from Web SearchEngine Query Logs Using an Improved Association Rule MiningModel,” J. Am. Soc. Information Science Technology, vol. 58,pp. 1871-1883, Oct. 2007.

[27] W. Su, J. Wang, Q. Huang, and F. Lochovsky, “Query ResultRanking over E-Commerce Web Databases,” Proc. Conf. Informa-tion and Knowledge Management (CIKM), pp. 575-584, 2006.

[28] A. Telang, S. Chakravarthy, and C. Li, “Establishing a Workloadfor Ranking in Web Databases,” technical report, UT Arlington,http://cse.uta.edu/research/Publications/Downloads/CSE-2010-3.pdf, 2010.

[29] A. Telang, C. Li, and S. Chakravarthy, “One Size Does not Fit All:Towards User- and Query-Dependent Ranking for Web Data-bases,” technical report, UT Arlington, http://cse.uta.edu/research/Publications/Downloads/CSE-2009-6.pdf, 2009.

[30] K. Werner, “Foundations of Preferences in Database Systems,”Proc. 28th Int’l Conf. Very Large Data Bases (VLDB), pp. 311-322,2002.

[31] L. Wu et al., “Falcon: Feedback Adaptive Loop for Content-BasedRetrieval,” Proc. Int’l Conf. Very Large Data Bases (VLDB), pp. 297-306, 2000.

[32] Z. Xu, X. Luo, and W. Lu, “Association Link Network: AnIncremental Semantic Data Model on Organizing Web Re-sources,” Proc. Int’l Conf. Parallel and Distributed Systems (ICPADS),pp. 793-798, 2010.

[33] H. Yu, S.-w. Hwang, and K.C.-C. Chang, “Enabling Soft Queriesfor Data Retrieval,” Information Systems, vol. 32, no. 4, pp. 560-574,2007.

[34] H. Yu, Y. Kim, and S. won Hwang, “Rv-svm: An Efficient Methodfor Learning Ranking Svm,” Proc. Pacific-Asia Conf. KnowledgeDiscovery and Data Mining (PAKDD), pp. 426-438, 2009.

[35] T.C. Zhou, H. Ma, M.R. Lyu, and I. King, “Userrec: A UserRecommendation Framework in Social Tagging Systems,” Proc.24th AAAI Conf. Artificial Intelligence, 2010.

Aditya Telang received the bachelor’s degree incomputer science from Fr. Conceicao RodriguesCollege of Engineering, Mumbai University,India, the master’s degree in computer sciencefrom the State University of New York (SUNY) atBuffalo in 2004, and is currently working towardthe PhD degree in the Department of ComputerScience and Engineering at the University ofTexas at Arlington. His research interestsinclude ranking, querying, data mining, and

information integration over web databases.

Chengkai Li received the BS and ME degreesin computer science from Nanjing University andthe PhD degree in computer science from theUniversity of Illinois at Urbana-Champaign. He isan assistant professor in the Department ofComputer Science and Engineering at theUniversity of Texas at Arlington. His researchinterests are in the areas of databases, web datamanagement, data mining, and informationretrieval. He works on ranking and top-k queries,

web search/mining/integration, database exploration, query processingand optimization, social networks and user-generated content, OLAP,and data warehousing. His current focus is on both bringing novelretrieval and mining facilities into database systems and providing queryfunctionality on top of web data and information.

Sharma Chakravarthy is a professor in theComputer Science and Engineering Departmentat The University of Texas at Arlington (UTA).He is the founder of the Information Technologylaboratory (IT Lab) at UTA. His research hasspanned semantic and multiple query optimiza-tion, complex event processing, and web data-bases. He is the coauthor of the book StreamData Processing: A Quality of Service Perspec-tive (2009). His current research includes

information integration, web databases, recommendation systems,integration of stream and complex event processing, and knowledgediscovery. He has published more than 150 papers in refereedinternational journals and conference proceedings.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


one size does not fit all: toward user- and query-dependent ranking for web databases

Documents