extending faceted search with automated object...

12
Extending Faceted Search with Automated Object Ranking Kostas Manioudakis [0000-0002-6510-9889] and Yannis Tzitzikas [0000-0001-8847-2130] Institute of Computer Science, FORTH-ICS, Heraklion, Greece, and Computer Science Department, University of Crete, {manioudaki, tzitzik}@ics.forth.gr Abstract. Faceted Search is a widely used interaction scheme in digital libraries, e-commerce, and recently also in Linked Data. Nevertheless, object ranking in the context of Faceted Search is not well studied. In this paper we propose an extended version of the model enriched with parameters that enable specifying the characteristics of the sought object ranking. Then we provide an algorithm for producing an object ranking that satisfies these parameters. For doing so various sources are exploited including preferences and statistical properties of the dataset. Finally we present an implementation of the model, the GUI extensions that were required, as well as simulation-based evaluation results that provide evidence about the reduction of the user’s cost. 1 Introduction Faceted Search (FS) is the de facto query paradigm in e-commerce for more than one decade [14,16]. It is widely used in digital libraries, in the semantic web and in Linked Data. FS is essentially a session-based interactive method for gradual query formulation (commonly over a multidimensional information space) through simple clicks that offers to the user an overview of the result set (groups and count information) and never leads to empty result sets. At each state of the interaction the user explores the focus, i.e. the set of objects that satisfy the various constraints/filters that the user has specified up to that point. These objects are unranked, e.g. when the user explores a catalogue for buying a new laptop, or ranked, e.g. when the user explores the available hotels which are ordered with respect to price, user ratings or other criteria (default or specified by the user in the form of preferences as in the case of PFS [19]). The focus is ranked also in cases FS is applied after a keyword search query, e.g. as in Google Scholar. Although object ranking in FS is already used in commercial systems, the scientific literature on this topic is relatively short. Most of the research, has focused on methods only for facet ranking, i.e. for deciding in what order to place the facets. In this paper we focus on the ranking of objects. We propose an extension of the FS model that is enriched with parameters for specifying the desired properties of object ranking. These parameters allow tackling the problem of too big or too small answers and can specify how refined the sought ranking should be, as well as how long the answer should be. Then we describe Preprint of: K. Manioudakis and Y. Tzitzikas, Extending Faceted Search with Automated Object Ranking, 13th Metadata and Semantics Research Conference (MTSR 2019), Rome, Italy, Oct 2019

Upload: others

Post on 29-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extending Faceted Search with Automated Object Rankingusers.ics.forth.gr/...FacetedSearchAutoRanking.pdf · Faceted Search (FS) is the de facto query paradigm in e-commerce for more

Extending Faceted Search withAutomated Object Ranking

Kostas Manioudakis [0000-0002-6510-9889] and Yannis Tzitzikas [0000-0001-8847-2130]

Institute of Computer Science, FORTH-ICS, Heraklion, Greece, andComputer Science Department, University of Crete,

{manioudaki, tzitzik}@ics.forth.gr

Abstract. Faceted Search is a widely used interaction scheme in digitallibraries, e-commerce, and recently also in Linked Data. Nevertheless,object ranking in the context of Faceted Search is not well studied. Inthis paper we propose an extended version of the model enriched withparameters that enable specifying the characteristics of the sought objectranking. Then we provide an algorithm for producing an object rankingthat satisfies these parameters. For doing so various sources are exploitedincluding preferences and statistical properties of the dataset. Finallywe present an implementation of the model, the GUI extensions thatwere required, as well as simulation-based evaluation results that provideevidence about the reduction of the user’s cost.

1 Introduction

Faceted Search (FS) is the de facto query paradigm in e-commerce for morethan one decade [14,16]. It is widely used in digital libraries, in the semanticweb and in Linked Data. FS is essentially a session-based interactive methodfor gradual query formulation (commonly over a multidimensional informationspace) through simple clicks that offers to the user an overview of the result set(groups and count information) and never leads to empty result sets. At eachstate of the interaction the user explores the focus, i.e. the set of objects thatsatisfy the various constraints/filters that the user has specified up to that point.These objects are unranked, e.g. when the user explores a catalogue for buying anew laptop, or ranked, e.g. when the user explores the available hotels which areordered with respect to price, user ratings or other criteria (default or specifiedby the user in the form of preferences as in the case of PFS [19]). The focus isranked also in cases FS is applied after a keyword search query, e.g. as in GoogleScholar. Although object ranking in FS is already used in commercial systems,the scientific literature on this topic is relatively short. Most of the research,has focused on methods only for facet ranking, i.e. for deciding in what order toplace the facets. In this paper we focus on the ranking of objects. We proposean extension of the FS model that is enriched with parameters for specifyingthe desired properties of object ranking. These parameters allow tackling theproblem of too big or too small answers and can specify how refined the soughtranking should be, as well as how long the answer should be. Then we describe

Preprint of:K. Manioudakis and Y. Tzitzikas, Extending Faceted Search with Automated Object Ranking, 13th Metadata and Semantics Research Conference (MTSR 2019), Rome, Italy, Oct 2019

Page 2: Extending Faceted Search with Automated Object Rankingusers.ics.forth.gr/...FacetedSearchAutoRanking.pdf · Faceted Search (FS) is the de facto query paradigm in e-commerce for more

2 Authors Suppressed Due to Excessive Length

methods and algorithms for deriving such object rankings. In the sequent wedescribe how we have realized the model. To grasp the idea, the left side ofFigure 1 sketches the GUI of a typical FS containing three facets and a focuscomprising 8 objects partitioned into 2 buckets, the first containing 6 objects andthe second 2 objects. The right side sketches the GUI according to the extendedmodel that we introduce, where the user has asked for 10 objects, and a morerefined ranking, specifically that no bucket should contain more that 3 objects.We can see that more objects have appeared (approximate results) and the focusrespects the Maximum Block (MB) constraint of 3.

Fig. 1: Impact of automatic ranking on search results. The approximate resultssatisfy the preference and one of the 2 filters.

In a nutshell, the main contributions of this paper are: (a) the extension ofthe model with object ranking-related parameters, (b) the formulation of thecorresponding object ranking problem, (c) the discussion on the ability to solvethe problem, (d) the algorithm SmartFSRank for providing a solution, (e) theproposal of how to extend the GUI of FS systems for making evident and clearthe object ranking, (f) the description of the model’s implementation, and (g)some promising simulation-based evaluation results. The rest of this paper isorganized as follows: Section 2 describes the context and related work. Section3 introduces the extended model. Section 4 provides the algorithms for realizingthe extended model. Section 5 describes the extensions of the GUI and theimplementation, Section 6 discusses related systems and some evaluation results.Finally, Section 7 concludes the paper.

2 Context and Related Work

2.1 Background

Faceted Search Faceted Exploration (or Faceted Search) is a widely used in-teraction scheme for Exploratory Search. It is the de facto query paradigm in

Page 3: Extending Faceted Search with Automated Object Rankingusers.ics.forth.gr/...FacetedSearchAutoRanking.pdf · Faceted Search (FS) is the de facto query paradigm in e-commerce for more

Extending Faceted Search with Automated Object Ranking 3

e-commerce [14,16]. It is also used for exploring RDF Data (e.g. see [18] for arecent survey, and [10] for a recent system). In a short (and rather informal)way we could define it as a session-based interactive method for query formu-lation (commonly over a multidimensional information space) through simpleclicks that offers an overview of the result set (groups and count information),never leading to empty result sets .

PFS: Preference-enriched Faceted Search Preference-enriched FacetedSearch [19], for short PFS, is an extension of FS that supports preferences.PFS offers actions that allow the user to order facets, values, and objects usingbest, worst, preferTo actions (i.e. relative preferences), aroundTo actions (overa specific value), and other criteria. Furthermore, the user is able to composeobject related preference actions, using Priority, Pareto, Pareto Optimal (i.e.skyline) and other. The distinctive features of PFS is that it allows expressingpreferences over attributes whose values can be hierarchically organized (and/ormulti-valued), it supports preference inheritance, and it offers scope-based rulesfor resolving automatically the conflicts that may arise. As a result the usercan restrict his focus by using the faceted interaction scheme (hard constraints)that lead to non-empty results, and rank according to preference the objects ofhis focus. PFS has been used in various domains, also in the context of spokendialogue systems [12]. Recent extensions of SPARQL with preferences ([13,15])will facilitate the implementation of PFS over RDF data.

2.2 Related Work

There is related work from several areas including FS systems, Databases, andLearning to Rank approaches.

In Faceted Search Systems. The idea of automatic ranking in faceted searchis not new, and there are various approaches for automatic ranking in facetedsearch systems, discussed in the survey [18]. Most of the research, has focused onmethods only for facet ranking, i.e. for deciding in what order to place the facets.In most cases (e.g [8], [7]), the proposed methods are based on the frequency offacet values. The approach in [6] dynamically ranks the properties depending onthe query. Moreover, they define different measures on qualitative and numericfacets, and also suggest ranking the values of each property by their frequencyin descending order. Other metrics have also been utilized such as navigationalcost [5]. In [3] a set-cover ranking method is used. Ranking the facets can bealso useful for ordering the objects.

Object ranking, is already used in commercial systems. For instance, the ho-tel platform booking.com offers a ranking method based on various properties.However, the scientific literature on this topic is relatively short. In [14] (Chapter9) a method based on property values is briefly described through an e-shoppingexample. According to PFS, and the system Hippalus [11], the user can specifythe ranking of objects through preference actions. In the context of PFS, [17]introduced a method for quantifying the degree of match between an object andthe user’s preference actions. This is mainly for making evident to the user how

Page 4: Extending Faceted Search with Automated Object Rankingusers.ics.forth.gr/...FacetedSearchAutoRanking.pdf · Faceted Search (FS) is the de facto query paradigm in e-commerce for more

4 Authors Suppressed Due to Excessive Length

good the top-ranked objects are. In the same context of PFS, [12] introducesselectivity and entropy, that can be used for ranking the objects.In Databases. The problem of automated ranking has also been studied inthe context of databases. For instance, [1] aims at tackling the Many AnswersProblem, which arises when a query returns too many result tuples without anyordering. That work proposes a framework that adopts the popular InformationRetrieval ranking method of cosine similarity on TF-IDF weights. An alternativeapproach for the Many Answers Problem was proposed in [2] that is based onProbabilistic Information Retrieval. In that work, they investigate attributesnot specified in the query and calculate two scores: a global score which capturesthe global importance of unspecified attribute values and a conditional scorewhich captures the strengths of dependencies (or correlations) between specifiedand unspecified attribute values. These scores are estimated automatically froma workload of past queries and with data analysis. A user evaluation showedbetter quality in comparison to methods of [1].Learning to Rank Approaches. In the field of Information Retrieval apartfrom the classical retrieval models, lately various methods that utilize MachineLearning have been developed [9]. These processes involve creating a rankingfunction by training a machine learning model on user data, collected from pastqueries (e.g. number of clicks on each item, explicit user ratings, etc). A Learningto Rank approach for faceted search is proposed in [20]. In that work, a weightedTF-IDF scoring formula calculated on facets is adopted. Queries and documentsare represented as sets of facet-value pairs, and learning methods are employedto determine the optimal weights of the formula, given a user query and a set ofpreviously judged documents (from explicit and implicit user feedback).

3 The Extended Model

Representation of Data. We assume a set of objects Obj described by a setof facets F1, . . . , Fk. The values of these facets can be categorical, numerical, aswell as values from a taxonomy. Moreover facets can be multi-valued. Overallthis representation framework captures to most widely (and successful) applica-tions of FS (for e-commerce, booking application, bibliographic search, etc).Plain FS Interaction. The user during the FS formulates interactively a setof hard constraints (filters), denoted by hc. The focus is actually the extensionof the hc, denoted by E(hc) (E(hc) ⊆ Obj).FS with Preferences. If PFS is supported and the user has expressed a set ofpreferences, i.e. a set of soft constraints denoted by sc, then they are exploitedfor ranking the elements of E(hc) with respect to the preference relation �sc.The Focus. In general we can consider that the focus at each state of the inter-action is a bucket order, L = 〈b1, . . . , bz〉, i.e. a sequence of blocks b1 . . . bz, eachbeing a subset of Obj which are pairwise disjoint (bi∩ bj = ∅). Note that z couldbe 1, meaning that the focus consists of a single block with objects not ranked(or ranked arbitrarily for reasons of presentation). Note that this formulationcaptures both FS and PFS, which uses preferences to rank the objects.

Page 5: Extending Faceted Search with Automated Object Rankingusers.ics.forth.gr/...FacetedSearchAutoRanking.pdf · Faceted Search (FS) is the de facto query paradigm in e-commerce for more

Extending Faceted Search with Automated Object Ranking 5

User Session. A user session us is a series of actions, s = 〈a1, . . . , an〉 whereeach ai is a hard or soft constraint.Parameters of the Extended Model. We propose extending the model withtwo parameters– MB: Maximum Block size. E.g. if MB = 1 then the system should return

a linear order of objects, if MB = 2 the answer should not contain tiesbetween more than 2 objects.

– R: number of requested objects.With these two parameters several requirements can be tackled:– Too many objects: R forces the system to rank the objects for returning the

best R objects.– Too few objects: R forces the system to also return approximate objects– Arbitrary order: MB forces the system to rank the objects so that no block

has more than MB objects, and in this way the rank is not arbitraryCharacterizing a bucket order L. Let L be a bucket order L = 〈b1, . . . , bz〉,i.e. a possible ranked answer, and let objects(L) denote the set of objects thatoccur in L. We could characterize L according to various criteria:

– HCsat. We could say that L satisfies the hard constraints hc, if objects(L)are exactly those that satisfy hc, i.e. objects(L) = E(hc).

– SCsat. We could say that L satisfies a soft constraint sc (i.e. it respectsthe preference order), if L ⊇�sc|E(hc). This means that autoranking is usedonly for ranking the objects in each block of the preference order (it never“violates” the blocks, it just add relationships, and these relationships donot create any cycle).

– MBsat. We could say that L satisfies a maximum allowable block size MB,if |bi| ≤MB for each 1 ≤ i ≤ z.

– Rsat. We could say that L satisfies R, if L contains exactly R objects.

It is not hard to see that it is not always possible to find an L that satisfiesall of the above constraints. For instance if the objects that satisfy the hc areless than R, i.e. |E(hc)| < R, then the system should either return less objects(sacrificing Rsat), or should try to return R objects by extending E(hc) withthe R− |E(hc)| in number “closest” objects (sacrificing HCsat).

On the other extreme, if those that satisfy the hc are more than R, then thesystem should rank them and return the best of them. This is another case ofan L that is not HCsat, since it contains less objects than those satisfying hc. Aproblem statement that includes more requirements follows:

Definition 1 (Problem Statement). Given a user session us with hard andsoft constraints, a parameter R specifying the number of desired objects, a pa-rameter MB specifying the maximum allowable block size, compute and returnto the user the “best” (wrt HCsat, SCsat, MBSat, Rsat) answer L. �

What remains is to clarify what “best” means. A first objective is to produceone or more L that satisfy all the criteria if possible. In case there are morethan one, then we need a method to select one of them. To this end, datasetstatistics and other metrics (as we have seen in the related work) can be used.For example we could promote frequently or rarely occurring values. If there is

Page 6: Extending Faceted Search with Automated Object Rankingusers.ics.forth.gr/...FacetedSearchAutoRanking.pdf · Faceted Search (FS) is the de facto query paradigm in e-commerce for more

6 Authors Suppressed Due to Excessive Length

no L that satisfies all hard constraints, then we need to find one that “betterapproximates” a bucket order that satisfies them all. We will elaborate on thisissue in the next section.

4 The SmartFSRank Ranking Method

Here we define an algorithm that provides a solution to the problem statement(as defined in Def. 1). In general the algorithm exploits PFS, if supported, andit tries to satisfy R by ranking and approximate matching, and MB throughranking based on statistical properties of the data. Note that the algorithm canbe applied even if PFS is not supported (in that case we have one block, i.e.z = 1). In brief, the algorithm SmartFSRank, Alg. 1, first tries to satisfy hc (line1), and then sc (Part 1, line 2) by exploiting the PFS-based ranking method.Then it tries to satisfy R (Part 2, lines 5-7), and finally MB (Part 3, lines 8-13).In Part 2, if R is greater than the size of the current focus, then more objects(not satisfying hc) have to be added and this should be based on the approximatesatisfaction of the hard constraints. This selection is done by AppendBlocksthat is analyzed in 4.1. In Part 3, the block breaking for satisfying MB can bebased on frequency, and similarity to soft constraints, as defined in [17], and itis done by BreakBlock(b) that is analyzed in 4.2.

Algorithm 1 SmartRankInput: Obj, hc, sc,MB,ROutput: A ranked answer that satisfies hc, sc,MB and R.

1: A = E(hc); . The objects satisfying the hc2: /** Part (1): Apply the PFS method to satisfy sc*/3: Compute (A,�sc |A) which is a series of blocks = 〈b1, . . . , bZ〉.4: /** Part (2): Satisfy R */5: if |A| < R then . need to add more objects6: Result = Result.AppendBlocks(R− |A|); . Add new blocks to the answer7: end if8: /** Part (3): Satisfy MB*/9: for each b ∈ Result do

10: if |b| > MB then . If block b does not satisfy MB11: Replace b by BreakBlock(b,MB, sc, hc)12: end if13: end for

4.1 AppendBlocks

The idea is to score each object according to its distance from the hc. Havingsuch a scoring function, a method to realize AppendBlocks is to score eachobject not in E(hc) and return the R − |E(hc)| objects that have the highestscore. This method guarantees that it will return the objects which maximizethe score, i.e. those that better approximate the information need, as expressed

Page 7: Extending Faceted Search with Automated Object Rankingusers.ics.forth.gr/...FacetedSearchAutoRanking.pdf · Faceted Search (FS) is the de facto query paradigm in e-commerce for more

Extending Faceted Search with Automated Object Ranking 7

by the hc. As regards the scoring, below we detail a method based on facettypes. Let hc = c1 ∧ . . . ∧ cn, where each ci is a constraint like Fi = ti, e.gStars = 4. If oj is an object, with oji we denote the value of oj on the facet Fi.In Table 1 we define the score per conjunct based on the facet type. The firstcolumn corresponds to the case where a conjunct is satisfied, while the secondpresents the formulas used when a conjunct is not satisfied. In both cases thescores are in the interval [0, 1]. In the last row, corresponding to the case wherethe terminology is structured as a taxonomy, we define a similarity measure thatreflects the distance in the taxonomy. For a term x ∈ Fi, let up(x) = { t ∈Fi | x ≤i t}. We define the similarity between two terms x and y, as the Jaccard

similarity of their greater nodes, specifically: simtax(x, y) = |up(x)∩up(y)||up(x)∪up(y)| and

then define scoretaxonomy(Fi = ti, oj) = simtax(ti, oji).

Table 1: Formulas for calculating scorehc per conjunctscore type oj satisfies ci oj does not satisfy ci

scoreflatterminology(Fi = ti, oj) 1 0

scorenumeric(Fi = ti, oj) 1 1− |ti−oji|maxom∈Obj{|ti−omi|}

scoreinterval(Fi ∈ [a, b], oj) 1 scorenumeric(Fi = a+b2, oj)

scoretaxonomy(Fi = ti, oj) simtax(ti, oji) simtax(ti, oji)

Back to the running example of Fig. 1, we can now see how the scores of theapproximate results were calculated, regarding the constraint Stars = 4. Theyboth have Stars = 5, so their score 1 according to the numeric case of the above

table is: scorenumeric(Stars = 4, 5) = 1− |4−5|4−0 = 1− 14 = 0.75

Definition 2 (HCscore). We can define the consolidated score of an object owith respect to hc = c1 ∧ . . . ∧ cn, as: scorehc(oj) = 1

n

∑ni=1 scoretype(ci)(oj)

where type(ci) ∈ {flatterminology, numeric, interval, taxonomy} .

This formula can be considered as the baseline. It expresses how close oj iswith respect to a point that satisfies hc. The exact algorithm for AppendBlocksis Alg. 2. It returns the bucket order to be appended.

4.2 BreakBlock

For breaking each block that does not satisfy MB an idea is to score eachobject of the block according to its discrimination value. Rare elements areharder to find, therefore it could be reasonable to promote objects that have rarevalues, i.e. those with higher discrimination value. On the other hand, frequentvalues may correspond to popular values, therefore it could be also reasonableto promote frequent values, i.e. those that appear in several objects. As we shallsee in the section about GUI, the GUI allows the user to specify whether rareor frequent values are preferred. In any case we have to define and compute thediscrimination value. Having such a formula we can apply it to each object of

1 Assuming that the range of the facet Stars, in the dataset, is the set {0,1,2,3,4,5}.

Page 8: Extending Faceted Search with Automated Object Rankingusers.ics.forth.gr/...FacetedSearchAutoRanking.pdf · Faceted Search (FS) is the de facto query paradigm in e-commerce for more

8 Authors Suppressed Due to Excessive Length

Algorithm 2 AppendBlocksInput: Num, hcOutput: A bucket order to be appended

1: CO = Obj \ E(hc); . The candidate objects2: for each o ∈ CO do3: o.score = scorehc(o); . compute the score of o wrt hc4: end for5: Sort(CO, score, descending); . sort CO wrt score attribute in desc. order6: Let B = 〈b1, . . . , bF 〉 the resulting blocks . after the previous sorting7: c = 0; i = 1; A = ε;8: while c < Num do . While the objective of Num objects is not reached9: A = A.appendBlock(bi); i+ +; c = c+ |bi|;

10: end while11: Return A;

any block that does not satisfy MB to order its objects. Then, such blocks willbreak to smaller ones satisfying MB. We can define the discrimination value(dV ) of a tuple oj = (oj1, . . . , ojk) by taking the average inverse frequency, i.e.:dVw(oj) = 1

k ∗∑i=1,k

1freqw(oji)

. Note that frequency can be defined in various

ways, this is why the above formula uses freqw where w ∈ {g, ga,E}. Specificallythe frequency of a value t ∈ Fi, can be defined globally (freqg), or with respectto the objects that have value in facet Fi (freqga), or in the current focus E

(freqE). Formally: freqg(oji) =|{ox∈Obj|oxi=oji}|

|Obj| (1),

freqga(oji) =|{ox∈Obj|oxi=oji}||{oy∈Obj|oyi 6=ε}| (2), freqE(oji) =

|{ox∈E|oxi=oji}||E| (3).

Note that if oji = ε, i.e. null, then freq{g,E}(oji) = number of objects havingnull (just like an ordinary value). In general it makes sense to consider a seriesof “tie breaking” methods, for making sure that all ties can be broken so thatMB is eventually satisfied. Each such method can be assigned a level, meaningthat if the application of the level i method does not break a tie, then the leveli+ 1 method is applied. The exact algorithm for BreakBlock is Alg. 3.

It first breaks the block b with respect to the discrimination value in thefocus (i.e. freqE). If MB is still not satisfied, it uses freqg (recursively onlyfor that block). Specifically we assume the following series of levels: Levels =〈dvE , dvG, lexicographic〉. At level 3 we assume the lexicographic order wrt thename of the object. This ensures that the algorithm terminates. The algorithmreturns a bucket order that certainly satisfies MB. One key point is that a costis paid only if needed i.e. only for the blocks that do not satisfy MB.

5 Extensions of the Graphical User Interface

In general we have identified the following GUI issues: (a) how to make evi-dent the automatic ranking, (b) how to enable the user to change the ranking(e.g. frequent vs rare), (c) how to make clear the objects that do not satisfythe hard constraints, (d) how to provide ranking explanation (both for hc and

Page 9: Extending Faceted Search with Automated Object Rankingusers.ics.forth.gr/...FacetedSearchAutoRanking.pdf · Faceted Search (FS) is the de facto query paradigm in e-commerce for more

Extending Faceted Search with Automated Object Ranking 9

Algorithm 3 BreakBlockInput: b,MB, level, hc, sc, where b does not satisfy MB.Output: A bucket order of the objects of b that satisfies MB.

1: A = objects(b); . the objects occuring in b2: for each o ∈ A do3: o.dv = DV (o, level); . compute the discrimination value of o at level4: end for5: Sort(A, dv, descending); . sort A wrt dv attribute in desc. order6: Let B = 〈b1, . . . , bF 〉 the resulting blocks . after the previous sorting7: for each bi ∈ B do8: if |bi| > MB then . if bi still does not satisfy MB9: Bnew = BreakBlock(bi,MB, level + 1, sc, hc); . recursive call with +1

level10: end if11: Replace in B the block bi by the series of blocks Bnew12: end for13: Return B;

sc). For the implementation of the extended FS model that we propose in thispaper, we decided to use Hippalus which is a publicly accessible web systemthat implements the PFS interaction model. The information base that feedsHippalus is represented in RDF/S2 using a schema adequate for representingobjects described according to dimensions with hierarchically organized values.Below we describe how we tackled the above questions by showing screenshotsfrom our implementation. The bucket order is presented by separating bucketswith a line label “Top-ranked”, “Second-ranked”, etc. so that the preference-based ranking is made clear to the user. The objects within a preference-basedbucket are ordered based on the automatic method presented in this paper (in-stead of an arbitrary one). The settings provided are following (see also Fig. 2):

1. Enable / disable Rsat

2. Specify the value of R parameter

3. Enable / disable inner bucket order-ing

4. Enable / disable MBsat

5. Specify the value of MB parameter

6. Select policy about discriminationvalue: prefer rare values, commonvalues or no preference

Fig. 2: The automatic rankingsettings as provided in the GUI

Figure 2 displays the settings used in the running example (Fig. 1). As regardsranking explanation, (a) the GUI shows the scores for each object (consolidated

2 http://www.w3.org/TR/rdf-schema/

Page 10: Extending Faceted Search with Automated Object Rankingusers.ics.forth.gr/...FacetedSearchAutoRanking.pdf · Faceted Search (FS) is the de facto query paradigm in e-commerce for more

10 Authors Suppressed Due to Excessive Length

scorehc, dvE , dvG, and similarity to soft constraints, as a percentage) as shownin the right side of Fig. 1, and (b) the GUI provides a button labeled “explain”on each object, that when pressed, it displays the scorehc per facet, as well asthe soft constraints and which of them are satisfied by the object.

6 Discussion

Comparison with Related Systems. In comparison to the existing work, atfirst we should say that the data space we assume supports hierarchically or-ganized values. As regards the proposed ranking framework, it is the first thatconsiders hard constraints (the typical functionality of FS), soft constraints (in-cluding preference inheritance in the hierarchically organized values), as well asstatistical-based object ranking. It does not require log or training data, there-fore it can be widely applied. Probably the closest work to ours is [20] howeverthey rely on machine learning techniques and user data, while we focus on thequery constraints and the statistical properties of the dataset. To make clearthe key differences between our system and the most related systems that werementioned in §2, Table 2 provides a list of features and marks those systems thatprovide them.

Table 2: Comparison with Related SystemsOur Tool GRAFA [10] Basu et al. [4] van Belle [20]

Facet Hierarchies Yes No Yes NoApproximate Matching Yes No No YesBlocks of desired size Yes No No No

Preferences Yes No No NoNeeds training data No No No Yes

Simulation-based Evaluation. The main purpose of automatic ranking, isto assist the user in finding the desired object. One way to measure this gain,is to consider the number of constraints required, until the desired objects isranked in the top-K positions of the focus (for various values of K). The betterthe automatic ranking is, the less constraints are required. To this end, we havestarted a simulation based evaluation, since this can be more objective and lesslaborious than evaluation with users. So far we have made experiments usingone dataset, which contains information about 382 hotels and has 18 facets. Weconsidered each of these 382 hotels as a possible ideal top result. We run 2 tests.In the first one, we simulated users that try to find that hotel by sequentiallyapplying hard constraints that match the object’s description (the conjunctscorrespond to the object’s facet-value pairs). In the second one, instead of hardconstraints the user formulated only soft constraints. Specifically, for each facet-value pair of the target object a soft constraint of type “best” was applied (e.gprefer Stars...4 best). The initial ordering of the objects was a random one, butthe same in each simulated session. Below we report the average and maximumnumber of constraints needed in each of three cases: (1) No automatic ranking,(2) Automatic ranking based on discrimination value, preferring rare values, (3)Automatic ranking based on discrimination value, preferring common values.

Page 11: Extending Faceted Search with Automated Object Rankingusers.ics.forth.gr/...FacetedSearchAutoRanking.pdf · Faceted Search (FS) is the de facto query paradigm in e-commerce for more

Extending Faceted Search with Automated Object Ranking 11

For cases (2) and (3) the maximum block size (MB) is 1, so that there aren’tany ties in the ranking. Tables 3 and 4 show the simulation results.

Table 3: Hard constraints session simulation resultsAutomaticRanking

Preferenceon DV

Avg. HC(K=1)

Max HC(K=1)

Avg. HC(K=2)

Max HC(K=2)

Avg. HC(K=3)

Max HC(K=3)

Disabled - 4.55 19 3.08 15 2.51 14Enabled Rare 4.85 19 3.29 18 2.62 15Enabled Common 4.11 18 2.92 10 2.45 10

Table 4: Soft constraints session simulation resultsAutomaticRanking

Preferenceon DV

Avg. SC(K=1)

Max SC(K=1)

Avg. SC(K=2)

Max SC(K=2)

Avg. SC(K=3)

Max SC(K=3)

Disabled - 4.26 16 2.99 14 2.46 14Enabled Rare 4.43 16 3.14 14 2.57 14Enabled Common 3.99 14 2.89 11 2.43 7

Although we plan to continue evaluation on more datasets and with morecomplex simulated user sessions, the results so far show that preferring commonvalues is better than searching without automatic ranking. This holds for allmetrics that are measured in Tables 3 and 4. On the contrary, if preference isgiven on rare values, we get worse results than those with arbitrary ranking.The above provide positive evidence for the benefits of MB and of BreakBlocks,specifically they reduce the average cost up to to 9.18% (HC, K=1) and themaximum cost up to 50% (SC, K=3). As regards the benefits from R, it isnot hard to see that whenever a new approximate object is added, it reducesthe number of constraints the user would have to formulate (for getting thatobject). Therefore the gain from adding R−|A| objects is the number of distinctdescriptions of these objects, so the gain ranges [1, R− |A|].Efficiency. Although scalability is not currently our focus, we should note thatthe time complexity of the ranking methods is O(N ∗ K) where N,K are thenumber of objects and facets. For the dataset we used, which contained 382objects, the average time to apply a hard constraint increased from 2 ms (withoutautomated ranking) to approximately 20 ms (with automated ranking), whichis not noticeable by the user.

7 Concluding Remarks

We proposed an extended model for FS for improving the exploration experi-ence of the users in various contexts. Specifically, we proposed a set of param-eters for specifying the desired properties of object ranking, and then, throughSmartFSRank, we factorized the problem to sub-tasks that can be tackled moreeasily. Finally, we showed the model’s implementation and the required GUI ex-tensions. The evaluation results of the extended model through simulation arepromising, i.e. they provide evidence that the proposed model reduces the usercost for finding the desired object, specifically it reduces the average cost up to9.18% and the maximum cost up to 50%. Apart from continuing the simulation-based evaluation, an interesting extension would be to consider diversification

Page 12: Extending Faceted Search with Automated Object Rankingusers.ics.forth.gr/...FacetedSearchAutoRanking.pdf · Faceted Search (FS) is the de facto query paradigm in e-commerce for more

12 Authors Suppressed Due to Excessive Length

requirements, as well as to investigate indexes and algorithms for scalability i.e.for enabling FS with automated ranking over very big datasets.

Acknowledgement. This work was partially supported by the project AI4EU(EU H2020, Grant agreement No 825619).

References

1. S. Agrawal, S. Chaudhuri, G. Das, and A. Gionis. Automated ranking of databasequery results. In CIDR, 2003.

2. S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum. Probabilistic ranking ofdatabase query results. In Proceedings of the Thirtieth VLDB, 2004.

3. W. Dakka, P. Ipeirotis, and K. Wood. Automatic construction of multifacetedbrowsing interfaces. In Proceedings of the 14th CIKM, 2005.

4. Basu et al. Minimum-effort driven dynamic faceted search in structured databases.In Proceedings of the 17th CIKM. ACM, 2008.

5. Li Chengkai et al. Facetedpedia: Dynamic generation of query-dependent facetedinterfaces for wikipedia. In Proceedings of the 19th ICWWW. ACM, 2010.

6. Vandic et al. Dynamic facet ordering for faceted product search engines. IEEETransactions on Knowledge and Data Engineering, 2017.

7. R. Hahn, C. Bizer, C. Sahnwaldt, C. Herta, S. Robinson, M. Burgle, H. Duwiger,and U. Scheel. Faceted wikipedia search. In Business Information Systems, 2010.

8. Andreas Harth. Visinav: Visual web data search and navigation. In Database andExpert Systems Applications, 2009.

9. Tie-Yan Liu. Learning to rank for information retrieval. Foundations and Trendsin Information Retrieval, 3, 2009.

10. J. Moreno-Vega and A. Hogan. Grafa: Scalable faceted browsing for RDF graphs.In ISWC, 2018.

11. P. Papadakos and Y. Tzitzikas. Hippalus: Preference-enriched faceted exploration.In EDBT/ICDT Workshops, volume 172, 2014.

12. A. Papangelis, P. Papadakos, Y Stylianou, and Y. Tzitzikas. Spoken dialogue forinformation navigation. In SIGDial, 2018.

13. O. Pivert, O. Slama, and V. Thion. SPARQL Extensions with Preferences: aSurvey. In ACM Symposium on Applied Computing, 2016.

14. G. M. Sacco and Y. Tzitzikas. Dynamic taxonomies and faceted search: theory,practice, and experience. Springer Science & Business Media, 2009.

15. A. Troumpoukis, S. Konstantopoulos, and A. Charalambidis. An extension ofsparql for expressing qualitative preferences. In ISWC, 2017.

16. D. Tunkelang. Faceted search. Synthesis lectures on information concepts, retrieval,and services, 2009.

17. Y. Tzitzikas and E. Dimitrakis. Preference-enriched faceted search for voting aidapplications. IEEE Transactions on Emerging Topics in Computing, 7(2), 2019.

18. Y. Tzitzikas, N. Manolis, and P. Papadakos. ”Faceted Exploration of RDF/SDatasets: A Survey”. Journal of Intelligent Information Systems, 2017.

19. Y. Tzitzikas and P. Papadakos. Interactive exploration of multidimensional andhierarchical information spaces with real-time preference elicitation. FundamentaInformaticae, 20:1–42, 2012.

20. Agnes van Belle. Learning to rank for faceted search: Bridging the gap betweentheory and practice, 2017.