[lecture notes in computer science] electronic commerce and web technologies volume 2115 || faceted...

K. Bauknecht, S.K. Madria, and G. Pernul (Eds.): EC-Web 2001, LNCS 2115, pp. 295-304, 2001.© Springer-Verlag Berlin Heidelberg 2001

Faceted Preference Matching in Recommender Systems

Fred N. Loney

Spirited Software, [email protected]

Abstract. A recommender system assists customers in product selection bymatching client preferences to suitable items. This paper describes a preferencematching technique for products categorized by a faceted feature classificationscheme. Individual ratings of features and products are used to identify acustomer’s predictive neighborhood. A recommendation is obtained by aninferred ranking of candidate products drawn from the neighborhood. Thetechnique addresses the problem of sparse customer activity databasescharacteristic of e-commerce. Product search is conducted in a controlled,effective manner based on customer similarity. The inference mechanismevaluates the probabilty that a candidate product satisfies a customer query.The inference algorithm is presented and illustrated by a practical example.

1 Introduction

E-commerce sites can attract and retain customers by fostering a virtual community ofusers sharing a common interest. A community focus has two complementary aspects:

• collaboration in integrating member-generated content• personalization based on client preference

A recommender system enables a community focus by matching personal preferencesto shared product evaluations. The collaborative ratings are used as a source ofevidence to extrapolate a user’s observations to an unfamiliar item.

Recommender systems exhibit the following characteristics:• active participation – users contribute product ratings and feature preferences• sparse coverage – the number of products purchased or rated by an

individual is a small proportion of the total number of available products• probabilistic inference – the recommendation is understood to be an

approximate match based on the available information• selectivity bias – it is more important to avoid a false positive than a false

negativeUser contributions take a variety of forms. The least informative contribution is a

record of purchase activity. Suggestions are based on prior purchases, on theassumption that a purchase is an implicit recommendation. Confidence is increased ifthe user provides feedback by rating the purchased product. Capturing userpreferences of product features offers additional useful information.

Sparse databases present a special challenge for building a recommender system.The critical function of a recommender system is to find a predictive neighborhood ofusers who have enough in common to make a reasonably accurate prediction of

mailto:[email protected]

296 F.N. Loney

shared likes and dislikes. A recommendation is essentially an inference based onuncertain supporting evidence. However, the selectivity bias implies that some errorsare worse than others. Overlooking a good product is less egregious thanrecommending a bad product.

This paper presents a technique for making recommendations for active systemsthat takes into account both product ratings and feature preferences. Particularattention is paid to finding a suitable neighborhood and factoring features into theinference process. The paper is organized as follows: Section 2 is a cursory review ofrelated work. Section 3 presents an algorithm for identifying a predictiveneighborhood. Section 4 incorporates feature preferences into the technique. The finalsection summarizes the results and discusses future work.

2 Related Work

Recommender systems have been implemented for such diverse applictions as officesystems [4], news groups [12], music [17] and movies [8]. A comprehensive reviewof e-commerce recommender systems is presented in [15].

Some algorithms for identifying comparable individuals include the LikeMindscloseness function [16], the constrained Pearson coefficient [17], the Spearman rankcorrelation [7], “cosine” vector similarity [2] and the Intelligent RecommendationAlgorithm (IRA) [1]. The constrained Pearson coefficient is chosen as the basis forthe technique described here because it offers good results in most situations [7] andis easily adapted to the normalized form presented in Section 3. The correlation isderived from the ratings of two users a and b over a common set of products P. Forrecommendations rap and rbp of a product Pp ˛ , the general form of the constrainedPearson coefficient is given by:

22 )()(

))((

å

å

˛

˛

--

--=

Ppbbpaap

Ppbbpaap

ab

rrrr

rrrr

corr

(1)

where ar is the mean of a’s observations and br is the mean of b’s observations.

These algorithms treat an item as an opaque entity without elaborating itemfeatures. The recommendation encapsulates the item as a package of features withoutconsidering the contribution of any particular feature to that item’s appeal. Facets [11]offer a way to categorize features and a framework to rank and compare features aswell as products. Facets have been used to index and retrieve relevant software reusecomponents [11,3]. They are adapted to recommender systems in this paper as afeature classification mechanism.

3 Neighborhood Formation

A neighborhood is formed by discovering users who have similar ratings. Thissection presents a technique to identify a neighborhood based on standardized productratings. Section 4 extends this basic model to incorporate feature preferences into theuser comparisons as well.

Faceted Preference Matching in Recommender Systems 297

3.1 Normalization

Typically, a rating is an integer in the range [l,v], where v is a small positive integer.We can, without loss of generality, normalize the set of ratings to the unit interval [-1,1] with zero mean. Normalization standardizes and simplifies subsequent ratingcomparisons and recommendation inference. For a set of ratings R where

Rrvr ˛££ ,1 , the normalization transformation t is defined by:

1

12)(

---=

v

vrrt (2)

This is a well-formed linear transformation that maps the corresponding bounds andmean, since:

01

1)1()

2

1(

11

12)(

11

1

1

12)1(

=-

--+=+

=-

--=

-=-+-=

---=

v

vvvv

vvv

v

v

v

v

t

t

t (3)

Furthermore, t is strongly order-preserving, i.e. for Rrrr ˛311 ,,

)()( 2121 rrrr tt <Û<)()( 2121 rrrr tt =Û=

)()( 2121 rrrr tt >Û>and

)()(

)()(

31

21

31

21

rr

rr

rr

rr

tttt

--=

--

(4)

[10] presents the generalized normalization procedure and proof of correctness.Henceforth, it is assumed that all rating sets are so normalized.

3.2 Similarity

The normalized Pearson correlation corrab is given by Equation (5):

å åå=i biai

i biaiab

rr

rrcorr

22

(5)

The Pearson correlation has the desirable property that a user whose ratings areconsistently inflated or deflated by a positive constant multiplier with respect to abase user preserves a high correlation with the base user, i.e. if rbi = cÂUai and c > 0then

1||)( 22

2

22==

×

×=

ååå

å åå

i aii ai

i ai

i i aiai

i aiaiab

rrc

rc

crr

crrcorr

(6)

298 F.N. Loney

On the other hand, Pearson correlation lacks a useful feature of the IRA algorithmfor detecting inverse correlations, whereby the multiplier c approaches –1. A userwith a strongly negative rating correlation is conferred a high positive predictivevalue. The IRA algorithm encodes this negative correlation inversion such that thesimilarity metric approaches 1 as the correlation approaches –1. In fact, a negativecorrelation carries the same weight as a inversely positive correlation.

The motivation is to encode the predictive value of consistently differentindividuals. While a negative correlation may be a useful predictor, it is unclear that itapplies in all domains. Furthermore, it is unlikely that a negative correlation shouldcarry the same weight as a comparably positive correlation. Intuitively, mostcustomers would prefer a recommendation of an item highly rated by consistentlylike-minded individuals to a recommendation of an item poorly rated by consistentlydifferent individuals.

Given the similarity bias mentioned in the introduction, positive recommendationsof poorly rated items should be exercised with discretion. The approach taken here isto allow for this and other judgements of user credibility as an explicit scaling factor.A user b is assigned a credibility with respect to user a that signifies the credenceplaced in user b’s judgement. The credibility, denoted credab, is a scaling factor ofthe correlation in the range [-1, 1]. It is typically calculated based on simpleheuristics, user feedback or self-assessments. For example, a professional wine criticmight be accorded higher credibility than other evaluators of wine products.

The conventional Pearson correlation assumes credab = 1 for all users a,b. The IRAalgorithm assumes the two-valued credibility assignment:

îíì

<-‡

=0,1

,0,1

ab

abab corr

corrcred

Other heuristics are possible, and can be discovered and adjusted with experience inthe application domain.

Given the correlation and credibility, the confidence imputed to user a in theratings of user b is given by:

ababab corrcredconf ×= (7)

Finally, a predictor neighborhood of user a with confidence K is the set of users

{ }KconfUuN aba >˛= | (8)

The value of K is chosen to balance predictive support with an adequate neighborhoodsize. In practice, it is convenient to set a standard threshold value of K subject to aminimum neighborhood size. The experimental evidence in [13] is useful to gauge anappropriate neighborhood size based on the size and dimensionality of the observationset.


4 Facets

Establishing a feature set with a predictive capacity similar to observer coherencelends additional credence to a recommendation. Features serve three purposes inrecommender systems: neighborhood formation, query specification andrecommendation inference. After introducing a facet classification scheme, thissection describes how to use feature correlations to form neighborhoods and featuretemplates to format queries.

4.1 Faceted Classification

A useful technique for assigning features to items is to partition the features into pre-defined categories, or facets. A simple faceted classification of two wines is presentedin Table 1.

Table 1. Faceted Wine Classification

Winery Vintage Grape Flavor Arbor Springs 1999 Chardonnay oak, vanillaChloe Vineyards 1996 Pinot Noir black pepper

There are no a priori restrictions on feature cardinality besides those imposed bythe application domain. A wine may be required to have one winery, but may haveany number of flavor descriptors. Furthermore, a feature can take a range of valuesindicating the degree to which a product manifests the feature, as in Table 2.

Table 2. Feature Valuation

Winery Vintage Grape Flavor oak=0.8Arbor Springs 1999 Chardonnayvanilla=0.7

Additional features may be derived from assigned features. For example, theWinery facet determines a Region that is common to all wines from that winery.Similarly, Grape can be grouped into the Color features white or red1. Thus, a facetis either assigned if its features are directly dependent on the product, or derived if itsfeatures are dependent, directly or indirectly, on an assigned facet.

The hierarchical classification induces a subsumption ordering: facet F1 subsumesfacet F2 if the value of F2 determines the value of F1. This ordering is reflexive,transitive and antisymmetric. There is a natural extension of the facet subsumptionordering to tuples:

Õ=

m

iiF

1

subsumes Õ=

¢n

iiF

1

if and only if nm £ and Fi subsumes iF ¢ , mi ££1 .

The subsumption ordering induces a lattice, as in Figure 1. 1 Even this simple grouping hides subtle data modeling issues. The comon assortment of wines

into the three colors red, white and rosé invalidates the Grape ? Color dependency, since roséis a wine-making style that is a feature of the particular wine rather than the grape. Thepotential for ambiguity and anomaly motivate a predefined stable facet classification by adomain expert.

300 F.N. Loney

Color

Grape

Country

<Color,Country>

<Grape,Country>

Fig. 1. Facet tuple space for wine grape and origin

A value in facet tuple space constitutes a feature template representing a uniquecombination of features. A customer expresses the desirability of a feature orcombination of features by assigning a rating to a feature template. Table 3 showsratings of feature templates taken from the facet tuple spaces Region · Grape,Country · Color and Region.

Table 3. Feature rating

User Feature Template Rating Mel <Oregon,Pinot Noir> 0.8Mel <Germany,White> 0.7Maurice <Mosel> 0.9

4.2 Feature Correlation

Facets facilitate neighborhood formation by permitting additional opportunities fordiscovering predictors. The user correlation defined in Equation (6) is readily adaptedto compare feature preferences. However, a feature rating must first be tempered bythe generality of the feature, since a generic feature has less predictive value than aspecific product in determining customer similarity. For example, a match on ratingsof a particular wine wine has higher predictive value than a match on the featuretemplate <Oregon,PinotNoir>, which in turn has higher predictive value than amatch on the feature <Red>.

The predictive value is captured by the coherence of a feature template,),,( 1 nffcoh K , where 1),,(0 1 ££ nffcoh K . Coherence can be assigned by a domain

expert or inferred by a heuristic. There are three types of heuristics:• an aggregation heuristic determines the coherence of a composite feature

template as the joint product

))(1(1),,(1

1 Õ=

--=n

iin fcohffcoh K

(9)


• a classification heuristic applies a default facet coherence, e.g.coh(Germany) = coh(Country).

• an inheritance heuristic uses the value of a subsumption facet, e.g.coh(PinotNoir) = coh(Red).

The adjusted feature rating is then given by

))((ˆ rrfcohrr afiafaf --= (10)

This has the effect of attenuating the strength of the feature rating by the coherence.The feature rating is thus weaker than a product rating to the extent that the featurecoherence is weaker. The adjusted features ratings are then included in the overallcorrelation that determines user similarity.

4.3 Query Specification

A query specification expresses user preference for features in the recommendation.The feature ratings described in Section 3.2 act as a default query specification, sincethey capture a user’s customary preferences. Alternatively, an ad hoc query can beformulated explicitly by assigning preference values to feature templates.Query templates can be combined using the logical “and”, “or” and “not” operators toform a logical expression. The general form of a query Q with n terms, then, is

{ }nivqQ ii ,,1, K=== where each condition qi is a feature template or a logical

expression whose operands are query subconditions.The value v assigned to a condition measures the desirability of the feature

combination in the query result. v = 1 if a candidate product is required to match thecondition. v = -1 if the feature combination is disallowed in the result. v = 0 if therequestor is indifferent about the occurence of the feature combination. The queryspecification of Equation (11) below indicates a slight preference for Oregon red wineand a strong preference to show Pinot Noir in the result.

Q = <Oregon,Red>=0.2, <PinotNoir>=0.8 (11)

Given a query condition vq = , define an adjusted value ))(1()(ˆ qfreqvqv -×= , where

)(

}) matches|({)(

Pcard

qpPpcardqfreq

˛= is the frequency of occurence of products

satisfying q. Similarly, the adjusted value for a product that does not satisfy thecondition is )()(ˆ qfreqvqv ×-=Ø . The probability that a product will be preferred overanother product is then given by:

ïïî

ïïí

ì

+Ø

+

=otherwise

2

1)(ˆ

matches if2

1)(ˆ

)|(qv

qpqv

pqprob

(12)

For the query template },...,1,{ nivqQ iin === , the joint adjusted value is defined

recursively as:

302 F.N. Loney

)(ˆ)(ˆ)(ˆ)(ˆ)(ˆ

)(ˆ)(ˆ

11

11

-- ×-+==

nnnnk QvqvQvqvQv

qvQv (13)

Table 4 shows an example valuation for the query specification in Equation (11). Thejoint valuation for a product satisfying both query conditions in the example is

0.7737)72)(.19(.72.19.)(ˆ =-+=Qv and the joint probability is 0.88685)( =Qprob .

Table 4. Example query distribution and valuation

q freq(q) v )(ˆ qv )(qprob )(ˆ qv Ø )( qprob Ø

<Oregon,Red> .05 .20 .19 0.595 -.01 0.495

<PinotNoir> .10 .80 .72 0.86 -.08 0.46

Query condition valuations are assumed to be independent, e.g. the preferenceexpressed for an Oregon red wine in the subcondition <Oregon,Red> of query Q inEquation (11) above is 0.2, regardless of whether the wine is a Pinot Noir. Cross-dependencies can be factored into the query by use of logical operators, e.g.

<Oregon,Red> and not <PinotNoir>=0.2

4.4 Recommendation Inference

The task of recommendation inference is to use the neighborhood and feature ratingsto assess the probability that a product will match a user query. This is evaluated bycalculating the probability that each candidate product would be preferred overanother product. The neighborhood is restricted to users who rate the product. Theprobability )|( pQprob that user u would prefer product p maps the user rating to theinterval [0, 1]:

)|(1)|(2

1)|(

puprobpuprob

rpuprob up

-=Ø

+=

(14)

The probability of a neighborhood recommendation is the average confidence inthe users rating the product, expressed as a probability in the range [0,1]:

)(2

)()|(

Ncard

rconfNcardpNprob Nu

upau

×

+=

å (15)

The aggregate probability that the product p is the preferred product to satisfyquery condition q is the weighted cumulative probabilities of the neighborhoodrecommendation and the facet recommendation:

)|()1()|()( pQprobpNprobqprob ×-+×= ww (16)

The weight w is a value in the range [0,1] that expresses the relative contribution ofthe neighborhood assessment vis-a-vis facet template matching.


An example inference is presented graphically in Figure 2. The nodes representpredictive sources, annotated with the probability that the subject product p ispreferred over other products at that node. The query specification Q is given byTable 4 and the neighborhood N consists of two users u1 and u2 who rate p 0.6 and0.8, resp., with confidence value 1 for each user. The probability assigned to a usernode is given by Equation (14) as 2/)1()|( += uprpuprob , or p=0.8 for u1 and p=0.9

for u2. The joint probability for neighborhood N is given by Equation (15) :

85.04

8.06.02

)(2

)()|( =++=

×

+=

åNcard

rconfNcardpNprob Nu

upau (17)

<Oregon,Red>

<PinotNoir>

Q

p

u1

u2

N

q

.80

.90

.85

.60

.86

.89

.87

Fig. 2. Example query evaluation

The contribution of the product feature templates is given in Equations (12) and(13). The final probability for q is given by Equation (16) with a nominal weight of0.5:

87.0)89.5(.)85.5(.)|(5.0)|(5.0)|( =×+×=×+×= pQprobpNprobpqprob (18)

This is the probability that p will be preferred over another wine, given theavailable evidence. Each candidate product would then be evaluated and rankedaccording to resulting probability.

5 Conclusion

In this paper, a technique was presented for recommending products based on productand feature ratings. The technique is appropriate for an active recommender system,which entails the application of domain expertise in devising a suitable facetclassification scheme. The execution is intentionally dependent on domain-specificjudgements and is sensitive to the values chosen. Faceted recommendation is useful tothe extent that the application domain can be structured by accepted, well-understoodfacets.

Future work is required to assist the domain expert in building the inference modelin two respects: i) sensitivity analysis of the value assignments and ii) usingobservation sets to learn the relative contribution of features to user preference. Afruitful avenue of investigation is to represent the model as a Bayesian network [6,13]and apply techniques of sensitivity analysis [9] and learning [5].

304 F.N. Loney

References

1. Aggarwal, C., Wolf, J., Wu, K. and Yu, P. Horting Hatches an Egg: a New Graph-theoretic Approach to Collaborative Filtering. In Proceedings of the Fifth ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining, San Diego, CA(1999) 201-212

2. Breese, J., Heckerman, D. and Kadie, C. Empirical analysis of Predictive Algorithms forCollaborative Filtering. In Proceedings of the 14th Conference on Uncertainty in ArtificialIntelligence (1998) 43-52

3. Damiani, E., Fugini, E. and Bellettini, C. A Hierarchy-Aware Approach to FacetedClassification of Object-Oriented Components. ACM Transactions on SoftwareEngineering and Methodology 8:3 (1999) 215-262

4. Goldberg, D., Nichols, D., Oki, B. and Terry, D. Using Collaborative Filtering to Weavean Information Tapestry. Communications of the ACM 35:12 (1992) 61–70

5. Heckerman, D. A Tutorial on Learning with Bayesian Networks. Microsoft TechnicalReport MSR-TR 95-06 (1995)

6. Heckerman, D. and Wellman, M. Bayesian Networks. Communications of the ACM 38:3(1995) 27-30

7. Herlocker, J., Konstan, J., Borchers, A. and Riedl, J. An Algorithmic Framework forPerforming Collaborative Filtering. In Proceedings of the 22nd Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA(1999) 230-237

8. Hill, W., Stead, L., Rosenstein, M. and Furnas, G. Recommending and evaluating choicesin a virtual community of use. In Proceedings of ACM Conference on Human Factors inComputing Systems, Denver, CO (1995) pages 194-201.

9. Howard R. and Matheson, J. (eds.). The Principles and Applications of Decision Analysis.Strategic Decisions Group, Menlo Park, CA (1983)

10. Loney, F. Normalization of a Bounded Observation Set (or How to Compare Apples andOranges). Spirited Software Technical Report SSI-TR 2001-01, available atwww.spiritedsw.com/pub/techreports/tr2001-01.pdf (2001)

11. Prieto-Díaz, R. Implementing faceted classification for software reuse. Communications ofthe ACM 34:5 (1991) 88-97

12. Resnick, P., Iacovou, N. Suchak, M., Bergstrom, P. and Riedl, J. GroupLens: an OpenArchitecture for Collaborative Filtering of Netnews. In Proceedings of the Fifth ACMConference on Computer Supported Cooperative Work, Chapel Hill, NC (1994) 175-186

13. Ribeiro, B. and Muntz, R. A Belief Network Model for IR. In Proceedings of the 19thAnnual International ACM SIGIR Conference on Research and Development inInformation Retrieval, Zurich, Switzerland (1996) 253-260

14. Sarwar, B., Karypis, G., Knowtan, J. and Riedl, J. Analysis of RecommendationAlgorithms for E-Commerce. In Proceedings of the Second ACM Conference onElectronic Commerce, Minneapolis, MI (2000) 158-167

15. Schafer, B., Konstan, Riedl, J. Recommender Systems in E-Commerce. In Proceedings ofthe First ACM Conference on Electronic Commerce, Denver, CO (1999) 158-166

16. Schapire, R. and Singer, Y. Improved boosting algorithms using using confidence-ratedpredictions. Machine Learning 37:3 (1999) 297-336

17. Shardanand, U. and Patti Maes, P. 1995. Social information filtering: Algorithms forautomating "word of mouth". In Proceedings of ACM Conference on Human Factors inComputing Systems, Denver, CO (1995) 210-217

http://www.spiritedsw.com/pub/techreports/tr2001-01.pdf

[lecture notes in computer science] electronic commerce and web technologies volume 2115 || faceted...

Documents