recommendation independence - kamishima › archive › 2018-p-fat-print.pdf · proceedings of...

Proceedings of Machine Learning Research 81:1–15, 2018 Conference on Fairness, Accountability, and Transparency

Recommendation Independence

⇤

Toshihiro Kamishima [email protected]

Shotaro Akaho [email protected]

Hideki Asoh [email protected]

National Institute of Advanced Industrial Science and Technology (AIST),

AIST Tsukuba Central 2, Umezono 1–1–1, Tsukuba, Ibaraki, Japan 305–8568

Jun Sakuma [email protected]

University of Tsukuba, 1–1–1 Tennodai, Tsukuba, Ibaraki, Japan 305–8577; and

RIKEN Center for Advanced Intelligence Project, 1–4–1 Nihonbashi, Chuo-ku, Tokyo, Japan 103-0027

Editors: Sorelle A. Friedler and Christo Wilson

Abstract

This paper studies a recommendation algo-rithm whose outcomes are not influencedby specified information. It is useful incontexts potentially unfair decision shouldbe avoided, such as job-applicant recom-mendations that are not influenced by so-cially sensitive information. An algorithmthat could exclude the influence of sensi-tive information would thus be useful forjob-matching with fairness. We call thecondition between a recommendation out-come and a sensitive feature Recommenda-

tion Independence, which is formally de-fined as statistical independence betweenthe outcome and the feature. Our previousindependence-enhanced algorithms simplymatched the means of predictions betweensub-datasets consisting of the same sensi-tive value. However, this approach couldnot remove the sensitive information rep-resented by the second or higher momentsof distributions. In this paper, we developnew methods that can deal with the secondmoment, i.e., variance, of recommendationoutcomes without increasing the compu-tational complexity. These methods canmore strictly remove the sensitive infor-mation, and experimental results demon-strate that our new algorithms can moree↵ectively eliminate the factors that un-dermine fairness. Additionally, we explorepotential applications for independence-enhanced recommendation, and discuss itsrelation to other concepts, such as recom-mendation diversity.

⇤ Our experimental codes are available at http://www.kamishima.net/iers/

Keywords: Recommender System, Fair-ness, Independence

1. Introduction

A recommender system searches for items or in-formation predicted to be useful to users, andits influence on users’ decision-making has beengrowing. For example, online-retail-store cus-tomers check recommendation lists, and are morelikely to decide to buy highly-rated items. Rec-ommender systems have thus become an indis-pensable tool in support of decision-making.

Such decision-making support tools must befair and unbiased, because users can make poordecisions if recommendations are influenced byspecific information that does not match theirneeds. Hence, a recommendation algorithm thatcan exclude the influence of such informationfrom its outcome would be very valuable. Thereare several representative scenarios in which theexclusion of specific information would be nec-essary. First, there are contexts in which rec-ommendation services must be managed in ad-herence to laws and regulations. Sweeney pre-sented an example of dubious advertisementplacement that appeared to exhibit racial dis-crimination (Sweeney, 2013). In this case, theadvertisers needed to generate personalized ad-vertisements that were independent of racial in-formation. Another concern is the fair treatmentof information providers. An example in this con-text is the Federal Trade Commissions investiga-tion of Google to determine whether the searchengine ranks its own services higher than thoseof competitors (Forden, 2012). Algorithms that

c� 2018 T. Kamishima, S. Akaho, H. Asoh & J. Sakuma.

http://www.kamishima.net/iers/

http://www.kamishima.net/iers/


can explicitly exclude information, whether ornot content providers are competitors, would behelpful for alleviating competitors’ doubts thattheir services are being unfairly underrated. Fi-nally, a user sometimes needs to exclude the influ-ence of unwanted information. Popularity bias,which is the tendency for popular items to berecommended more frequently (Celma and Cano,2008), is a well-known drawback of recommenda-tion algorithms. If information on the popularityof items could be excluded, users could acquireinformation free from unwanted popularity bias.

To fulfill the need for techniques to exclude theinfluence of specific information, several meth-ods for fairness-aware data mining have beendeveloped (for review, see Hajian et al., 2016).In these approaches, a classifier is designedto predict labels so that they are independentfrom specified sensitive information. By in-troducing this idea, we proposed the conceptof recommendation independence. This is for-mally defined as unconditional statistical inde-pendence between a recommendation outcomeand specified information. We call a recom-mendation that maintains the recommendationindependence independence-enhanced recommen-dation. We developed two types of approachesto these recommendations. The first is a regu-larization approach, which adopts an objectivefunction with a constraint term for imposingindependence (Kamishima et al., 2012a, 2013;Kamishima and Akaho, 2017). The second is amodel-based approach, which adopts a generativemodel in which an outcome and a sensitive fea-ture are independent (Kamishima et al., 2016).

In this paper, we propose new methodsfor making independence-enhanced recommenda-tions. Our previous model (Kamishima et al.,2013) took a regularization approach and com-bined probabilistic matrix factorization and aconstraint term. However, because the con-straint term was heuristically designed so that itmatched means by shifting predicted ratings, itcould not remove the sensitive information repre-sented by the second or higher moments of distri-butions. Further, the approach could not controlthe range of predicted ratings, and thus wouldskew the rating distribution. For example, if allpredicted ratings were shifted toward +1, thelowest ratings would not appear in the predic-tions. To remove these drawbacks without sacri-

ficing computational e�ciency, we developed twonew types of constraint terms exploiting statis-tical measures: Bhattacharyya distance and mu-tual information.

We performed more extensive experimentsthan in our previous studies in order to achievemore reliable verification. Here, we examine algo-rithms on three datasets and six types of sensitivefeatures to confirm the e↵ects of independence-enhancement. To verify the improvements thatderive from considering the second moments, wequantitatively compared the quality of rating dis-tributions using an independence measure.

Moreover, we explore scenarios in whichindependence-enhanced recommendation wouldbe useful, and clarify the relation between recom-mendation independence and other recommenda-tion research topics. We provide more examplesof three types of scenarios in which enhancementof recommendation independence would be use-ful. As in the discussion in the RecSys 2011panel (Resnick et al., 2011), rich recommenda-tion diversity has been considered beneficial formaking recommendations fair. We discuss thedi↵erences in the definitions of recommendationdiversity and independence. We also note therelation to transparency and privacy in a recom-mendation context.

Our contributions are as follows.

• We develop new independence-enhanced rec-ommendation models that can deal with thesecond moment of distributions without sac-rificing computational e�ciency.

• Our more extensive experiments reveal thee↵ectiveness of enhancing recommendationindependence and of considering the secondmoments.

• We explore applications in which recommen-dation independence would be useful, andreveal the relation of independence to theother concepts in recommendation research.

This paper is organized as follows. In section 2,we present the concept of recommendation in-dependence, and discuss how the concept wouldbe useful for solving real-world problems. Meth-ods for independence-enhanced recommendationare proposed in section 3, and the experimen-tal results are presented in section 4. Section 5contains a discussion about recommendation in-dependence and related recommendation issues,and section 6 concludes our paper.

2


dislike like

(a) standard

dislike like

(b) independence-enhanced

Figure 1: Distributions of the predicted ratingsfor each sensitive value

2. Recommendation Independence

This section provides a formal definition of rec-ommendation independence, and we show appli-cations of this concept for solving real problems.

2.1. Formal Definition

Before formalizing the concept of recommenda-tion independence, we will give a more intuitivedescription of this concept. Recommendation in-dependence is defined as the condition that thegeneration of a recommendation outcome is sta-tistically independent of a specified sensitive fea-ture. This implies that information about thefeature has no impact on the recommendationoutcomes. We will take the example of a movierecommendation. In this example, we select themovie-release year as a sensitive feature, in orderto prevent the release year from influencing therecommendation of individual movies. Therefore,assuming that there are two movies whose fea-tures are all the same except for their releaseyear, this independence-enhanced recommendermakes identical predictions for these two movies.To illustrate the e↵ect of enhancing recommen-

dation independence, Figure 1 shows the distri-butions of predicted ratings for each sensitivevalue for the ML1M-Year dataset. The detailswill be shown in section 4; here, we briefly notethat ratings for movies are predicted and the sen-sitive feature represents whether or not moviesare released before 1990. Black and gray barsshow the distributions of ratings for older andnewer movies, respectively. In Figure 1(a), rat-ings are predicted by a standard algorithm, andolder movies are highly rated (note the largegaps between the two bars indicated by arrow-heads). When recommendation independence isenhanced (⌘=100) as in Figure 1(b), the distri-butions of ratings for older and newer movies be-

come much closer (the large gaps are lessened);that is to say, the predicted ratings are less af-fected by the sensitive feature. It follows fromthis figure that the enhancement of recommen-dation independence reduces the influence of aspecified sensitive feature on the outcome.

We can now formalize the above intuitivedefinition of the recommendation independence.Consider an event in which all the informationrequired to make a recommendation, such as thespecifications of a user and item and all featuresrelated to them, is provided and a recommenda-tion result is inferred from this information. Thisevent is represented by a triplet of three randomvariables: R, S, and F . R represents a recom-mendation outcome, which is typically a ratingvalue or an indicator of whether or not a speci-fied item is relevant. We call S a sensitive feature,using the terminology in the fairness-aware datamining literature. Finally, F represents all ordi-nary features (or features) related to this eventother than those represented by R and S. Therecommendation is made by inferring the value ofR given the values of S and F based on a prob-abilistic recommendation model, Pr[R|S, F ].

Based on information theory, the statement“generation of a recommendation outcome is sta-tistically independent of a specified sensitive fea-ture” describes the condition in which the mutualinformation between R and S is zero, I(R;S) = 0.This means that we know nothing about R evenif we obtain information about S, because infor-mation on S is excluded from R. This conditionis equivalent to statistical independence betweenR and S, i.e., Pr[R] = Pr[R|S], as denoted byR ?? S.

2.2. Applications

We here consider recommendation applicationsfor which independence should be enhanced.

2.2.1. Adherence to Laws and

Regulations

Recommendation services must be managedwhile adhering to laws and regulations. Wewill consider the example of a suspiciousadvertisement placement based on keyword-matching (Sweeney, 2013). In this case, userswhose names are more popular among individ-uals of African descent than European descent

3


were more frequently shown advertisements im-plying arrest records. According to an investi-gation, however, no deliberate manipulation wasresponsible; rather, the bias arose simply as aside-e↵ect of algorithms to optimize the click-through rate. Because similar algorithms of Webcontent optimization are used for online recom-mendations, such as for online news recommen-dations, similar discriminative recommendationscan be provided in these contexts. For example,independence-enhanced recommendation wouldbe helpful for matching an employer and a jobapplicant based not on gender or race, but onother factors, such as the applicant’s skill levelat the tasks required for the job.Recommendation independence is also helpful

for avoiding the use of information that is re-stricted by law or regulation. For example, pri-vacy policies prohibit the use of certain typesof information for the purpose of making recom-mendations. In such cases, by treating the pro-hibited information as a sensitive feature, the in-formation can be successfully excluded from theprediction process of recommendation outcomes.

2.2.2. Fair Treatment of Content

Providers

Recommendation independence can be used toensure the fair treatment of content providers orproduct suppliers. The Federal Trade Commis-sion has been investigating Google to determinewhether the search engine ranks its own serviceshigher than those of competitors (Forden, 2012).The removal of deliberate manipulation is cur-rently considered to ensure the fair treatmentof content providers. However, algorithms thatcan explicitly exclude information whether or notcontent providers are competitors would be help-ful for dismissing the competitors’ doubts thattheir services may be unfairly underrated.Though this case is about information re-

trieval, the treatment of content providers in thecourse of generating recommendations can alsobe problematic. Consider the example of an on-line retail store that directly sells items in addi-tion to renting a portion of its Web sites to ten-ants. On the retail Web site, if directly sold itemsare overrated in comparison to items sold by ten-ants, then the trade conducted between the siteowner and the tenants is considered unfair. To

carry on a fair trade, the information on whetheran item is sold by the owner or the tenants shouldbe ignored. An independence-enhanced recom-mender would be helpful for this purpose. Notethat enhancing this type of recommendation in-dependence is not disadvantageous to retail cus-tomers, because items are equally rated if theyare equivalent and sold under equivalent condi-tions.

2.2.3. Exclusion of Unwanted

Information

Users may want recommenders to exclude the in-fluence of specific information. We give severalexamples. Enhancing independence is useful forcorrecting a popularity bias, which is the ten-dency for popular items to be recommended morefrequently (Celma and Cano, 2008). If users arealready familiar with the popular items and areseeking minor and long-tail items that are novelto them, this popularity bias will be unfavorableto them. In this case, the users can specify thevolume of consumption of items as a sensitivefeature, and the algorithm will provide recom-mendations that are independent of informationabout the popularity of items.

The deviations of preference data can be ad-justed. As is well known, preference data area↵ected by their elicitation interface. For exam-ple, the response of users can be changed depend-ing on whether or not predicted ratings are dis-played (Cosley et al., 2003). By making a rec-ommendation independent of the distinction ofpreference-elicitation interfaces, such unwanteddeviations can be canceled.

When users explicitly wish to ignore spe-cific information, such information can be ex-cluded by enhancing the recommendation inde-pendence. Pariser recently introduced the con-cept of the filter bubble problem, which is theconcern that personalization technologies narrowand bias the topics of interest provided to tech-nology consumers, who do not notice this phe-nomenon (Pariser, 2011). If a user of a socialnetwork service wishes to converse with peoplehaving a wide variety of political opinions, afriend recommendation that is independent of thefriends’ political conviction will provide an op-portunity to meet people with a wide range ofviews.

4


3. Independence-Enhanced

Recommendation

In this section, after formalizing a taskof independence-enhanced recommendation, weshow an independence-enhanced variant of aprobabilistic matrix factorization model. Finally,we show the previous and our new types ofpenalty terms required for enhancing the recom-mendation independence.

3.1. Task Formalization

We here formalize a task of independence-enhanced recommendation. Recommendationtasks can be classified into three types: findinggood items that meet a user’s interest, optimiz-ing the utility of users, and predicting ratings ofitems for a user (Gunawardana and Shani, 2009).We here focus on the following predicting-ratingstask. X 2 {1, . . . , n} and Y 2 {1, . . . ,m} denoterandom variables for the user and item, respec-tively. R denotes a random variable for the rec-ommendation outcome in the previous section,but this R is now restricted to the rating of Ygiven by X, and we hereafter refer to this R as arating variable. The instance of X, Y , and R aredenoted by x, y, and r, respectively.As described in the previous section, we addi-

tionally introduced a random sensitive variable,S, which indicates the sensitive feature with re-spect to which the independence is enhanced.This variable is specified by a user or managerof recommenders, and its value depends on var-ious aspects of an event as in the examples insection 2.2. In this paper, we restrict the domainof a sensitive feature to a binary type, {0, 1}. Atraining datum consists of an event, (x, y), a sen-sitive value for the event, s, and a rating valuefor the event, r. A training dataset is a set of Ntraining data, D = {(x

i

, yi

, si

, ri

)}, i = 1, . . . , N .We define D(s) as a subset consisting of all datain D whose sensitive value is s.Given a new event, (x, y), and its corre-

sponding sensitive value, s, a rating predic-tion function, r(x, y, s), predicts a rating of theitem y by the user x, and satisfies r(x, y, s) =EPr[R|x,y,s][R]. This rating prediction function isestimated by using a regularization approach de-veloped in the context of fairness-aware data min-ing (Kamishima et al., 2012b). This approach is

to optimize an objective function having threecomponents: a loss function, loss(r⇤, r), an inde-pendence term, ind(R,S), and a regularizationterm, reg. The loss function represents the dis-similarity between a true rating value, r⇤, anda predicted rating value, r. The independenceterm quantifies the expected degree of indepen-dence between the predicted rating values andsensitive values, and a larger value of this termindicates the higher level of independence. Theaim of the regularization term is to avoid over-fitting. Given a training dataset, D, the goal ofthe independence-enhanced recommendation isto acquire a rating prediction function, r(x, y, s),so that the expected value of the loss function isas small as possible and the independence termis as large as possible. The goal can be accom-plished by finding a rating prediction function,r, so as to minimize the following objective func-tion:

P(xi,yi,si,ri)2D loss(r

i

, r(xi

, yi

, si

))

� ⌘ ind(R,S) + � reg(⇥), (1)

where ⌘ > 0 is an independence parameter tobalance between the loss and the independence,� > 0 is a regularization parameter, and ⇥ is aset of model parameters.

3.2. An Independence-EnhancedRecommendation Model

We adopt a probabilistic matrix factorization(PMF) model (Salakhutdinov and Mnih, 2008) topredict ratings. Though there are several minorvariants of this model, we here use the followingmodel defined as equation (3) in (Koren, 2008):

r(x, y) = µ+ bx

+ cy

+ p>x

qy

, (2)

where µ, bx

, and cy

are global, per-user, andper-item bias parameters, respectively, and p

x

and qy

are K-dimensional parameter vectors,which represent the cross e↵ects between usersand items. The parameters of the model areestimated by minimizing the squared loss func-tion with a L2 regularizer term. This model isproved to be equivalent to assuming that truerating values are generated from a normal dis-tribution whose mean is equation (2). Unfortu-nately, in the case that not all entries of a rat-ing matrix are observed, the objective function

5


of this model is non-convex, and merely local op-tima can be found. However, it is empiricallyknown that a simple gradient method succeedsin finding a good solution in most cases (Koren,2008).The PMF model was then extended to enhance

the recommendation independence. First, theprediction function (2) was modified so that itis dependent on the sensitive value, s. For each

value of s, 0 and 1, parameter sets, µ(s), b(s)x

, c(s)y

,

p(s)x

, and q(s)y

, are prepared. One of the parame-ter sets is chosen according to the sensitive value,and the rating prediction function becomes:

r(x, y, s) = µ(s) + b(s)x

+ c(s)y

+ p(s)x

>q(s)y

. (3)

By using this prediction function (3), an objec-tive function of an independence-enhanced rec-ommendation model becomes:

P(xi,yi,ri,si)2D(ri � r(x

i

, yi

, si

))2

� ⌘ ind(R,S) + � reg(⇥), (4)

where the regularization term is a sum of L2 reg-ularizers of parameter sets for each value of sexcept for global biases, µ(s). Model parameters,

⇥(s) = {µ(s), b(s)x

, c(s)y

,p(s)x

,q(s)y

}, for s 2 {0, 1},are estimated so as to minimize this objective.Once we learn the parameters of the rating pre-diction function, we can predict a rating value forany event by applying equation (3).

3.3. Independence Terms

Now, all that remains is to define the inde-pendence terms. Having previously introduc-ing our independence term of mean match-ing (Kamishima et al., 2013), we here proposetwo new independence terms, distribution match-ing and mutual information.

3.3.1. Mean Matching

Our previous independence term in (Kamishimaet al., 2013) was designed so as to matchthe means of two distributions Pr[R|S=0] andPr[R|S=1], because these means match if R andS become statistically independent. The inde-pendence term is a squared norm between meansof these distributions:

�✓

S(0)N (0)

� S(1)N (1)

◆2

, (5)

where N (s) = |D(s)| and S(s) is the sum of pre-dicted ratings over the set D(s),

S(s) =P

(xi,yi,si)2D(s) r(xi

, yi

, si

). (6)

We refer to this independence term as meanmatching, which we abbreviate as mean-m.

3.3.2. Distribution Matching

To remedy the drawback that the mean-m term isdesigned to ignore the second moment, we pro-pose a new independence term. Techniques forhandling the independence between a continuoustarget variable and a sensitive feature have notbeen fully discussed. The method in (Calderset al., 2013) is basically the same as the approachof the mean-m. Perez-Suay et al. (2017) proposedthe use of the Hilbert-Schmidt independence cri-terion. However, this approach requires the com-putational complexity of O(N2) for computing akernel matrix, and it is not scalable.

We therefore create our new independenceterm, distribution matching with Bhattacharyyadistance, which we abbreviate as bdist-m. Thisbdist-m term can deal with the second momentof distributions in addition to the first mo-ment. For this purpose, each of two distribu-tions, Pr[R|S=0] and Pr[R|S=1], is modeled bya normal distribution, and the similarity betweenthem are quantified by a negative Bhattacharyyadistance:

�✓� ln

Z 1

�1

pPr[r|S=0]Pr[r|S=1]dr

◆

=1

2ln

2pV(0)V(1)

V(0) + V(1)

!�

⇣S(0)N

(0) � S(1)N

(1)

⌘2

4(V(0) + V(1)), (7)

where V(s) is the variance of predicted ratingsover the training set D(s). To estimate V(s),we use the expectation of a posterior distribu-tion of a variance parameter derived from equa-tion (2.149) in (Bishop, 2006) to avoid the zerovariance:

V(s) =2b0 +Q(s) � (S(s))2/N (s)

2a0 +N (s), (8)

where S(s) is equation (6), and Q(s) is the squaredsum of predicted ratings:

Q(s) =P

(xi,yi,si)2D(s) r(xi

, yi

, si

)2. (9)

a0 and b0 are hyper-parameters of a prior Gammadistribution. We use 2a0=10�8 and 2b0=10�24.

6


3.3.3. Mutual Information

We next propose our new independence term,mutual information with normal distributions,abbreviated as mi-normal. We employed mutualinformation for quantifying the degree of statis-tical independence. Distributions, Pr[R|S], aremodeled by a normal distribution. Our new mi-

normal term is defined as the negative mutual in-formation between R and S:

� I(R;S) = �(H(R)�H(R|S))= �(H(R)�

Ps

Pr[S=s] H(R|S=s)), (10)

where H(X) is a di↵erential entropy function. Westart with the second term of this equation. Be-cause S is a binary variable, Pr[S=s] can be eas-ily estimated by N (s)/N . To estimate H(R|S=s),we model Pr[R|S=s] by a normal distribution.By using the formula of di↵erential entropy ofa normal distribution (e.g., see example 8.1.2 inCover and Thomas, 2006), we get

H(R|S=s) = 12 ln 2⇡eV

(s). (11)

We next turn to the first term of equation (10).Pr[R] is a mixture of two normal distributions:

Pr[R] =P

s

Pr[S=s] Pr[R|S=s]. (12)

We approximate Pr[R] by a single normal distri-bution with the same mean and variance of thismixture distribution. We consider this approxi-mation proper, because it captures the first twomoments of the mixture. Further, when R and Sare completely independent, where two means oftwo normal distributions are equal, the mixtureis precisely equivalent to a single normal distribu-tion. This property is desirable, because we nowtry to let R and S be independent. By using theentropy of a normal distribution, we get

H(R) = 12 ln 2⇡eV, (13)

V =4b0 + (Q(0)+Q(1))� (S(0)+S(1))2/N

4a0 +N. (14)

Finally, by substituting equations (11) and (13)into equation (10), we obtain a mi-normal inde-pendence term.These new independence terms, bdist-m and

mi-normal, are computationally e�cient. Becausethese terms are smooth and analytically di↵er-entiable like a mean-m term, the objective func-tion can be optimized e�ciently. Their computa-tional complexity is dominated by the sum and

Table 1: Summary of experimental conditions

data ML1M Flixster Sushi

# of users 6, 040 147, 612 5, 000# of items 3, 706 48, 794 100# of ratings 1, 000, 209 8, 196, 077 50, 000rating scale 1, 2, . . . , 5 .5, 1, . . . , 5 0, 1, . . . , 4mean rating 3.58 3.61 1.27

# of latent factors, K 7 20 5regularization param., � 1 30 10

non-personalized MAE 0.934 0.871 1.081standard MAE 0.685 0.655 0.906

the squared sum of data, and it is O(N). Be-cause this complexity is on the same order asthat of the loss function of the original PMFmodel, the total computational complexity of anindependence-enhanced variant is the same asthat of the original algorithm. Additionally, sincethese new terms take the first two moments ofdistributions into account, they give a much bet-ter approximation than the mean-m term. Fi-nally, we note the di↵erence between the bdist-m

and mi-normal terms. A bdist-m term does notapproximate Pr[R], unlike a mi-normal term. Ami-normal term can be straightforwardly extensi-ble to the case that a sensitive feature is categor-ical type.

4. Experiments

We implemented independence-enhanced rec-ommenders and applied them to benchmarkdatasets. Below, we present the details re-garding these datasets and experimental condi-tions, then compare three independence terms:mean-m, bdist-m, and mi-normal. These com-parative experiments confirm the e↵ectiveness ofindependence-enhanced recommenders and showthe advantages gained by considering the secondmoments.

4.1. Datasets and ExperimentalConditions

We can now consider the experimental conditionsin detail, including the datasets, methods, andevaluation indexes. To confirm the e↵ectivenessof independence-enhancement more clearly, wetested our methods on the three datasets summa-rized in Tables 1 and 2. The first dataset was the

7


Table 2: Sizes, means, and variances of data sub-sets for each sensitive value

Datasets size mean variance

S=0 S=1 S=0 S=1 S=0 S=1

ML1M-Year 456, 683 543, 526 3.72 3.46 1.15 1.30ML1M-Gender 753, 769 246, 440 3.57 3.62 1.25 1.23Flixster 3, 868, 842 4, 327, 235 3.71 3.53 1.18 1.18Sushi-Age 3, 150 46, 850 2.39 2.75 1.75 1.60Sushi-Gender 23, 730 26, 270 2.80 2.66 1.43 1.77Sushi-Seafood 43, 855 6, 145 2.76 2.46 1.61 1.58

Movielens 1M dataset (ML1M) (Maxwell Harperand Konstan, 2015). We tested two types of sen-sitive features for this set. The first, Year, repre-sented whether a movie’s release year was laterthan 1990. We selected this feature because ithas been proven to influence preference patterns,as described in section 2.1. The second feature,Gender, represented the user’s gender. The movierating depended on the user’s gender, and ourrecommender enhanced the independence of thisfactor. The second dataset was the larger Flixsterdataset (Jamali and Ester, 2010). Because nei-ther the user nor the item features were available,we adopted the popularity of items as a sensitivefeature. Candidate movies were first sorted bythe number of users who rated the movie in adescending order, and a sensitive feature repre-sented whether or not a movie was in the top 1%of this list. A total of 47.2% of ratings were as-signed to this top 1% of items. The third datasetwas the Sushi dataset1 (Kamishima, 2003), forwhich we adopted three types of sensitive fea-tures: Age (a user was a teen or older), Gender (auser was male or female), and Seafood (whetheror not a type of sushi was seafood). We selectedthese features because the means of rating be-tween two subsets, D(0) and D(1), diverged.We tested three independence terms in sec-

tion 3.3: mean-m, bdist-m, and mi-normal. Anobjective function (4) was optimized by the con-jugate gradient method. We used hyperparame-ters, the number of latent factors K, and a reg-ularization parameter �, in Table 1. For eachdataset, D(0) and D(1), the parameters were ini-tialized by minimizing an objective function of astandard PMF model without an independenceterm. For convenience, in experiments the loss

1. http://www.kamishima.net/sushi/

term of an objective was rescaled by dividing itby the number of training examples. We per-formed a five-fold cross-validation procedure toobtain evaluation indexes of the prediction errorsand independence measures.

We evaluated our experimental results in termsof prediction errors and the degree of indepen-dence. Prediction errors were measured by themean absolute error (MAE) (Gunawardana andShani, 2009). This index was defined as the meanof the absolute di↵erence between the observedratings and predicted ratings. A smaller valueof this index indicates better prediction accu-racy. To measure the degree of independence, wechecked the equality between two distributions ofratings predicted on D(0) and D(1) datasets. Asthe measure of equality, we adopted the statis-tic of the two-sample Kolmogorov-Smirnov test(KS), which is a nonparametric test for the equal-ity of two distributions. The KS statistic is de-fined as the area between two empirical cumu-lative distributions of predicted ratings for D(0)

and D(1). A smaller KS indicates that R and Sare more independent.

4.2. Experimental Results

In this section, we empirically examine the fol-lowing two questions.

1. We consider whether independence-enhancedrecommenders definitely enhance the recom-mendation independence. Such enhancementwas not theoretically guaranteed for severalreasons: the objective function (4) was notconvex, the rating distributions were modeledby normal distributions, and we adopted anapproximation in equation (13).

2. We compare the behaviors of the three in-dependence terms. We specifically examinewhether the bdist-m and mi-normal terms cantake into account the second moment of dis-tributions, which was ignored in the mean-m

case.

To answer these questions, we generate twotypes of experimental results. First, we show thequantitative measures to evaluate the predictionaccuracy and the degree of independence. Sec-ond, we qualitatively visualize the distributionsof predicted ratings for the ML1M-Year dataset.

8

http://www.kamishima.net/sushi/


4.2.1. Evaluation by Quantitative

Measures

To examine the two questions, we quantita-tively analyze the evaluation measures. We fo-cus on the first question — that is, the e↵ec-tiveness of independence-enhancement. Evalu-ation measures, MAE and KS, for six datasetsare shown in Figure 2. We tested a standardPMF model and three independence-enhancedPMF models. We first investigate KSs, whichmeasure the degree of independence, as shownin the right column of Figure 2. We can ob-serve a decreasing trend of KSs along with theincrease of an independence parameter, ⌘. In ad-dition, all independence-enhanced recommenderscould achieve better levels of independence thanthose obtained by a standard PMF model, if theindependence parameter is fully large. Thesetrends verified that recommendation indepen-dence could be enhanced successfully. We thenanalyzed the MAEs, a measurement of predic-tion accuracy, from the left column of Figure 2.The MAEs derived from independence-enhancedPMF models were slightly increased when com-pared with those derived from a standard PMFmodel. Such losses in accuracy are inevitable, aswe will discuss in section 5.1. However, the losseswere slight, and these independent-enhancementmodels could accomplish good trade-o↵s be-tween accuracy and independence. In summary,independence-enhanced recommenders could en-hance recommendation independence success-fully.

We then move on the second question. To ex-amine the influence of considering the second mo-ments, we checked the means and standard devia-tions of distributions for the ML1M-Year dataset.Test data were divided into two sub-datasets,D(0) and D(1), and means and standard devi-ations of predicted ratings were computed foreach pair of sub-datasets. These pairs of meansand standard deviations are depicted in Figure 3.For all types of independence terms, we foundthat the pairs of means approached each otheras the independence parameter increased. Notethat this trend was observed for all datasets. Onthe other hand, the pairs of standard deviationsbecame closer as the value of ⌘ increased in thebdist-m and mi-normal cases, but not in the mean-

m case. Similar trends were observed for all the

MAE KS

ML1M-Year

η

standardmean-mbdist-mmi-normal

0.64

0.66

0.68

0.70

0.72

0.74

10−2 10−1 1 101 102η


0

0.05

0.10

0.15

0.20

10−2 10−1 1 101 102

ML1M-Gender

η


0.64

0.66

0.68

0.70

0.72

0.74

10−2 10−1 1 101 102η


0

0.01

0.02

0.03

0.04

0.05

10−2 10−1 1 101 102

Flixster

η


0.60

0.62

0.64

0.66

0.68

0.70

10−2 10−1 1 101 102η


0

0.05

0.10

0.15

0.20

10−2 10−1 1 101 102

Sushi-Age

η


0.86

0.88

0.90

0.92

0.94

0.96

10−2 10−1 1 101 102η


0

0.05

0.10

0.15

0.20

0.25

0.30

10−2 10−1 1 101 102

Sushi-Gender

η


0.86

0.88

0.90

0.92

0.94

0.96

10−2 10−1 1 101 102η


0

0.02

0.04

0.06

0.08

0.10

10−2 10−1 1 101 102

Sushi-Seafood

η


0.86

0.88

0.90

0.92

0.94

0.96

10−2 10−1 1 101 102η


0.10

0.15

0.20

0.25

0.30

0.35

10−2 10−1 1 101 102

Figure 2: Changes in the MAE and KS measures

NOTE: The subfigure rows sequentially show the re-sults for the ML1M-Year, ML1M-Gender, Flixster, Sushi-Age, Sushi-Gender, and ML1M-Seafood datasets, respec-tively. The X-axes of these subfigures represent the in-dependence parameter, ⌘, in a logarithmic scale. TheY-axes of subfigures in the first and the second columnsrepresent MAEs in a linear scale and KSs in a linearscale, respectively. Note that the ranges are changed toemphasize the di↵erences. The results for the bdist-m

and mi-normal terms are overlapped in some subfigures.

9


η3.0

3.2

3.4

3.6

3.8

4.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102

(a) mean-m

η3.0

3.2

3.4

3.6

3.8

4.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102

(b) bdist-m

η3.0

3.2

3.4

3.6

3.8

4.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102

(c) mi-normal

Figure 3: Changes of means and standard devi-ations of predicted ratings according to the pa-rameter, ⌘NOTE: The X-axes of these subfigures represent the in-dependence parameter, ⌘, in a logarithmic scale. Thesesubfigures sequentially show the means and standarddeviations of predicted ratings derived by the mean-m,bdist-m, and mi-normal methods for the ML1M-Year, re-spectively. Means and standard deviations for the twogroups based on sensitive values are represented by thescales at the left and right side of these subfigures, re-spectively. Pairs of means and standard deviations aredepicted by red solid and blue dotted lines, respectively.

datasets except for Sushi-Age and Sushi-Seafood.Note that, according to our analysis, the fail-ure to control the second moments for these twodatasets was due to the imbalanced distributionsof sensitive values as in Table 2. We can concludefrom these results that the methods using ournew independence terms, bdist-m and mi-normal,can control the second moments of rating dis-tributions, which our previous mean-m methodcould not control.

4.2.2. Qualitative Visualization of

Predictions

To provide intuitive illustrations ofindependence-enhancement, we visualize thedistributions of predicted ratings for the ML1M-

Year dataset. In terms of the first researchquestion, we show a comparison of the ratingdistributions predicted by the standard PMFmodel with those predicted by the mi-normal

independence-enhanced PMF model in Figure 1.As described in section 2.1, the figure illustratesthe e↵ect of independence enhancement.We here demonstrate the improvement in con-

sideration of the second moments. By consider-ing the second moments, our new models can re-move the sensitive information more strictly, andcan produce less skewed predicted ratings by ad-justing their range. For this purpose, we visuallycompare the distributions of ratings predicted

dislike like

(a) mean-m

dislike like

(b) mi-normal

Figure 4: Distributions of the ratings predictedby mean-m and mi-normal methods for each sen-sitive valueNOTE: In the upper charts, black and gray bars showthe histograms of ratings for test data in D(0) and D(1),respectively. In the lower charts, we show the values forthe black bins minus those for the gray bins.

by ignoring the second moments, i.e., mean-m,with those predicted by considering them, i.e.,mi-normal. Figure 4 shows the distributions ofpredicted ratings for each sensitive value for theML1M-Year dataset, as in Figure 1. Figures 4(a)and 4(b) show the distributions of ratings pre-dicted by the mean-m and mi-normal methods(⌘=100), respectively. First, the distributionsfor the datasets, D(0) and D(1), became muchcloser in the mi-normal case than in the mean-m

case (see the smaller bars in the lower charts).This indicates that the mi-normal could morestrictly remove sensitive information by consid-ering the second moments of distributions. Sec-ond, we examine the skew of rating distribu-tions. We concentrate on the rightmost binsin these histograms. The di↵erences in thesebins were larger than those obtained by a stan-dard PMF model. This is because the distri-butions for D(1) (gray) were shifted toward theplus side, and this bin contained all the test datawhose ratings were predicted to be larger thanthe maximum of the rating range. The mi-normal

method could achieve a smaller di↵erence thanthe mean-m method by scaling the range of rat-ings, because the mi-normal method could con-trol the second moments of distributions. How-ever, such scaling was impossible in the mean-m

case, because the mean-m could merely shift thedistributions. This observation implies that themi-normal method could produce a less skewed

10


distribution of predicted ratings. Note that weobserved similar trends in terms of the bdist-m

method.Finally, we comment on the di↵erence between

mi-normal and bdist-m. The di↵erences in per-formance between these two methods in termsof accuracy and independence were very slight.Though the mi-normal method adopted an ap-proximation, it could be straightforwardly exten-sible to the case that a sensitive feature is a cat-egorical discrete variable. Therefore, it would bebetter to use the mi-normal method in general.From the above, it may be concluded that all

independence terms could successfully enhancerecommendation independence, and that our newindependence terms, mi-normal and bdist-m, canenhance independence more strictly than the pre-vious mean-m term.

5. Discussion and Related Work

Finally, we will discuss the characteristics of rec-ommendation independence, explore the relationbetween recommendation independence and di-versity, and present related topics.

5.1. Discussion

In this section, we discuss three topics with re-spect to recommendation independence. First,let us consider why a sensitive feature must bespecified in the definition of recommendation in-dependence. In brief, a sensitive feature mustbe selected because it is intrinsically impossibleto personalize recommendation outcomes if theoutcomes are independent of any feature. Thisis due to the ugly duckling theorem, which is atheorem in pattern recognition literature that as-serts the impossibility of classification withoutweighing certain features of objects as more im-portant than others (Watanabe, 1969). Becauserecommendation is considered a task for classi-fying whether or not items are preferred, certainfeatures inevitably must be weighed when mak-ing a recommendation. Consequently, it is im-possible to treat all features equally. In the Rec-Sys2011 panel (Resnick et al., 2011), a panelistalso pointed out that no information is neutral,and thus individuals are always influenced by in-formation that is biased in some manner.

Second, we will consider the indirect influencesof sensitive features. In section 2.1, we incorpo-rated a sensitive feature into a prediction model,Pr[R|S, F ]. It might appear that the model couldbe made independent by simply removing S, butthis is not the case. By removing the sensi-tive information, the model satisfies the conditionPr[R|S, F ] = Pr[R|F ]. Using this equation, theprobability distribution over (R,S, F ) becomes

Pr[R,S, F ] =Pr[R|S, F ] Pr[S|F ] Pr[F ]

=Pr[R|F ] Pr[S|F ] Pr[F ]. (15)

This is the conditional independence between Rand S given F , i.e., R ?? S | F , which is di↵erentfrom the unconditional independence between Rand S, i.e., R ?? S. Under a condition of con-ditional independence, if there are features in Fthat are not independent of S, the outcomes willbe influenced by S through the correlated fea-tures. This phenomenon was observed in the ex-ample of online advertising in section 2.2.1. Eventhough no information on the individuals’ raceswas explicitly exploited, such an incident couldarise through the influence of other features thatindirectly contain information about their races.Note that, within the fairness-aware data mining,such an indirect influence is called a red-lining ef-fect (Calders and Verwer, 2010).

With respect to the fairness-aware data min-ing, Kamiran et al. (2013) discussed a more com-plicated situation, which they called conditionalfairness. In this case, some of the features in Fare explainable even if they are correlated with asensitive feature in addition to being consideredfair. For example, in the case of job-matching, ifspecial skills are required for a target job, con-sidering whether or not an applicant has theseskills is socially fair even if it is correlated withthe applicant’s gender. Variables expressing suchskills are treated as explainable variables, E, andthe conditional independence between R and Sgiven E, i.e., R ?? S | E, is maintained. If Eis a simple categorical variable, our method willbe applicable by small modification. An indepen-dence term is computed for each dataset havingthe same explainable value, e 2 dom(E), and thesum of these terms weighted by Pr[e] is used asa constraint term. However, it would not be aseasy for the cases in which the domain of E islarge or E is a continuous variable.

11


Finally, we will discuss the relation between ac-curacy and recommendation independence. Fun-damentally, as recommendation independence isfurther enhanced, prediction accuracy tends toworsen. This is due to the decrease in avail-able information for inferring recommendationoutcomes. When information about S is notexcluded, the available information is the mu-tual information between R and (S, F ), i.e.,I(R;S, F ). The information becomes I(R;F ) af-ter excluding the information about S. Because

I(R;S, F )� I(R;F ) = I(R;S|F ) � 0,

the available information is non-increasing whenexcluding the information on S. Hence, thetrade-o↵ for enhancing the independence gener-ally worsens the prediction accuracy.

5.2. Relation to RecommendationDiversity

We will briefly discuss recommendation diver-sity (Kunaver and Pozrl, 2017), which is an at-tempt to recommend a set of items that are mu-tually less similar. McNee et al. (2006) pointedout that recommendation diversity is important,because users become less satisfied with recom-mended items if similar items are repeatedlyshown. To our knowledge, Ziegler et al. (2005)were the first to propose an algorithm to diver-sify recommendation lists by selecting items lesssimilar to those already selected. Lathia et al.(2010) discussed the concept of temporal diver-sity that is not defined in a single recommenda-tion list, but over temporally successive recom-mendations. Adomavicius and Kwon (2012) dis-cussed the aggregate diversity and the individualdiversity over the items that are recommendedto a whole population of users and to a specificuser, respectively.We wish to emphasize that independence is

distinct from recommendation diversity. Thereare three major di↵erences between recommen-dation diversity and independence. First, whilethe diversity is the property of a set of recommen-dations, as described above, the independence isa relation between each recommendation and aspecified sensitive feature. Hence, it is impos-sible to diversify a single recommendation, buta single recommendation can be independent ifits prediction of ratings or its determination of

older

newer

(a) standard

older

newer

(b) diversified

Figure 5: Enhancement of recommendation di-versity

whether to recommend or not is statistically in-dependent from a given sensitive feature.

Second, recommendation independence de-pends on the specification of a sensitive featurethat is a function of an item and a user. Onthe other hand, recommendation diversity basi-cally depends on the specification of how itemsare similar. Even though similarity metrics arenot explicitly taken into account, similarities areimplicitly considered in a form, for example,whether or not a pair of items are the same.Therefore, independence and diversity are appli-cable to di↵erent situations. Because a sensitivefeature can represent the characteristics of users,independence can be applicable for coping withthe factors of users, such as the ML1M-Gender orSushi-Age feature. Inversely, the relative di↵er-ence between item properties is hard to processby using independence. For example, when us-ing a similarity for diversity, one can representwhether or not two items are the same color, butit is not easy to capture such a feature by usinga sensitive feature. In this way, diversity and in-dependence are directed at di↵erent aspects in arecommendation context.

Third, while diversity seeks to provide a widerrange of topics, independence seeks to provideunbiased information. Consider a case like thatin Figure 1, where there are two types of candi-date movies, older and newer, and older moviestend to be highly rated. As shown in Fig-ure 5(a), a standard recommender selects top-rated movies, which are illustrated by shading.Newer movies are less recommended than oldermovies, because newer ones are rated lower. Toenhance diversity, newer movies are added toa recommendation list instead of removing theolder movies, as shown in Figure 5(b). As aresult, both older and newer movies are recom-mended, and a wider range of movie topics is

12


provided to users. However, the predicted rat-ings are still a↵ected by whether a movie is neweror older. In other words, this diversified recom-mendation list is biased in the sense that it isinfluenced by the release year of movies. This ishighly contrasted with the case of recommenda-tion independence shown in Figure 1(b). How-ever, in this latter case, the inverse situation ap-plies: even though the recommendation indepen-dence is enhanced, a recommender might selectmovies having highly skewed topics with respectto sensitive features other than the one specified.Therefore, as shown in this example, the pur-poses of recommendation diversity and indepen-dence are di↵erent.

5.3. Other Related Topics

In addition to the recommendation diversity, theconcept of recommendation independence hasconnection with the following research topics.We adopted techniques for fairness-aware data

mining to enhance the independence. Fairness-aware data mining is a general term for min-ing techniques designed so that sensitive infor-mation does not influence the mining outcomes.Pedreschi et al. (2008) first advocated such min-ing techniques, which emphasized the unfair-ness in association rules whose consequents in-clude serious determinations. Datta et al. (2016)quantified the influence of a sensitive future bya surrogate data test. Another technique offairness-aware data mining focuses on classifi-cations designed so that the influence of sen-sitive information on the predicted class is re-duced (Kamishima et al., 2012b; Calders andVerwer, 2010; Kamiran et al., 2012). Thesetechniques would be directly useful in the de-velopment of an independence-enhanced vari-ant of content-based recommender systems, be-cause content-based recommenders can be im-plemented by standard classifiers. Specifically,class labels indicate whether or not a target userprefers a target item, and the features of objectscorrespond to features of item contents.The concept behind recommendation trans-

parency is that it might be advantageous to ex-plain the reasoning underlying individual recom-mendations. Indeed, such transparency has beenproven to improve the satisfaction of users (Sinhaand Swearingen, 2002), and di↵erent methods

of explanation have been investigated (Herlockeret al., 2000). In the case of recommendationtransparency, the system convinces users of itsobjectivity by demonstrating that the recommen-dations were not made with any malicious in-tention. On the other hand, in the case of in-dependence, the objectivity is guaranteed basedon mathematically defined principles. However,it would be useful to examine the interface toquantify the degree of recommendation inde-pendence to improve user experiences. Justas it can be helpful to show the confidence ofthe prediction accuracy, it would be helpful todisplay the measures of independence. Com-paring the independence-enhanced recommenda-tions with non-enhanced recommendations wouldalso be beneficial.

Because independence-enhanced recommenda-tion can be used to avoid the exposure of pri-vate information if the private information isrepresented by a sensitive feature, these tech-niques are related to privacy-preserving datamining. To protect the private information con-tained in rating information, dummy ratings areadded (Weinsberg et al., 2012). In addition, pri-vacy attack strategies have been used as a toolfor detecting discrimination discovery (Ruggieriet al., 2014).

6. Conclusions

We proposed the concept of recommendation in-dependence to exclude the influence of specifiedinformation from a recommendation outcome.We previously attempted to enhance recommen-dation independence, but these attempts merelyshifted the predicted ratings. In this paper, wedeveloped new algorithms that can deal with thesecond moments of rating distributions, and thusthe sensitive information could be more strictlyremoved. The advantages of new algorithms weredemonstrated by the experimental results. In ad-dition, we explored applications of independence-enhancement, and clarified the relations betweenrecommendation independence and other recom-mendation topics, such as diversity.

There are many capabilities required forindependence-enhanced recommendation. Whileour current technique is mainly applicable to thetask of predicting ratings, we plan to developanother algorithm for the find-good-items task.

13


We plan to explore other types of independenceterms or approaches. Because sensitive featuresare currently restricted to binary types, we willalso try to develop independence terms that candeal with a sensitive feature that is categorical orcontinuous. Methods for handling more compli-cated conditional fairness like that described insection 5.1 need to be developed.

Acknowledgments

We gratefully acknowledge the valuable com-ments and suggestions of Dr. Paul Resnick. Wewould like to thank the Grouplens research laband Dr. Mohsen Jamali for providing datasets.This work is supported by MEXT/JSPS KAK-ENHI Grant Number JP24500194, JP15K00327,and JP16H02864.

References

G. Adomavicius and Y. Kwon. Improving aggre-gate recommendation diversity using ranking-based techniques. IEEE Trans. on Knowledgeand Data Engineering, 24(5):896–911, 2012.

C. M. Bishop. Pattern Recognition And MachineLearning. Information Science and Statistics.Springer, 2006.

T. Calders and S. Verwer. Three naive Bayes ap-proaches for discrimination-free classification.Data Mining and Knowledge Discovery, 21:277–292, 2010.

T. Calders, A. Karim, F. Kamiran, W. Ali, andX. Zhang. Controlling attribute e↵ect in lin-ear regression. In Proc. of the 13th IEEE Int’lConf. on Data Mining, pages 71–80, 2013.

O. Celma and P. Cano. From hits to niches?: orhow popular artists can bias music recommen-dation and discovery. In Proc. of the 2nd KDDWorkshop on Large-Scale Recommender Sys-tems and the Netflix Prize Competition, 2008.

D. Cosley, S. K. Lam, I. Albert, J. A. Konstan,and J. Riedl. Is seeing believing? how rec-ommender interfaces a↵ect users’ opnions. InProc. of the SIGCHI Conf. on Human Factorsin Computing Systems, pages 585–592, 2003.

T. M. Cover and J. A. Thomas. Elements of In-formation Theory. Wiley Series in Telecommu-nications and Signal Processing. Wiley, secondedition, 2006.

A. Datta, S. Sen, and Y. Zick. Algorithmictransparency via quantitative input influence.In IEEE Symposium on Security and Privacy,2016.

S. Forden. Google said to face ultimatum fromFTC in antitrust talks. Bloomberg, Nov. 132012. hhttp://bloom.bg/PPNEaSi.

A. Gunawardana and G. Shani. A survey of ac-curacy evaluation metrics of recommendationtasks. Journal of Machine Learning Research,10:2935–2962, 2009.

S. Hajian, F. Bonchi, and C. Castillo. Algo-rithmic bias: from discrimination discovery tofairness-aware data mining. The 22nd ACMSIGKDD Int’l Conf. on Knowledge Discoveryand Data Mining, Tutorial, 2016.

J. L. Herlocker, J. A. Konstan, and J. Riedl.Explaining collaborative filtering recommen-dations. In Proc. of the Conf. on ComputerSupported Cooperative Work, pages 241–250,2000.

M. Jamali and M. Ester. A matrix factoriza-tion technique with trust propagation for rec-ommendation in social networks. In Proc. ofthe 4th ACM Conf. on Recommender Systems,pages 135–142, 2010.

F. Kamiran, A. Karim, and X. Zhang. Decisiontheory for discrimination-aware classification.In Proc. of the 12th IEEE Int’l Conf. on DataMining, pages 924–929, 2012.

F. Kamiran, I. Zliobaite, and T. Calders. Quan-tifying explainable discrimination and remov-ing illegal discrimination in automated decisionmaking. Knowledge and Information Systems,35:613–644, 2013.

T. Kamishima. Nantonac collaborative filtering:Recommendation based on order responses. InProc. of The 9th Int’l Conf. on Knowledge Dis-covery and Data Mining, pages 583–588, 2003.

14

http://bloom.bg/PPNEaS


T. Kamishima and S. Akaho. Considerations onrecommendation independence for a find-good-items task. In Workshop on Responsible Rec-ommendation, 2017.

T. Kamishima, S. Akaho, H. Asoh, andJ. Sakuma. Enhancement of the neutrality inrecommendation. In The 2nd Workshop onHuman Decision Making in Recommender Sys-tems, 2012a.

T. Kamishima, S. Akaho, H. Asoh, andJ. Sakuma. Fairness-aware classifier with prej-udice remover regularizer. In Proc. of theECML PKDD 2012, Part II, pages 35–50,2012b. [LNCS 7524].

T. Kamishima, S. Akaho, H. Asoh, andJ. Sakuma. E�ciency improvement ofneutrality-enhanced recommendation. In The3rd Workshop on Human Decision Making inRecommender Systems, 2013.

T. Kamishima, S. Akaho, H. Asoh, and I. Sato.Model-based approaches for independence-enhanced recommendation. In Proc. of theIEEE 16th Int’l Conf. on Data Mining Work-shops, pages 860–867, 2016.

Y. Koren. Factorization meets the neighborhood:A multifaceted collaborative filtering model. InProc. of the 14th ACM SIGKDD Int’l Conf. onKnowledge Discovery and Data Mining, pages426–434, 2008.

M. Kunaver and T. Pozrl. Diversity in recom-mender systems — a survey. Knowledge-BasedSystems, 123:154–162, 2017.

N. Lathia, S. Hailes, L. Capra, and X. Amatriain.Temporal diversity in recommender systems.In Proc. of the 33rd Annual ACM SIGIR Conf.on Research and Development in InformationRetrieval, pages 210–217, 2010.

F. Maxwell Harper and J. A. Konstan. Themovielens datasets: History and context. ACMTrans. on Interactive Intelligent Systems, 5(4),2015.

S. M. McNee, J. Riedl, and J. A. Konstan. Accu-rate is not always good: How accuracy metricshave hurt recommender systems. In Proc. ofthe SIGCHI Conf. on Human Factors in Com-puting Systems, pages 1097–1101, 2006.

E. Pariser. The Filter Bubble: What The InternetIs Hiding From You. Viking, 2011.

D. Pedreschi, S. Ruggieri, and F. Turini.Discrimination-aware data mining. In Proc. ofthe 14th ACM SIGKDD Int’l Conf. on Knowl-edge Discovery and Data Mining, pages 560–568, 2008.

A. Perez-Suay, V. Laparra, G. Mateo-Garcıa,J. Munos-Marı, L. Gomez-Chova, andG. Camps-Valls. Fair kernel learning. In Proc.of the ECML PKDD 2017, 2017.

P. Resnick, J. Konstan, and A. Jameson.Panel on the filter bubble. The 5th ACMConf. on Recommender Systems, 2011.hhttp://acmrecsys.wordpress.com/2011/10/25/panel-on-the-filter-bubble/i.

S. Ruggieri, S. Hajian, F. Kamiran, andX. Zhang. Anti-discrimination analysis us-ing privacy attack strategies. In Proc. of theECML PKDD 2014, Part II, pages 694–710,2014. [LNCS 8725].

R. Salakhutdinov and A. Mnih. Probabilistic ma-trix factorization. In Advances in Neural In-formation Processing Systems 20, pages 1257–1264, 2008.

R. Sinha and K. Swearingen. The role of trans-parency in recommender systems. In Proc. ofthe SIGCHI Conf. on Human Factors in Com-puting Systems, pages 830–831, 2002.

L. Sweeney. Discrimination in online ad deliv-ery. Communications of the ACM, 56(5):44–54, 2013.

S. Watanabe. Knowing and Guessing – Quantita-tive Study of Inference and Information. JohnWiley & Sons, 1969.

U. Weinsberg, S. Bhagat, S. Ioannidis, andN. Taft. Blurme: Inferring and obfuscatinguser gender based on ratings. In Proc. ofthe 6th ACM Conf. on Recommender Systems,pages 195–202, 2012.

C. N. Ziegler, S. M. McNee, J. A. Konstan, andG. Lausen. Improving recommendation liststhrough topic diversification. In Proc. of the14th Int’l Conf. on World Wide Web, pages22–32, 2005.

15

http://acmrecsys.wordpress.com/2011/10/25/panel-on-the-filter-bubble/

http://acmrecsys.wordpress.com/2011/10/25/panel-on-the-filter-bubble/

Proceedings of Machine Learning Research 81:1–20, 2018 Conference on Fairness, Accountability, and Transparency

Recommendation Independence[Supplementary Materials]

Toshihiro Kamishima [email protected]

Shotaro Akaho [email protected]

Hideki Asoh [email protected]

National Institute of Advanced Industrial Science and Technology (AIST),

AIST Tsukuba Central 2, Umezono 1–1–1, Tsukuba, Ibaraki, Japan 305–8568

Jun Sakuma [email protected]

University of Tsukuba, 1–1–1 Tennodai, Tsukuba, Ibaraki, Japan 305–8577; and

RIKEN Center for Advanced Intelligence Project, 1–4–1 Nihonbashi, Chuo-ku, Tokyo, Japan 103-0027

Editors: Sorelle A. Friedler and Christo Wilson

Appendix A. Changes of MAEs,means and standarddeviations ofpredicted ratings

Figure 6 is the full version of Figure 3. This showsthe changes of MAEs, means and standard devi-ations of predicted ratings according to the pa-rameter, ⌘, for all datasets. We focus on pairsof standard deviations depicted by blue dottedlines in the right three columns of the subfigures.While the mean-m term is designed to ignore thesecond moments of distributions, these momentscan be taken into account by the bdist-m andmi-normal terms. Hence, the behaviors of stan-dard deviations, which are the square roots ofthe second moments, disclose the distinctions ofthe three independence terms. The observationsof the standard deviations may be summarizedas:

• ML1M-Gender: all three independence termscould make pairs of standard deviations con-verge.

• ML1M-Year, Flixster, Sushi-Gender: bdist-m

and mi-normal terms could make standarddeviations converge, but a mean-m termcould not.

• Sushi-Age, Sushi-Seafood: for all three inde-pendence terms, standard deviations did notconverge.

In summary, there were no cases for which amean-m term could make standard deviations

converge, but a mi-normal term or a bdist-m termcould not. From this fact, we can conclude thatour new independence terms, bdist-m and mi-

normal, enhanced recommendation independencemore strictly than the mean-m term.

Appendix B. The Analysis of theFailure to Control theSecond Moments forSome Datasets

In section 4.2.1, we briefly described that the fail-ure to control the standard deviations for Sushi-Age and Sushi-Seafood was due to the instabilityof the predictions. We here show more evidences.Table 3 shows MAEs for two groups, D(0) andD(1), under the condition of recommendation in-dependence being fully enhanced (⌘ = 100). Er-rors for a Sushi-Age dataset such that S=0, andfor a Sushi-Seafood dataset such that S=1 (un-derlined in the Table) were relatively large. FromTable 2, it may be seen that the numbers of datafor these two groups were very small, and as aresult, predictions became unstable. This insta-bility made it di�cult to control the shapes ofdistributions, and thus standard deviations failedto converge. Despite these di�cult conditions, weemphasize that all independence recommenderssucceeded to in modifying the first moments ofdistributions appropriately.

c� 2018 T. Kamishima, S. Akaho, H. Asoh & J. Sakuma.


MAE mean-m bdist-m mi-normal

ML1M-Year

η


0.64

0.66

0.68

0.70

0.72

0.74

10−2 10−1 1 101 102η

3.0

3.2

3.4

3.6

3.8

4.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102η

3.0

3.2

3.4

3.6

3.8

4.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102η

3.0

3.2

3.4

3.6

3.8

4.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102

ML1M-Gender

η


0.64

0.66

0.68

0.70

0.72

0.74

10−2 10−1 1 101 102η

3.0

3.2

3.4

3.6

3.8

4.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102η

3.0

3.2

3.4

3.6

3.8

4.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102η

3.0

3.2

3.4

3.6

3.8

4.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102

Flixster

η


0.60

0.62

0.64

0.66

0.68

0.70

10−2 10−1 1 101 102η

3.0

3.2

3.4

3.6

3.8

4.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102η

3.0

3.2

3.4

3.6

3.8

4.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102η

3.0

3.2

3.4

3.6

3.8

4.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102

Sushi-Age

η


0.86

0.88

0.90

0.92

0.94

0.96

10−2 10−1 1 101 102η

2.0

2.2

2.4

2.6

2.8

3.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102η

2.0

2.2

2.4

2.6

2.8

3.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102η

2.0

2.2

2.4

2.6

2.8

3.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102

Sushi-Gender

η


0.86

0.88

0.90

0.92

0.94

0.96

10−2 10−1 1 101 102η

2.0

2.2

2.4

2.6

2.8

3.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102η

2.0

2.2

2.4

2.6

2.8

3.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102η

2.0

2.2

2.4

2.6

2.8

3.0

0.4

0.6

0.8

1.0

1.2

1.4

10−2 10−1 1 101 102

Sushi-Seafood

η


0.86

0.88

0.90

0.92

0.94

0.96

10−2 10−1 1 101 102η

2.0

2.2

2.4

2.6

2.8

3.0

0.2

0.4

0.6

0.8

1.0

1.2

10−2 10−1 1 101 102η

2.0

2.2

2.4

2.6

2.8

3.0

0.2

0.4

0.6

0.8

1.0

1.2

10−2 10−1 1 101 102η

2.0

2.2

2.4

2.6

2.8

3.0

0.2

0.4

0.6

0.8

1.0

1.2

10−2 10−1 1 101 102

Figure 6: Changes of MAEs, means and standard deviations of predicted ratingsNOTE: The subfigure rows sequentially show the results for the ML1M-Year, ML1M-Gender, Flixster, Sushi-Age,Sushi-Gender, and ML1M-Seafood datasets, respectively. The X-axes of these subfigures represent the indepen-dence parameter, ⌘, in a logarithmic scale. The Y-axes of subfigures in the first column represent MAEs in alinear scale. Red solid lines with points, green broken lines, and blue dotted lines show results by the mean-m,bdist-m, and mi-normal models, respectively. Black solid lines without points show non-personalized MAEs inTable 1, which are errors of an original probabilistic matrix factorization model. Note that results for bdist-m andmi-normal terms overlapped in some subfigures. The Y-axes of subfigures in the other three columns representthe means and standard deviations of predicted ratings in a linear scale. Means and standard deviations fortwo groups based on sensitive values are represented by the scales at the left and right side of these subfigures,respectively. Pairs of means and standard deviations are depicted by red solid and blue dotted lines, respectively.

2


Table 3: Absolute Mean Errors Per Sensitive Value

Methods mean-m mi-normal bdist-m

Datasets S=0 S=1 S=0 S=1 S=0 S=1

ML1M-Year 0.683 0.709 0.684 0.712 0.685 0.712ML1M-Gender 0.678 0.742 0.680 0.743 0.680 0.744Flixster 0.681 0.628 0.684 0.631 0.687 0.631Sushi-Age 1.039 0.919 1.152 0.908 1.156 0.903Sushi-Gender 0.881 0.965 0.872 0.973 0.878 0.974Sushi-Seafood 0.909 1.038 0.895 1.059 0.894 1.058

dislike like

(a) mean-m

dislike like

(b) bdist-m

Figure 7: Distributions of the ratings predictedby mean-m and bdist-m methods for each sensi-tive value

Appendix C. Comparison ofRating DistributionsPredicted by mean-mand bdist-m Methods

Figure 7 is the same as the Figure 4 exceptfor comparing the mean-m and bdist-m methods.Similar trends were observed as in the case ofcomparison with the mi-normal method.

Appendix D. E�ciency of anAccuracy-IndependenceTrade-O↵

We next examined the e�ciency of the trade-o↵between accuracy and independence. Before dis-cussing this trade-o↵, we first consider the fol-lowing two baseline errors. The first baseline isthe non-personalized MAE, defined as the MAEwhen the mean ratings are always o↵ered. This

corresponds to the expected MAE when ran-domly recommending items. This type of non-personalized recommendation can be consideredcompletely independent, because R is statisti-cally independent from all the other variables,including S. The second baseline is the stan-dard MAE, defined as the MAE when ratingsare predicted by an original probabilistic ma-trix factorization model without an independenceterm. Due the above trade-o↵, the error in rat-ings predicted theoretically by an independence-enhanced recommender, would be larger or equalto the standard MAE. We show these two base-lines for the three datasets in Table 1. Comparedto the MAEs in Figure 2, MAEs produced by ourindependence-enhanced recommenders were sub-stantially smaller than their corresponding non-personalized MAEs and were nearly equal to orslightly worse than their corresponding standardMAEs.

To analyze the accuracy-independence trade-o↵, we compared independence-enhanced recom-menders with another baseline, a partially ran-dom recommender. This partially random rec-ommender o↵ers ratings by a standard recom-mender, but the �% of items were exchangedwith randomly selected items. This replacementcould be simulated by replacing the �% val-ues of predicted ratings with the correspondingmean ratings, which were the ratings of a non-personalized recommender. We show the com-parison of results obtained by a mi-normal termin Figure 8. Note that results of mean-m andbdist-m terms were similar to those of the mi-

normal term. The accuracies of partially randomrecommenders were much worse than those ofour mi-normal at the same level of independence,though results were rather unstable in Sushi-Age

and Sushi-Seafood cases. Based on these re-

3


MAE

0.5

0.6

0.7

0.8

0.9

1.0

KS0 0.1 0.2

(a) ML1M-Year

MAE

0.5

0.6

0.7

0.8

0.9

1.0

KS0 0.1 0.2

(b) ML1M-Gender

MAE

0.5

0.6

0.7

0.8

0.9

1.0

KS0 0.1 0.2

(c) Flixster

MAE

0.8

0.9

1.0

1.1

1.2

KS0 0.2 0.4

(d) Sushi-Age

MAE

0.8

0.9

1.0

1.1

1.2

KS0 0.2 0.4

(e) Sushi-Gender

MAE

0.8

0.9

1.0

1.1

1.2

KS0 0.2 0.4

(f) Sushi-Seafood

Figure 8: Comparison of independence-enhanced and partially random recommendationsNOTE: The curves in the charts show the changes in the KS and MAE. The X-axes of these subfigures representthe Kolmogorov-Smirnov static (KS) in a linear scale. The Y-axes represent the prediction errors measured bythe mean absolute error (MAE) in a linear scale. Because a smaller KS and MAE respectively indicate greaterindependence and better accuracy, the bottom left corner of each chart is preferable. Red lines with circles showthe indices derived by a partially random recommender whose mixture ratio, �, was changed from 0% to 90%.Blue lines with squares show the results of a mi-normal model when the independence parameter, ⌘, is increasedfrom 10�2 to 102.

sults, we conclude that independence-enhancedrecommenders could increase independence witha smaller sacrifice in accuracy than random re-placement.

In summary, these experimental results sug-gest that independence-enhanced recommenderse�ciently exclude sensitive information.

Appendix E. Changes inPreference to MovieGenres

To show how the patterns of recommendationswere changed, we show the genre-related di↵er-ences of mean ratings in Table 4. The ML1M-

Year and ML1M-Gender data were first dividedaccording to the eighteen kinds of movie gen-res provided in the original data. Each genre-related data set further divided into two sets ac-cording to their sensitive values, the mean ratingswere computed for each set, and we showed thedi↵erences of these mean ratings. We targetedthree types of ratings: the original true ratings,and ratings predicted by mean-m, bdist-m, and

mi-normal methods (⌘ = 100). We selected thesix genres for which the absolute di↵erences be-tween original mean ratings for two subsets werelargest. In Table 4(a), the genres in which newermovies were highly rated are presented in the up-per three rows, and the lower three rows show thegenres in which older movies were favored. In Ta-ble 4(b), the genres preferred by females are listedin the upper three rows, and the lower three rowsshow the genres preferred by males.

In this table, because the mi-normal methodshowed very similar trend with the mean-m andbdist-m methods, we hereafter focus on the mi-

normal method. It could be seen that the ab-solute di↵erences in the mi-normal columns weregenerally reduced compared to those of the cor-responding original di↵erences if they were orig-inally large. For example, in the case of ML1M-

Year data, the large absolute di↵erence in theFantasy 0.593 was reduced to 0.308, but the smallabsolute di↵erence in Animation 0.040 was widento 0.305. This meant that the independence-recommenders did not merely shift the predictedratings according to the sensitive values to en-

4


Table 4: Genre-related di↵erences of mean ratings

(a) ML1M-Year data set: old - new

genre original mean-m bdist-m mi-normal

Animation �0.040 �0.302 �0.301 �0.305Documentary 0.113 �0.120 �0.117 �0.121Film-Noir 0.238 �0.037 0.025 0.007

Western 0.524 0.254 0.231 0.232Mystery 0.563 0.295 0.303 0.297Fantasy 0.593 0.331 0.308 0.308

(b) ML1M-Gender data set: male - female

genre original mean-m bdist-m mi-normal

Children’s �0.214 �0.160 �0.160 �0.146Musical �0.213 �0.154 �0.154 �0.146Romance �0.100 �0.046 �0.047 �0.036

Crime 0.024 0.081 0.089 0.094Film-Noir 0.074 0.125 0.145 0.134Western 0.103 0.155 0.147 0.163

hance independence. By excluding the informa-tion about sensitive features, the di↵erences be-tween mean ratings were occasionally widened.This is because the balance between indepen-dence and accuracy is considered in equation (1),and thus the ratings for events that are morehelpful for enhancing independence are drasti-cally changed.

5

recommendation independence - kamishima › archive › 2018-p-fat-print.pdf · proceedings of...

Documents