achieving optimal privacy in trust-aware collaborative filtering recommender systems

Achieving Optimal Privacy in Trust-Aware Social Recommender Systems

Nima Dokoohaki, Cihan Kaleli, Huseyin Polat, Mihhail Matskin

The Second International Conference on Social Informatics (SocInfo’10) 27-29 October, 2010, Laxenburg, Austria

Emegrence of Trust in Social Recommender Systems

• Most successful recommenders employ well-known collaborative filtering (CF) techniques- Social Recommender Systems (SRS) – Those CF-based

recommenders that use social network as backbone.

• CF automates the word-of-mouth process- Finding users similar to the user receiving their

recommendation and suggesting her items rated high in the past by similar taste users

• Shortcoming : sparsity of User-Rating Matrix- There are always numerous items and the ratings scored

by users are sparse, often the step of finding similar users fails . Trust is proposed as remedy.

Extending Social Recommender Systems with Trust Metric

• Extend CF recommenders as follows:- Utilizing a trust metric, which enables a trust-based

heuristic to propagate and find users whom are trustworthy with respect to active user that we are gathering/generating recommendations for.

• Trust has shown that it can improve the accuracy of recommenders. ( Golbeck, Ziegler, Massa, ...)

• Complete list of problems addressed by trust-recommenders - Massa, P., & Avesani, P. Trust Metrics in Recommender

Systems. In Computing with Social Trust (pp. 259-285), 2009.

Problems with Existing Trust-aware Recommender • Privacy and lack of decentralization ...

• Growing concern about the vulnerability to shilling attacks : - Current implementations are centralized or not tested in

a decentralized fashion

• Current research has paid least attention to clearly address the privacy issues surrounding the architecture and components of trust recommenders.

Privacy issues with Social Recommender Systems

• CF systems including social networks-based ones have several advantages. However, they fail to protect users’ privacy.

• … Also, Data collected for CF can be used for unsolicited marketing, government surveillance, profiling users, etc.

• Users who remain concerned about their privacy, …- users might decide to give false data that effect

producing truthful recommendations. • This in turn leads to decrease in accuracy of

performance of recommender system

Motivation and Contributions

• Emphasizing importance of dealing with privacy issues surrounding the architecture and components of trust-aware recommender systems. - Extending Architecture of Trust Recommenders with

Privacy Preserving Module- Proposing use data perturbation techniques to protect

users’ privacy while still providing accurate recommendations.

• Dealing with conflict of privacy goals and trust goals through Agent Mechanisms- Utilizing Pareto efficiency

Trust-Aware Recommender System Architecture

Taken from Massa, P., & Avesani, P. “Trust Metrics in Recommender Systems”. In Computing with Social Trust (pp. 259-285), 2009.

Trust[NxN]

Rating[NxM]

Trust Metric

EstimatedTrust [NxN]

INPUT

N:: usersM:: items

First Step Second Step

Trust-aware

SimilarityMetric

User Similarity

[NxN]

Rating Predictor

Pure Collaborative Filtering

Disguised Ratings[NxM]

OUTPUT

Private Trust-Metric

EstimatedPrivate Trust[NxN]

Private Trust-Aware Collaborative Filtering

PrivateRating

Predictor

PredictedRating[NxM]

Private Trust-Aware Recommender Architecture

Privacy Protection Methodology:Data Normalization with z-score

• Normalization of data is very critical to increase privacy level.

• To privacy protection, users employ data perturbation techniques. We propose to use normalized version of actual ratings to improve the privacy level.

• As a result, z-score values are utilized.

• *z-score of an item indicates how far and in what direction, that item deviates from its distribution's mean, expressed in units of its distribution's standard deviation.

*W. Du and H. Polat. Privacy-preserving collaborative filtering. International Journal of Electronic Commerce, 9(4):9-36, 2005.

Privacy Protection Methodology:Random Perturbations

• To disguise data, users add random numbers to z-scores. They select such random numbers from two different distribution which are Gaussian and uniform

• Since adding random numbers hides ratings of rated items, users add random ratings to hide unrated items.

• After disguising their private data, users compute trust between each other.

Private Trust Estimation:Trust Formalization

• Assume there are two users; ua and ub.

We formalize the trust between them as follows:

Private Trust Estimation: Trust Estimation

Private Recommendation Process: Producing Referrals

Mutual Effects of Trust and Privacy:Notion of Conflict• privacy and accuracy are conflicting goals

• Conflict - Trust metrics along at each step of trust estimation

increase or maintain the accuracy of predictions. - Increasing the amount of perturbations leads to further

information loss.

• Dealing with Conflict through Optimization - we can argue that an optimal setting can be defined

where privacy and accuracy can be both maintained at the same time

Optimization design space

• PCS (privacy configuration set)

• TCS(trust configuration set)

• Probrem space consists of all possible configurations:

Mapping Design Space to a Pareto Optimization Space

Inferring The Optimal Privacy Set

• Heuristic. To infer OPS, following heuristic is used:

1. Perturbing the overall user data using different PCS settings;

2. Observing the framework under variations of TCS; • (steps 2 & 3 are interchangeable depending on goals

at hand)

3. Perturbing the sparse user data with PCS inferred from step 2 allows for inferring OPS and finalizing the Pareto optimal setting

Evaluating the Recommendation Framework: Dataset

• Two sets of experiments: - First set demonstrates the effect of insertion of random

data on accuracy of predictions generated as output of the recommendation system.

- The second set of experiments demonstrates how filling unrated items with varying f values affect the overall accuracy of recommender system

• MovieLens dataset, http://www.grouplens.org/node/73- 943 user rating profiles, with more than 100000 rating

values. Rating values are on a 5 point scale.

http://www.grouplens.org/node/73

Evaluating the Recommendation Framework: Recommender

• We used Trust Recommender from:- S. Fazeli, A. Zarghami, N. Dokoohaki, and M. Matskin,

"Elevating Prediction Accuracy in Trust-aware Collaborative filtering Recommenders through T-index Metric and TopTrustee lists," the Journal of Emerging Technologies in Web Intelligence (JETWI), 2010.

• a decentralized trust-aware recommender- T-index , as a trust metric for filtering trust between

users. Unlike previous approaches, - a trust network between users can automatically be

built from existing ratings between users. - a Distributed Hash Table (DHT) like list of trustees,

TopTrusteeList (TTL) [19] that wraps around the items, which are tasted similarly to those of current user.

MAE of recommendation framework, without adding any perturbations

n=2 n=3 n=5 n=10 n=20 n=500.850000000000001

0.860000000000001

0.870000000000001

0.880000000000001

0.890000000000001

0.900000000000001

0.910000000000001

0.920000000000001

T=0 T=25 T=50 T=100 T=200 T=500 T=1000

Zarghami, A., Fazeli, S., Dokoohaki, N., & Matskin, M. (2009). Social Trust-Aware Recommendation System: A T-Index Approach. In Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on (Vol. 3, pp. 85-90). IEEE Computer Society. doi: 10.1109/WI-IAT.2009.237.

MAE with added perturbations to user data, having Gaussian distribution

β=1 β=2 β=3 β=40.42

0.92

1.42

1.92

2.42

2.92

N=2, T=0 N=2, T=100 N=3, T=0 N=3, T=100 N=5, T=0

MAE with added perturbations to user data, having Uniform distribution

δ=1 δ=2 δ=3 δ=40.620000000000002

0.820000000000002

1.02

1.22

1.42

1.62

N=2, T=0 N=2, T=100 N=3, T=0 N=3, T=100 N=5, T=0

β=1 β=2 β=3 β=40.42

0.92

1.42

1.92

2.42

2.92

N=2, T=0 N=2, T=100 N=3, T=0

N=3, T=100 N=5, T=0

δ=1 δ=2 δ=3 δ=40.620000000000003

0.820000000000003

1.02

1.22

1.42

1.62

N=2, T=0 N=2, T=100 N=3, T=0

N=3, T=100 N=5, T=0

(δ, β) = (1,1)

Perturbing the overall user data using Gaussian and Uniform distributions (δ, β)

β=1 β=2 β=3 β=40.42

0.92

1.42

1.92

2.42

2.92

N=3, T=100

δ=1 δ=2 δ=3 δ=40.620000000000003

0.820000000000003

1.02

1.22

1.42

1.62

N=3, T=100

(δ, β) = (1,1)

Compare the results from MAE of framework under masked data

(n, t) = (3,100)

Filling Sparse Data with Random Gaussian distribution with respect to f

Half Density Full Density Double Density0.740000000000004

0.750000000000004

0.760000000000004

0.770000000000004

0.780000000000004

0.790000000000004

Fine-tuning the privacy

• Perturb the sparse user data with (δ, β, n, t) inferred from previous step for fine-tuning the privacy.

• we observe consistent increase in intervals of f which finalizes the choice of n, t, δ, β

• We finalize the results in • ordered set of n=3, t=100, δ=1, β=1 and f = [0, d]

Will be the Pareto front.

Inferring Optimality Set:Comparison with Non-Masked Results

• Optimality holds under Masked Data.

• Comparison with MAE results of non-masked framework with framework under masked results:- we inferred the optimum values for β=1, n=3 and t=100

and for these parameters MAE= 0.7994, while for similar parameters without adding perturbations we achieve MAE=0.881.

• MAE results are still less than results of MAE without adding perturbations. - we achieve the best results with MAE= 0.863 for

(n,t)=(50,100) and this value is still greater than our optimum value.

MAE results are still better than results of MAE without adding perturbations.

Conclusions

• A framework for addressing the problem of privacy in trust recommenders is proposed,

• Conflicting goals of privacy and accuracy,

• Through experiments we showed that we can infer such setting that holds even when trust recommender is not under privacy measures,

• As a result privacy can be introduced in trust recommenders and can be optimized to avoid private data loss and at the same time produce accurate recommendations

Thank you

Nima Dokoohakihttp://web.it.kth.se/~nimad/

http://web.it.kth.se/~nimad/contact.html

http://web.it.kth.se/~nimad/contact.html