Achieving Optimal Privacy in Trust-Aware Social Recommender Systems
Nima Dokoohaki, Cihan Kaleli, Huseyin Polat, Mihhail Matskin
The Second International Conference on Social Informatics (SocInfo’10) 27-29 October, 2010, Laxenburg, Austria
Emegrence of Trust in Social Recommender Systems
• Most successful recommenders employ well-known collaborative filtering (CF) techniques- Social Recommender Systems (SRS) – Those CF-based
recommenders that use social network as backbone.
• CF automates the word-of-mouth process- Finding users similar to the user receiving their
recommendation and suggesting her items rated high in the past by similar taste users
• Shortcoming : sparsity of User-Rating Matrix- There are always numerous items and the ratings scored
by users are sparse, often the step of finding similar users fails . Trust is proposed as remedy.
Extending Social Recommender Systems with Trust Metric
• Extend CF recommenders as follows:- Utilizing a trust metric, which enables a trust-based
heuristic to propagate and find users whom are trustworthy with respect to active user that we are gathering/generating recommendations for.
• Trust has shown that it can improve the accuracy of recommenders. ( Golbeck, Ziegler, Massa, ...)
• Complete list of problems addressed by trust-recommenders - Massa, P., & Avesani, P. Trust Metrics in Recommender
Systems. In Computing with Social Trust (pp. 259-285), 2009.
Problems with Existing Trust-aware Recommender • Privacy and lack of decentralization ...
• Growing concern about the vulnerability to shilling attacks : - Current implementations are centralized or not tested in
a decentralized fashion
• Current research has paid least attention to clearly address the privacy issues surrounding the architecture and components of trust recommenders.
Privacy issues with Social Recommender Systems
• CF systems including social networks-based ones have several advantages. However, they fail to protect users’ privacy.
• … Also, Data collected for CF can be used for unsolicited marketing, government surveillance, profiling users, etc.
• Users who remain concerned about their privacy, …- users might decide to give false data that effect
producing truthful recommendations. • This in turn leads to decrease in accuracy of
performance of recommender system
Motivation and Contributions
• Emphasizing importance of dealing with privacy issues surrounding the architecture and components of trust-aware recommender systems. - Extending Architecture of Trust Recommenders with
Privacy Preserving Module- Proposing use data perturbation techniques to protect
users’ privacy while still providing accurate recommendations.
• Dealing with conflict of privacy goals and trust goals through Agent Mechanisms- Utilizing Pareto efficiency
Trust-Aware Recommender System Architecture
Taken from Massa, P., & Avesani, P. “Trust Metrics in Recommender Systems”. In Computing with Social Trust (pp. 259-285), 2009.
Trust[NxN]
Rating[NxM]
Trust Metric
EstimatedTrust [NxN]
INPUT
N:: usersM:: items
First Step Second Step
Trust-aware
SimilarityMetric
User Similarity
[NxN]
Rating Predictor
Pure Collaborative Filtering
Disguised Ratings[NxM]
OUTPUT
Private Trust-Metric
EstimatedPrivate Trust[NxN]
Private Trust-Aware Collaborative Filtering
PrivateRating
Predictor
PredictedRating[NxM]
Private Trust-Aware Recommender Architecture
Privacy Protection Methodology:Data Normalization with z-score
• Normalization of data is very critical to increase privacy level.
• To privacy protection, users employ data perturbation techniques. We propose to use normalized version of actual ratings to improve the privacy level.
• As a result, z-score values are utilized.
• *z-score of an item indicates how far and in what direction, that item deviates from its distribution's mean, expressed in units of its distribution's standard deviation.
*W. Du and H. Polat. Privacy-preserving collaborative filtering. International Journal of Electronic Commerce, 9(4):9-36, 2005.
Privacy Protection Methodology:Random Perturbations
• To disguise data, users add random numbers to z-scores. They select such random numbers from two different distribution which are Gaussian and uniform
• Since adding random numbers hides ratings of rated items, users add random ratings to hide unrated items.
• After disguising their private data, users compute trust between each other.
Private Trust Estimation:Trust Formalization
• Assume there are two users; ua and ub.
We formalize the trust between them as follows:
Private Trust Estimation: Trust Estimation
Private Recommendation Process: Producing Referrals
Mutual Effects of Trust and Privacy:Notion of Conflict• privacy and accuracy are conflicting goals
• Conflict - Trust metrics along at each step of trust estimation
increase or maintain the accuracy of predictions. - Increasing the amount of perturbations leads to further
information loss.
• Dealing with Conflict through Optimization - we can argue that an optimal setting can be defined
where privacy and accuracy can be both maintained at the same time
Optimization design space
• PCS (privacy configuration set)
• TCS(trust configuration set)
• Probrem space consists of all possible configurations:
Mapping Design Space to a Pareto Optimization Space
Inferring The Optimal Privacy Set
• Heuristic. To infer OPS, following heuristic is used:
1. Perturbing the overall user data using different PCS settings;
2. Observing the framework under variations of TCS; • (steps 2 & 3 are interchangeable depending on goals
at hand)
3. Perturbing the sparse user data with PCS inferred from step 2 allows for inferring OPS and finalizing the Pareto optimal setting
Evaluating the Recommendation Framework: Dataset
• Two sets of experiments: - First set demonstrates the effect of insertion of random
data on accuracy of predictions generated as output of the recommendation system.
- The second set of experiments demonstrates how filling unrated items with varying f values affect the overall accuracy of recommender system
• MovieLens dataset, http://www.grouplens.org/node/73- 943 user rating profiles, with more than 100000 rating
values. Rating values are on a 5 point scale.
Evaluating the Recommendation Framework: Recommender
• We used Trust Recommender from:- S. Fazeli, A. Zarghami, N. Dokoohaki, and M. Matskin,
"Elevating Prediction Accuracy in Trust-aware Collaborative filtering Recommenders through T-index Metric and TopTrustee lists," the Journal of Emerging Technologies in Web Intelligence (JETWI), 2010.
• a decentralized trust-aware recommender- T-index , as a trust metric for filtering trust between
users. Unlike previous approaches, - a trust network between users can automatically be
built from existing ratings between users. - a Distributed Hash Table (DHT) like list of trustees,
TopTrusteeList (TTL) [19] that wraps around the items, which are tasted similarly to those of current user.
MAE of recommendation framework, without adding any perturbations
n=2 n=3 n=5 n=10 n=20 n=500.850000000000001
0.860000000000001
0.870000000000001
0.880000000000001
0.890000000000001
0.900000000000001
0.910000000000001
0.920000000000001
T=0 T=25 T=50 T=100 T=200 T=500 T=1000
Zarghami, A., Fazeli, S., Dokoohaki, N., & Matskin, M. (2009). Social Trust-Aware Recommendation System: A T-Index Approach. In Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on (Vol. 3, pp. 85-90). IEEE Computer Society. doi: 10.1109/WI-IAT.2009.237.
MAE with added perturbations to user data, having Gaussian distribution
β=1 β=2 β=3 β=40.42
0.92
1.42
1.92
2.42
2.92
N=2, T=0 N=2, T=100 N=3, T=0 N=3, T=100 N=5, T=0
MAE with added perturbations to user data, having Uniform distribution
δ=1 δ=2 δ=3 δ=40.620000000000002
0.820000000000002
1.02
1.22
1.42
1.62
N=2, T=0 N=2, T=100 N=3, T=0 N=3, T=100 N=5, T=0
β=1 β=2 β=3 β=40.42
0.92
1.42
1.92
2.42
2.92
N=2, T=0 N=2, T=100 N=3, T=0
N=3, T=100 N=5, T=0
δ=1 δ=2 δ=3 δ=40.620000000000003
0.820000000000003
1.02
1.22
1.42
1.62
N=2, T=0 N=2, T=100 N=3, T=0
N=3, T=100 N=5, T=0
(δ, β) = (1,1)
Perturbing the overall user data using Gaussian and Uniform distributions (δ, β)
β=1 β=2 β=3 β=40.42
0.92
1.42
1.92
2.42
2.92
N=3, T=100
δ=1 δ=2 δ=3 δ=40.620000000000003
0.820000000000003
1.02
1.22
1.42
1.62
N=3, T=100
(δ, β) = (1,1)
Compare the results from MAE of framework under masked data
(n, t) = (3,100)
Filling Sparse Data with Random Gaussian distribution with respect to f
Half Density Full Density Double Density0.740000000000004
0.750000000000004
0.760000000000004
0.770000000000004
0.780000000000004
0.790000000000004
Fine-tuning the privacy
• Perturb the sparse user data with (δ, β, n, t) inferred from previous step for fine-tuning the privacy.
• we observe consistent increase in intervals of f which finalizes the choice of n, t, δ, β
• We finalize the results in • ordered set of n=3, t=100, δ=1, β=1 and f = [0, d]
Will be the Pareto front.
Inferring Optimality Set:Comparison with Non-Masked Results
• Optimality holds under Masked Data.
• Comparison with MAE results of non-masked framework with framework under masked results:- we inferred the optimum values for β=1, n=3 and t=100
and for these parameters MAE= 0.7994, while for similar parameters without adding perturbations we achieve MAE=0.881.
• MAE results are still less than results of MAE without adding perturbations. - we achieve the best results with MAE= 0.863 for
(n,t)=(50,100) and this value is still greater than our optimum value.
MAE results are still better than results of MAE without adding perturbations.
Conclusions
• A framework for addressing the problem of privacy in trust recommenders is proposed,
• Conflicting goals of privacy and accuracy,
• Through experiments we showed that we can infer such setting that holds even when trust recommender is not under privacy measures,
• As a result privacy can be introduced in trust recommenders and can be optimized to avoid private data loss and at the same time produce accurate recommendations
Thank you
Nima Dokoohakihttp://web.it.kth.se/~nimad/