improving customer decision in e-commerce: a collaborative filtering approach
TRANSCRIPT
Recap: Online Movie Shop The problem statement:
Too many movies presented to the customer to choose from. Naïve users easily make poor decisions.
Users have divided attention and therefore may not have enough time to make better decisions.
The proposed solution: Provide MORE ACCURATE AND QUALITY MOVIE RECOMMENDATIONS to the
customers.
By inculcating a recommender agent that considers implicit user feedback and temporal effects.
The agent should be able to learn the motivational factors behind customer preferences.
Recap: Online Movie Shop This literature review begs to answer the first four questions asked in the
research proposal document but first builds a general understanding of the recommender systems.
Literature Review Recommender Systems in the context of e-Commerce
Recommendation techniques Content–based
Collaborative Filtering (our interest) Neighbourhood Algorithms
Model–based Algorithms Matrix Factorization Algorithms
SVD
SVD++
timeSVD++
Comparison of the Matrix Factorization Algorithms
Presentation, browsing, explanation and visualization of the recommendations, and techniques that make the recommendation process more structured and conversational.
Recommender Systems Definition:
software agents that elicit the interests and preferences of individual consumers, and make recommendations accordingly
they have the potential to support and improve the quality of the decisions consumers make while searching for and selecting products online.
Data used refers to 3 kinds of objects: items, users and transactions Items refer to the recommended objects. In our case, items correspond to movies.
Users refer to the diverse range of customers to which personalised recommendations are made
Transaction is a recorded interaction between a user and the system Ratings are the most used form of response (implicit feedback): scalar, binary and unary
The Structure of the Learning System Known user preferences are
represented as a matrix of users and items
Each cell corresponds to the rating of user for item .
The task is to predict the missing rating for the active user .
So the attribute a user that can sufficiently be represented in the user-item matrix is the rating for a particular movie.
Recommendation Techniques Content-based, Collaborative Filtering, Hybrid Approaches
Content-based Approaches System learns to recommend items that are similar to the ones that the user liked
in the past.
The similarity of items is calculated based on the features associated with the compared items.
For example, if a user has positively rated a movie that belongs to the comedy genre, then the system can learn to recommend other movies from this genre.
Recommendation Techniques Collaborative Filtering Approaches
This approach recommends to the active user the items that other users with similar tastes liked in the past.
The similarity in taste of two users is calculated based on the similarity in the rating history of the users.
The techniques here are broadly categorised into: Neighbourhood Methods (e.g. kNN)
Model-based Methods (e.g. Matrix Factorization)
Neighbourhood Methods Focus on relationships between items or, alternatively, between users
Summary of the Algorithms Assign a weight to all users with respect to similarity with the active user.
Select users that have the highest similarity with the active user – commonly called the neighbourhood.
Compute a prediction from a weighted combination of the selected neighbours’ ratings.
Neighbourhood Methods The similarity metric can be computed using the Pearson correlation
coefficient as
Where is the set of items rated by both users i.e. , is the rating given to item by user , and is the mean rating given by user .
Then, we predict the rating for item by
Where is the prediction for the active user for item , is the similarity between users and , and is the neighbourhood or set of most similar users
Neighbourhood Methods
Limitations Limited coverage
Users can be neighbours only if they have rated common items
But users having rated a few or no common items may still have similar preferences
Also only items rated by neighbours can be recommended
Sensitivity to Sparse data The accuracy of neighbourhood-based recommendation methods suffers from
the lack of available ratings
Users typically rate only a small proportion of the available items
This is aggravated by the fact that users or items newly added to the system may have no ratings at all, a problem known as cold-start.
Improvements to CF Methods Item – based Collaborative Filtering
where is the set of all users who have rated both items and , is the rating of user on item , and is the average rating of the th item across users.
Then, the rating for item for user a can be predicted using a simple weighted average, as in:
where is the neighbourhood set of the items rated by a that are most similar to . For item-based collaborative filtering too, one may use alternative similarity metrics such as adjusted cosine similarity.
Improvements to CF Methods Significance Weighting
Here, we multiply the similarity weight by significance weighting factor, which devalues the correlations based on few co-rated items (Herlocker et al., 1999).
Default Voting Assume a default value for the rating for items that have not been explicitly rated
then compute correlation using the union of items rated by users being matched as opposed to the intersection
Inverse User Frequency Universally liked items or items that have been rated by all users are not as
useful as less common items
It is applied to similarity based CF by multiplying the original rating of item with the factor ; where is the number of users who have rated item out of the total number of users
Model – based Methods Provide recommendations by estimating parameters of statistical models for
user ratings
These methods transform both items and users to the same latent factor space.
The latent space is then used to explain ratings by characterizing both items and users in terms of factors automatically inferred from user feedback.
Models for the rating data are learnt by fitting the previously observed ratings.
Baseline Predictors (Biases) Much of the observed rating values are due to effects associated with either users
or items, independently of their interaction.
Where the parameters and indicate the observed deviations of user and item, respectively, from the average
Model – based Methods Baseline Predictors We estimate and, by solving the least square problem:
Where the first term strives to find’s and’s that fit the given ratings.
The regularization term, avoids over-fitting by penalizing the magnitudes of the parameters. is a constant that controls regularization and is determined by cross validation
This least square problem can be solved fairly efficiently by the method of stochastic gradient descent.
Matrix Factorization Algorithms Matrix Factorization techniques fall under latent factor models.
These techniques approach collaborative filtering with the holistic goal to uncover latent features that explain observed ratings
They assume that the similarity between users and items is simultaneously induced by some hidden lower-dimensional structure in the data
For example, the rating that a user gives to a movie might be assumed to depend on few implicit factors such as the user’s taste across various movie genres.
The algorithm include Singular Value Decomposition (SVD)
SVD++
Time-Aware SVD++ (timeSVD++)
Singular Value Decomposition (SVD) This algorithm is based on a model maps both users and items to a joint latent
factor space of dimensionality, such that user-item interactions are modeled as inner products in that space.
The latent space tries to explain ratings by characterizing both products and users on factors automatically inferred from user feedback
Accordingly, each item is associated with a vector, and each user is associated with a vector.
For a given item, the elements of of measure the extent to which the item possesses those factors, positive or negative.
For a given user, the elements of measure the extent of interest the user has in items that are high on the corresponding factors (again, these may be positive or negative).
Singular Value Decomposition (SVD) The resulting dot product,, captures the interaction between user and item
—i.e., the overall interest of the user in characteristics of the item.
The final rating is created by also adding in the aforementioned baseline predictors that depend only on the user or item.
Thus, a rating is predicted by the rule:
In order to learn the model parameters (,, and ) the algorithm minimizes the regularized squared error:
The constant, which controls the extent of regularization, is usually determined by cross validation. Minimization is typically performed by either stochastic gradient descent or alternating least squares.
Singular Value Decomposition (SVD) The algorithm then loops over all known ratings in, computing:
is a constant representing the learning rate.
SVD++ This algorithm improves on SVD by incorporating implicit user feedback in
learning the user preferences.
It possible to capture a significant signal by accounting for which items the users rate, regardless of their rating value
To this end, a second set of item factors is added, relating each item to a factor vector.
Those new item factors are used to characterize users based on the set of items that they rated. The exact model is represented as:
The set contains the items rated by user.
SVD++ Now, a user is modeled as.
The algorithm uses a free user-factors vector, , which is learnt from the given explicit rating.
This vector is complemented by the sum, which represents the perspective of implicit feedback. Since the’s are centred on zero by the regularization, the sum is normalized by, in order to stabilize its variance across the range of observed value
The algorithm determines the model parameters by minimizing the associated regularized squared error function through stochastic gradient descent
SVD++ Several types of implicit feedback can be simultaneously introduced into the
model by using extra sets of item factors.
For example, if a user has a certain kind of implicit preference to the items in (e.g., she/he bought them), and a different type of implicit feedback to the items in (e.g., she/he browsed them), we could use the model:
The relative importance of each source of implicit feedback will be automatically learned by the algorithm by its setting of the respective values of model parameters.
timeSVD++ This algorithm lends itself well to modeling temporal effects, which
significantly improves prediction accuracy.
It defines a template for a time sensitive baseline predictor for user’s rating of item at day or time:
Here and are real valued functions that change over time, parameterized by temporal effects that either span extended periods of time or more transient effects.
→ split the item biases into time-based bins
→ the linear function to capture a possible gradual drift of user bias
→ smooth functions for modeling the user bias
timeSVD++ Our baseline predictor for modeling the drifting user bias becomes:
To learn the involved parameters, ,, , and, the algorithm solves
Here, the first term strives to construct parameters that fit the given ratings. The regularization term, , avoid over-fitting by penalizing the magnitude of the parameters, assuming a neutral 0 prior.
timeSVD++ Following the above modeling,
Here captures the stationary portion of the factor, approximates a possible portion that changes linearly over time, and absorbs the very local, day-specific variability.
At this point, we can tie all pieces together and extend the SVD++ factor model by incorporating the time changing parameters. The resulting model will be denoted as, where the prediction rule is as follows:
Comparison of Matrix Factorization Algorithms
ModelSVD 0.9140 0.9074 0.9046 0.9025 0.9009SVD++ 0.9131 0.9032 0.8952 0.8924 0.8911timeSVD++
0.8971 0.8891 0.8824 0.8805 0.8799
Conclusion Matrix factorization can be applied in e-commerce to provide quality
recommendations which can significantly improve customer decisions.
timeSVD++ is the best Matrix Factorization algorithm since it provides more accurate recommendations since it considers user and item biases, user’s implicit feedback and temporal effects.
With this algorithm, we can significantly provide more accurate movie recommendations that will translate to improved customer decision our Online Movie Shop.
References Bennet, J., & Lanning, S. (2007). The Netflix Prize. KDD Cup and Workshop. Retrieved from www.netflixprize.com
Berkovsky, S. (2009). Mediation of User Models: for Enhanced Personalization in Recommender. VDM Verlag.
Mulang', O. I. (2014). Social Media Temporal Absence Recommendation: a Collaborative Filtering Perspective. Jomo Kenyatta University of Agriculture and Technology.
Burke, R. (2007). Hybrid Recommender Systems: Survey and Experiments. In User Modeling and User-Adapted Interaction (pp. 331-370).
Christian, D., & George, K. (2011). A Comprehensive Survey of Neighborhood-based. In R. Francesco, R. Lior, & S. Bracha, Recommender Systems Handbook (pp. 107-140). Springer.
Francesco, R., Lior, R., & Bracha, S. (2011). Introduction to Recommender Systems Handbook. In R. Francesco, Recommender Systems Handbook (pp. 1-35). Springer Science + Business Media, LLC.
Friedrich, G., & Zanker, M. (2011). A taxonomy for generating explanations in recommender systems. AI Magazine, 32(3), 90-98.
Grouplens. (1994). An open architecture for collaborative filtering of netnews. CSCW, ACM SIG Computer Supported Cooperative Work,.
Koren, Y., & Robert, B. (2011). Advances in Collaborative Filtering. In R. Francesco, R. Lior, & S. Bracha, Recommender Systems (pp. 145-184). Springer.
Mahmood, T., Ricci, F., Venturini, A., & Hpken, W. (2008). Adaptive recommender systems for travel. In P. O. Connor, & U. G. W. Hpken, Information and Communication (pp. 1-11). Springer.
Melville, P., & Sindhwani, V. (2010). Recommender Systems. In Encyclopedia of Machine Learning. Heidelberg: Springer-Verlag Berlin.