final report 6snap.stanford.edu/class/cs224w-2014/projects2014/... ·...
TRANSCRIPT
Item Graph Based Recommendation System for Bookopolis
David Z Liu ([email protected]) Gurbir Singh ([email protected])
Introduction: This paper introduces a novel graph based collaborative filtering (CF) recommendation system. A directed and weighted item-‐based graph is constructed using similarity scores calculated from book rating data gathered from Bookopolis, an online reading website for children between age seven and twelve. The weight of the edge is determined by the similarity between two items, and the direction of the edge is defined by the popularity of the items. The graph-‐based approach yields better recommendation even when the rating data is relatively sparse. The directed nature of the graph improves the quality of the search by recommending items that are not only suitable to users’ taste but are also more popular among all items. We compared recommendation results generated from a standard item-‐based CF recommendation system, an undirected item-‐graph CF recommendation system and a directed item-‐graph CF recommendation system. Based on the data we have collected, item-‐graph CF recommendation system outperforms the standard item-‐based CF recommendation system. Motivation: As online educational materials are quickly becoming the principal source of learning, accurate recommendation systems within educational context will be central to the success of any online learning systems. The objective of this paper is to outline a book recommendation system for seven to twelve year old students. This recommendation system can enhance student learning by providing just-‐in-‐time personalized information retrieval of vetted content. Various approaches for recommendation systems have been studied extensively, and among those approaches item-‐based Collaborative Filtering (CF) is generally considered as one of the most successful way to build recommendation systems. [3] CF recommendation system provides recommendations for a specific user based on his/her previous opinions
and the opinions of other like-‐mined users. One of the key challenges in collaborative filtering is the inherent sparsity of the rating data. A typical user will only rate a very small portion of the item in the overall collection on a popular website. It is important for CF to make accurate ratings predications in the presence of limited data. Most of the CF algorithms are not adopted universally because they produce inaccurate rating in the presents of highly sparse rating data. [1] In this paper, we describe a novel implementation of directed and weighted item-‐based graph structure that addresses the problem of inherent data sparisty and enhances the recommendation quality. We used data collected from Bookoplis, a popular online book-‐reading site for children age between seven to twelve. Each node in the item-‐based graph represents a specific book in the Bookopolis data set and a pair of nodes are linked together based on their similarity; the direction of the link is decided based on the popularity of the item with the more popular item being the in-‐node and less popular item being the out-‐node. The weight of each link is the similarity score between the two items. The underline idea of our approach is to recommend books that are not only similar but also better. By devising a recommendation scheme that exploits transitive relationships between multiple nodes, we address the issue of inherent data sparsity. We compared recommendation results generated from three different recommendation system implementations: item-‐based CF implementation, undirected and weighted item-‐graph implementation and directed and weighted item-‐graph implementation. Based on the results obtained so far, we observed that the standard item-‐based CF is highly susceptible to data sparsity and the item-‐graph structure showed promising potential to address the issue of sparsity. Augmenting the user item matrix with additional attributes improves the recommendation accuracy.
Related Work: In this section we briefly present some of the research literature related to item-‐based collaborative filtering recommendation systems and graph-‐based approaches. In [1], the Aggarwal et al introduced one of the first graph-‐based techniques to overcome the problems arising in neighborhood collaborative filtering due to sparsity. The basic idea is to form and maintain a directed graph whose nodes are users and whose directed edges between nodes are degrees of similarity between two users. Recommendations are induced by connecting a user through multiple nodes to find another user who rated the same item. As the graph walks through users who have not rated the item in question, a linear transformation is constructed that translates ratings from one user to the other. The exploration of transitive relationship between multiple users was not considered in the traditional k-‐nearest neighbor approach, and it is this feature that effectively combats the inherent rating data sparsity. For the collaborative filtering process to be effective, the length of directed path needs to be small, and can be found effectively by breadth first search algorithm. If a direct path is not found within some predefined small distance, the algorithm will terminate and return empty recommendation result. The results of synthetic simulation experiments used to evaluate the graph-‐based CF algorithm showed that graph-‐based approach produced better perdition data than two well-‐established CF approaches, LikeMinds and Firefly. B.M. Sarwar et al. introduce one of the first item-‐based CF recommendation system to address the issue of needing to compute millions of recommendation at run time. [2] The item-‐based approach first looks into the set of items the target user has rated and computes how similar they are to other items in the system. The final prediction value is computed by taking a weighted average of the target user’s ratings on the top k most similar items. The static nature of items in a typical E-‐commerce site allowed B.M Sarwar et al to pre-‐compute the item similarities and performing a quick table look-‐up at run time to retrieve the required similarity value. The
experiments conducted by B.M Sarwar et al showed that item-‐based algorithms provided better quality and performance than the best available user-‐based algorithms. In [3] Wang et al constructed a CF system based on the concepts of item-‐based and graph-‐based approaches; this system has the potential to combat the data sparsity still suffered by item-‐based approach described in [2] while achieving high scalability. They built a weighted undirected graph where nodes are items and the edges represent the pairwise item relationships. Associated with each edge is a nonnegative weight that encodes the pairwise similarities. The standard cosine similarity formula is used to compute the weight between each pair. Two items is declared similar if they are connected by an edge with a weight larger than some predefined threshold; or if there exists a path connecting two items with weights on each edge that constitutes this path larger than a threshold. Experimental evaluation conducted by Wang et al showed that item-‐graph based recommendation system produced better recommendation results than traditional item-‐based methods. Model: Bookopolis Data Set Analysis The data has been provided by Bookopolis, it contains information for 3648 active users. The data is stored in a relational database system that contains 36 relations. There are roughly 7782 books that are read by the users. The main table we used to build the recommendation systems is the “bookshelfitem” table. Figure 1 shows all of the table names in the Bookopolis database and the “bookshelfitem” table’s attributes. “bookshelfitem” table contains the information about users and the books that they have an interest in. The interest is captured by the attribute “readingStatus”. For every user, the “readingStatus” field contains ALREADY_READ, READING or WANT_TO_READ. This table contains information about 34k user-‐book interactions. We have processed our data set into a user-‐item matrix as shown in table 1, this matrix forms the
central structure for all of the similarity computations. While constructing the user-‐item matrix for the recommendation engine, we used the “readingStatus” field as the basic rating by the users, which is represented as 1 in basic cosine calculation.
Figure 1: Bookopolis Data Set
Table 1. Item-‐user matrix Web Crawling: To combat the sparsity (as shown in Fig 3 in the results section) inherent in the Bookopolis data set, we crawled the web to augment our data with additional attributes. We used the public API from the popular book review website goordreads (https://www.goodreads.com/api). The crawler
retrieves a given book’s rating, title, author and the number of users who rated the book. We gathered this information for every book on users’ bookshelf. We incorporated the normalized rating values into our user-‐item matrix as an additional row. Algorithm and Implementation: The similarity score between items is obtained by computing the cosine similarity (EQ 1) between two books using the user-‐item matrix as described in the model section. Cosine similarity between two items i and j ∶
s!" = ı ∙ ȷı ! ȷ !
EQ1
Item-‐based CF System: Using EQ1 and table 1, we computed similarity score for each item pair and stored the results. We implemented recommendation system using the standard item-‐based CF method described in [2]. Each user-‐item entry will get a value of 1 as long as the user has read the book and 0 otherwise. For each item we kept the top K most similar items as recommendation candidates based on the similarity scores. Figure 2 in result section shows the precision and recall measures of our recommendation system as we vary K. Undirected Weighted Item-‐Graph CF system Using similarity computed with EQ1, we created the undirected Item based graph using SNAP. The item graph is an undirected weighted graph G = (V, E) where
-‐ V is the book set,where each book is regarded as a node on the graph G
-‐ E is the edge set. Associated with each edge E!" is a weight W!" subject to W!" ≥0 and W!" = W!" W!" is the similarity between two nodes
-‐ Two nodes q and p are connected if W!" > δ, where δ is a tunable parameter
To improve the performance of the recommendation engine, we can sparisfy the graph by keeping
similarities between each item 𝑖 𝜖 𝑉 and its k most similar items, and then setting the similarities between 𝑖 and the rest of the items to be zero, i.e.
𝑊!" =𝑤!" , 𝑖! ∈ 𝜅 𝑖! 𝑜𝑟 𝑖! ∈ 𝜅 𝑖! 0, 𝑖! ∉ 𝜅 𝑖!
Here 𝜅 𝑖! 𝑎𝑛𝑑 𝜅 𝑖! represents the set that contains k most similar items of 𝑖! 𝑎𝑛𝑑 𝑖! . Since we only have 7782 books, sparsification is not needed. To make recommendations for a given user, we applied the following heuristics:
-‐ Pick one book from the evaluation set and get the node in the graph that corresponds to it.
-‐ Get all of its direct neighboring nodes and add them to the recommendation set.
-‐ Because the data is sparse, it is likely that we do not get enough recommendations. In this case, we perform a breadth first search to traverse through neighbor of neighbors. To minimize the computation complexity, we only allow maximum of three hops per recommendation.
-‐ As every edge has a weight associated with it, we compute an overall score for each recommendation pair using EQ3 (P!"!!"#$ !" ! is set to 0, it is only relevant for directed graph). We model the strength of similarity of the node as an exponentially decaying function to capture the fact that the further out a node from the source node, the less similar it is.
-‐ Top k scored item will be the final recommendation presented to the user
Directed Weighted Item-‐Graph CF system Directed graph’s edge structure is the same as the undirected graph except that each pair of nodes is connected with a direction. In the directed item-‐graph, two nodes are connected together if their similarity score is greater than some predetermined threshold (W!" > δ) . The direction of the link is determined based on the popularity of the item, with the more popular item being the in-‐node and less popular item being the out-‐node. Popularity of each node is obtained by computing a linear combination of weighted normalized attributes using EQ2. We
incorporated into our popularity calculation the data crawled from goodread site. If P! > P! then the edge between book A and B will look like𝐵 → 𝐴 . The intuition behind this graph is to recommend books that are not only similar but also better/popular Popluarity of item i ∶ P!= β ∗
bookpoli rating of i ∗ number of user rated ibookpoli rating of i ∗ number of user rated i!
+ γ ∗goodread rating of i ∗ number of user rated igoodread rating of i ∗ number of user rated i!
EQ2 β, γ are parameterization values for each component. The optimized value in our case are β = 0.45 and γ =0.70. We found recommendations using the following process:
-‐ For a given item i, first collect all of its’ in and out nodes. This will keep the most similar items regardless of their popularity.
-‐ Nodes can only be traversed using edge directions in our graph. In order to make the problem more tractable we limit the hops to 3. In this case, we only allow more popular nodes to reach out to the next two connected layer in the network.
-‐ We use EQ3 to keep a score for each selected recommendation candidate. We model the strength of similarity and popularity of the node as an exponentially decaying function to capture the fact that the further out a node from i the less similar it is.
-‐ Top k scored item will be the final recommendation presented to the user
link score for a pair of node i, j with w connected links: Score! =
!
!! !"#"$%&"'(!!!!"!!"#$ !" !!!!!
!! EQ3
λ is the parameterization value . The optimized value in our case is 2.5. Evaluation: Data: We reserved 30% of our data set for evaluation purpose. The items on user’s bookshelf represent the
gold standard to measure the correctness of item recommendation. We consumed 70% of the users’ bookshelf data to train our model. Precision and Recall: We started with both precision and recall measurement methods to evaluate the quality of the recommendation engine. Precision (P) is the fraction of retrieved documents that are relevant: Precision = #(!"#"$%&' !"#$% !"#!$%"&)
#(!"#!$%"& !"#$%) EQ5
Recall (R) is the fraction of relevant documents that are retrieved: Recall = #(!"#"$%&' !"#$% !"#!$%"&)
#(!"#"$%&' !"#$%) EQ6
Computation Procedure:
-‐ Take a book from the user shelf (evaluation set).
-‐ Generate recommendations using cosine similarity or weighted undirected graph or weighted directed graph.
-‐ Calculate precision by comparing the recommendations with the user’s bookshelf. More matches means better precision number. We use total number of retrieved books as the denominator in the precision calculation EQ5.
-‐ Calculate recall by comparing recommendations with the user’s bookshelf again, instead of using total number of retrieved books as the denominator; we use the user’s bookshelf as the denominator to in the recall calculation EQ6. (user’s bookshelf is a subset of the training set)
Limitations of Precision and Recall: Precision and Recall metrics provide very limited insight into the effectiveness of our approach because we are treating user’s bookshelf as ultimate source of truth. This may be valid for some users, but most of the users only have a few books on their bookshelf, and an exact match with recommended books set is
unlikely, this leads to poor precision and recall numbers. After closely examining our data set, we have observed that precision and recall measurements are not sufficient for us to evaluate the effectiveness of our methods. A set of book recommendations may not exactly match user’s bookshelf items; however, they could nonetheless be very similar to the given user’s interest. To truly capture the quality of the recommendation set we need to evaluate the recommendation set based on their similarity to the books on user’s bookshelf. Alternate approach using cosine similarity: To overcome the above limitation, we devised an alternate evaluation scheme. The basic intuition behind the new scheme is to measure the similarity between the books retrieved with our algorithm and the books in the evaluation set. Since the books in the evaluation set represent a particular user’s interests, this measurement directly measures whether the books returned by the recommendation engines suits the interests of a given user. The procedure below lists the mathematical details of the new evaluation metrics.
-‐ Take a book from the user shelf (evaluation set).
-‐ Generate recommendations using item based CF or weighted undirected graph or weighed directed graph.
-‐ Compare every recommended book with the items on users’ bookshelf and retain the maximum similarity score.
-‐ Take the average similarity score over all the recommended books. Similarity of a recommendation set with the bookshelf can be calculated as:
11 + 1 − 𝜆
!!!!
𝑘, 𝐸𝑄 7
Where k is number of recommended books and 𝜆 is the maximum cosine similarity between recommended book 𝑖 ∈ 𝑘 and the book on the evaluation set.
EQ7 is maximized to 1, when the recommended books matched exactly with the book on user’s bookshelf i.e. 𝜆 = 1. It is minimized when we have a total mismatch i.e. 𝜆 = 0 . Therefore, EQ7 is an indicator that tells us how close is our recommendation result to the best case. Baseline: To determine the relevance and accuracy of our recommendation methods we compared the results generated by EQ7 to the case of random recommendation selection. We randomly picked k books from our set and fed them into EQ7. The result of the random selection serves as the baseline for us to evaluate the effectiveness of our algorithms. Results: We implemented three variations of recommendation engine: an item-‐based CF recommendation system, an undirected weighted item-‐graph recommendation system and a directed weighted item-‐graph recommendation system. Item-‐based CF system: The recommendation accuracy for both recall and precision is relative low for small k values and recall increases as we increase k while precision decreases, this behavior is in line with the nature of EQ5 and EQ6. (Figure 2) This result indicates that the top-‐10 results from cosine calculation do not reliably produce the top candidates. Detailed examination of the cosine score distribution reveals that there are three types of cosine scores that dominate the distribution. The distribution of cosine score is plotted in figure 3. This distribution plot indicates that most of the books either have one, two or three valid ratings, thus, most of the books have same cosine scores. This result indicates that our user rating data is sparse, which could explain the low accuracy in the low k region. To enhance our similarity computation we incorporated rating data crawled from goodread API into the user-‐item matrix. We computed similarity using the following heuristics: for two given books, if the difference between two rating score is greater than certain threshold then we assign a 0 to the rating
entry for the similarity calculation, otherwise we assign a 1. The intuition behind this approach is to give more similarity weights to books that received similar reviews. We found a threshold value of 0.35 gives the best result. Figure 4 shows Precision and Recall vs. K using this new similarity computation; we can see that this approach slightly improved the accuracy for small K. We used this expanded matrix for the cosine score calculation in both item-‐graphs.
Figure 2
Figure 3
Figure 4
Undirected Weighted Item-‐Graph CF system We experimented with different values of 𝛿 for both undirected and directed graph. We found the optimal value for 𝛿 to be around 0.00165. We observed high clustering coefficient (0.507) in the bookopolis graph. This is in line with any real graphs, as the triadic closures tend to occur more often. Roughly 92% of the books are present in the graph and the expected degree distribution value after 𝛿 optimization is around 6.18. We have evaluated the recommendation results using recall and precision, and we also measured the effectiveness of our recommendation using the closeness of similarity measures captured by EQ7. We have divided our users into different subgroups of users who have read 1-‐5 books, 6-‐11 books, 12-‐17 books and so on. We measured the average EQ7 value for each subgroup. The precision and recall results are shown in Figure 5 and Figure 6 respectively. The results obtained using EQ7 (which represent the closeness in similarity of the result to the evaluation data) is shown in figure 7. For this calculation we fixed K to be 10. Additional discussions of results are presented in the “analysis and future work” section. Directed Weighted Item-‐Graph CF system Directed graph’s follows undirected graph with the addition of direction between edges. For the given data, we found that the directed graph has one SCC, which comprised of around 92% of total nodes. For directed graph, we measured the precision, recall and closeness of similarity measures as captured by EQ7. The precision and recall results are shown in figure 5 and Figure 6 respectively. The result obtained using EQ7 is shown in figure 7. For this calculation we fixed k to be 10. Additional discussions of results are presented in the “analysis and future work” section
Figure 5
Figure 6
Figure 7 Analysis and Future work Result Analysis The precision and recall of the graph approaches saturates for large k, this is because we are limiting the maximum depth of traversal to be 3 and the
graph contains 92% of the books. For large K the total number books retrieved using graph approach reaches an upper limit regardless of the value of K because our algorithm forces a termination at 3 hops. In the low K region, the item-‐graph approach performs slightly better compared to the standard cosine similarity recommendation method. In the large K region, the directed graph saturates to a higher precision value because the traversal process is further limited by the direction of the graph, therefor the maximum number of books that can be retrieved from the directed graph, is less than the undirected graph. The same argument also explains their difference in the high k region of the recall graph. In general, the precision and recall between all three methods exhibits similar behavior. The closeness of similarity measure captured by EQ7 is a better way to observe the effectiveness of different methods. In figure 7, we clearly see that all three-‐similarity recommendation methods return books that are closer to user’s interest as compared to random recommendation. In addition, figure 7 shows that both graph approaches produce better recommendation (measured in terms of closeness to user interest rather than exact match to the user’s bookshelf) compared to the standard cosine similarity method. From directed graph curve in figure 7, we can see that beginner/infrequent reader’s tends to read more popular books but a well-‐read user is not swayed by the popularity as much. We can see that directed graph matches users’ interest better than all other methods when users have less than 11 books on their bookshelf, and as user’s bookshelf item increases beyond 17 the undirected graph produces the best results. These results give us an insight into the general reading behavior of a typical user based on their reading habits. Furthermore, all recommendation methods, including the random method, in figure 7 exhibits an increasing pattern as the number of items on user’s bookshelf increases. This behavior is expected, the more books user has on his/her bookshelf, the more likely that the user has wide range of interests, and therefore it is more likely for a retrieved book to match with one aspect of user’s interest. When user’s bookshelf is
large enough, a randomly selected book is just as likely to match a specific book on the bookshelf as a book selected from undirected graph. Future Work: One of the key limitations while implementing recommendation engine was the data size. We only have 3648 active users and 7782 books. This is very small data set when compared with real world data sets. It will be interesting to see how our approaches will scales when applied to bigger data sets. Furthermore, the smaller data size also eliminates the concerns associated with computation efficiency; to be able to finish graph traversal during run time is a challenge when data size is large. There are many other opportunities to expand the user-‐item matrix further to reduce sparsity. For instance, we could also include author information as a row in the matrix and we could also include the title of the books as an additional row. However, a somewhat involved process of linguistic preprocessing of the title is needed to enable matching similar titles. Reference: [1] Aggarwal, C.C., Wolf, J.L., Wu, K., Yu, P.S.: Horting hates an egg: A new graph-‐theoretic approach to collaborative filtering. In: Proc. of the 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD 1999, pp. 201-‐212. ACM, New York (1999) [2] Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item0based collaborative filtering recommendation algorithms. In: Proc. of the WWW Conf. (2001) [3] Wang, F., Ma, S., Yang, L., Li, T.: Recommendation on item graphs. IN. Proc. of the Sixth Int. Conf. on Data Mining, ser. ICDM 2006, pp. 1119-‐1123. IEEE Computer Society, Washington, DC (2006) [4] D. Liben-‐Nowell, J. Kleinberg.: The Link Prediction Problem for Social Networks. Proc. 12th International Conference on Information and Knowledge Management (CIKM), 2003.