social recommender system

Download Social recommender system

Post on 10-May-2015




1 download

Embed Size (px)


Breif about mechanism behind recommender you tube ,amazon uses this techonology.


  • 1.Tag Based Social Recommender System(RS) Project Mentor Ms Pragya Dwivedi By Aditi Gupta Anirudh kanjani Abhinav Vasu Rawat Kapil kumar Ashutosh Singh

2. Agenda Recommender systems- overview Usefulness of Recommender Systems(RS) Types of RS Relation with information architecture Limitations and possible improvements Relation with Social Networking 3. What are they and Why are they Recommender systems provide a way for information filtering that attempts to present information that are likely of interest to the user. Its advantages are: Enhances user experience Assists users in finding information Reduces search and navigation time Increases productivity Increases credibility Mutually beneficial proposition 4. Types of Recommender Systems(RS) 5. Content based RS Highlights Recommend items similar to those users preferred in the past User profiling is the key Items/content usually denoted by keywords Matching user preferences with item characteristics works for textual information Vector Space Model widely used 6. Content based RS Limitations Not all content is well represented by keywords, e.g. images Items represented by same set of features are indistinguishable Overspecialization: unrated items not shown Users with thousands of purchases is a problem New user: No history available Shouldnt show items that are too different, or too similar 7. Collaborative RS Highlights Use other users recommendations (ratings) to judge items utility Key is to find users/user groups whose interests match with the current user Vector Space model widely used (directions of vectors are user specified ratings) More users, more ratings: better results Can account for items dissimilar to the ones seen in the past 8. Collaborative RS Limitations Different users might use different scales. Possible solution: weighted ratings, i.e. deviations from average rating . Finding similar users/user groups isnt very easy. New user: No preferences available. New item: No ratings available. 9. Hybrid RS Uses both content based and collaborative filtering. Introduced to avoid the limitations found in both content and collaborative methods. Example: Netflix- makes recommendations by comparing the watching and searching habits of similar users (i.e. collaborative filtering) as well as by offering movies that share characteristics with films that a user has rated highly (content-based filtering). 10. Other Variations of RS Cluster Models Create clusters or groups. Put a customer into a category. Classification simplifies the task of user matching. More scalability and performance. Lesser accuracy than normal collaborative filtering method. 11. Possible Improvement in RS Better understanding of users and items Social network (social RS) 1. User level Highlighting interests, hobbies, and keywords people have in common 2. Item level link the keywords to ecommerce (by RS algorithms) 12. What is tag? A tag is a piece of information that describes the data or content that it is assigned to. Tags are nonhierarchical keywords used for Internet bookmarks, digital images, videos, files and so on. A tag doesn't carry any information or semantics.Tagging serves many functions, including: Classification Marking ownership Describing content type Online identity 13. About tagging Labeling and Tagging are done to aid in classification, marking, ownership, noting boundaries and indicating online identity. They may take the form of words, images or marks. Online & internet databases deploy them as a way for publishers to help users to find content. 14. Where they are used? Social bookmarking :- provides users to add tags to their bookmarks. Flickr :- allows users to add their own text tags to each of their pictures, constructing flexible & easy metadata that makes pictures highly searchable. YouTube :- also implements tagging. They categorise content using simple keywords. The users add tags which are visible and themselves link to other items that share that keyword tag. 15. Examples Within a Blog : - Many blog systems allow authors to add free-form tags to a post. For example, a post may display that it has been tagged with baseball and tickets. For an event :- An official tag is a keyword adopted by events to use in their web applications, such as blog entries, photos of the event and persentation slides. In research :- Associate an item with a small no of themes, then a group of tags for these themes can be attached. In this way free form classification allows author to manage large amounts of information. 16. Tag types Triple Tags : - Triple tag or Machine tag uses a special tag to define extra semantics information about the tag, making it more meaningful for interpretation. Triple tags comprise of - a namespace , a predicate & a value . 17. Tag types Hash Tag : - Word or phrase prefixed with #. Form of metadata tag. Short messages on social networking such as twitter , facebook may be tagged by putting #. before important words. Hash tag provides a means of grouping such messages since one can search for hash tags and get the set of messages that contain it. Knowledge tag : - it is a type of meta information that describes or defines some aspect of information resource. They are the type of metadata that captures knowledge in the form of descriptions, classification, comments, notes, hyperlinks etc. 18. Information Retrieval Systems Information retrieval is the activity of obtaining information resources relevant to an information need from collection of information resources. Searches can be based on metadata or on full text. 19. The Information Retrieval Cycle Source SelectionResource Query FormulationQuery SearchRanked List SelectionDocumentsquery reformulation, relevance feedbackresult11/27/2013Introduction to Information Retrieval19 20. Search Process Source SelectionResource Query FormulationQuerySearchIndexingIndexRanked ListSelectionDocumentsResults Document Collection Slide is from Jimmy Lins tutorial 11/27/2013Introduction to Information Retrieval20 21. Implementation-How Recommender System Works In case we use content based filtering Cosine similarity formula is utilized as followsWhere wc and ws are TF-IDF weight vectors 22. Implementation-How Recommender System Works In case we use collaborative filtering Pearson similarity formula is used as follows sim(x,y)-similarity between user x and y rx,s rating for item s given by user x ry,s rating for item s given by user y ry- mean of all ratings by user y rx- mean of all ratings by user x 23. Implementation-How Recommender System Works 24. Similarity Model Vector-space model This is a model that allows us to extract documents based on the tags given by a user through a query. Vector space model uses TF-IDF weights to categorise the documents into relevant and nonrelevant ones. The end result is the document(s) having best similarity with the tags given in the query.11/27/2013Introduction to Information Retrieval24 25. The Vector-Space Model Assume t distinct terms remain after preprocessing; call them index terms or the vocabulary. These orthogonal terms form a vector space. Dimension = t = |vocabulary| Each term, i, in a document or query, j, is given a real-valued weight, wij. Both documents and queries are expressed as t-dimensional vectors: dj = (w1j, w2j, , wtj) 25 26. Document Collection Acollection of n documents can be represented in the vector space model by a term-document matrix. An entry in the matrix corresponds to the weight of a term in the document; zero means the term has no significance in the document or it simply doesnt exist in the document.T1 T2 . w11 w21 D1 wt1 D2 w12 wt2 : : : : Dn w1n wtnTtw22 : : w2n : : 26 27. Issues for Vector Space Model How to determine important words in a document? Word sense? Word n-grams (and phrases, idioms,) terms How to determine the degree of importance of a term within a document and within the entire collection? How to determine the degree of similarity between a document and the query? In the case of the web, what is a collection and what are the effects of links, formatting information, etc.? 27 28. Term Weights: Term Frequency More frequent terms in a document are more important, i.e. more indicative of the topic. fij = frequency of term i in document jMay want to normalize term frequency (TF) by dividing by the frequency of the most common term in the document: TFij = fij / maxi{fij}28 29. Term Weights: Inverse Document Frequency Terms that appear in many different documents are less indicative of overall topic. df i = document frequency of term i = number of documents containing term i IDFi = inverse document frequency of term i, = log2 (N/ df i) (N: total number of documents) An indication of a terms discrimination power. Log used to dampen the effect relative to tf.29 30. TF-IDF Weighting A typical combined term importance indicator is TFIDF weighting: wij = TFij -IDFi = TFij log2 (N/ dfi) A term occurring frequently in the document but rarely in the rest of the collection is given high weight. Many other ways of determining term weights have been proposed. Experimentally, TF-IDF has been found to work well.30 31. Computing TF-IDF - An Example Given a document containing terms with given frequencies: A(3), B(2), C(1) Assume collection contains 10,000 documents and document frequencies of these terms are: A(50), B(1300), C(250) Then: A: TF = 3/3; IDF = log2(10000/50) = 7.6; TF-IDF = 7.6 B: TF= 2/3; IDF = log2 (10000/1300) = 2.9; TF-IDF = 2.0 C: TF= 1/3; IDF= log2 (10000/250) = 5.3; TF-IDF = 31 32. Performance and Correction Measures Precision- is the fraction of documents retrieved that are relevant to the users information need. Recall- Recall is the fraction of the documents that are relevant to the query that are successfully retrieved F-Measure Mean Absolute Error(MAE) 33. Precision vs. Recall All docsRetrievedRecall|