Personalizing & Recommender Systems

Download Personalizing  & Recommender Systems

Post on 05-Jan-2016

26 views

Category:

Documents

4 download

Embed Size (px)

DESCRIPTION

Personalizing & Recommender Systems. Bamshad Mobasher Center for Web Intelligence DePaul University, Chicago, Illinois, USA. Personalization. The Problem - PowerPoint PPT Presentation

TRANSCRIPT

Limited Knowledge Profile Injection Attacks in Collaborative Filtering Systems

Personalizing & Recommender SystemsBamshad Mobasher

Center for Web Intelligence

DePaul University, Chicago, Illinois, USA12PersonalizationThe ProblemDynamically serve customized content (books, movies, pages, products, tags, etc.) to users based on their profiles, preferences, or expected interests

Why we need it?Information spaces are becoming much more complex for user to navigate (huge online repositories, social networks, mobile applications, blogs, .)For businesses: need to grow customer loyalty / increase salesIndustry Research: successful online retailers are generating as much as 35% of their business from recommendations

23Recommender SystemsMost common type of personalization: Recommender systems

RecommendationalgorithmUserprofile34Common ApproachesCollaborative FilteringGive recommendations to a user based on preferences of similar usersPreferences on items may be explicit or implicitIncludes recommendation based on social / collaborative contentContent-Based FilteringGive recommendations to a user based on items with similar content in the users profileRule-Based (Knowledge-Based) FilteringProvide recommendations to users based on predefined (or learned) rulesage(x, 25-35) and income(x, 70-100K) and children(x, >=3) recommend(x, Minivan)Hybrid Approaches45

Content-Based Recommender Systems56Content-Based Recommenders: Personalized Search AgentsHow can the search engine determine the users context?

Query: Madonna and Child??Need to learn the user profile:User is an art historian?User is a pop music fan?67Content-Based Recommenders :: more examplesMusic recommendationsPlay list generation

Example: Pandora78

Collaborative Recommender Systems89Collaborative Recommender Systems

910

Collaborative Recommender Systems1011Collaborative Recommender Systemshttp://movielens.umn.edu

1112Social / Collaborative Tags

12

Example: Tags describe the Resource

Tags can describeThe resource (genre, actors, etc)Organizational (toRead)Subjective (awesome)Ownership (abc)etc

Tag RecommendationThese systems are collaborative.Recommendation / Analytics based on the wisdom of crowds.Tags describe the user

Rai Aren's profileco-authorSecret of the Sands"

Social RecommendationA form of collaborative filtering using social network dataUsers profiles represented as sets of links to other nodes (users or items) in the networkPrediction problem: infer a currently non-existent link in the network16

17

Example: Using Tags for Recommendation1718

Aggregation & Personalization across social, collaborative, and content channels1819

1920

20Build a content-based recommender forNews stories (requires basic text processing and indexing of documents)Blog posts, tweetsMusic (based on features such as genre, artist, etc.)Build a collaborative or social recommenderMovies (using movie ratings), e.g., movielens.orgMusic, e.g., pandora.com, last.fmRecommend songs or albums based on collaborative ratings, tags, etc.recommend whole playlists based on playlists from other usersRecommend users (other raters, friends, followers, etc.), based similar interests21Possible Interesting Project Ideas2122The Recommendation TaskBasic formulation as a prediction problem

Typically, the profile Pu contains preference scores by u on some other items, {i1, , ik} different from itpreference scores on i1, , ik may have been obtained explicitly (e.g., movie ratings) or implicitly (e.g., time spent on a product page or a news article)Given a profile Pu for a user u, and a target item it, predict the preference score of user u on item it2223Collaborative Recommender SystemsCollaborative filtering recommendersPredictions for unseen (target) items are computed based the other users with similar interest scores on items in user us profilei.e. users with similar tastes (aka nearest neighbors)requires computing correlations between user u and other users according to interest scores or ratingsk-nearest-neighbor (knn) strategy

Can we predict Karens rating on the unseen item Independence Day?2324Basic Collaborative Filtering ProcessNeighborhood Formation PhaseRecommendations

NeighborhoodFormationRecommendationEngineCurrent User RecordHistoricalUser Recordsuseritemrating

NearestNeighborsCombinationFunctionRecommendation Phase2425Collaborative Filtering: Measuring SimilaritiesPearson Correlationweight by degree of correlation between user U and user J

1 means very similar, 0 means no correlation, -1 means dissimilar

Works well in case of user ratings (where there is at least a range of 1-5)Not always possible (in some situations we may only have implicit binary values, e.g., whether a user did or did not select a document)Alternatively, a variety of distance or similarity measures can be used

Average rating of user Jon all items.

2526Collaborative filtering recommendersPredictions for unseen (target) items are computed based the other users with similar interest scores on items in user us profilei.e. users with similar tastes (aka nearest neighbors)requires computing correlations between user u and other users according to interest scores or ratings

predictionCorrelation to KarenPredictions for Karen on Indep. Day based on the K nearest neighborsCollaborative Recommender Systems2627Collaborative Filtering: Making PredictionsWhen generating predictions from the nearest neighbors, neighbors can be weighted based on their distance to the target userTo generate predictions for a target user a on an item i:

ra = mean rating for user au1, , uk are the k-nearest-neighbors to aru,i = rating of user u on item Isim(a,u) = Pearson correlation between a and u

This is a weighted average of deviations from the neighbors mean ratings (and closer neighbors count more)

2728Example Collaborative SystemItem1 Item 2Item 3Item 4Item 5Item 6Correlation with AliceAlice5233?User 12441-1.00User 2213120.33User 342321.90User 4332310.19User 53222-1.00User 6531320.65User 75151-1.00BestmatchPredictionUsing k-nearest neighbor with k = 12829Collaborative Recommenders :: problems of scale

2930Item-based Collaborative FilteringFind similarities among the items based on ratings across usersOften measured based on a variation of Cosine measurePrediction of item I for user a is based on the past ratings of user a on items similar to i.

Suppose:

Predicted rating for Karen on Indep. Day will be 7, because she rated Star Wars 7That is if we only use the most similar itemOtherwise, we can use the k-most similar items and again use a weighted average

sim(Star Wars, Indep. Day) > sim(Jur. Park, Indep. Day) > sim(Termin., Indep. Day)3031Item-based collaborative filtering

3132Item-Based Collaborative FilteringItem1 Item 2Item 3Item 4Item 5Item 6Alice5233?User 12441User 221312User 342321User 433231User 53222User 653132User 75151Item similarity0.760.790.600.710.75BestmatchPrediction3233Collaborative Filtering: Evaluationsplit users into train/test setsfor each user a in the test set:split as votes into observed (I) and to-predict (P)measure average absolute deviation between predicted and actual votes in PMAE = mean absolute erroraverage over all test users

33Data sparsity problemsCold start problemHow to recommend new items? What to recommend to new users?Straightforward approachesAsk/force users to rate a set of itemsUse another method (e.g., content-based, demographic or simply non-personalized) in the initial phaseAlternativesUse better algorithms (beyond nearest-neighbor approaches)In nearest-neighbor approaches, the set of sufficiently similar neighbors might be too small to make good predictionsUse model-based approaches (clustering; dimensionality reduction, etc.)

34Example algorithms for sparse datasetsRecursive CFAssume there is a very close neighbor n of u who has not yet rated the target item i .Apply CF-method recursively and predict a rating for item i for the neighborUse this predicted rating instead of the rating of a more distant direct neighborItem1Item2Item3Item4Item5Alice5344?User13123?User243435User333154User415521sim = 0.85Predict rating forUser135More model-based approachesMany ApproachesMatrix factorization techniques, statisticssingular value decomposition, principal component analysisApproaches based on clusteringAssociation rule miningcompare: shopping basket analysisProbabilistic modelsclustering models, Bayesian networks, probabilistic Latent Semantic AnalysisVarious other machine learning approachesCosts of pre-processing Usually not discussedIncremental updates possible?36Dimensionality ReductionBasic idea: Trade more complex offline model building for faster online prediction generationSingular Value Decomposition for dimensionality reduction of rating matricesCaptures important factors/aspects and their weights in the data factors can be genre, actors but also non-understandable onesAssumption that k dimensions capture the signals and filter out noise (K = 20 to 100)Constant time to make recommendationsApproach also popular in IR (Latent Semantic Indexing), data compression,

37A picture says

BobMaryAliceSue38Matrix factorizationVkTDim1-0.44-0.570.060.380.57Dim20.58-0.660.260.18-0.36

UkDim1Dim2Alice0.47-0.30Bob -0.440.23Mary0.70-0.06Sue0.310.93Dim1Dim2Dim15.630Dim203.23

SVD:Prediction: = 3 + 0.84 = 3.84

39Content-based recommendationCollaborative filtering does NOT require any information about the items,However, it might be reasonable to exploit such informationE.g. recommend fantasy novels to people who liked fantasy novels in the pastWhat do we need:Some information about the available items such as the genre ("content") Some sort of user profile describing what the user likes (the preferences)The task:Learn user preferencesLocate/recommend items that are "similar" to the user preferences4041Content-Based RecommendersPredictions for unseen (target) items are computed based on their similarity (in terms of content) to items in the user profile.E.g., user profile Pu contains

recommend highly: and recommend mildly:

41Content representation & item similaritiesRepresent items as vectors over featuresFeatures may be items attributes, keywords, tags, etc.Often items are represented a keyword vectors based on textual descriptions with TFxIDF or other weighting approachesHas the advantage of being applicable to any type of item (images, products, news stories, tweets) as long as a textual description is available or can be constructedItems (and users) can then be compared using standard vector space similarity measures

Intersection42Content-based recommendationIntersection4344Combining Content-Based and Collaborative RecommendationExample: Semantically Enhanced CFExtend item-based collaborative filtering to incorporate both similarity based on ratings (or usage) as well as semantic similarity based on content / semantic informationSemantic knowledge about itemsCan be extracted automatically from the Web based on domain-specific reference ontologiesUsed in conjunction with user-item mappings to create a combined similarity measure for item comparisonsSingular value decomposition used to reduce noise in the content dataSemantic combination thresholdUsed to determine the proportion of semantic and rating (or usage) similarities in the combined measure4445Semantically Enhanced Hybrid RecommendationAn extension of the item-based algorithmUse a combined similarity measure to compute item similarities:

where, SemSim is the similarity of items ip and iq based on semantic features (e.g., keywords, attributes, etc.); andRateSim is the similarity of items ip and iq based on user ratings (as in the standard item-based CF) is the semantic combination parameter: = 1 only user ratings; no semantic similarity = 0 only semantic features; no collaborative similarity

4546Semantically Enhanced CFMovie data setMovie ratings from the movielens data setSemantic info. extracted from IMDB based on the following ontology

4647Semantically Enhanced CFUsed 10-fold x-validation on randomly selected test and training data setsEach user in training set has at least 20 ratings (scale 1-5)

4748Semantically Enhanced CFDealing with new items and sparse data setsFor new items, select all movies with only one rating as the test dataDegrees of sparsity simulated using different ratios for training data

4849Data Mining Approach to PersonalizationBasic Ideagenerate aggregate user models (usage profiles) by discovering user access patterns through Web usage mining (offline process)Clustering user transactionsClustering itemsAssociation rule miningSequential pattern discoverymatch a users active session against the discovered models to provide dynamic content (online process)

Advantagesno explicit user ratings or interaction with usersenhance the effectiveness and scalability of collaborative filtering4950Example Domain: Web Usage MiningWeb Usage Miningdiscovery of meaningful patterns from data generated by user access to resources on one or more Web/application serversTypical Sources of Data:automatically generated Web/application server access logse-commerce and product-oriented user events (e.g., shopping cart changes, product clickthroughs, etc.)user profiles and/or user ratingsmeta-data, page content, site structureUser Transactionssets or sequences of pageviews possibly with associated weightsa pageview is a set of page files and associated objects that contribute to a single display in a Web Browser5051Personalization Based on Web Usage MiningOffline ProcessWeb &ApplicationServer LogsData CleaningPageview IdentificationSessionizationData IntegrationData TransformationData PreprocessingUserTransactionDatabaseTransaction ClusteringPageview ClusteringCorrelation AnalysisAssociation Rule MiningSequential Pattern MiningUsage Mining

PatternsPattern FilteringAggregationCharacterizationPattern AnalysisSite Content& StructureDomain KnowledgeAggregateUsage ProfilesData Preparation PhasePattern Discovery Phase

5152Personalization Based on Web Usage Mining:Online ProcessRecommendation Engine

Web Server

Client Browser

Active SessionRecommendations

Integrated User ProfileAggregateUsage Profiles

Stored User ProfileDomain Knowledge5253Conceptual Representation of User Transactions or Sessions

Session/user dataPageview/objectsRaw weights are usually based on time spent on a page, but in practice, need to normalize and transform.5354Web Usage Mining: clustering exampleTransaction Clusters: Clustering similar user transactions and using centroid of each cluster as a usage profile (representative for a user segment)SupportURLPageview Description1.00/courses/syllabus.asp?course=450-96-303&q=3&y=2002&id=290SE 450 Object-Oriented Development class syllabus0.97/people/facultyinfo.asp?id=290Web page of a lecturer who thought the above course0.88/programs/Curre...

Recommended

View more >