crab: a python framework for building recommender systems

86
Crab A Python Framework for Building Recommendation Engines Marcel Caraciolo @marcelcaraciolo Bruno Melo @brunomelo Ricardo Caspirro @ricardocaspirro PythonBrasil 2011, São Paulo, SP

Upload: marcel-caraciolo

Post on 06-May-2015

8.415 views

Category:

Technology


1 download

DESCRIPTION

Keynote at VII PythonBrasil (Python Brazilian Users Meeting) at São Paulo. SP in 01/10/2011.

TRANSCRIPT

Page 1: Crab: A Python Framework for Building Recommender Systems

CrabA Python Framework for Building

Recommendation Engines

Marcel Caraciolo@marcelcaraciolo

Bruno Melo@brunomelo

Ricardo Caspirro@ricardocaspirro

PythonBrasil 2011, São Paulo, SP

Page 2: Crab: A Python Framework for Building Recommender Systems

What is Crab ?

A python framework for building recommendation engines

A Scikit module for collaborative, content and hybrid filtering

Mahout Alternative for Python Developers :D

Open-Source under the BSD license

https://github.com/muricoca/crab

Page 3: Crab: A Python Framework for Building Recommender Systems

When started ?

It began one year ago

Community-driven, 4 members

Since April,2011 the open-source labs Muriçoca incorporated it

https://github.com/muricoca/

Since April,2011 rewritting it as Scikit

Page 4: Crab: A Python Framework for Building Recommender Systems

Knowing Scikits

Scikits are Scipy Toolkits - independent and projects hosted under a common namespace.

Scikits ImageScikits MlabWrapScikits AudioLabScikit Learn

....

http://scikits.appspot.com/scikits

Page 5: Crab: A Python Framework for Building Recommender Systems

Knowing Scikits

Scikit-Learn

Machine Learning Algorithms + scientific Python packages (Numpy, Scipy and Matplotlib)

http://scikit-learn.sourceforge.net/

Our goal: Incorporate the Crab as Scikit and incorporate some parts of them at Scikit-learn

Page 6: Crab: A Python Framework for Building Recommender Systems

Why Recommendations ?

!"#$%&'()$*+$,-$&.#'/0'&%)#)$1(,0#The world is an over-crowded place

Page 7: Crab: A Python Framework for Building Recommender Systems

Why Recommendations ?

We are overloaded

!"#$%"#&'"%(&$)")

* +,&-.$/).#&0#/"1.#$%234(".#

$/)#5(&6 7&.2.#"$4,#)$8

* 93((3&/.#&0#:&'3".;#5&&<.#

$/)#:-.34#2%$4<.#&/(3/"

* =/#>$/&3;#?#@A#+B#4,$//"(.;#

2,&-.$/).#&0#7%&6%$:.#

"$4,#)$8

* =/#C"1#D&%<;#."'"%$(#

2,&-.$/).#&0#$)#:"..$6".#

."/2#2&#-.#7"%#)$8

Thousands of news articles and blog posts each day

Millions of movies, books and music tracks online

Even Friends sometimes we are overloaded !

Several Places, Offers and Events

Page 8: Crab: A Python Framework for Building Recommender Systems

Why Recommendations ?

We really need and consume only a few of them!

“A lot of times, people don’t know what they want until you show it to them.”

Steve Jobs

“We are leaving the Information age, and entering into the Recommendation age.”

Chris Anderson, from book Long Tail

Page 9: Crab: A Python Framework for Building Recommender Systems

Why Recommendations ?

Can Google help ?

Yes, but only when we really know what we are looking for

Can Facebook help ?Yes, I tend to find my friends’ stuffs interesting

But, what’s does it mean by “interesting” ?

What if i had only few friends and what they like do not always attract me ?

Can experts help ?Yes, but it won’t scale well.

But it is what they like, not me! Exactly same advice!

Page 10: Crab: A Python Framework for Building Recommender Systems

Why Recommendations ?

Recommendation Systems

Systems designed to recommend to me something I may like

Page 11: Crab: A Python Framework for Building Recommender Systems

Why Recommendations ?

Recommendation Systems!"#$%&"'$"'(')*#*+,)

-+*#)+. -#/') 0#)1#

2' 23&4"+')1 5,6 7),*%'"&863

!

Graph Representation

Page 12: Crab: A Python Framework for Building Recommender Systems

The current Crab

Collaborative Filtering algorithms

Evaluation of the Recommender Algorithms

User-Based, Item-Based and Factorization Matrix (SVD)

Precision, Recall, F1-Score, RMSE

Precision-Recall Charts

Page 13: Crab: A Python Framework for Building Recommender Systems

The current Crab

Precision-Recall Charts

Page 14: Crab: A Python Framework for Building Recommender Systems

Collaborative Filtering

O Vento Levou

Thor

Similar

Armagedon ToyStore

Marcel

like recommends

Items

Rafael Amanda Users

Page 15: Crab: A Python Framework for Building Recommender Systems

The current Crab

Page 16: Crab: A Python Framework for Building Recommender Systems

The current Crab>>>#load the dataset

Page 17: Crab: A Python Framework for Building Recommender Systems

The current Crab>>>#load the dataset

>>> from crab.datasets import load_sample_movies

Page 18: Crab: A Python Framework for Building Recommender Systems

The current Crab>>>#load the dataset

>>> from crab.datasets import load_sample_movies

>>> data = load_sample_movies()

Page 19: Crab: A Python Framework for Building Recommender Systems

The current Crab>>>#load the dataset

>>> from crab.datasets import load_sample_movies

>>> data = load_sample_movies()

>>> data

Page 20: Crab: A Python Framework for Building Recommender Systems

The current Crab

{'DESCR': 'sample_movies data set was collected by the book called \nProgramming the Collective Intelligence by Toby Segaran \n\nNotes\n----- \nThis data set consists of\n\t* n ratings with (1-5) from n users to n movies.', 'data': {1: {1: 3.0, 2: 4.0, 3: 3.5, 4: 5.0, 5: 3.0},  2: {1: 3.0, 2: 4.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 2.0},  3: {2: 3.5, 3: 2.5, 4: 4.0, 5: 4.5, 6: 3.0},  4: {1: 2.5, 2: 3.5, 3: 2.5, 4: 3.5, 5: 3.0, 6: 3.0},  5: {2: 4.5, 3: 1.0, 4: 4.0},  6: {1: 3.0, 2: 3.5, 3: 3.5, 4: 5.0, 5: 3.0, 6: 1.5},  7: {1: 2.5, 2: 3.0, 4: 3.5, 5: 4.0}}, 'item_ids': {1: 'Lady in the Water',  2: 'Snakes on a Planet',  3: 'You, Me and Dupree',  4: 'Superman Returns',  5: 'The Night Listener',  6: 'Just My Luck'}, 'user_ids': {1: 'Jack Matthews',  2: 'Mick LaSalle',  3: 'Claudia Puig',  4: 'Lisa Rose',  5: 'Toby',  6: 'Gene Seymour',  7: 'Michael Phillips'}}

>>>#load the dataset

>>> from crab.datasets import load_sample_movies

>>> data = load_sample_movies()

>>> data

Page 21: Crab: A Python Framework for Building Recommender Systems

The current Crab

Page 22: Crab: A Python Framework for Building Recommender Systems

The current Crab

>>> from crab.models import MatrixPreferenceDataModel

Page 23: Crab: A Python Framework for Building Recommender Systems

The current Crab

>>> from crab.models import MatrixPreferenceDataModel

>>> m = MatrixPreferenceDataModel(data.data)

Page 24: Crab: A Python Framework for Building Recommender Systems

The current Crab

>>> print mMatrixPreferenceDataModel (7 by 6)         1 2 3 4 5 ...1 3.000000 4.000000 3.500000 5.000000 3.0000002 3.000000 4.000000 2.000000 3.000000 3.0000003 --- 3.500000 2.500000 4.000000 4.5000004 2.500000 3.500000 2.500000 3.500000 3.0000005 --- 4.500000 1.000000 4.000000 ---6 3.000000 3.500000 3.500000 5.000000 3.0000007 2.500000 3.000000 --- 3.500000 4.000000

>>> from crab.models import MatrixPreferenceDataModel

>>> m = MatrixPreferenceDataModel(data.data)

Page 25: Crab: A Python Framework for Building Recommender Systems

The current Crab

Page 26: Crab: A Python Framework for Building Recommender Systems

The current Crab

>>> #import pairwise distance

Page 27: Crab: A Python Framework for Building Recommender Systems

The current Crab

>>> #import pairwise distance

>>> from crab.metrics.pairwise import euclidean_distances

Page 28: Crab: A Python Framework for Building Recommender Systems

The current Crab

>>> #import pairwise distance

>>> from crab.metrics.pairwise import euclidean_distances

>>> #import similarity

Page 29: Crab: A Python Framework for Building Recommender Systems

The current Crab

>>> #import pairwise distance

>>> from crab.metrics.pairwise import euclidean_distances

>>> #import similarity>>> from crab.similarities import UserSimilarity

Page 30: Crab: A Python Framework for Building Recommender Systems

The current Crab

>>> #import pairwise distance

>>> from crab.metrics.pairwise import euclidean_distances

>>> #import similarity>>> from crab.similarities import UserSimilarity

>>> similarity = UserSimilarity(m, euclidean_distances)

Page 31: Crab: A Python Framework for Building Recommender Systems

The current Crab

>>> #import pairwise distance

>>> from crab.metrics.pairwise import euclidean_distances

>>> #import similarity>>> from crab.similarities import UserSimilarity

>>> similarity = UserSimilarity(m, euclidean_distances)

>>> similarity[1]

Page 32: Crab: A Python Framework for Building Recommender Systems

The current Crab

[(1, 1.0), (6, 0.66666666666666663), (4, 0.34054242658316669), (3, 0.32037724101704074), (7, 0.32037724101704074), (2, 0.2857142857142857), (5, 0.2674788903885893)]

>>> #import pairwise distance

>>> from crab.metrics.pairwise import euclidean_distances

>>> #import similarity>>> from crab.similarities import UserSimilarity

>>> similarity = UserSimilarity(m, euclidean_distances)

>>> similarity[1]

Page 33: Crab: A Python Framework for Building Recommender Systems

The current Crab

[(1, 1.0), (6, 0.66666666666666663), (4, 0.34054242658316669), (3, 0.32037724101704074), (7, 0.32037724101704074), (2, 0.2857142857142857), (5, 0.2674788903885893)]

>>> #import pairwise distance

>>> from crab.metrics.pairwise import euclidean_distances

>>> #import similarity>>> from crab.similarities import UserSimilarity

>>> similarity = UserSimilarity(m, euclidean_distances)

>>> similarity[1]

MatrixPreferenceDataModel (7 by 6)         1 2 3 4 5 ...1 3.000000 4.000000 3.500000 5.000000 3.0000002 3.000000 4.000000 2.000000 3.000000 3.0000003 --- 3.500000 2.500000 4.000000 4.5000004 2.500000 3.500000 2.500000 3.500000 3.0000005 --- 4.500000 1.000000 4.000000 ---6 3.000000 3.500000 3.500000 5.000000 3.0000007 2.500000 3.000000 --- 3.500000 4.000000

Page 34: Crab: A Python Framework for Building Recommender Systems

The current Crab

Page 35: Crab: A Python Framework for Building Recommender Systems

The current Crab

>>> from crab.recommenders.knn import UserBasedRecommender

Page 36: Crab: A Python Framework for Building Recommender Systems

The current Crab

>>> from crab.recommenders.knn import UserBasedRecommender

>>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True)

Page 37: Crab: A Python Framework for Building Recommender Systems

The current Crab

>>> from crab.recommenders.knn import UserBasedRecommender

>>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True)

>>> recsys.recommend(5)array([[ 5. , 3.45712869],       [ 1. , 2.78857832],       [ 6. , 2.38193068]])

Page 38: Crab: A Python Framework for Building Recommender Systems

The current Crab

>>> recsys.recommended_because(user_id=5,item_id=1)array([[ 2. , 3. ],       [ 1. , 3. ],       [ 6. , 3. ],       [ 7. , 2.5],       [ 4. , 2.5]])

>>> from crab.recommenders.knn import UserBasedRecommender

>>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True)

>>> recsys.recommend(5)array([[ 5. , 3.45712869],       [ 1. , 2.78857832],       [ 6. , 2.38193068]])

Page 39: Crab: A Python Framework for Building Recommender Systems

The current Crab

>>> recsys.recommended_because(user_id=5,item_id=1)array([[ 2. , 3. ],       [ 1. , 3. ],       [ 6. , 3. ],       [ 7. , 2.5],       [ 4. , 2.5]])

>>> from crab.recommenders.knn import UserBasedRecommender

>>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True)

>>> recsys.recommend(5)array([[ 5. , 3.45712869],       [ 1. , 2.78857832],       [ 6. , 2.38193068]])

MatrixPreferenceDataModel (7 by 6)         1 2 3 4 5 ...1 3.000000 4.000000 3.500000 5.000000 3.0000002 3.000000 4.000000 2.000000 3.000000 3.0000003 --- 3.500000 2.500000 4.000000 4.5000004 2.500000 3.500000 2.500000 3.500000 3.0000005 --- 4.500000 1.000000 4.000000 ---6 3.000000 3.500000 3.500000 5.000000 3.0000007 2.500000 3.000000 --- 3.500000 4.000000

Page 40: Crab: A Python Framework for Building Recommender Systems

The current Crab

Using REST APIs to deploy the recommenderdjango-piston, django-rest, django-tastypie

Page 41: Crab: A Python Framework for Building Recommender Systems

Crab is already in production

News from Abril Publisher recommendations!Collecting over 10 magazines, 20 books and 100+ articles

Running on Python+ Scipy +Django

Easy-to-use interface

Content-Based-Filtering

Still in development

Page 42: Crab: A Python Framework for Building Recommender Systems

Content Based Filtering

O Vento Levou

Duro de Matar

Similar

Armagedon ToyStore

Marcel

likesrecommend

Items

Users

Page 43: Crab: A Python Framework for Building Recommender Systems

Crab is already in production

PythonBrasil keynotes RecommenderRecommending keynotes based on a hybrid approach

Running on Python+ Scipy +Django

Schedule yourkeynotes

Content-Based-Filtering+

Collaborative Filtering

Still in development

Page 44: Crab: A Python Framework for Building Recommender Systems

source, the recommendation architecture that we propose willaggregate the results of such filtering techniques.We aim at integrating the previously mentioned hybrid prod-

uct recommendation approach in a mobile application so theusers could benefit from useful and logical recommendations.Moreover, we aim at providing a suited explanation for eachrecommendation to the user, since the current approaches justonly deliver product recommendations with a overall scorewithout pointing out the appropriateness of such recommen-dation [13]. Besides the basic information provided by thesuppliers, the system will deliver the explanation, providingrelevant reviews of similar users, we believe that it willincrease the confidence in the buying decision process and theproduct accepptance rate. In the mobile context this approachcould help the users in this process and showing the useropinions could contribute to achieve this task.

!"#$%&'%($)

!"*+#,$+'-)

!".,"/#)

!"*+#,$+'-)

0+($"($)1%#"2)

3,4$"',(5)

!"#$%&"'()*+,#&-,.)

/$%,0"12()*3$4%)3""5.)

0+44%6+'%$,.")1%#"2)

3,4$"',(5)

)))67,8,#%)+,4%$91$'%4)-1":))))

))))1,;&,<4)<1&%%,')=2)4&:&8$1))

)))))))))))%$4%,5)94,14>?)

7"$%)

!"8+99"(2"'))

!"8+99"(2%$,+(#)

Fig. 1. Meta Recommender Architecture

Since one of the goals of this work is to incorporatedifferent data sources of user opinions and descriptions, wehave addopted an meta recommendation architecture. By usinga meta recommender architecture, the system would providea personalized control over the generated recommendation listformed by the combination of rich data [16]. The influenceof the specific data sources could be explicitly controlled byevaluating the past user interaction with the recommender todecide how to balance the different knowledge sources. Forinstance, if the product or service to be recommended has arich structured description (e.g. restaurant) , then the systemtends to use more content-based filtering approach. Otherwise,if the product is poorly described (attributes), then the system

would rely more on collaborative-filtering techniques, that is,the reviews from similar users.Figure 1 shows a overview of our meta recommender

approach. By combining the content-based filtering and thecollaborative-based one into a hybrid recommender system, itwould use the services/products repositories which cataloguesthe services to be recommended, and the review repositorythat contains the user opinions about those services. All thisdata can be extracted from data source containers in the websuch as the location-based social network Foursquare [17] asdisplayed at the Figure 2 and the location recommendationengine from Google: Google HotPot [18].

Fig. 2. User Reviews from Foursquare Social Network

The content-based filtering approach will be used to filterthe product/service repository, while the collaborative basedapproach will derive the product review recommendations. Inaddition we will use text mining techniques to distinct thepolarity of the user review between positive or negative one.This information summarized would contribute in the productscore recommendation computation. The final product recom-mendation score is computed by integrating the result of bothrecommenders. By now, we are considering to use differentoptions regarding this integration approach, one at specialis the symbolic data analysis approach (SDA) [19], whicheach product description and user ratings/reviews are modeledas set of modal symbolic descriptions that summarizes theinformation provided by the corresponding data sources. It isa novel approach in hybrid recommender systems which,i nour domain, can encapsulate in entities the levels of influenceof both user reviews and product descriptions.

B. Symbolic Recommendation ApproachThe Symbolic Data Analysis (SDA) is a research field that

provides suitable tools to manage aggregated data detailedby multi-valued variables, where data table entries are sets

of categories, ordered list of categories, intervals or weighthistograms [19]. It is also provides approaches for informationfiltering algorithms such as Content-Based , Collaborative-Based and Hybrid Base ones. The main idea is to representthe user profile, in our domain the product to be recom-mended, through symbolic data structures and the user anditem correlations are computed through dissimilarity functionsadapted from the symbolic data analysis (SDA) domain. Theadvantage of using SDA based information-filtering methodsin the context of recommendations is that the user descriptionsynthetizes the entire body of information taken from the itemdescriptions belonging to the user profile. Therefore, itensare described by histogram-value symbolic data, so it can becompared through a dissimilarity function. By using the userreviews and the product descriptions modeled by histogram-value symbolic data, it would attend our requirements sinceour recommendations would be balanced by both structures.Bezerra and Carvalho proposed approaches where the resultsachieved showed to be very promising [19].

III. SYSTEM DESIGNApplication data information our mobile recommender sys-

tem can be divided into two parts: the product description(such as location, description and its attributes) and the userreviews or ratings provided by user (such as rating, comments,tags, etc.). The Figure 3 gives the system’s architecture andrelative components.

!"#$"%&'$

!(#$()&'*&%$+,-*.&$

/01&'234&$

5&-$

!6#$6,00&41&7$

8&4,99&0731*,0$:0;*0&$

<',7)41$

8&=,%*1,'>$

8&?*&@$

8&=,%*1,'>$

8&%).1%$

!<#$<'&2&'&04&%A$B,431*,0A$&14C$

!B#$B*%1$,2$D4,'&7$<',7)41%$

!8#$830E&7$<',7)41%$

!(#$()&'*&%$

Fig. 3. Mobile Recommender System Architecture

In our mobile product/service recommender, the user couldfilter some products or services and get a list of recommen-tations. The user also can enter his preferences or give hisfeedback to some offered product recommendation.Other functionalities are the retrieval of the next ve best

recommendations, the search for reviews satisfying somegiven constraints (text-length, date, review attitude) or thosecontaining some keywords. Let us place a typical use scenarioof this recommender by showing a restaurant query example.For instance, a user wants to know good restaurants for eating

a chinese food around his current location (the system alsocould integrate location-based services). Refining the querythe user also searches for places with highly positive reviews.According to the coordinate information and then calculate thedistance of their current location, the system would provide alist ranked by the highly positive reviews of restaurants joinedby summarized reviews written by the most similar users,that reviewed the restaurant services. According to personalpreferences, the user continues to review the recommendationrestaurant list, and select places he wishes to go. After gonethe places he selected, the user coult enter his feedback, inform of wishes or critiques, to the service recommendation.This information would be added to the reviews repository forthose places where it would be processed and summarized byour recommender, extracting useful information such as thepolarity and adding new keywords for the product featuresvector.

IV. METHODOLOGY AND EXPECTED RESULTSA. MethodologyOur research focuses on methodologies and techniques

for improving the user acceptance of product and servicesrecommendations and explaining these suggestions in themobile context. To achieve this, we have used a new approachwhere both product descriptions and user reviews are incor-porated into the mobile recommendation process. We believethis approach will bring to the user more condence on therecommendations and a better understanding of the productsspecially supporting him in the buying decision process. Toaccomplish this task, we will do a overview of the context inwhich mobile recommender systems are included and the mainresearch concerns about the topic. We wil also analyze and in-vestigate approaches for filtering algorithms, where could baseour design and implementation choices on previous failures orsucesses and reuse and adapt successful solutions. We will alsopropose our meta-recommender system by providing a detailedexplanation of our recommender architecture, implementation,main processes, and crucial features. Finally, to validate it,we will use standard measures of recommendation systemscomparing our suggestions to mobile users in a real datasetextracted from Web and discussing the results.

B. Expected ResultsWe believe that our product mobile recommender incorpo-

rating user reviews will increase the user trust in the recom-mended products, since he can read opinions, both positiveand negative from a group of similar users. Moreover, viewingother reviews, the user can feel more estimulated to share hisown experiences as also obtain recognition from other users.Reviews also provide a better product understanding sincethere will be more information for the user to decide if theproduct is (or is not) suitable for him. Finally, presenting alist of recommendations aside with explanations may provide’local Hidden knowledge’. Local hidden knowledge can bedescribed as knowledege that you gain only after purschasinga product or visiting a place. It can not be found using

Crab is already in production

Hybrid Meta Approach

Page 45: Crab: A Python Framework for Building Recommender Systems

Crab is already in production

Brazilian Social Network called Atepassar.comEducational network with more than 60.000 students and 120 video-classes

Running on Python + Numpy + Scipy and

Django

Backend for Recommendations MongoDB - mongoengine

Daily Recommendations with Explanations

Page 46: Crab: A Python Framework for Building Recommender Systems

Evaluating your recommender

Crab implements the most used recommender metrics.Precision, Recall, F1-Score, RMSE

Using matplotlib for a plotter utility

Implement new metrics

Simulations support maybe (??)

Page 47: Crab: A Python Framework for Building Recommender Systems

Evaluating your recommender

Page 48: Crab: A Python Framework for Building Recommender Systems

Evaluating your recommender

>>> from crab.metrics.classes import CfEvaluator

Page 49: Crab: A Python Framework for Building Recommender Systems

Evaluating your recommender

>>> from crab.metrics.classes import CfEvaluator

>>> evaluator = CfEvaluator()

Page 50: Crab: A Python Framework for Building Recommender Systems

Evaluating your recommender

>>> from crab.metrics.classes import CfEvaluator

>>> evaluator = CfEvaluator()

>>> evaluator.evaluate(recommender=recsys,metric='rmse')

Page 51: Crab: A Python Framework for Building Recommender Systems

Evaluating your recommender

>>> from crab.metrics.classes import CfEvaluator

>>> evaluator = CfEvaluator()

>>> evaluator.evaluate(recommender=recsys,metric='rmse'){'rmse': 0.69467177857026907}

Page 52: Crab: A Python Framework for Building Recommender Systems

Evaluating your recommender

>>> from crab.metrics.classes import CfEvaluator

>>> evaluator = CfEvaluator()

>>> evaluator.evaluate(recommender=recsys,metric='rmse'){'rmse': 0.69467177857026907}

>>> evaluator.evaluate_on_split(recommender=recsys, at =2)

Page 53: Crab: A Python Framework for Building Recommender Systems

Evaluating your recommender

>>> from crab.metrics.classes import CfEvaluator

>>> evaluator = CfEvaluator()

>>> evaluator.evaluate(recommender=recsys,metric='rmse'){'rmse': 0.69467177857026907}

>>> evaluator.evaluate_on_split(recommender=recsys, at =2)

({'error': [{'mae': 0.345, 'nmae': 0.4567, 'rmse': 0.568}, {'mae': 0.456, 'nmae': 0.356778, 'rmse': 0.6788}, {'mae': 0.456, 'nmae': 0.356778, 'rmse': 0.6788}],

'ir': [{'f1score': 0.456, 'precision': 0.78557, 'recall':0.55677}, {'f1score': 0.64567, 'precision': 0.67865, 'recall': 0.785955},

{'f1score': 0.45070, 'precision': 0.74744, 'recall': 0.858585}]}, {'final_score': {'avg': {'f1score': 0.495955,

'mae': 0.429292, 'nmae': 0.373739,

'precision': 0.63932929, 'recall': 0.729939393, 'rmse': 0.3466868},

'stdev': {'f1score': 0.09938383 , 'mae': 0.0593933,

'nmae': 0.03393939, 'precision': 0.0192929, 'recall': 0.031293939, 'rmse': 0.234949494}}})

Page 54: Crab: A Python Framework for Building Recommender Systems

Distributing the recommendation computations

Use Hadoop and Map-Reduce intensivelyhttps://github.com/pfig/mrjobInvestigating the Yelp mrjob framework

Develop the Netflix and novel standard-of-the-art usedMatrix Factorization, Singular Value Decomposition (SVD), Boltzman machines

The most commonly used is Slope One technique.Simple algebra math with slope one algebra y = a*x+b

Page 55: Crab: A Python Framework for Building Recommender Systems

Cache/Paralelism with joblib

class UserSimilarity(BaseSimilarity):    ...

    @memory.cache  def get_similarity(self, source_id, target_id):         source_preferences = self.model.preferences_from_user(source_id)         target_preferences = self.model.preferences_from_user(target_id)

        return self.distance(source_preferences, target_preferences) \            if not source_preferences.shape[1] == 0 \                and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

...

def get_similarities(self, source_id):        return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model]

from joblib import Memory memory = Memory(cachedir=’’, verbose=0)

http://packages.python.org/joblib/index.html

Page 56: Crab: A Python Framework for Building Recommender Systems

Cache/Paralelism with joblib

class UserSimilarity(BaseSimilarity):    ...

    @memory.cache  def get_similarity(self, source_id, target_id):         source_preferences = self.model.preferences_from_user(source_id)         target_preferences = self.model.preferences_from_user(target_id)

        return self.distance(source_preferences, target_preferences) \            if not source_preferences.shape[1] == 0 \                and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

...

def get_similarities(self, source_id):        return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model]

from joblib import Memory memory = Memory(cachedir=’’, verbose=0)

>>> #Without memory.cache

http://packages.python.org/joblib/index.html

Page 57: Crab: A Python Framework for Building Recommender Systems

Cache/Paralelism with joblib

class UserSimilarity(BaseSimilarity):    ...

    @memory.cache  def get_similarity(self, source_id, target_id):         source_preferences = self.model.preferences_from_user(source_id)         target_preferences = self.model.preferences_from_user(target_id)

        return self.distance(source_preferences, target_preferences) \            if not source_preferences.shape[1] == 0 \                and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

...

def get_similarities(self, source_id):        return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model]

from joblib import Memory memory = Memory(cachedir=’’, verbose=0)

>>> #Without memory.cache >>># With memory.cache

http://packages.python.org/joblib/index.html

Page 58: Crab: A Python Framework for Building Recommender Systems

Cache/Paralelism with joblib

class UserSimilarity(BaseSimilarity):    ...

    @memory.cache  def get_similarity(self, source_id, target_id):         source_preferences = self.model.preferences_from_user(source_id)         target_preferences = self.model.preferences_from_user(target_id)

        return self.distance(source_preferences, target_preferences) \            if not source_preferences.shape[1] == 0 \                and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

...

def get_similarities(self, source_id):        return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model]

from joblib import Memory memory = Memory(cachedir=’’, verbose=0)

>>> #Without memory.cache >>># With memory.cache>>> timeit similarity.get_similarities

(‘marcel_caraciolo’)

http://packages.python.org/joblib/index.html

Page 59: Crab: A Python Framework for Building Recommender Systems

Cache/Paralelism with joblib

class UserSimilarity(BaseSimilarity):    ...

    @memory.cache  def get_similarity(self, source_id, target_id):         source_preferences = self.model.preferences_from_user(source_id)         target_preferences = self.model.preferences_from_user(target_id)

        return self.distance(source_preferences, target_preferences) \            if not source_preferences.shape[1] == 0 \                and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

...

def get_similarities(self, source_id):        return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model]

from joblib import Memory memory = Memory(cachedir=’’, verbose=0)

>>> #Without memory.cache >>># With memory.cache>>> timeit similarity.get_similarities

(‘marcel_caraciolo’)>>> timeit similarity.get_similarities

(‘marcel_caraciolo’)

http://packages.python.org/joblib/index.html

Page 60: Crab: A Python Framework for Building Recommender Systems

Cache/Paralelism with joblib

class UserSimilarity(BaseSimilarity):    ...

    @memory.cache  def get_similarity(self, source_id, target_id):         source_preferences = self.model.preferences_from_user(source_id)         target_preferences = self.model.preferences_from_user(target_id)

        return self.distance(source_preferences, target_preferences) \            if not source_preferences.shape[1] == 0 \                and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

...

def get_similarities(self, source_id):        return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model]

from joblib import Memory memory = Memory(cachedir=’’, verbose=0)

>>> #Without memory.cache >>># With memory.cache>>> timeit similarity.get_similarities

(‘marcel_caraciolo’)>>> timeit similarity.get_similarities

(‘marcel_caraciolo’) 100 loops, best of 3: 978 ms per loop

http://packages.python.org/joblib/index.html

Page 61: Crab: A Python Framework for Building Recommender Systems

Cache/Paralelism with joblib

class UserSimilarity(BaseSimilarity):    ...

    @memory.cache  def get_similarity(self, source_id, target_id):         source_preferences = self.model.preferences_from_user(source_id)         target_preferences = self.model.preferences_from_user(target_id)

        return self.distance(source_preferences, target_preferences) \            if not source_preferences.shape[1] == 0 \                and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

...

def get_similarities(self, source_id):        return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model]

from joblib import Memory memory = Memory(cachedir=’’, verbose=0)

>>> #Without memory.cache >>># With memory.cache>>> timeit similarity.get_similarities

(‘marcel_caraciolo’)>>> timeit similarity.get_similarities

(‘marcel_caraciolo’) 100 loops, best of 3: 978 ms per loop 100 loops, best of 3: 434 ms per loop

http://packages.python.org/joblib/index.html

Page 62: Crab: A Python Framework for Building Recommender Systems

Cache/Paralelism with joblibhttp://packages.python.org/joblib/index.html

Investigate how to use multiprocessing and parallel packages with similarities computation

def get_similarities(self, source_id):        return Parallel(n_jobs=3) ((other_id, delayed(self.get_similarity) (source_id, other_id)) for other_id, v in self.model)

from joblib import Parallel ...

Page 63: Crab: A Python Framework for Building Recommender Systems

Distributed Computing with mrJobhttps://github.com/Yelp/mrjob

Page 64: Crab: A Python Framework for Building Recommender Systems

Distributed Computing with mrJobhttps://github.com/Yelp/mrjob

It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing)

Page 65: Crab: A Python Framework for Building Recommender Systems

Distributed Computing with mrJobhttps://github.com/Yelp/mrjob

It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing)

Page 66: Crab: A Python Framework for Building Recommender Systems

Distributed Computing with mrJobhttps://github.com/Yelp/mrjob

"""The classic MapReduce job: count the frequency of words."""from mrjob.job import MRJobimport re

WORD_RE = re.compile(r"[\w']+")

class MRWordFreqCount(MRJob):

    def mapper(self, _, line):        for word in WORD_RE.findall(line):            yield (word.lower(), 1)

    def reducer(self, word, counts):        yield (word, sum(counts))

if __name__ == '__main__':    MRWordFreqCount.run()

It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing)

Page 67: Crab: A Python Framework for Building Recommender Systems

Distributed Computing with mrJobhttps://github.com/Yelp/mrjob

Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce

Page 68: Crab: A Python Framework for Building Recommender Systems

Distributed Computing with mrJobhttps://github.com/Yelp/mrjob

Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce

Page 69: Crab: A Python Framework for Building Recommender Systems

Future studies with Sparse MatricesReal datasets come with lots of empty values

Apontador Reviews Dataset

http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html

Solutions:

scipy.sparse package

Sharding operations

Matrix Factorization techniques (SVD)

Page 70: Crab: A Python Framework for Building Recommender Systems

Future studies with Sparse MatricesReal datasets come with lots of empty values

Apontador Reviews Dataset

http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html

Solutions:

scipy.sparse package

Sharding operations

Matrix Factorization techniques (SVD)

Crab implements a Matrix Factorization with Expectation

Maximization algorithm

Page 71: Crab: A Python Framework for Building Recommender Systems

Future studies with Sparse MatricesReal datasets come with lots of empty values

Apontador Reviews Dataset

http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html

Solutions:

scipy.sparse package

Sharding operations

Matrix Factorization techniques (SVD)

Crab implements a Matrix Factorization with Expectation

Maximization algorithmscikits.crab.svd package

Page 72: Crab: A Python Framework for Building Recommender Systems

Optimizations with Cythonhttp://cython.org/

Cython is a Python extension that lets developers annotate functions so they can be compiled to C.

http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.html

Page 73: Crab: A Python Framework for Building Recommender Systems

Optimizations with Cythonhttp://cython.org/

Cython is a Python extension that lets developers annotate functions so they can be compiled to C.

http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.html

# setup.py

from distutils.core import setup

from distutils.extension import Extension

from Cython.Distutils import build_ext

# for notes on compiler flags see:

# http://docs.python.org/install/index.html

setup(

cmdclass = {'build_ext': build_ext},

ext_modules = [Extension("spearman_correlation_cython", ["spearman_correlation_cython.pyx"])]

)

Page 74: Crab: A Python Framework for Building Recommender Systems

Optimizations with Cythonhttp://cython.org/

Cython is a Python extension that lets developers annotate functions so they can be compiled to C.

http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.html

# setup.py

from distutils.core import setup

from distutils.extension import Extension

from Cython.Distutils import build_ext

# for notes on compiler flags see:

# http://docs.python.org/install/index.html

setup(

cmdclass = {'build_ext': build_ext},

ext_modules = [Extension("spearman_correlation_cython", ["spearman_correlation_cython.pyx"])]

)

Page 75: Crab: A Python Framework for Building Recommender Systems

Benchmarks

Dataset Pure Python w/ dicts

Python w/ Scipy and Numpy

MovieLens 100k 15.32 s 9.56 shttp://www.grouplens.org/node/73

Old Crab New Crab

Page 76: Crab: A Python Framework for Building Recommender Systems

Benchmarks

Dataset Pure Python w/ dicts

Python w/ Scipy and Numpy

MovieLens 100k 15.32 s 9.56 shttp://www.grouplens.org/node/73

0 4 8 12 16

Time ellapsed ( Recommend 5 items)

Old Crab New Crab

Page 77: Crab: A Python Framework for Building Recommender Systems

Benchmarks

Dataset Pure Python w/ dicts

Python w/ Scipy and Numpy

MovieLens 100k 15.32 s 9.56 shttp://www.grouplens.org/node/73

0 4 8 12 16

Time ellapsed ( Recommend 5 items)

Old Crab New Crab

Page 78: Crab: A Python Framework for Building Recommender Systems

Benchmarks

Dataset Pure Python w/ dicts

Python w/ Scipy and Numpy

MovieLens 100k 15.32 s 9.56 shttp://www.grouplens.org/node/73

0 4 8 12 16

Time ellapsed ( Recommend 5 items)

Old Crab New Crab

Page 79: Crab: A Python Framework for Building Recommender Systems

Why migrate ?

Old Crab running only using Pure Python

Recommendations demand heavy maths calculations and lots of processing

Compatible with Numpy and Scipy libraries

High Standard and popular scientific libraries optimized for scientific calculations in Python

Scikits projects are amazing! Active Communities, Scientific Conferences and updated projects (e.g. scikit-learn)

Turn the Crab framework visible for the community Join the scientific researchers and machine learning developers around the Globe coding with

Python to help us in this project

Be Fast and Furious

Page 80: Crab: A Python Framework for Building Recommender Systems

Why migrate ?

http://morepypy.blogspot.com/2011/05/numpy-in-pypy-status-and-roadmap.html

Numpy optimized with PyPy

2.x - 48.x Faster

Page 81: Crab: A Python Framework for Building Recommender Systems

How are we working ?

Sprints, Online Discussions and Issues

https://github.com/muricoca/crab/wiki/UpcomingEvents

Page 82: Crab: A Python Framework for Building Recommender Systems

How are we working ?

Our Project’s Home Page

http://muricoca.github.com/crab

Page 83: Crab: A Python Framework for Building Recommender Systems

Future Releases

Planned Release 0.1Collaborative Filtering Algorithms working, sample datasets to load and test

Planned Release 0.11Sparse Matrixes and Database Models support

Planned Release 0.12Slope One Agorithm, new factorization techniques implemented

....

Page 84: Crab: A Python Framework for Building Recommender Systems

Join us!

1. Read our Wiki Pagehttps://github.com/muricoca/crab/wiki/Developer-Resources

2. Check out our current sprints and open issueshttps://github.com/muricoca/crab/issues

3. Forks, Pull Requests mandatory

4. Join us at irc.freenode.net #muricoca or at our discussion list

http://groups.google.com/group/scikit-crab

Page 85: Crab: A Python Framework for Building Recommender Systems

Recommended Books

SatnamAlag, Collective Intelligence in Action, Manning Publications, 2009

Toby Segaran, Programming Collective Intelligence, O'Reilly, 2007

ACM RecSys, KDD , SBSC...

Page 86: Crab: A Python Framework for Building Recommender Systems

CrabA Python Framework for Building

Recommendation Engines

Marcel Caraciolo@marcelcaraciolo

Bruno Melo@brunomelo

Ricardo Caspirro@ricardocaspirro

{marcel, ricardo,bruno}@muricoca.com

https://github.com/muricoca/crab