a recommendation engine for your php application
Post on 13-Apr-2017
182 Views
Preview:
TRANSCRIPT
A recommendation engine for your PHP apps
1) Intro to recommender systems
2) PredictionIO
3) Case Study
Definition: a system that help people finding things when the process of finding what you need is challenging because you have a lot of choices/alternatives
So… it’s a search engine!
Search Engines
Document base is (almost) static
Queries are dynamic
Search Engines
Create an index analyzing the documents
Calculate relevance for a query: tf*idf
Recommender systems
Document base is growing (eg: Netflix)
Query is static: find something I like
Classification
Domain: news, products, …
Helps defining what can be suggested
Purpose: sales, information, education, build a community
What is TripAdvisor purpose?
Personalization levels
• Non personalized: best sellers
• Demographic: age, location
• Ephemeral: based on current activities
• Persistent
Types of input
• Explicit: ask user to rate something
• Implicit: inferred from user behaviour
Output
• Prediction: predicted rating, evaluation
• Recommendations: suggestion list, top-n, offers, promotion
• Filtering: email filters, news articles
A model for comparison
User: people with preference
Items: subject of rating
Rating: expression of opinion
(Community: space where opinions makes sense)
Non-personalized
Best seller
Most popular
Trending
Summary of community ratings: eg best hotel in town
Hotel
Visitor Hotel
Visitor Hotel
Hotel A Hotel B Hotel C
John 3 5
Jane 3
Fred 1 0
Tom 4
AVG 3.5 3 0
Content based
User rate items
We build a model of user preference
Look for similar items based on the model
Action 0.7
Sci Fi 3.2
Vin Diesel 1.2
… …
https://www.amazon.com/Relevant-Search-applications-Solr-Elasticsearch/dp/161729277Xhttp://www.slideshare.net/treygrainger/building-a-real-time-solrpowered-recommendation-engine
Problems/Limitations
Need to know items content
User cold start: time to learn important features for the user
What if user interest change?
Lack of serendipity: accidentally discover something you like
Collaborative filtering
No need to analyze (index) content
Can capture more subtle things
Serendipity
User-User
Select people of my neighborhood with similar taste. If other people share my taste I want their opinion combined
E.T
2 4Joe 2 2 3 ?
1 55 2 4 …
Tom 3 3 2
4 1
User-User: which users have similar tastes?
E.T
2 4Joe 2 2 3 ?
1 55 2 4 …
Tom 3 3 2
4 1
User-User: which users have similar tastes?
Item-Item
Find an items where I have expressed an opinion and look how other people felt about it. Precompute similarities between items
E.T
2 4Joe 2 2 3 ?
1 55 2 4 …
Tom 3 3
4 1
Item-Item: which item are similar?
Problems/Limitations
Sparsity
When recommending from a large item set, users will have rated only some of the items
User Cold start
Not enough known about new user to decide who is similar
Item cold start
Cannot predict ratings for new item till some similar users have rated it [No problem for content-based]
Scalability
With millions of ratings, computations become slow
Dimensionality reduction
Express my opinions as a set of tastes
Compact representation of the matrix with relevant features
Rogue One
1 3 5
Joe 1 2 3
An example
Item1 Item2 Item3 Item4 Item5
Joe 8 1 ? 2 7
Tom 2 ? 5 7 5
Alice 5 4 7 4 7
Bob 7 1 7 3 8
How similar are Joe and Tom? How similar are Joe and Bob?
Only consider items both users have rated
For each item - Calculate difference in the users’ ratings - Take the average of this difference over the items
Item1 Item2 Item3 Item4 Item5
Joe 8 1 ? 2 7
Tom 2 ? 5 7 5
Alice 5 4 7 4 7
Bob 7 1 7 3 8
Sim(Joe, Tom) = (|8-2| + |2-7| + |7-5|)/3 = 13/3 = 4.3 Sim(Alice, Bob) = (|5-7| + |4-1| + |4-3| + |7-8|)/4 = 7/4 = 1.75
Item1 Item2 Item3 Item4 Item5
Joe 8 1 ? 2 7
Tom 2 ? 5 7 5
Alice 5 4 7 4 7
Bob 7 1 7 3 8
Now we have a score or weight for each user
Recommend what similar user have rated highly
To calculate rating of an item to recommend, give weight to each user’s recommendations based on how similar they are to you.
use entire matrix or
use a K-nn algorithm: people who historically have the same tastes as me
aggregate using weighted sum
weights depends on similarity
Item1 Item2 Item3 Item4 Item5
Joe 8 1 ? 2 7
Tom 2 ? 5 7 5
Alice 5 4 7 4 7
Bob 7 1 7 3 8
How similar are Item1 and Item2? How similar are Item1 and Item3?
Only consider items both users have rated
For each item - Calculate difference in ratings for the 2 items - Take the average of this difference over the users
Item1 Item2 Item3 Item4 Item5
Joe 8 1 ? 2 7
Tom 2 ? 5 7 5
Alice 5 4 7 4 7
Bob 7 1 7 3 8
Sim(I1, I2) = (|8-1| + |5-4| + |7-1|)/3 = 14/3 = 4,6
Sim(I1, I3) = (|2-5| + |5-7| + |7-7|)/3 = 5/3 = 1,6
Item1 Item2 Item3 Item4 Item5
Joe 8 1 ? 2 7
Tom 2 ? 5 7 5
Alice 5 4 7 4 7
Bob 7 1 7 3 8
As user-user, use whole matrix or identify neighbors
Cosine similarity
[3,5]
[2,7]
[0,0]
Our domain
Domain: online book shop, both paper and digital
Recommend titles, old and news
- Who bought this also bought
- You might like
Choosing the tool
PredictionIO
Under the Apache umbrella
Based on solid open source stack
Customizable templates engines
SDK for PHP
Installation
http://actionml.com/docs/pio_by_actionml
Pre-baked Amazon AMIs
Installation via source code
http://predictionio.incubator.apache.org/install/install-sourcecode/
You can choose storage
mysql/postgres vs elasticsearch+hbase
The event server
Pattern: user -- action -- item
User 1 purchased product X
User 2 viewed product Y
User 1 added product Z in the cart
$ pio app new MyApp1
[INFO] [App$] Initialized Event Store for this app ID: 1. [INFO] [App$] Created new app: [INFO] [App$] Name: MyApp1 [INFO] [App$] ID: 1 [INFO] [App$] Access Key: 3mZWDzci2D5YsqAnqNnXH9SB6Rg3dsTBs8iHkK6X2i54IQsIZI1eEeQQyMfs7b3F
$ pio eventserver
Server runs on port 7070 by default
$ curl -i -X GET http://localhost:7070
{“status":"alive"}
$ curl -i -X GET “http://localhost:7070/events.json?accessKey=$ACCESS_KEY"
Events modeling
what can/should we model?
rate, like, buy, view, depending on the algorithm
$set , $unset and $delete
_pio* are reserved
setUser($uid, array $properties=array(), $eventTime=null)
unsetUser($uid, array $properties, $eventTime=null)
deleteUser($uid, $eventTime=null)
setItem($iid, array $properties=array(), $eventTime=null)
unsetItem($iid, array $properties, $eventTime=null)
deleteItem($iid, $eventTime=null)
recordUserActionOnItem($event, $uid, $iid, array $properties=array(), $eventTime=null)
createEvent(array $data)
getEvent($eventId)
Engines
D.A.S.E Architecture
Data Source and Preparation
Algorithm
Serving
Evaluation
$ pio template get apache/incubator-predictionio-template-recommender MyRecommendation
$ cd MyRecommendation
engine.json
"datasource": { "params" : { "appName": “MyApp1”, "eventNames": [“buy”, “view”] } },
$ pio build —verbose
$ pio train
$ pio deploy
Getting recommendations
Implementation
2 kind of suggestions
- who bought this also bought (recommendation)
- you may like (similarities)
View
Like (add to basket, add to wishlist)
Conversion (buy)
Recorded in batch
4 engines
2 for books, 2 for ebooks
(not needed now)
Retrained every night with new data
recordLike($user, array $item)
recordConversion($user, array $item)
recordView($user, array $item)
createUser($uid)
getRecommendation($uid, $itype, $n = self::N_SUGGESTION)
getSimilarity($iid, $itype, $n = self::N_SUGGESTION)
user cold start/item cold start
if we don’t get enough suggestion switch to non personalized (also for non logged users)
user cold start/item cold start
if we don’t get enough suggestion switch to non personalized (best sellers)
Michele Orselli CTO@Ideato
_orso_
micheleorselli / ideatosrl
mo@ideato.it
https://joind.in/talk/93d2d
Links• http://www.slideshare.net/NYCPredictiveAnalytics/building-a-recommendation-
engine-an-example-of-a-product-recommendation-engine?next_slideshow=1
• https://www.coursera.org/learn/recommender-systems-introduction
• http://actionml.com/
• https://github.com/grahamjenson/ger
top related