indicthreads pune12 recommenders apache mahout
Post on 04-Apr-2018
231 Views
Preview:
TRANSCRIPT
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
1/29
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
2/29
2
Contents
A recommendation problem
What is a recommender Building a recommender using Mahout
Tips and tweaks
Recommender considerations
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
3/29
A book store
Sells books:
By various authors Of various categories
On different subjects
From various publishers
Readers/buyers are asked to rate
Readers/buyers can provide reviews
You walk into the store
(buy something for a friend)
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
4/29
The store owner
Asks you what:
your friend reads (already owns)
your friend usually likes more
Has data on what:
his customers buy his customers rate and review
Uses a few strategies
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
5/29
1 - Find similar books
Depending on which books your friend has, pick
books: by the same author
on the same/similar subject/s
in the same category from the same publication
(those with highest sales numbers)
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
6/29
2 - Find books with similar readership
Define some similarity
e.g. two books are as similar as the number of readersrating both of them
Define some limit of relevance
e.g. only consider books which are more than 4 readers
similar
Look for all books which are similar to booksyour friend owns
Pick books from this set that you friend doesntown
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
7/29
3 - Find people with similar tastes
Define some similarity
e.g. two people are as similar as the number of booksthey like from the same category
Define some limit of relevance
e.g. only consider the 3 top people when ordered
according to how similar they are to your friend
Look for users similar to your friend and seewhat they read
Pick books which these people like and yourfriend doesnt own
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
8/29
Example data1,101,5.0 3,101,2.5 4,106,4.0
1,102,3.0 3,104,4.0 5,101,4.0
1,103,2.5 3,105,4.5 5,102,3.0
2,101,2.0 3,107,5.0 5,103,2.0
2,102,2.5 4,101,5.0 5,104,4.0
2,103,5.0 4,103,3.0 5,105,3.5
2,104,2.0 4,104,4.5 5,106,4.0
Your friend owns three books:
Gave 5 stars to book 101 (likes hugely and talks about it all the time)
Gave 3 stars to book 102 (has shown some liking to it)
Gave 2.5 stars to book 103 (has read it, but didnt say bad things about it)
Now, we need to recommend for your friend books he hasnt seen
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
9/29
A pictorial representation
101 102 103 104 105 106 107
1
2
3
4
5
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
10/29
Visualize
101 102 103 104 105 106 107
1
2
3
4
5
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
11/29
A (slightly) bigger example1,101,5.0 3,111,2.5 6,103,2.0
1,102,3.0 4,101,5.0 6,106,4.0
1,103,2.5 4,103,3.0 6,113,3.01,109,3.5 4,104,4.5 6,115,5.0
1,112,4.0 4,106,4.0 7,103,4.5
2,101,2.0 4,109,2.0 7,104,2.5
2,102,2.5 4,111,2.5 7,108,4.0
2,103,5.0 5,101,4.0 7,109,3.5
2,104,2.0 5,102,3.0 7,110,3.5
2,107,4.5 5,103,2.0 7,112,2.5
2,113,3.5 5,104,4.0 8,101,2.0
3,101,2.5 5,105,3.5 8,105,4.03,104,4.0 5,106,4.0 8,106,4.5
3,105,4.5 5,109,3.0 8,110,3.0
3,107,5.0 5,112,4.0 8,114,5.0
3,115,4.0 6,101,4.5 8,115,3.5
A l
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
12/29
A pictorial representation
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
1 2 3 4
5 6 7 8
Clearly, not a viable option
M h h
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
13/29
Mahout to the rescue
Wh i A h M h
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
14/29
What is Apache Mahout
Apache Mahout
A machine learning library Works with Apache Hadoop
Use cases:
Recommenders
Clustering
Classification
R d i M h t
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
15/29
Recommenders in Mahout
Recommenders use data culled from user
behavior Recommending using Mahout
Similarity between users or items
Expressed as a number between 0-1
Neighborhood of users/items
Recommendation using this info and an algorithm
Generic
Specialized
Si il it
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
16/29
Similarity
Various algorithms:
Euclidean distance Pearson correlation
Cosine measure
Spearman correlation
Tanimoto coefficient
Log-likelyhood
Effectiveness dependent on the input data
Influences running time and memory
N i hb h d
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
17/29
Neighborhood
Nearest N neighborhood (say, 4):
Threshold neighborhood (say, > 0.8):
5
U
3
2
4
1
5
U
3
2
4
1
R nd r
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
18/29
Recommender
Recommenders
Generic recommender User based
Item based
Slope-one recommender
Singular Value Decomposition based
Liner Interpolation based
Cluster-based
Recommender rescorer Recommender evaluator
A real life Web application
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
19/29
A real-life Web application
News aggregator-cum-reader
Fetches news from a news service Shows the news in a uniform UI
Lets readers read, like/dislike and comment on news
Link social networks and share
Make this a personalized newspaper
Track user actions
Derive and store preferences
Generate recommendations Leverage social accounts, etc.
Overall design
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
20/29
Overall design
User, application
data (MySQL)
News
aggregation,
storage (Hbase)
Preferences,
Recommender
(Mahout)
REST
REST
REST
Controller
API (REST)
Web application
Phone/tablet
applications
Third party
applications
Recommender
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
21/29
Recommender
REST
(Grizzly,Tomcat)
REST service
Fetch recommendations
Input user actions
Recommender
(offline, run
periodically)
MySQL
Database
Input
table
dump
How to extract data one dimension
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
22/29
How to extract data one dimension
4299
511
128
51
13
4 4
1
2
1
10
100
1000
10000
1 2 3 4 5 6 7 8 9Number of News Articles
News article readership
News articlereadership
How to extract data add dimensions
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
23/29
How to extract data add dimensions
1
10
100
1000
10000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 42 44 46 51 57
Number of News articles / Topics
News articlereadership
Topicreadership
How more data helps
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
24/29
How more data helps
0
5
10
15
20
25
30
35
40
0 100 200 300 400 500 600 700 800
Number of news articles/topics
No. of readerswith x articles
eachNo. of readerswith x topicseach
21
How more data helps
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
25/29
How more data helps
0
1
2
3
4
5
6
7
8
9
5 25 45 65 85
Number of news articles/topics
No. of readerswith x articleseach
No. of readerswith x topicseach
How more data helps
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
26/29
How more data helps
0
0.5
1
1.5
2
2.5
3
3.5
95 145 195 245 295 345 395
Number of news articles/topics
No. of readerswith x articles
eachNo. of readerswith x topicseach
Learnings
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
27/29
Learnings
Know thy user
Frequency of visits Preference logic wrt user
Know thy items
Should have enough items per user
Maximize items per action
Should have enough intersections
Should not be transient
Use tweaking abilities Sharpen the saw
Questions
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
28/29
Questions
?
-
7/30/2019 IndicThreads Pune12 Recommenders Apache Mahout
29/29
Thank youviraj@gslab.com
viraj.paripatyadar@gmail.com
top related