recommender system

26
Recommender System hellojinj ie 2013-06- 19

Upload: jie-jin

Post on 15-Jan-2015

646 views

Category:

Technology


4 download

DESCRIPTION

mahout

TRANSCRIPT

Page 1: Recommender system

Recommender System

hellojinjie2013-06-

19

Page 2: Recommender system

We will talk about

◦ Netflix Prize

◦ Major challenges

◦ Definitions of subjects and

problems

◦ Recommend methods

◦ Mahout

◦ CNTV 5+ VIP Recommendation

Page 3: Recommender system

We will not talk about

◦ Architecture of a recommender

system

◦ How to make it robust and

scalability

Page 4: Recommender system

Netflix Prize

◦ Netflix, Inc. is an American provider of on-demand Internet

streaming media and flat rate DVD-by-mail

◦ 60% of DVDs rented by Netflix are selected based on

personalized recommendations.

Page 5: Recommender system

Netflix Prize

◦ In October 2006, Netflix released a dataset containing

approximately 100 million anonymous movie ratings and

challenged researchers and practitioners to develop

recommender systems that could beat the accuracy of the

company's recommendation system, Cinematch.

◦ On 21 September 2009, the grand prize of $1,000,000 was

awarded to a team that over performed the Cinematch's

accuracy by 10%.

Page 6: Recommender system

Major challenges

◦ Data sparsity – 数据庞大;评分分布不均匀。◦ Scalability– 数据庞大;增量更新。◦ Cold start – 新来的用户◦ Diversity vs. accuracy – 不要把路人皆知的推介给我◦ Vulnerability to attacks – 有榜单,就有人刷榜◦ The value of time – 不同时期喜欢不同的东西◦ Evaluation of recommendations – 不同的推介方法谁好谁

差◦ User interface – 优化的展示方式,让用户乐于接受我们的推

Page 7: Recommender system

Evaluation Metrics for Recommendation

◦ The training set ET

-- The training set is treated as known

information

◦ The probe set EP

-- no information from the probe set is

allowed to be used for recommendation.

Page 8: Recommender system

Evaluation Metrics for Recommendation

◦ Accuracy Metrics

◦ Mean Absolute Error (MAE)

◦ Root Mean Squared Error (RMSE)

Page 9: Recommender system

Evaluation Metrics for Recommendation

Page 10: Recommender system

Evaluation Metrics for Recommendation

◦ Precision is the proportion of top

recommendations that are good.

◦ Recall is the proportion of good

recommendations that appear in top

recommendations.

Page 11: Recommender system

Evaluation Metrics for Recommendation

Page 12: Recommender system

Classifications of recommender systems

◦ Content-based recommendations

◦ Collaborative recommendations

◦ Memory-based collaborative filtering

◦ Standard similarity-based methods

◦ methods employing social filtering

◦ Model-based collaborative filtering

◦ dimensionality reduction methods

◦ diffusion-based methods

◦ Hybrid approaches

Page 13: Recommender system

Similarity-based methods

◦ User-based recommender

for every other user w

compute a similarity s between u and w

retain the top users, ranked by similarity, as a

neighborhood n

for every item i that some user in n has a preference for,

but that u has no preference for yet

for every other user v in n that has a preference for i

compute a similarity s between u and v

incorporate v's preference for i, weighted by s, into a

running average

Page 14: Recommender system

Similarity-based methods

◦ User-based recommender

DataModel model = new FileDataModel(new File("intro.csv"));

UserSimilarity similarity = new

PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood =

new NearestNUserNeighborhood(100, similarity,

model);

Recommender recommender =

new GenericUserBasedRecommender(model,

neighborhood, similarity);

Page 15: Recommender system

Similarity-based methods

◦ User-based recommender

• Data model, implemented via DataModel• User-user similarity metric, implemented via UserSimilarity• User neighborhood definition, implemented via

UserNeighborhood• Recommender engine, implemented via a Recommender

(here, GenericUserBasedRecommender)

Page 16: Recommender system

Similarity-based methods

◦ Item-based recommender

for every item i that u has no preference for yet

for every item j that u has a preference for

compute a similarity s between i and j

add u's preference for j, weighted by s, to a running

average

return the top items, ranked by weighted average

Page 17: Recommender system

Similarity-based methods

◦ Item-based recommender

DataModel model = new FileDataModel(new File("intro.csv"));

ItemSimilarity similarity = new

PearsonCorrelationSimilarity(model);

Recommender recommender =

new GenericUserBasedRecommender(model,

similarity);

Page 18: Recommender system

Summary of available recommender implementations

in Mahout

Page 19: Recommender system

CNTV 5+ VIP Recommendationpassport_260676's preference

( 上半场 11:00) 9- 马竞 - 拉达梅尔 . 法尔考 攻入一球 lfp 3.0( 第一节 08:59) 6-EAST- 勒布朗 . 詹姆斯 灌篮得分 nba 5.0MV- 即刻出发(演唱:吉克隽逸) nba 3.0

( 第二节 11:00) 24-EAST- 保罗 . 乔治 灌篮得分 nba 5.0

userBasedBooleanPref

( 第四节 00:47) 32-WEST- 布雷克 . 格里芬 灌篮得分 nba 20.860504 ( 第二节 02:33) 32-WEST- 布雷克 . 格里芬 接 24-WEST- 科比 . 布莱恩特 传球,灌篮 nba 17.332127

wings nba 9.839406

( 上半场 22:00) 7- 皇家马德里 - 克里斯蒂亚诺 . 罗纳尔多 自摆乌龙 lfp 8.962188 托尼 · 帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手 + 飞跃海报 nba 7.2042103 ( 下半场 58:00) 10- 巴塞罗那 - 梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣 + 向文斯 · 卡特致敬 nba 7.0464416 ( 第三节 01:25) 34- 掘金 - 贾维尔 . 麦基 灌篮得分 nba 6.4302483 http://172.16.0.237:10008/recommend/userID/260676/howMany/10

Page 20: Recommender system

CNTV 5+ VIP Recommendationpassport_260676's preference

( 上半场 11:00) 9- 马竞 - 拉达梅尔 . 法尔考 攻入一球 lfp 3.0( 第一节 08:59) 6-EAST- 勒布朗 . 詹姆斯 灌篮得分 nba 5.0MV- 即刻出发(演唱:吉克隽逸) nba 3.0

( 第二节 11:00) 24-EAST- 保罗 . 乔治 灌篮得分 nba 5.0

userBasedBooleanPref

( 第四节 00:47) 32-WEST- 布雷克 . 格里芬 灌篮得分 nba 20.860504 ( 第二节 02:33) 32-WEST- 布雷克 . 格里芬 接 24-WEST- 科比 . 布莱恩特 传球,灌篮 nba 17.332127

wings nba 9.839406

( 上半场 22:00) 7- 皇家马德里 - 克里斯蒂亚诺 . 罗纳尔多 自摆乌龙 lfp 8.962188 托尼 · 帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手 + 飞跃海报 nba 7.2042103 ( 下半场 58:00) 10- 巴塞罗那 - 梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣 + 向文斯 · 卡特致敬 nba 7.0464416 ( 第三节 01:25) 34- 掘金 - 贾维尔 . 麦基 灌篮得分 nba 6.4302483

Page 21: Recommender system

CNTV 5+ VIP Recommendationpassport_260676's preference

( 上半场 11:00) 9- 马竞 - 拉达梅尔 . 法尔考 攻入一球 lfp 3.0( 第一节 08:59) 6-EAST- 勒布朗 . 詹姆斯 灌篮得分 nba 5.0MV- 即刻出发(演唱:吉克隽逸) nba 3.0

( 第二节 11:00) 24-EAST- 保罗 . 乔治 灌篮得分 nba 5.0

userBasedBooleanPref

( 第四节 00:47) 32-WEST- 布雷克 . 格里芬 灌篮得分 nba 20.860504 ( 第二节 02:33) 32-WEST- 布雷克 . 格里芬 接 24-WEST- 科比 . 布莱恩特 传球,灌篮 nba 17.332127

wings nba 9.839406

( 上半场 22:00) 7- 皇家马德里 - 克里斯蒂亚诺 . 罗纳尔多 自摆乌龙 lfp 8.962188 托尼 · 帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手 + 飞跃海报 nba 7.2042103 ( 下半场 58:00) 10- 巴塞罗那 - 梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣 + 向文斯 · 卡特致敬 nba 7.0464416 ( 第三节 01:25) 34- 掘金 - 贾维尔 . 麦基 灌篮得分 nba 6.4302483

Page 22: Recommender system

CNTV 5+ VIP Recommendationpassport_260676's preference

( 上半场 11:00) 9- 马竞 - 拉达梅尔 . 法尔考 攻入一球 lfp 3.0( 第一节 08:59) 6-EAST- 勒布朗 . 詹姆斯 灌篮得分 nba 5.0MV- 即刻出发(演唱:吉克隽逸) nba 3.0

( 第二节 11:00) 24-EAST- 保罗 . 乔治 灌篮得分 nba 5.0

userBasedBooleanPref

( 第四节 00:47) 32-WEST- 布雷克 . 格里芬 灌篮得分 nba 20.860504 ( 第二节 02:33) 32-WEST- 布雷克 . 格里芬 接 24-WEST- 科比 . 布莱恩特 传球,灌篮 nba 17.332127

wings nba 9.839406

( 上半场 22:00) 7- 皇家马德里 - 克里斯蒂亚诺 . 罗纳尔多 自摆乌龙 lfp 8.962188 托尼 · 帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手 + 飞跃海报 nba 7.2042103 ( 下半场 58:00) 10- 巴塞罗那 - 梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣 + 向文斯 · 卡特致敬 nba 7.0464416 ( 第三节 01:25) 34- 掘金 - 贾维尔 . 麦基 灌篮得分 nba 6.4302483

Page 23: Recommender system

Q&A

References1. Sean Owen, Mahout in Action2. Linyuan Lv, Recommender

Systems

Page 24: Recommender system

Architecture of NeuRecommendation

IMS etc.

Dispatcher

Recommender Recommender

Data Feeder

Request for recommendation

Dispatch request using round robin

Fetching users’ preferences

Page 25: Recommender system

Architecture of NeuRecommendation

Data Store Mahout

RPC

Recommender 1. Serve recommendation request

2. Fetch users’ preferences

Page 26: Recommender system

END