Transcript

Beer Recommender Systems

Hsiang-HsuanHung(HHH)[email protected]

h=ps://github.com/HsiangHung/BI-analysis/Recommender

h=p://www.hsianghung.tech

IlikeBudweiser!!!

5 13

Beer RecommendaEon

?

IlikeBudweiser!!!

5 13

1 3.5 5Idon’tlikeBudweiser

Beer RecommendaEon

?

?

• Supervised learning, regression problems.

• Central concept: Similarity

1.) item-itemrecommendaEon (Amazon)

2.) user-itemrecommendaEon

CollaboraEve filtering (NeOlix, SpoEfy)

Personalized RecommendaEon

Recommender Pipeline

e-commerce data raEng data

web server

browse

reco

Hybrid recommender

RecommendaEon Engine Dashboard

item-item reco user-item reco

When Mr. Simpsons (id=7) is browsing beer-144:

RecommendaEon Engine Dashboard

item-item reco user-item reco

When Mr. Simpsons (id=7) is browsing beer-144:

user

s

beers

5 4 1 5 5 4

5 5 5 34 1 1

11 5 1 221

33

CollaboraEve Filtering (CF)

4 44

34 1

b8 = (1,�, 5,�, 1,�, 4)

b9 = (5,�, 1,�, 5, 4,�)

b10 = (5, 1, 2,�, 5, 4, 1)

17 33 34

42

45

47

48

5 6 7 8 9 10 11 12

Beer Vectors: raEng table

DB:BeHoppy

Beer Vector Space and Cosine Similarity

b1

b2

b4

b3

b 2 Rnum of users

b1

b2

b4

b3

sA,B =bA · bB

|bA||bB |

is more similar

to than

Beer Vector Space and Cosine Similarity

b 2 Rnum of users

When Mr. Simpsons (id=7) is browsing beer-144:

user-item reco

RecommendaEon Engine Dashboard

item-item reco

ru,3 = 4.5

ru,4 = 1

ru,2 = 3

b 2 Rm

ru,5 = 4

R̂u,1 =?

Neighborhood Models

rH,x = 1

rM,x

= 4.5

rT,x

= 5

R̂S,x

=?

u 2 Rn

ModelB: user-based ModelA: item-based

Model C: Latent-Factor Model (Easily Scale Up)

users preferences beer features

( )

( ) m u

sers

n beers n

m

f

f ⇡ ⇥

Computer(2009),Koren,BellandVolinsky

1 2 5 1 2 2

3 5 2 5 5 4

5 4 1 5 5 4

4 4 4

(�uS�)

(�uH�)

(�uK�)...

( )

b1 b2 b3

7 33 34 42 45 47 48

567891011

ru,i ' uTubipredicEon

User-Beer Vector Space

uS

u,b 2 Rf

R̂S,1 = uTSb1

f: # of latent factors b1

Models Performance

MAE =1

N

X

u,i

|R̂u,i � ru,i|

k=10-20

Challenges

(implicit purchase frequency) ru,b = 1� 5

Hu,KorenandVolinsky,2008 ru,b 2 I

• Cold Start (need more raEngs).

• Integrate implicit data: e-commerce data.

• Define confidence for each customer.

(explicit raEng)

My Background

•  Physics PhD@UCSD (2011)

•  ComputaEonal Physicist@UT AusEn and UIUC (2012-2015)

•  ComputaEonal physics and materials science

•  Data Engineering Fellow@Insight (2016)

•  Data ScienEst/Engineer@Anheuser-Busch (2016)

Thank you!

Sarwar,Karypis,Konstan,andRiedl,(2001)

weight

R̂u,i =

Pj2Sk sijru,jPj2Sk |sij |

ru,3 = 4.5

ru,4 = 1

ru,2 = 3

kNN + weight:

b 2 Rm

ru,5 = 4

=(0.8 ⇤ 4.5 + 0.7 ⇤ 4 + 0.2 ⇤ 3)

0.8 + 0.7 + 0.2

R̂u,1 =?

Model A: Item-based Neighborhood

users

beers

5 4 1 5 5 4

5 5 5 34 1 1

11 5 1 221

33

RaEngs as Features of Users Vectors

4 44

34 1

u7 = (5, 4,�, 1, 5, 5, 4,�)

u34 = (1, 1,�, 5, 1, 2, 2,�)

u45 = (4, 5,�, 1, 5, 5, 3, 1)

s45,7 > s45,34

7 33 34

42

45

47

48

5 6 7 8 9 10 11 12

Herlocker,Konstan,BorchesandRiedl,(1999)

weight

rH,x = 1

rM,x

= 4.5

rT,x

= 5R̂

u,x

=

Pv2S

k suv

rv,xP

v2S

k |suv

|

R̂S,x

=?

Find similar users:

u 2 Rn

Model B: User-based Neighborhood

What is the ALS?

( )

( )

(�uS�)

(�uH�)

(�uK�)...

( )

b1 b2 b3RaEng matrix ⇡ ⇥ R = UBT· · ·

567891011uS =

0

BBB@

uS,1

uS,2...

uS,f

1

CCCA=

⇣BTB+ �I

⌘�1BT

0

BBB@

rS,1rS,2...

rS,n

1

CCCA

More Detail: Normal EquaEons

rS

7 33 34 42 45 47 48

Hu,KorenandVolinsky,2008

567891011

bi =

0

BBB@

bi,1bi,2...

bi,f

1

CCCA=

⇣UTU+ �I

⌘�1UT

0

BBB@

r1,ir2,i...

rm,i

1

CCCA

uS =

0

BBB@

uS,1

uS,2...

uS,f

1

CCCA=

⇣BTB+ �I

⌘�1BT

0

BBB@

rS,1rS,2...

rS,n

1

CCCA

= ri

rSrS,i

7 33 34 42 45 47 48

Hu,KorenandVolinsky,2008 More Detail: Normal EquaEons

ImplemenEng ALS in Spark

• AlternaEng least square (ALS)

Solving Matrix-FactorizaEon LR

regularizaEon

minu,b,⇠

X

(u,i) if ru,i 6=0

⇣ru,i � uT

ubi � ⇠u,i⌘2

+ �⇣X

u

|uu|2 +X

i

|bi|2⌘

• ALS: at each step, fix one variable, and solve minimizaEon: fix , solve fix , solve fix , solve u ub b u b

Grid Search Using Cross-ValidaEon

�⇣|u|2 + |b|2

•  LogisEc regression + confidence weight

CF Using Implicit Data

minu,b,⇠

X

(u,i)

cu,i⇣pu,i � uT

ubi � ⇠u,i⌘2

+ �⇣X

u

|uu|2 +X

i

|bi|2⌘

user-item interacEon

bias

regularizaEon

confidence Hu,KorenandVolinsky,2008

cu,i = 1 + ↵ru,i

cu,i = 1 + ↵ log (1 + ru,i/✏)

cu,i = 1 + ↵ log (1 + ru,i/✏) + �ru,i

pu,i = 0/1

Implicit Data CF Performance

• Metric: percenEle-ranking

rank =

Pu,i ru,i ⇤ ranku,iP

u,i ru,i

• Random: rank = 50%

• CF (f=20):

Baseline: rank ⇠ 29%

rank ⇠ 16%


Top Related