introduction to core science models

Introduction to Core Science ModelsYahoo! Labs

2011/19/11

Agenda

Basic Counting Models: EMP

Feature Based Models: OLR

RLFM: Feature Model + Collaborative Filtering

Bonus: Tutorial on Collaborative Filtering

Note:

› Will focus on the science framework

› Will not focus on the optimization problem

EMP + OLR:


› Simple CTR model based on counting clicks/views




Today Module on Yahoo FP:

Counting Models: CTR

Estimate CTR for each article independently

CTR = Click-Thru-Rate = Total Clicks / Total Views

Online Model: Update every 5 mins:

++

=

++++++=

∑∑

<

<

−

−

ts st

ts st

11tt

11tt

VV

CC

V...VV

C...CCCTR

t'' period during viewsV

t'' period during clicksC

t

t

==

CTR Curves for Two Days

Traffic obtained from a controlled randomized experimentThings to note: (a) Short lifetimes, (b) temporal effects, (c) often breaking news stories

Each curve is the CTR of an item in the Today Module over time

Counting Models: Most Popular

EMP: Estimated Most Popular ( aka GMP ):› Decay = Forget about old clicks and views ( Gamma > 0.95-0.99 )

Segmented Most Popular:› Separate model for each segment of the population

++++++=

−−

−−

...VγVγtV

...CγCγCCTR

2t2

1tt

2t2

1ttEMP

++++++=

−−

−−− ...VγVγV

...CγCγCCTR

Male2t

2Male1t

Malet

Male2t

2Male1t

Malet

MaleEMP

Tracking behavior of Estimated Most Popular model

Low click rate articles – More temporal smoothing

OLR: Online Logistic Regression



› Motivation for using regression

› Logistic Regression framework

› Online Logistic Regression: general case

› Per item-OLR Use Case : Today Module

› Improving Model


Affinity Models: Log Odds


Motivation for using Regression:

Logistic Regression:› Natural framework to include more features: › Age, Gender, Location,User Interests,…

› Xk,u = value for feature k and user u: eg age of a user

› Wk = weight parameter to be learned for each feature

++++=

−

−

...VγV

...CγCCTR

Male_40_NY1t

Male_40_NYt

Male_40_NY1t

Male_40_NYt

Male_40_NY

∑ = ∗+=−{features}k uk,kclickclick XWb))P/(1Log(P

• EMP: Breaks down if segment is too small: • eg 40 yrs old Male in NewYork

Linear Regression: One Dimension

2

{examples}i ii b)Xa(YSSE ∑ = −∗−=

• Find value of “a” and “b” that minimize Sum of Square of Errors (SSE)

• Take derivative of SSE with respect to “a” and “b” and equal to 0

60 70 80 90 100 110 12080

100

120

140

160

180

Linear Fit: Y = a * X + b

X = Height

Y =

Wei

gh

t

ERROR

Can’t Apply Linear Model to Click Prediction

For example: Probably of Click for article on Retirement as function of Age

0 10 20 30 40 50 60 70 80 90 1000.0

0.2

0.4

0.6

0.8

1.0Linear Doesn't Represent the Data Well

Data Points

Linear Model

Pro

ba

bili

ty o

f C

lick

Age

Logistic Model for Click Prediction Probably of Click for article on Retirement as function of Age

)*(1

1)(

bAgeaExpClickP

−−+=

Age

0 10 20 30 40 50 60 70 80 90 1000.0

0.2

0.4

0.6

0.8

1.0Logistic Model is much better

Data Points

Logistic Model

Pro

ba

bili

ty o

f C

lick

Logistic Regression: One Dimension

• How to find parameter “a” and “b” for many training examples:

• Maximized Product of Probabilities (Likelihood):

• “Hard” to solve

))((exp1

1)(

bAgeaYiYP

ii −⋅⋅−+

=

=>−==>+=

1

1

i

i

Y

Y P(Yi=+1) = Prob user Clicked on article

P(Yi=-1) = Prob user Didn’t Click

)()()()( 321 nYPYPYPYPLikelihood ⋅⋅⋅⋅⋅=

),( ii AgeY

Optimize Logistic Likelihood for 4 Data Points:)()()()()4..1( 4321 YPYPYPYPLikelihood ⋅⋅⋅=

-0.3 -0.2 -0.1 0 0.1 0.2 0.30

2

4

6

8

10

12

Likelihod

Prob(Y1)

Prob(Y2)

X-axis: parameter “a”

b))Age(aYi(exp1

1)P(Y

ii −⋅⋅−+

=

For simplicity: I assume that I know the value of “b”

Optimize Logistic Likelihood for 40 Data Points:

-0.08 -0.05 -0.02 0 0.03 0.05 0.08 0.1 0.13 0.15 0.18 0.20

2

4

6

8

10

12

40 Data Points

4 Data Points

Lik

elih

oo

d (

res

cale

d )


)()()()40..1( 4021 YPYPYPLikelihood ⋅⋅⋅⋅=

For simplicity: I assume that I know the value of “b”

Gaussian Approximation to Likelihood:


(a)Likelihood)/2)m(aExp( 40240

240 ≈−− σ

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.200.0E+00

2.0E+00

4.0E+00

6.0E+00

8.0E+00

1.0E+01

1.2E+01

Gaussian_Max

Likelihood40

• Replace Likelihood with a simple Gaussian with two Hyperparameters:

* Mean: m40 (what is the average value for “a”)

* Standard deviation: (what is the error around the mean)

40σ

40m

40σ

Gaussian approx allow for Update for one data point at a time:

)}P(Y)P(Y){P(Y)P(Y)/2)m(aExp( 1383940240

240 ⋅⋅⋅⋅⋅≈−− σ

)/2)m(aExp()P(Y)/2)m(aExp( 239

23940

240

240 σσ −−⋅≈−−

(a)Likelihood)/2)m(aExp( 40240

240 ≈−− σ

Posterior Likelihood * Prior

≈

• Note: for simplicity I ignored all normalizations

OLR: Online Logistic Regression: one parameter

• Solve Bayesian update for each new event:

b))Age(aY(exp1

1P(Y)

−⋅⋅−+=

)/2)m(aExp(P(Y))/2)m(aExp( 21-t

21-t

2t

2t σσ −−⋅≈−−

• Yrank approximate solution: Scott Roy talk: http://twiki.corp.yahoo.com/pub/Personalization/YRank/YRankLearning.ppt

• Yrank update formulas:

+=

+=

−

−

21t

2t

1tt

1/1/

mm

σσ


≈

Age)(Y,

http://twiki.corp.yahoo.com/pub/Personalization/YRank/YRankLearning.ppt

OLR: Online Logistic Regression: General Case

• Solve Bayesian update for each new event:

)XwY(exp1

1P(Y)

fff∑ ⋅⋅−+

=

)/2σ)m(wExp(P(Y))/2σ)m(wExp(f

21-tf,

21-tf,f

f

2tf,

2tf,f ∑∑ −−⋅≈−−

• Yrank update formulas:

+=

+=

−

−

21tf,

2tf,

1tf,tf,

1/σ1/σ

mm

• Replace one parameter “a” by a set of parameters:

• Replace on feature “Age” by a set of features:

}{w f

}{Xf

}){X(Y, f


≈

OLR: General Case: Features

• Multi-dimension logistic regression model:

))Xw(Y(exp1

1P(Y)

{features}fff∑

=⋅⋅−+

=

Sports)about_i&likeSports(u8

about_NBAi7tsabout_Spori6

likeSportsu5SanJoseu4Age40su3Maleu2

1ff

f

*

**

1*

==

==

====

+

++

++++

∗=∑

Xw

XwXw

XwXwXwXw

wXw<= Baseline

<= User Features

<= Article Features

<= User*Article Features

• More on Features:http://twiki.corp.yahoo.com/view/SRelevance/NewsRecommendationFeatureshttp://twiki.corp.yahoo.com/view/SRelevance/COREUserProfilesSparsePolarity

http://twiki.corp.yahoo.com/view/SRelevance/NewsRecommendationFeatures

http://twiki.corp.yahoo.com/view/SRelevance/COREUserProfilesSparsePolarity

OLR: Online Logistic Regression



› Motivation for using regression

› Logistic Regression framework

› Online Logistic Regression: General Case

› Per item-OLR Use Case : Today Module

› Improving Model


Affinity Models: Log Odds


Per item-OLR use Case: Yahoo FP Today Module

Per item-OLR use Case: Yahoo FP Today Module• Front Page Module:

• Article don’t live very long ( < day )• Many clicks/views for each article

• Each Article treated independently: • A new OLR model for each new Article

• Trying to predict CTR for each user & article pair: u,i

)Xw(exp1

11)P(Y

fu,ures}{user_featf fi,ui ⋅−+

==∑ =

likeMusicu7likeNFLu6likeSportsu5

NewYork4Age20su3Maleu2

1fu,f

X*wX*wX*w

X*wX*wX*w

wX*w

===

==

+++

+++=∑ <= Baseline

<= User Features


)Xw(exp1

11)P(Y


==∑ =

-6 -5 -4 -3 -2 -1 0 1 2 3 40.0

0.2

0.4

0.6

0.8

1.0

∑ ⋅ Xw

P(Y

i=1)

1w


))/2)m(w(Exp~Prior 2fi,

2fi,{features}k fi, σ−−∑ =

• Each Article has its own OLR Model and its own set of weights: }{w fi,

• Each Article has its own:

Yrank Update Formula:

+=

+=

−

−

21tf,i,

2tf,i,

1tf,i,tf,i,

1/1/

mm

σσ

)Xw(exp1

11)P(Y


==∑ =

• For each event (Yui,{Xuf}) update the hyperparameters for that article:


• How to use OLR model:

• Choose a candidate pool: • Roughly 50-100 pick by editors

• Explore: • In a small bucket: try all 50-100 articles randomly• Modeling: For each event(click/view) apply Yrank for that

article

• Exploit: • For the reminder (larger bucket) • Scoring: Predict article CTR, and order by decreasing CTR:

)Xm(exp1

11)P(Y CTR

fu,features}{userf fi,ui ⋅−+

===∑ =

Improving Online Learning:

• Correlated OLR: Include interactions between hyperparameters: improvement

))m(w)Am(w)2/1((Exp~Prior f2f21

f2f1,f1f2f1, f1 −−− −∑

• Mini-Batch: Update multiple data points at once: no gain in CTR

• TechPulse 2011: Taesup Moon, Pradheep Elango, Su-Lin Wuhttp://twiki.corp.yahoo.com/pub/YResearch/CokeLabDiary/techpulse.pdf

)P(Y)P(Ybatch) (miniLikelihood n1 =

Improving Explore/Exploit: UCB

• UCB: improve Explore/Exploit strategy: improvement

• Old strategy: • Explore: update OLR only from events in a small random bucket• Exploit:

• Order articles in decreasing value of predicted CTR

• New strategy: UCB (aka Upper Confidence Bound)• Single bucket• Explore:

• Update OLR with all events• Exploit:

• Order articles in decreasing value of “optimistic”

greedy-ε

UCBCTR

• TechPulse 2011: Taesup Moon, Pradheep Elango, Su-Lin Wuhttp://twiki.corp.yahoo.com/pub/YResearch/CokeLabDiary/techpulse.pdf

Improving Explore/Exploit: UCB

• Upper Confidence Bound strategy: improvement• Exploit:

• Order articles in decreasing value of “optimistic”

• ONE DIMENSION EXAMPLE:

X)m(exp1

1CTR

⋅−+=

))XσzX(m(exp1

1CTR

2UCB⋅⋅+⋅−+

=

UCBCTR

parameter tunable z =

• Replace normal CTR:

• With optimistic CTR:

-6 -5 -4 -3 -2 -1 0 1 2 3 40.0

0.2

0.4

0.6

0.8

1.0

UCBCTRCTR

RLFM: Regression based Latent Factor Model



RLFM: Feature Model + Collaborative Filtering› RLFM components

› Using RLFM: Offline & Online update


RLFM: Regression based Latent Factor Model• RLFM: basic idea

* Build a single logistic regression model for all users “u” and articles “i”

* Add Collaborative Filtering using Matrix Factorization• Modeling:

• Most of it is done offline in big batch mode ( millions events )• One part of the model is also updated online ( one event using Yrank

update)

⇒Latent Factor Models are work in progress:

• Original Y Labs Paper: Deepak Agarwal, Bee-Chung Chenhttp://twiki.corp.yahoo.com/pub/YResearch/CokeLabDiary/featfact.pdf

• Implementation for Coke:http://twiki.corp.yahoo.com/view/YResearch/RLFMForCoke

http://twiki.corp.yahoo.com/pub/YResearch/CokeLabDiary/featfact.pdf


RLFM components:

1) Build a logistic regression model for all users “u” and

articles “i”

2) Add user bias and article bias

3) Collaborative Filtering using Matrix Factorization

4) Predict factors for new user/article: Cold Start

5) Add Logistic Regression + Bias + Matrix Factorization

1) Build logistic regression for all user/articles:

)*(exp1

1)1P(Y

}esall_featur{ffui,f

ui ∑=

−+==

Xw

Sports)about_i&likeSports(u8

about_NBAi7tsabout_Spori6

likeSportsu5SanJoseu4Age40su3Maleu2

1fui,f

f

*

**

*

==

==

====

+

++

++++

=∑

Xw

XwXw

XwXwXwXw

wXw<= Baseline

<= User Features

<= Article Features

<= User*Article Features

• A single set of parameters {Wf} for all users, articles• Learned offline in batch mode

• Build a single logistic regression model for all users {u}, articles {i}:

2) Add per user and per article baseline:

• Add bias parameters:● Some article are more/less popular than other● Some user read more/less stories than other

)X*wβα(exp1

1)1P(Y

res}{all_featuffui,fiu

ui ∑=

−−−+==

• Baseline is not the same for every user/article:• Old Baseline:

• New baseline: iu1 βαw ++

• More parameters to optimize:

• Better with some priors – to be described later

1w

}β{},α{},{w iuf

3) Matrix Factorization Motivation

• How do deal with:

• Article about disaster preparedness:

• Hurricanes: need user from coastline: Texas => Northeast

• Earthquakes: need user from West coast

• Would need : X_user_WestCoast * X_about_earthquakes• I don’t have that …

• But if I have many views/clicks over many such articles I can discover that pattern !!!


Use

rs

Earthquake Politics

=

1111

0010

1000

0001

0010

0000

1111

1111

CDC

NewYork

Oakland

SanJose

licks

• I can discover patterns within clicks:• SIMPLE EXAMPLE:

0

0

1

1

)00001111(

+

1

0

0

0 )11110000(• Clicks mostly explained by:

2211 V*UV * U +


)V*UV*U(exp1

11)P(Y)P(Click

i,2u,2i,1u,1uiui −−+

===

)V*U(exp1

11)P(Y

ikuk{factors}k

ui ∑=

−+==

• The general case:

• Note:• Number of factors ~ 50-200 << Nusers & Narticles

• Most Clicks explained by:2211 V*UV*U +

3) Matrix Factorization Model

• Matrix Factorization Model: aka Collaborative Filtering

• Obtain U’s and V’s: maximize the following likelihood

−+=

∑Π=

= ))V*U(Y(exp1

1Likelihood

{factors}kikukui{examples}ui

)V*U(exp1

1)1(P

{factors}kikuk

ui ∑=

−+==Y

... priors someh Better wit

-1 viewsand 1clicksY

ews)(clicks/vi eventspast allover Product

ui

ui

•=+==>•

=>Π•

3)Matrix Factorization Model

• Get U’s and V’s: Maximize Likelihood * Prior

ikb

uka

ba

2b

2bikik

2a

2aukuk

V allfor same theisσ

Uallfor same theisσ

0,m and 0m :Choose

)/2)m-Exp(-(V~Veach for prior

)/2)m-Exp(-(U~each Ufor prior

:Priors someh Better wit

⋅

⋅

==⋅⋅

⋅

•

σσ

• Note: above priors are uncorrelated• Original RLFM paper used correlated priors

4) Matrix Factorization Model – Cold Start Problem

• Matrix Factorization Model:

• Cold start problem: =>for new user U=0 or for new article V=0

)V*U(exp1

1)(P

{factors}kikuk

ui ∑=

−+=Y

4) Matrix Factorization Model – Cold Start Problem

• Matrix Factorization Model:

• Cold start problem: =>for new user U=0 or for new article V=0

)V*U(exp1

1)(P

{factors}kikuk

ui ∑=

−+=Y

• Solution choose different prior:

)/2σ)XD(VExp(Veach for

)/2σ)XG(UExp(each Ufor

2b

2bi,bk,

ures}{item_featbikik

2a

2au,ak,

ures}{user_feataukuk

∑

∑

=

=

−−=⋅

−−=⋅

• Parameters G’s & D’s obtained from maximizing: Likelihood * Prior

5) RLFM: Regression based Latent Factor Model

• Putting it back together: Bias + Regression + Matrix Factorization:

)V*UX*wβα(exp1

1)1(P

{factors}kikuk

{features}ffui,fiu

ui ∑∑==

−−−−+==Y

)/2σ)XD(VExp(Veach for

)/2σ)XG(UExp(each Ufor

2b

2bi,bk,

ures}{item_featbikik

2a

2au,ak,

ures}{user_feataukuk

∑

∑

=

=

−−=⋅

−−=⋅• Priors:

)/2σ)Xd(βExp(βeach for

)/2σ)Xg(αExp(αeach for

2β

2bi,b

ures}{item_featbii

2α

2au,a

ures}{user_featauu

∑

∑

=

=

−−=⋅

−−=⋅




RLFM: Feature Model + Collaborative Filtering› RLFM components

› Using RLFM: Offline & Online update


Using RLFM: Offline Modeling:

• Offline Modeling:

• Batch mode: Maximize: Likelihood * Prior

• Millions to Billions of examples processed at once

• Input: {Y’s, X’s} all events and features

• Output:

}{D},{G},{d},{g},{w :parameters

}{V},{U},{β},{α :factors

bk,ak,baf

ikukiu

Using RLFM: Online Modeling and Scoring:

• Online Scoring + some Modeling

• For new user or new article: compute factors from g,d,G,D

• new user bias:

• For old user or old article: get factors from offline batch mode

• For each event (click/view) on article “i”:

• Update Vik using per-item OLR approach

• Predict score using updated Vik:

)V*UX*wβα(exp1

1)1(P

{factors}kikuk

{features}ffui,fiu

ui ∑∑==

−−−−+==Y

au,aures}{user_feata

u Xgα ∑=

=

RLFM: Offline Results on Coke Data: Today Module• RLFM results on Offline experiment

• Y! Front Page – Today Module• CTR relative lift for RLFM vs Feature-Only as function of clicks/user

http://twiki.corp.yahoo.com/view/YResearch/RLFMReplayExperiments

Q & AContributors:

Pradheep Elango, Su-Lin Wu,Teasup Moon, Pranam Kolari

Deepak Agarwal, Bee-Chung Chen, Scott Roy

Jean-Marc Langlois

•Coke Science Papers:

http://twiki.corp.yahoo.com/view/YResearch/CokeLabDiary

Tutorial onCollaborative Filtering

Based on following Chapter

http://research.yahoo.com/files/korenBellChapterSpringer.pdf

By two of the Netflix winners



Collaborative Filtering: Introduction

Goal: predict ratings rui for a movie “i” that a user “u” hasn’t seen yet› Prediction based on Matrix of User/Movie Ratings:

● rui = 1 through 5 stars:

› Prediction equations for integer Ratings are simpler then for binary Clicks

› Rating matrix is a large very sparse matrix: ● 10M-100M users and 10k-100K movies but with ~99% blank entries

Based on : http://research.yahoo.com/files/korenBellChapterSpringer.pdf› This talk: Focus on the most relevant models & Ignore some improvements:

● Baseline adjustment : user bias, movie bias and overall average rating

● time aware model, binary features ( rated, rented )

This talk: › Adjusted Ratings:

)(Baselinerawadjusted

uiuiui rr −<=

Collaborative Filtering: the models

Correlated Neighborhood Model› Predict new rating based on ratings of similar movies

Global Neighborhood Model› Enlarge Neighborhood to be “global”

› Introduce adjustable weight parameters

Factorized Neighborhood Model› Apply matrix factorization to weight parameters

SVD Model› Apply matrix factorization to rating matrix itself

Collaborative Filtering: Correlated Neighborhood Model

• Define movie-movie Similarity measure: ● Sij based on correlation

• Define Correlated Neighborhood: • set of ~20 movies with largest Sij that are rated by “u”

• Define Weight : normalized Sijui

uj1

uj2

uj3

uj4

uj5uj6

• Predict unknown rui based on known ratings of similar movies ruj

• You will like movie“i” because you liked movies “j”

Sij6

ionNormalizat

j)Union(i,

/uju

uiij rrS ∗∝ ∑=


• Movies:• i=1 Star Trek• i=2 Star Wars• i=3 Action movie• i=4 Horror movie

ionNormalizat

j)Union(i,uj

uui

ij

rr

S

∗∝

∑=

==

−−−1111

1111

1111

1111

Ratings uir

=

15.000

5.015.05.0

05.011

05.011

ijS

Movie-Movie Similarity = Sij:

Movies

Use

rs

Movies

Mov

ies


∑∑ ∗== j

ijijj

ujui SSrr /~s}d_neighbor{correlate

• Predict unknown rui based on known ratings of similar movies ruj

• You will like movie“i” because you liked movies “j”

• Simple, intuitive model with ability to explain why we predict a new movie

• Modeling:• Need to precompute and store Sij: 10k * 10K = 100M• Weights are fixed to normalized value of Sij

• Optimal neighborhood is small

uju

uiij rrS ∗∝ ∑= j)Union(i,

uiuj1

uj2

uj3

uj4

uj5uj6

Sij6

• Similarity measure:

• Correlated Neighborhood: • set of ~20 movies with largest Sij that are rated by “u”

• Weight : • normalized Sij

• Scoring:

Collaborative Filtering: Global Neighborhood Model

• Modeling: Pick Wij to minimize regularized Sum of Errors:

• Extend Neighborhood to All Known Ratings for User “u”:• Let weight Wij be free parameters:

• Scoring:

∑∑ ∑ +∗−== ∈ ij

ijui

uRijuRjujui wwrrSSE 2

}gspast_ratin{

2)|(|

)(

)( / λ

• Better predictive power then previous model• Not easy to explain recommendation

• Expensive Modeling, Scoring and Storage of Wij: Size = 100M• Could try to limit based on Sij but there is a better approach

parametertion regulariza=λ

}{)( knownujruR =

)|(|

)(

/~uRij

uRjujui wrr ∗= ∑

∈

Reduce Number of Free Parameters: Matrix Factorization

• Want to reduce the number of free parameters in Wij: • Current size:10k * 10K = 100M

• Matrix factorization:• Goal: reduce number of free parameters to ~1M

=

=

...

1

1

1

............

...111

...111

...111

Weigth

• Toy example #1:• Weight matrix is uniform:

...)111(

• Replace my matrix(10k,10k) with outer-product of two vectors:• each 10k long


• Want to reduce the number of free parameters in Wij: • Current size:10k * 10K = 100M

• Matrix factorization:• Goal: reduce number of free parameters to ~1M

=

=

...

1

1

1

............

...111

...111

...111

Weigth

• Toy example #1:• Weight matrix is uniform:

...)111(

• Replace my matrix(10k,10k) with outer-product of two vectors:• each 10k long: U(10k), V(10k)• U & V are call factors

= U

( )V


=

=

1

1

1

1

9.0

0.18.00.18.0

8.00.18.00.1

0.18.00.18.0

8.00.18.00.1

Weigth

• Toy example #2:• Weight matrix is almost uniform:

)1111(

−+−+

+

1

1

1

1

1.0

)1111( −+−+

⋅= 11 Ud

⋅+ 22 Ud

( )1V ( )2V

jkikk

kij VUdW ⋅⋅= ∑= }1,2{

:Weights


=

=

61.0

59.0

30.0

44.0

18.2

94.071.075.014.0

83.084.035.050.0

13.030.042.051.0

10.055.061.074.0

Weigth

)53.058.048.039.0(

• Toy example #3:• Arbitrary weight matrix:

++−−

+

53.0

19.0

42.0

71.0

79.0

)67.003.018.072.0( −−

−+−−

+

59.0

78.0

14.0

14.0

36.0

)16.031.086.038.0( −

−++−

+

04.0

02.0

84.0

54.0

07.0

)50.075.001.043.0( −

• Noticed that: • An arbitrary N*N matrices can be decompose using N set of factors. • Note that amplitude are decreasing: d1 = 2.18 >> d4 = 0.04

• Can approximate weight matrix with a small set of factors

Note on convention for Matrix Factorization:

• Last equation is the definition of SVD (Singular Value Decomposition)

∑=k jkkikij VdUw

• Where factors U’s, V’s are chosen to be normalized:

● Independent from each other:

'if1' kkUUi

ikik ==∑

'if0' kkUUi

ikik ≠=∑• In this talk and in Koren & Bell’s chapter:

• The dk’s are incorporated inside the Uk,Vk:• Just a convention difference

• Where the factors are now normalized as: 'if' kkdUU ki

ikik ==∑

∑=k jkikij VUw

Collaborative Filtering: Factorized Neighborhood Model

• Apply Matrix Factorization to Wij:

∑=

=>}factors{k

jkikij VUw Choose: Nk (number of factors) << N (number of movies)

~200 << 10k-100K

• Scoring: Factorized Neighborhood Model:

)|(|

)({factors}

/~ )( uR

uRjjkuj

kuiui VrUr ∑∑

∈=∗=

∑∑∑ ∑∑ ++∗−== ∈= jk

jkik

ikui uRj

uRjkujk

ikui VUVrUrSSE 22

}gspast_ratin{

2

)(

)|(|

{factors}

)( / λλ

• Cheaper computation with same predictive power

)|(|

)(

/~uRij

uRjujui wrr ∗= ∑

∈• Recall Global Neighborhood Model:

• Where Wij are free parameters

Free parameters: Uui and Vjk

• Modeling:

Collaborative Filtering: SVD ModelSVD: Historical name for Matrix Factorization apply to Rating matrix

• Matrix Factorization apply to rui:

∑=

=>}factors{k

ikukui VUr Choose: Nk (number of factors) << N (number of movies)

~200 << 10-100K

• Scoring:ik

kukui VUr ∗= ∑

={factors}

~

∑∑∑ ∑ ++∗−== = ik

ikuk

ukui

ikk

ukui VUVUrSSE 22

}gspast_ratin{

2

{factors}

)( λλ

• Same predictive power• Not easy to explain recommendation

• Modeling:

Free parameters: Uui and Vjk

The End

introduction to core science models

Technology