convolutional neural network based recommender systemstat.snu.ac.kr/idea/seminar/20171128/cnn based...

Post on 08-Sep-2020

13 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Convolutional Neural Networkbased Recommender System

Deep Learning based Recommender System(Zhang et al. 2017)

Presented by Jiin Seo

November 28, 2017

Outline

1. Attention based CNN

2. Personalized CNN (CNN-PerMLP)

3. Deep Coperative Neural Network (DeepCoNN)

4. Convolutional Matrix Factorization (ConvMF)

5. CNN for Image Feature Extraction(VPOI)

6. CNN for Audio Feature Extraction(WMF)

7. CNN for Text Feature Extraction

Outline

1. Attention based CNN

2. Personalized CNN (CNN-PerMLP)

3. Deep Coperative Neural Network (DeepCoNN)

4. Convolutional Matrix Factorization (ConvMF)

5. CNN for Image Feature Extraction(VPOI)

6. CNN for Audio Feature Extraction(WMF)

7. CNN for Text Feature Extraction

1. Attention based CNN

Attention based CNN (Gong et al. 2016)

• Hashtag recommendation in microblog

• Multi-class classification problem

• (Global channel + Local channel) ⇒ Convolutional layer

• We adopt Attention Mechanism to scan input microblog and selecttrigger word. It chooses to focus only on a small subset of the wordsfor each tag.

1. Attention based CNN

Architecture

Figure: The architecture of the attention-based Convolutional Neural Network

1. Attention based CNN

Notations

• Given an input microblog m with length n,we take wi ∈ Rd for each word in the microblog.(d : dim. of the word vector)

• wi :i+j : the concatenation of words wi ,wi+1, · · · ,wi+j

1. Attention based CNN

Local Attention Channel . 1) Local attention layer

• Attention layer generates a seq. of trigger words (wi , · · · ,wj) from asmall window (window size: h)

• The score of the central word (w(2i+h−1)/2) is

s(2i+h−1)/2 = g(Ml ∗wi :i+h + b)

g : non-linear function, Ml ∈ Rh×d : parameter matrix, b: bias,

• Extract the trigger words.

wi =

{wi if wi > η,0 if wi ≤ η , 0 ≤ i ≤ n

• The threshold : η = δ ·min{s}+ (1− δ) ·max{s} ,s : seq. of scores

1. Attention based CNN

Local Attention Channel . 2) Folding layer

• Abstract the features of the trigger words(w).

z = g(Ml ∗ folding(w) + b)

where g : non-linear function, Ml ∈ Rd×r and b ∈ Rr

• folding : the sum operation for each dimension of all the trigger words

fi =∑j

wj ,i

• Output : fixed-length vector,which represents the embeddings of the trigger words w.

2. Attention based CNN

Global Channel . 1) Convolutional Layer

• All the words for each tag will be encoded.

• We use a CNN architecture to model whole microblog.

• Abstract the features.

zi = g(Mg ·wi :i+l−1 + b)

g : non-linear function, Mg ∈ Rl×d (l : window size) and b ∈ R• We Operate this filter on all combinations of the word in microblog{w1:l ,w2:l+1, · · · ,wn−l+1:}

• A map of feature :

z = [z1, z2, z · · · , zn−l+1]

1. Attention based CNN

Global Channel . 2) Pooling Layer

• A max-overtime pooling operation is applied.

• We can extract the most important feature for each feature map.

• To obtain multiple features,we use multiple filters with varying window sizes in the model.

• Output : fixed length vector,which represents the embeddings of the input microblog m .

1. Attention based CNN

Combining the Outputs of both channels

• Outputs of the local attention channel and the global channel.⇒ A simple convolutional layer

• Combine the information as follows :

h = tanh(M ∗ v[hg;hl] + b)

hg,hl : the feature vectors extracted from global and local channel,M : filter matrix for the convolutional operation, b : bias

1. Attention based CNN

Training

• Parameters : Θ = {W,Ml,Mg}W : words embeddings, Ml,Mg : the parameters of both channels

• Training Objective ftn :

J =∑

(m,a)∈D

−log(a | m)

,where D is the training corpus, a is the hashtag for microblog m.

• To minimize the objective ftn, we use AdaDelta.

1. Attention based CNN

Hashtag Recommendation

• Given an unlabelled dataset,Train our model on training data, and save the model which has thebest performance on the validate dataset.

• Encode the microblog through the local attention channel and globalchannel by the saved model.

• Combine the features generated from both channels.

• The scores of the hashtagsfor the d-th microblog by fully connected layer:

P(yd = a | hd ;β) =exp(β(a)Thd)∑j∈A exp(β(j)Thd)

A : set of candidate hashtags, β : parameters, h : feature vector

• Rank the hashtags for each microblog . And recommend thetop-ranked hashtags

1. Attention based CNN

Reult

• Attention based CNN outperforms state of-the-art methods.

• The trigger words methods could improve the performance.

• The multiple channels can achieve better performance than a singlechannel.

Outline

1. Attention based CNN

2. Personalized CNN (CNN-PerMLP)

3. Deep Coperative Neural Network (DeepCoNN)

4. Convolutional Matrix Factorization (ConvMF)

5. CNN for Image Feature Extraction(VPOI)

6. CNN for Audio Feature Extraction(WMF)

7. CNN for Text Feature Extraction

2. Personalized CNN (CNN-PerMLP)

Personalized CNN for Tag Recommendation (Nguyen et al. 2016)

• Image tag recommender system

• Personalized Content-Aware Tag Recommender suggests a ranked listof relevant tags.(Tu,i )

• CNN-PerMLP employs

• Convolution Neural Networks.• Personalized Fully-Connected Layer• Multilayer Perceptron as the Predictor

2. CNN-PerMLP

Architecture

Figure: The architecture of CNN-PerMLP

2. CNN-PerMLP

Notaions

• U : users, I : imagess, T : tags

• A = (au,i ,t) ∈ R|U|×|I |×|T|,

au,i ,t =

{1 if u assigns the tag t to the image i ,0 o.w.

• S := {(u, i , t) | (au,i ,t) ∈ A ∧ (au,i ,t) = 1} : the observed tagging set

• Tu,i := {t ∈ T | (u, i , t) ∈ S} : the set of relevant tags of user-image

• PS := {(u, i) | ∃t ∈ T : (u, i , t) ∈ S} : all observed posts

2. CNN-PerMLP

Notaions

• The collection of all RGB squared images :R = {Ri ,q | Ri ,q ∈ Rd×d×3 ∧ i ∈ I ∧ q ∈ Q}zi ∈ Rm : the visual features of the i-th image Ri ,Q :the patches

• The final scores of tags are calculated as follows :

y(u, i , t) = avgRi,q,,q∈Q

y ′(u,Ri ,q,, t)

• Top-K tag list :Tu,i := arg max

t∈T,|Tu,i |=K

y(u, i , t)

2. CNN-PerMLP

Convolution Neural Networks

• The visual features are achieved by passing a patch q of the image ithrough the CNN feature extractor.

• Convolutional layer

τkij = ϕ(bk +

p1∑a=1

(Wka ∗ ξa)ij)

τk : k-th feature map, ξa : a-th feature map* : convolutional operator, ϕ : activation ftnWk ∈ Rp1 × Rp2 × Rp2 , bk : weights and biases of filters for τk

• Max pooling operator

τkij = maxa,b

(ξk)a,b : k − th feature map

,

• Output :zqi = fcnn(Rq

i ) : Rd×d×3 → Rm

2. CNN-PerMLP

Personalized Fully-Connected Layer

• To personalize visual features of an image, the user’s information(ID)has to be combined with the features from the CNN .

• This layer captures the interaction between the user and each visualfeature.

• Input :

• zqi : the visual feature vector• κ: = {0, 1}|u| : the sparse vector (user’s features)

• Output (User-aware features) :

ψj(u, zqi ) = ϕ(bj + wper

j · (zqi )j + Vjκu)

wper ∈ Rm : the weights of the visual features ,V ∈ Rm×|U| : the weights of the user features,ϕ : activation ftn

2. CNN-PerMLP

Multilayer Perceptron as the Predictor

• To compute the scores of the tags, MLP is adopted.

• The network has one hidden layer.

• The Neural Network Score ftn :

y ′(u,Rq,i , , tj) = ϕ(wout

j · ϕ(Whiddenψ + bhidden) + boutj )

Whidden, bhidden : the weights and the biases of the hidden layerwoutj ∈Wout , bout : the weights and the biases of the output layer

2. CNN-PerMLP

Optimization• We adapt the Bayesian Personalized Ranking (BPR) optimization

criterion.• BPR finds the model’s parameters that maximize the difference

between the relevant and irrelevant tags.

Figure: The algorithm of BPR

Outline

1. Attention based CNN

2. Personalized CNN (CNN-PerMLP)

3. Deep Coperative Neural Network (DeepCoNN)

4. Convolutional Matrix Factorization (ConvMF)

5. CNN for Image Feature Extraction(VPOI)

6. CNN for Audio Feature Extraction(WMF)

7. CNN for Text Feature Extraction

3. Deep Coperative Neural Network (DeepCoNN)

DeepCoNN (Zheng et al. 2017)

• Joint Deep Modeling of Users and Items using Reviews

• DeepCoNN adopt two parallel CNNs to model User behaviors andItem properties from review texts

• In the shared layer, FM(Factorization Machine) is applied to capturetheir interactions for rating prediction.

3. DeepCoNN

DeepCoNN

• DeepCoNN alleviates the sparsity problem and enhances the modelinterpretability.

• DeepCoNN represents review text using pre-trained a wordembedding-technique.

3. DeepCoNN

Architecture

Figure: The architecture of DeepCoNN

3. DeepCoNN

Notations

• Each tuple (u, i , rui ,wui ) denotes a review written by user u for item iwith rating rui and text review of wui .

• A network for users (Netu) : user reviews −→ xu(rates)

• A network for items (Neti ) : item reviews −→ yi (rates)

• We focus on (Netu) in detail. The same process is applied for (Neti ).

3. DeepCoNN

Word Representation(Look-up Layer)

• A word embedding f : M→ Rn

• Matrix of word vector by user u :

Vu1:n = φ(du

1 )⊕ φ(du2 )⊕ · · · ⊕ φ(du

n )

duk : k-th word of singe document du

1:n, consisting of n wordsφ(du

k ) ∈ Rc : look-up ftn⊕ : the concatenation operator

• The order of words is preserved in matrix Vu1:n.

3. DeepCoNN

CNN Layers . 1) Convolution Layer

• Convolution layer consists of m neurons.

• Each neuron j in the convolutional layer uses filter Kj ∈ Rc×t .

• Convolution operation :

zi = f (Vu1:n ∗Kj + bj)

*: convolutional operatorf (x) = max{0, x}: activation ftn (ReLu)

3. DeepCoNN

CNN Layers . 2) Max Pooling Layer

• The most important feature of each feature map has been captured.

• Convolutional results are reduced to a fixed size vector.

oj = max{z1, z2, · · · , zn−t+1}

• Output vector of convolutional Layer, using multi-filters:

O = {o1, o2, · · · , on1}, n1 : # of kernel in the convolutional layer

3. DeepCoNN

CNN Layers . 3) Fully Connected Layer

• Output (rates for user u) :

xu = f (W ×O + g), xu ∈ Rn2×1

W: Weight matrix

• yi can be obtained with the same process.

• The dropout strategy has also been applied, to prevent overfitting,

3. DeepCoNN

Shared Layer

• This layer Maps the features of users and items into the same featurespace.

• Concatenate xu and yi into a single vector.

z = (xu, yi )

• Factorization Machine (FM) models all nested variable interactions inz.

• The Objective ftn :

J = w0 +

|z|∑i=1

wi zi +

|z|∑i=1

|z|∑j=i+1

< vi , vj > zi zj ,

w0, wi : the global bias and the strength of the i-th variable in z

< vi , vj >=∑|z|

f=1< ˆvi ,f , ˆvj ,f >

Outline

1. Attention based CNN

2. Personalized CNN (CNN-PerMLP)

3. Deep Coperative Neural Network (DeepCoNN)

4. Convolutional Matrix Factorization (ConvMF)

5. CNN for Image Feature Extraction(VPOI)

6. CNN for Audio Feature Extraction(WMF)

7. CNN for Text Feature Extraction

4. Convolutional Matrix Factorization (ConvMF)

ConvMF (Kim et al. 2016)

• Document context-aware recommendation model

• CNN (Convolutional neural network)+ PMF (Probabilistic matrix factorization)

• In the shared layer, FM(Factorization Machine) is applied to capturetheir interactions for rating prediction.

4. ConvMF

Architecture

Figure: The architecture of ConvMF

4. ConvMF

Convolutional neural network(CNN)

• Convolution layer for generating local features

• Pooling layer for representing data as more concise representation

4. ConvMF

Matrix Factorization(MF)

• Goal : Find latent models of users and items on a shared latent space .

• R ∈ RN×M : rating matrix (N users, M items)

• ui ∈ Rk , vj ∈ Rk : latent models of user i and item j

• The rating rij of user i on item j is approximated by the inner-productof corresponding latent models.

rij ≈ rij = uTi vj

• Minimize a Loss ftn :

L =N∑i

M∑j

Iij(rij − uTi vj)2 + λu

N∑i

‖ ui ‖2 +λv

M∑j

‖ vj ‖2

4. ConvMF

Probabilistic Model of ConvMF

• Goal : Find user and item latent models U ∈ Rk × N,V ∈ Rk ×M.

• UTV reconstructs the rating matrix R.

• Condi. dist. over observed ratings is given by

p(R | U,V, σ2) =N∏i

M∏j

N(rij | uTi vj , σ2)Ii j

, where N(x | µ, σ2) is p.d.f. of Normail dist.

• User latent models with zero-mean Gaussian prior are

p(U | σ2U) =N∏i

N(ui | 0, σUI )

4. ConvMF

Probabilistic Model of ConvMF

• Item latent model is generated from three variables:• internal weights W in CNN• Xj representing the document of item j• Gaussian noise

• Item latent model

vj = cnn(W,Xj) + εj

εj ∼ N(o, σ2VI )

• For each wk in W, we place zero-mean Gaussian prior are

p(W | σ2W) =∏k

N(wk | 0, σ2W)

• Condi. dist. over item latent model

p(V |W,X, σ2V) =M∏j

N(vj | cnn(W,Xj), σ2VI )

,where X is the set of description documents of items

4. ConvMF

CNN

• Goal : Generating document latent vectors from documents of items

• 1) embedding layer, 2) convolution layer, 3) pooling layer, and4) output layer

Figure: CNN architecture for ConvMF

4. ConvMF

CNN . 1) Embedding Layer

• A raw document −→ A dense numeric matrix

• Document : seq. of l words

• Document matrix :

D =

| | |· · · wi−1 wi wi+1 · · ·

| | |

,D ∈ Rp×l (1)

4. ConvMF

CNN . 2) Convolutional Layer

• Convolutional Layer extracts contextual features.

• Contextual feature is extracted by j-th shared weight Wjc ∈ Rp×ws :

c ji = f (Wjc ∗D(:,i :(i+ws−1)) + bjc)

* : convolution operator , ws: window size.f : activation ftn(ReLU)

• Contextual feature vector with Wjc

c j = [c j1, cj2, · · · , c

ji , · · · , c

jl−ws+1] ∈ Rl−ws+1

• We use multiple shared weights to capture multiple types ofcontextual features.

Wjc , j = 1, 2, · · · , nc

4. ConvMF

CNN . 3) Pooling Layer

• Max-pooling

df = [max(c1),max(c2), · · · ,max(c j), · · · ,max(cnc )]

4. ConvMF

CNN . 4) Output Layer

• We project df → on k-dim space of user and item latent models.

• Document latent vector using nonlinear projection:

s = tanh(Wf2{tanh(Wf1df + bf1)}+ bf2)

,where Wf1 ∈ Rf×nc ,Wf2 ∈ Rk×f are projection matricesand bf1 ∈ Rf , bf2 ∈ Rk are a bias vectors for Wf1 ,Wf2 with s ∈ Rk

• Output(document latent vector of item j) :

sj = cnn(W,Xj)

Xj : a raw document of item j , W : all the weight and bias variables

4. ConvMF

Optimization

• To optimize the variables , we use maximum a posteriori (MAP)estimation.

maxU,V,W

p(U,V,W | R,X, σ2, σ2U, σ2V, σ2W)

= maxU,V,W

[p(R | U,Vσ2)p(U | σ2U)p(V |W,X, σ2V)p(W | σ2W)]

L(U,V,W) =N∑i

M∑j

Iij2

(rij − uTi vj)2 +λU2

N∑i

‖ ui ‖2

+λV2

M∑j

‖ vj − cnn(W,Xj) ‖2 +λW2

|wk |∑k

‖ wk ‖2

,where λU = σ/σ2U, λV = σ/σ2V, and λW = σ/σ2W

4. ConvMF

- Optimization

• We adopt coordinate descent, to optimize the variables iteratively

ui ← (VIiVT + λUIK )−1VRi

vj ← (UIjUT + λVIK )−1(URj + λVcnn(W,Xj))

,where Ii = diag(Iij), j = 1, · · · ,M and Ri is a vector with (rij)Mj=1 for

user i.

• To optimize W, we use back propagation algorithm.

E(W) =λV2

M∑j

‖ (vj − cnn(W,Xj) ‖2 +λW2

|wk |∑k

‖ wk ‖2 +constant

4. ConvMF

Optimization

• With optimized U,V , and W, finally we can predict unknown ratingsof users on items.

rij ≈ E[rij | uTi vj , σ2]

= uTi vj = uTi (cnn(W,Xj) + εj)

4. ConvMF

Result

• ConvMF significantly outperforms the state-of-the-art competitors

• ConvMF well deals with the sparsity problem and skewed data withcontextual information.

• Pre-trained word embedding model increases the performance ofwhen the number of ratings is insufficient.

• ConvMF can distinguish subtle contextual difference of the sameword via different shared weights.

Outline

1. Attention based CNN

2. Personalized CNN (CNN-PerMLP)

3. Deep Coperative Neural Network (DeepCoNN)

4. Convolutional Matrix Factorization (ConvMF)

5. CNN for Image Feature Extraction(VPOI)

6. CNN for Audio Feature Extraction(WMF)

7. CNN for Text Feature Extraction

5. CNN for Image Feature Extraction(VPOI)

Visual Content Enhanced POI recommendation (VPOI) (Wang et al.2016)

• Goal : Recommending k un-visited POIs to each user.

• VPOI incorporates visual contents for POI recommendations

• Photos reflect users’ interests and informative descriptions aboutlocations.

Figure: Example of Images Posted by Users

5. VPOI

Architecture

Figure: The architecture of VPOI

5. VPOI

POI Recommender

• POI recommendation called location recommendation,

• POI recommendation focuses on

• geographical influence• social correlations• temporal patterns• textual content indications

5. VPOI

Notations

• U = {u1, u2, · · · , un}, L = {l1, l2, · · · , lm}, P = {p1, p2, · · · , pN}: the set of users. locations and photos

• X ∈ Rn×m : user-POI check-in matrix , Xij = freq. or rating of ui on lj

• R ∈ Rn×m : normalized version of X

Rij = g(Xij), g(x) =1

1 + exp−1

• Pui : the set of images uploaded by user i

• Plj : the set of images that are tagged lj

5. VPOI

Basic POI Recommender

• Probabilistic Matrix Factorization (PMF)

• POI recommender is one class CF, where only positive sample aregiven.

• Condi. dist. over observed ratings is

P(R | U,V, σ) =n∏

i=1

m∏j=1

[N(Rij | uTi vj , σ2)]Yij

,where U ∈ RK×n and V ∈ RK×m are the latent feature matrices ofusers and POIs, respectively.Y : indicator matix (Yij = 1 if Rij > 0 and 0 o.w )

5. VPOI

Basic POI Recommender

• User-Check-in data Model is

P(U,V | R) =n∏

i=1

N(ui | 0, σ2uI )m∏j=1

N(vj | 0, σ2v I )

n∏i=1

m∏j=1

[N(Rij | uTi vj , σ2)]Yij .

5. VPOI

Extracting and Modeling

• VGG16 model is choosen.

• For an input image pk , the visual contents are the output of VGG16.We denote it as cnn(pk) .

Figure: The architecture of VGG16 model

5. VPOI

Extracting and Modeling

• Prob. that ps belongs to ui :

P(fis = 1 | ui , ps) =exp(ui · P · CNN(ps))∑

pk∈P exp(uTi · P · CNN(pk))

, where P ∈ RK×d is the interaction marix between the visualcontents and latent user features.fis denotes if ps is posted by ui or not.

• By maximizing P(fis = 1 | ui , ps) for ps ∈ Pui , we force ui to besimilar to the visual contents.

5. VPOI

Extracting and Modeling

• Prob. that pt associated with lj :

P(gjt = 1 | lj , pt) =exp(vTi ·Q · CNN(pt))∑

pk∈P exp(vTj ·Q · CNN(pk))

, where Q ∈ RK×d is the interaction marix between the visualcontents and latent POI features.gjt denotes if pt is associated with lj or not.

• By maximizing P(gjt = 1 | lj , pt) for pt ∈ Pvj , we force vj to besimilar to the visual contents.

5. VPOI

Extracting and Modeling

• The image features :

P(F ,G | P,U,V,P,Q)

= [n∏

i=1

∏ps∈Pui

P(fis = 1 | ui , ps)] · [m∏j=1

∏pt∈Plj

P(gjt = 1 | lj , pt)]

,where F = {fis : ps ∈ Pui , ∀ui ∈ U} and G = {gjt : pt ∈ Plj , ∀lj ∈ L}

5. VPOI

VPOI Framework

maxU,V,P,Q,CNN

P(U,V,P,Q | R,F ,G,P)

• The Posterior Dist. is

P(U,V,P,Q | R,F ,G,P)

∝ P(R,F ,G | U,V,P,Q,P)P(U,V,P,Q | P)

= P(R | U,V)P(F ,G | P,U,V,P,Q)P(P)P(Q)P(U)P(V)

5. VPOI

VPOI Framework

• VPOI Framework can be written as

maxU,V,P,Q,CNN

− ‖ Y � (R−UTV) ‖2F −λ1(‖ U ‖2F + ‖ V ‖2F )

+αn∑

i=1

∑pk∈Pui

logP(fik = 1 | ui , pk)− λ2 ‖ P ‖2F

+αm∑j=1

∑pk∈Pvj

logP(gjk = 1 | vj , pk)− λ2 ‖ Q ‖2F

,where λ1 = σ2

σ2u

== σ2

σ2v, λ2 = σ2

σ2p

= σ2

σ2q

and α = 2σ2. � is the

Hadamard product.

5. VPOI

Algorithm

Figure: The architecture of VGG16 model

5. VPOI

Result

• VPOI outperforms representative state-of-the-art POI recommendersystems.

• The proposed framework alleviates the cold-start problem forrecommendation by incorporating images.

Outline

1. Attention based CNN

2. Personalized CNN (CNN-PerMLP)

3. Deep Coperative Neural Network (DeepCoNN)

4. Convolutional Matrix Factorization (ConvMF)

5. CNN for Image Feature Extraction(VPOI)

6. CNN for Audio Feature Extraction(WMF)

7. CNN for Text Feature Extraction

6. CNN for Audio Feature Extraction

Deep Content-based Music recommendation (Van et al. 2013)

• We propose to use a latent factor model for recommendation, and thelatent factors from music audio when they cannot be obtained fromusage data.

6. CNN for Audio Feature Extraction(WMF)

Weighted Matrix Factorization(WMF)

• The Taste Profile Subset contains play counts per song and per user.

• To learn latent factor representations of all users and items, we useWMF.

• rui : play count for user u and song i

• Define a preference and confidence variables

pui = I (rui > 0),

cui = 1 + αlog(1 + ε−1rui ).

• Assume the user enjoys the song, if pui = 1.

• cui measures how certain we are about this particular preference.

6. CNN for Audio Feature Extraction

Weighted Matrix Factorization(WMF) (Kim et al. 2016)

• WMF objective function :

minx∗,y∗

∑u,i

cui (pui − xTu yi )2 + λ(

∑u

‖ xu ‖2 +∑i

‖ yi ‖2)

,where xu is the latent factor vector for user u, and yi is the latentfactor vector for song i

• It consists of a confidence-weighted MSE and an L2 regularizationterm.

• ALS optimization method is used.

6. CNN for Audio Feature Extraction

Predictingl latent factors from music audio

• Regression problem

• Two methods (to convert music audio signals into a fixed-sizerepresentation):

• Bag-of-words representation• deep CNN

6. CNN for Audio Feature Extraction

Objective functions

• yi : the latent factor vector for song i , obtained with WMF

• y ′i : the corresponding prediction by the model

• Minimize MSE :minθ

∑i

‖ yi − y ′i ‖2

• Minimize WPE(weighted prediction error) :

minθ

∑u,i

cui (pu i − xTu y ′i )2

6. CNN for Audio Feature Extraction

Result

• Predicting latent factors from music audio is a viable method forrecommending new and unpopular music.

• Deep CNN significantly outperforming the traditional approaches.

Outline

1. Attention based CNN

2. Personalized CNN (CNN-PerMLP)

3. Deep Coperative Neural Network (DeepCoNN)

4. Convolutional Matrix Factorization (ConvMF)

5. CNN for Image Feature Extraction(VPOI)

6. CNN for Audio Feature Extraction(WMF)

7. CNN for Text Feature Extraction

7. CNN for Text Feature Extraction

e-Learning Resources Recommendation (Shen et al. 2016)

• Automatic Recommendation Technology for e-Learning Resourceswith CNN

• Text information : the course introduction or the classroom content,the abstract or full content of the learning resources.

• CNN can be used to predict the latent factors from the textinformation .

• We predict the rating scores between students and learning resources.

7. CNN for Text Feature Extraction

Architecture

Figure: The architecture of the recommendation algorithm

7. CNN for Text Feature Extraction

Training process

• Language model is employed for the input of CNN.

• LFM(Latent Factor Model) is employed for the output of CNN.

• CNN bridges the semantic gap between text information and thevectors of latent factors.

7. CNN for Text Feature Extraction

Recommendation process

• CNN : the input text information →the features of the learningresource

• We combine it with the student’s preferences

• The rating score between a student and a learning resource can bepredicted.

7. CNN for Text Feature Extraction

Model

• The CNN can be used to predict the latent factors from the textinformation.

• Input is achieved by language model according to the textinformation

• Output is solved by latent factor model from the historical ratingscores data

7. CNN for Text Feature Extraction

Model - CNN• four layers of CNN

• convolutional layer with multiple feature maps.• a mean-over-time pooling layer• an over-time convolutional layer• fully connected layer

Figure: The Construction of CNN

7. CNN for Text Feature Extraction

Model - CNN . 1) convolutional layer

• xi ∈ Rk : k-dim word representation of i-th word

• x = [x1, x2, · · · , xn] ∈ Rk

ci = f (w · xi + b)

, where w ∈ Rk is a filter, b ∈ R is a bias and f is a non-linear ftn.

• Feature Map :c = [c1, c2, · · · , cn] ∈ Rn

7. CNN for Text Feature Extraction

Model - CNN . 2) mean-overtime pooling layer

• We apply a mean-overtime region pooling operation over the featuremap.

• Pooling Operation in λ regions

bi = max{c(i−1)×(n/λ)+1, · · · , ci×(n/λ)), i ∈ [1, λ]

b = [b1,b2, · · · ,bλ]

7. CNN for Text Feature Extraction

Model - CNN . 3) convolutional layer

• Feature value :a = f (w · b + b)

, where w ∈ Rλ is a filter, b ∈ R is a bias and f is a non-linear ftn.

• The process extracts one feature from one filter. The model usesmultiple filters to obtain multiple features.

7. CNN for Text Feature Extraction

Model - CNN . 4) Fully Connected Layer

• Input : The features from previous layer.

• Output is the predicted latent factors

• The process extracts one feature from one filter. The model usesmultiple filters to obtain multiple features.

7. CNN for Text Feature Extraction

Model - CNN

• Minimize the mean squared error (MSE) of the predictions

arg minw,b

∑i

‖ y′i − yi ‖2

,where y′i is the latent factor vector for article i and yi is the outputof CNN.

7. CNN for Text Feature Extraction

Model - LFM

• The LFM results represent the features of students’ preferences andlearning resources.

Figure: The Process of LFM

7. CNN for Text Feature Extraction

Model - LFM L1R

• We proposed a modified matrix factorization method with L1 normbased regularization.

J(U,V) =∑ij

(Ui∗ · V∗j − rij)2 + λ1 ‖ U ‖1 +λ2 ‖ V ‖1

• U : the relationship between the students and the latent factors

• V : the relationship between the learning resources and the latentfactors

• rij : the rating score that made by i-th student to the j-th learningresource

• To minimize it, the split Bregman iteration method is used.

7. CNN for Text Feature Extraction

Model - Language Model

• Topic Model is employed.

• The Latent Dirichlet Allocation (LDA) method is used to train thetopic model.

7. CNN for Text Feature Extraction

Result

• It achieves significant improvements over conventional methods.

• It can also work well when the existing recommendation algorithmssuffer from the cold-start problem.

top related