trailmix recsys2018 43 -...
TRANSCRIPT
![Page 1: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/1.jpg)
Xing Zhao, Qingquan Song, James Caverlee and Xia HuDepartment of Computer Science and Engineering
Texas A&M University, USA
1
![Page 2: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/2.jpg)
Dataset Statistics
Items Quantity Proportion
Playlists 1,000,000
Unique Tracks 2,262,292 100%
Unique tracks (freq ≥ 5) 599,341 96.05% Unique tracks (freq ≥ 100) 70,229 80.67%
Unique albums 734,684Unique artists 295,860
4
Track Appeared Times in Training Data1 5 10 100 1000 10000 40000
Num
ber o
f Rem
aini
ng T
rack
s
#106
0
0.5
1
1.5
2
2.5
Cum
sum
Tak
ing
Up
of P
ositi
ve S
ampl
es
0
0.2
0.4
0.6
0.8
1
Therefore, in some part of our methods, weonly consider these tracks for training.
![Page 3: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/3.jpg)
Our Method - TrailMix
DNCF
C-Tree
CC-Title
5
PlaylistContinuation:For Task 2 to
10
Cold Start: ForTask 1
![Page 4: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/4.jpg)
3 7
5 21
3 43
6 81
8 32
7
13 14
6 5
Tracks (2,262,292)
Words
(9,817)
Word list 1:Track list 1
Word list 2:Track list 2
Word list 3:Track list 3
…
…
…Clu
ster
Recommend
New title: e.g. Pop Punk 2018 Summer
Wordlist Tracks
Word list Tracks
Word list Tracks
Word list Tracks
Word list Tracks
Normalize
Pre-process
…
6
CC-Title: Context Clustering using Title
i
j
Track i is existed in 6playlists whose titlecontain word j
![Page 5: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/5.jpg)
Items Quantityunique titles 92,944
unique normalized titles 17,381 unique non-stop
normalized words 9,817
playlist without title after processing 22,921
7
Steps:1. Preprocessing: stemming, stop words,
emoji, punctuation, etc.2. Building word-track matrix of size
9817 x 2,262,2923. Normalizing cells using ‘IDF’4. Clustering words based on row
similarity5. Recommend tracks in each cluster for
new title
CC-Title: Cont.
![Page 6: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/6.jpg)
8
Highlight:1. CC-Title could deal with large scale of matrix
computation with high efficiency.2. In some cases (clusters), the performance is
very good.
CC-Title: Cont.
![Page 7: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/7.jpg)
Pros: 1. Simple and Generic 2. Ensemble the advantages of basic matrix factorization model and MLP.
Cons:Computationally not efficient tobe directly applied on the targetproblem due to the huge itemscope and the matrix sparsity.
DNCF: Decorated Neural Collaborative Filtering
9
He et al. , “Neural Collaborative Filtering”. WWW, 2017.
Neural Collaborative Filtering
![Page 8: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/8.jpg)
DNCF: Cont.
10
Two modifications to address efficiency issue:
Training Phase: Constrained Negative Sampling.
Testing Phase: Constrained Recommendation with Reordering.
![Page 9: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/9.jpg)
2. Positive samples remain the wholedataset during training to protect thefeasible embedding and prediction ofall the testing data. (Task 2-10)
11
1. Constrain the negative samplingspace to the space of the tracksappearing equal to or more than 100times in the training data.
Track Appeared Times in Training Data1 5 10 100 1000 10000 40000
Num
ber o
f Rem
aini
ng T
rack
s
#106
0
0.5
1
1.5
2
2.5
Cum
sum
Tak
ing
Up
of P
ositi
ve S
ampl
es
0
0.2
0.4
0.6
0.8
1
Training Phase: Constrained Negative Sampling.
DNCF: Cont.
![Page 10: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/10.jpg)
2. Reorder the predicted 500 tracks with an ensemble trick leveraging two types of predictions provided by the Word2Vec embedding.
12
1. Constrain the recommendation space by only recommending the popular tracks (>=100 times) during testing phase towards a more targeted prediction.
Testing Phase: Constrained Recommendation with Reordering.
DNCF Word2Vec (1) Word2Vec (2)
L1 L2 L3
φ1 φ2 φ3
φ1 \ L1 ∪ L2 ∪ L3
DNCF: Cont.
![Page 11: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/11.jpg)
13
Highlight:1. Results steadily increase with maximum performance at seed 25;2. It performs better for playlists with random seeding tracks (R) than
sequential seeding tracks;
DNCF: Result
![Page 12: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/12.jpg)
C-Tree: Constructed Tree
A Playlist is:1. Natural tree-structure: A playlist
consists of different tracks ,andthese tracks always belong to a specific album of an artist;
2. Meaningful Cluster: A list of tracks in a specific playlist always have latent similarity, such as genres, style, listening sense, etc.
14
Phylogenetic Tree.(Source: https://www.creative-biostructure.com/custom-
phylogenetic-tree-construction-service-399.htm)
![Page 13: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/13.jpg)
15
A Real Example (PID: 11548):• Playlist Title: Pop Puck• 48 tracks belongs to 12albums by 5 artists (2 rockbands and 3 pop punkbands)
Pop punk band
Rock bandHow do we compare theinternal relationship?How do we compare itwith another tree(external)?
C-Tree: Cont.
![Page 14: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/14.jpg)
16
Training Data: Complete Tree Testing Data: Incomplete Tree
External comparisonIncomplete Tree: A playlistonly contains partial oftracks (seed), which is
waiting for recommending.
C-Tree: Cont.
![Page 15: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/15.jpg)
Steps:
1. Building Forest: 1 millioncomplete trees;
2. Comparing and normalizing thedistance between theincomplete tree T-test andcomplete tree T-train;
3. Recommending the tracks(leaves) from each T-train to theincomplete tree T-test, based onthe score of each leaf.
17
Playlist 1
Playlist 2
Playlist 3
Playlist 4
Playlist n
…
C-Tree: Cont.
![Page 16: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/16.jpg)
18
C-Tree: Result
Highlight:1. Results steadily increase with maximum performance at seed 25;2. It performs better for playlists with random seeding tracks (R) than
sequential seeding tracks;
![Page 17: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/17.jpg)
TrailMix: Ensemble Model
CC-Title
FinalRecommendation
ADNCF BDNCF
AC-Tree BC-Tree
Num_handout
Metho
d1
Metho
d2
19
![Page 18: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/18.jpg)
Experiment and Result
Experiment Setting:• Training 80%, testing 20%: cross-validation for hyperparameter tuning• Testing data strictly follows the rules designed byRecSys 2018
20
![Page 19: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/19.jpg)
Thank you!
21
![Page 20: TrailMix RecSys2018 43 - people.tamu.edupeople.tamu.edu/~zhaoxing623/slides/TrailMix_RecSys2018_43.pdf · DatasetStatistics Items Quantity Proportion Playlists 1,000,000 UniqueTracks](https://reader034.vdocuments.net/reader034/viewer/2022051807/60049b3df140345a334fc5dd/html5/thumbnails/20.jpg)
Q&A
22