cross-domain recommender system through tag …...cross-domain recommender system through tag-based...
TRANSCRIPT
Cross-domain Recommender
System Through Tag-based
Models
Peng Hao
A thesis submitted for the Degree of
Doctor of Philosophy
Faulty of Engineering and Information Technology
University of Technology SydneyMarch 2018
CERTIFICATE OF
AUTHORSHIP/ORIGINALITY
I certify that the work in this thesis has not previously been submitted for a
degree nor has it been submitted as part of requirements for a degree except as
fully acknowledged within the text.
I also certify that the thesis has been written by me. Any help that I have received
in my research work and the preparation of the thesis itself has been acknowledged.
In addition, I certify that all information sources and literature used are indicated
in the thesis.
Signature of candidate:
Date:
05/04/2018
Production Note:
Signature removed prior to publication.
Acknowledgements
This thesis is the result of four years hard work, during which I received a lot of
help from many people. So in there I would like to show my gratitude to all of
them.
First, I would like to thank my principle supervisor, A/Prof. Guangquan
Zhang, for offering me an opportunity to conduct my research in the Decision
Systems and e-Service Intelligence (DeSI) Lab, University of Technology Sydney
(UTS), Australia. I can still remember the excitement I had when I got the official
offer from UTS. At the beginning of the research, he encouraged me to choose
the topic that I am interested in. He also taught me how to approach a research
problem in general, and was always enthusiastic to help solve the difficulties in
my life. I would also like to express my thanks to my co-supervisor, Distinguished
Professor Jie Lu. I have learned a lot from her over these years, not only the
methodology of doing research but also the skills of writing a scientific paper.
Her comments and suggestions have strengthened this thesis significantly. Her
continuous hard work and generous personality have influenced me deeply, and
will be a great treasure in my future research and work. I feel very lucky to
have both of them as my supervisors. Without their excellent supervision and
continuous encouragement, this research could not be finished on time.
Special thanks also goes to Prof. Luis Martinez for welcoming me and providing
all the necessary help during my visit to the University of Jaen. I had spent a
wonderful time with the collaboration of him. The beautiful scene and quiet life
have attracted me to visit there again in the future.
I would also express my appreciation to all the members of the DeSI Lab in the
Centre of Artificial Intelligence (CAI), for their active participation and valuable
comments in every presentation I made during my study.
I also wish to thank the financial support I received from the China Scholarship
Council (CSC) and UTS, which support me to finish my study.
Finally, I would like to thank my family members, including my wife, father,
mother and mother-in-law. Without their great love, conscious encouragement
and infinite compassion, my dream of pursuing a Ph.D. degree will not be achieved.
Very special thanks to them all!
iii
Abstract
Nowadays, data pertaining to clients are generated at such a rapid rate it is
completely beyond the processing ability of a human, which leads to a problem
called information explosion. How to quickly and automatically provide person-
alized choices for someone from a large collection of resources has become a key
factor in determining the success of many commercial activities. In this context,
recommender systems have been developed as a type of software that aims to
predict and suggest items which are relevant to a specific user by analyzing the
user’s previous interaction data with certain items. Recommender systems have a
broad application in our daily life, such as product recommendation in Amazon,
video and movie recommendation in Youtube, music recommendation in Spotify.
A fundamental brick in building most recommender systems is the collaborative
filtering-based model, which has been widely adopted due to its outstanding
performance and flexible deployment. However, this model together and its
variations suffer from the so-called data sparsity problem, which results when
user sonly rate a limited number of items. With the development of the transfer
learning technique in recent years, cross-domain recommendation has emerged as
an effective way to address data sparsity in recommender systems. The principle
of cross-domain recommendation is to exploit knowledge from auxiliary source
domains to assist recommendation making in a sparse target domain.
In the development of cross-domain recommender systems, the most important
step is to build a bridge between the domains in order to transfer knowledge. This
task becomes more challenging in disjoint domains where users and items in both
domains are completely non-overlapping. In this respect, tags are studied and
utilized to establish explicit correspondence between domains. However, how to
effectively exploit tags to increase domain overlap and ultimate recommendation
quality remains as an open challenge which needs to be addressed.
This thesis aims to develop novel tag-based cross-domain recommendation
models in disjoint domains. First, it review the existing state-of-the-art techniques
related to this research. It then provides three solutions by exploiting domain-
specific tags, tag-inferred structural knowledge and tag semantics, respectively.
To evaluate the proposed models, this thesis conducts a series of experiments
on public datasets and compare them with state-of-the-art baseline approaches.
The experimental results show the superior performance achieved by our models
in different recommendation tasks under sparse settings. The findings of this
research not only contribute to the state-of-the-art on cross-domain recommender
systems, but also provide practical guidance for handling unstructured tag data
in recommendation tasks.
v
Table of contents
CERTIFICATE OF AUTHORSHIP/ORIGINALITY i
Acknowledgements ii
Abstract iv
List of figures xi
List of tables xiii
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research questions and objectives . . . . . . . . . . . . . . . . . . 5
1.3 Research contributions . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Publications related to this thesis . . . . . . . . . . . . . . . . . . 15
2 Research literature 17
2.1 Recommender systems . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.1 Recommendation problem . . . . . . . . . . . . . . . . . . 18
TABLE OF CONTENTS
2.1.2 Classification of recommender systems . . . . . . . . . . . 20
2.1.2.1 Collaborative filtering . . . . . . . . . . . . . . . 22
2.1.2.2 Content-based recommender systems . . . . . . . 25
2.1.2.3 Hybrid recommender systems . . . . . . . . . . . 27
2.1.2.4 Context-aware recommender systems . . . . . . . 27
2.1.2.5 Deep learning based recommender systems . . . . 28
2.2 Transfer learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.1 Definition of transfer learning . . . . . . . . . . . . . . . . 31
2.2.2 Classification of transfer learning techniques . . . . . . . . 32
2.2.2.1 Inductive transfer learning . . . . . . . . . . . . . 32
2.2.2.2 Transductive transfer learning . . . . . . . . . . . 34
2.2.2.3 Unsupervised transfer learning . . . . . . . . . . 36
2.3 Cross-domain recommender system . . . . . . . . . . . . . . . . . 37
2.3.1 Definition of cross-domain recommender system . . . . . . 38
2.3.2 Classification of cross-domain recommendation approaches 39
2.3.2.1 Cross-domain recommendation for partially/fully
overlapping domains . . . . . . . . . . . . . . . . 40
2.3.2.2 Cross-domain recommendation for non-overlapping
domains . . . . . . . . . . . . . . . . . . . . . . . 41
3 Exploiting Domain Specific Tags for Cross-domain Recommen-
dation 45
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Preliminary knowledge . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Enhanced tag-induced cross domain collaborative filtering . . . . 49
vii
TABLE OF CONTENTS
3.3.1 The alignment of domain-specific tags . . . . . . . . . . . . 49
3.3.2 Cross-domain similarities refinement . . . . . . . . . . . . 52
3.3.3 Model and inference . . . . . . . . . . . . . . . . . . . . . 53
3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4.1 Description of dataset and experimental settings . . . . . . 58
3.4.2 Impact of parameters . . . . . . . . . . . . . . . . . . . . . 60
3.4.3 Performance comparison . . . . . . . . . . . . . . . . . . . 60
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4 Exploiting Tag-induced Structural Information for Cross-domain
Recommendation 68
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3 Complete Tag Induced Cross-domain Recommendation . . . . . . 71
4.3.1 Step 1: Building basic inter-domain correlations using shared
tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3.2 Step 2: Enhancing inter-domain correlations using domain-
specific tag clusters . . . . . . . . . . . . . . . . . . . . . . 76
4.3.3 Step 3: Inferring intra-domain correlations from tags in
individual domains . . . . . . . . . . . . . . . . . . . . . . 81
4.3.4 Step 4: Aggregation and Integration of Inter- and intra-
domain knowledge . . . . . . . . . . . . . . . . . . . . . . 84
4.4 Complexity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
viii
TABLE OF CONTENTS
4.5.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . 90
4.5.2.1 Evaluation Methodology . . . . . . . . . . . . . . 90
4.5.2.2 Evaluation Metric . . . . . . . . . . . . . . . . . 92
4.5.2.3 Experimental Protocol . . . . . . . . . . . . . . . 93
4.6 Parameter Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.7 Impact of latent factors . . . . . . . . . . . . . . . . . . . . . . . . 95
4.8 Sensitivity analysis on Top-k Recommendation . . . . . . . . . . . 98
4.9 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . 103
4.10 Performance under Different Sparsity Level . . . . . . . . . . . . . 106
4.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5 Exploiting Tag Semantic for Cross-domain Recommendation 110
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.2.1 Notations and Problem Formulation . . . . . . . . . . . . 112
5.2.2 Tag-induced Cross Domain Collaborative Filtering Model . 114
5.3 Tag Semantically-boosted Cross-domain Recommendation . . . . 115
5.3.1 Joint Topic Mining . . . . . . . . . . . . . . . . . . . . . . 116
5.3.2 Topic Alignment . . . . . . . . . . . . . . . . . . . . . . . 121
5.3.3 Embedding space Learning . . . . . . . . . . . . . . . . . . 123
5.3.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.4 Experiments and analysis . . . . . . . . . . . . . . . . . . . . . . 127
5.4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.4.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . 128
5.4.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . 130
ix
TABLE OF CONTENTS
5.4.4 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.4.5 Experiment Results and Analysis . . . . . . . . . . . . . . 133
5.4.5.1 The Effect of Regularization Parameters . . . . . 133
5.4.5.2 The Effect of Tag Clusters . . . . . . . . . . . . . 136
5.4.5.3 Comparison with the Baselines . . . . . . . . . . 136
5.4.5.4 The Impact of Recommendation List Size . . . . 138
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6 Conclusions and future work 142
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Bibliography 147
Abbreviations 188
x
List of figures
1.1 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 A graphical illustration of the recommendation problem . . . . . . 19
3.1 A scenario for tag-based cross-domain recommendation. In this fig-
ure, we aim to exploit knowledge from a movie domain to bootstrap
book recommendation. The unobserved rating score is denoted by
? and each tag text starts with #. . . . . . . . . . . . . . . . . . . 48
3.2 MAE and RMSE variations via changing α and β . . . . . . . . . 61
4.1 Workflow and components of CTagCDR model. . . . . . . . . . . 74
4.2 Example of tag tripartite graph constructed based on user-tag
relationship. Red squares denote shared tags in both domains,
while the green triangles and blue circles denote the filtered domain-
specific tags from both source and target domains, respectively.
The edge weight reflects the similarity between the connected tags. 80
4.3 Impact of λu and λv on the recommendation performance of CTagCDR 96
4.4 Impact of λα on the recommendation performance of CTagCDR . 97
xi
LIST OF FIGURES
4.5 Performance of RMSE and NDCG@10 on LT vs ML and ML vs
LT w.r.t. the number of latent factors . . . . . . . . . . . . . . . . 99
4.5 Performance of RMSE and NDCG@10 on FM vs LT and FM vs
ML w.r.t. the number of latent factors . . . . . . . . . . . . . . . 100
4.5 Performance of RMSE and NDCG@10 on LT vs FM and ML vs
FMw.r.t. the number of latent factors . . . . . . . . . . . . . . . . 101
4.6 Performance of NDCG@k w.r.t. the ranking position k of ranking list102
4.7 Change of recommendation performance on ML vs LT during the
increment of train data size . . . . . . . . . . . . . . . . . . . . . 107
5.1 An example of ambiguous, redundant and non-identical but seman-
tically equivalent tags . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.2 Graphical illustration of joint topic mining and topic alignment . 117
5.3 Modelling tagging data for word2vec. The tag marked by red color
denotes overlapping tag in both domains. . . . . . . . . . . . . . . 124
5.4 Performance of HR@10 and NDCG@10 w.r.t λu and λi . . . . . . 134
5.5 Performance of HR@10 and NDCG@10 w.r.t number of tag clusters135
5.6 Performance of top-N recommendation in terms of HR@N where
N ranges from 10 to 50 . . . . . . . . . . . . . . . . . . . . . . . . 139
5.7 Performance of top-N recommendation in terms of NDCG@N where
N ranges from 10 to 50 . . . . . . . . . . . . . . . . . . . . . . . . 140
xii
List of tables
3.1 Statistics of the datasets used in Chapter 3 . . . . . . . . . . . . . 59
3.2 MAE comparison with other baselines (mean ± std) . . . . . . . . 64
3.3 RMSE comparison with other baselines (mean ± std) . . . . . . . 64
4.1 Notations and corresponding descriptions used in Chapter 4 . . . 72
4.2 Statistics of datasets used in Chapter 4 . . . . . . . . . . . . . . . 90
4.3 Overall performance on six domain pairs . . . . . . . . . . . . . . 104
5.1 Symbols and corresponding descriptions used in Chapter 5 . . . . 113
5.2 The tag filtering quality constraints (Gedikli and Jannach, 2013) . 129
5.3 Dataset Variations for ML, LT and FM . . . . . . . . . . . . . . 131
5.4 Comparison of TSCDR with other baselines . . . . . . . . . . . . 137
xiii
Chapter 1
Introduction
This chapter presents the introduction of the thesis. Section 1.1 describes the
background for conducting this research. Section 1.2 presents the research ques-
tions and the corresponding objectives we aim to achieve. Section 1.3 highlights
the contributions of this research. Section 1.4 introduces the structure of the
thesis and Section 1.5 lists the publications related to this thesis.
1.1 Background
On 6 August 1991, 26 years ago, the World Wide Web (WWW) became publicly
available. After several decades of development, the Internet has now become
the main way by which people around the world share information globally.
However, everything has two sides. The Internet itself offers great convenience
in terms of accessing various sources of information, but the increasing amount
of complex and heterogeneous data generated every second has become a serious
burden on human processing abilities. To address the problem of information
1
1.1 Background
explosion, search engine (Brin and Page, 2012; Page et al., 1999) and recommender
systems (Adomavicius and Tuzhilin, 2005; Beel et al., 2016; Lu et al., 2015b; Ricci
et al., 2011) have been developed as two alternative solutions. As opposed to a
search engine which searches and identifies items in a database according to the
keywords or characters specified by a user, a recommender system is a software
tool that generates recommendations to users based on their past preferences.
The goal of recommendation systems is to provide the right information on
products/services to the right customers at the right time. This can be achieved
by automatically filtering out unrelated products and suggesting only relevant ones.
There are many successful applications of recommender systems. Representative
examples include but are not limited to: product recommendation in E-commerce
websites (e.g., Amazon, eBay), movie and video recommendation (e.g., YouTube),
news recommendation (e.g., Yahoo), music recommendation (e.g., Spotify), social
recommendation (e.g., Facebook), App recommendation (e.g., Apple).
In general, existing recommendation techniques are mainly classified into three
categories, collaborative filtering (Koren and Bell, 2015; Sarwar et al., 2001; Su
and Khoshgoftaar, 2009), content-based (Lops et al., 2011; Mooney and Roy, 2000;
Pazzani and Billsus, 2007) and hybrid (Burke, 2002, 2007; Ghazanfar and Prugel-
Bennett, 2010) recommendation. Collaborative filtering (CF) is the most successful
and widely used technique for building a recommender system. It helps people
make choices based on the opinions of other people who share similar interests. The
content-based (CB) recommendation technique recommends items that are similar
to the ones preferred by a specific user in the past. Each recommendation technique
has its own merits and drawbacks. Therefore, hybrid recommendation is proposed
to gain higher performance and to avoid the drawbacks of pure recommendation
2
1.1 Background
techniques. The most common practice in hybrid recommendation is to combine
CF with other recommendation techniques in an attempt to solve cold-start,
sparseness and/or scalability problems (Kim et al., 2006).
Although great progress has been achieved in the area of recommender systems,
it is restricted to offering recommendations of items belonging to only one single
domain. There is a strong demand for joint recommendation in our daily life. For
example, instead of suggesting similar movies to a user when he browses a movie
in Netflix, other types of items provided by different websites such as music, books,
and video games somehow related to the movie, should be recommended to the
user as well. It would be beneficial to exploit user preferences in different domains
in order to build a general model to better capture user interests. Furthermore, it
is well known that the data sparsity problem, which is caused by the fact that users
generally rate a limited number of items, has posed a key challenge for CF-based
recommender systems (Hwangbo and Kim, 2017; Sarwar, 2001). Analyses show
that there could be dependencies and correlations between user preferences in
different domains, user knowledge acquired in one domain can be transferred and
exploited in several other related domains. The data sparsity problem together
with the desire to offer joint recommendations give us a strong motivation to
develop novel recommendation techniques.
In this context, the cross-domain recommender system has received much atten-
tion from both research and industry communities in the last few years (Cremonesi
et al., 2011; Fernández-Tobías, 2017; Fernández-Tobías et al., 2012; Khan et al.,
2017; Li, 2011). It can be considered as a practical application of transfer learning
techniques (Lu et al., 2015a; Pan and Yang, 2010; Weiss et al., 2016) in the field
of recommender systems, which aims to exploit existing knowledge from auxiliary
3
1.1 Background
source domains to facilitate recommendation making in a sparse target domain.
As shown in (Fernández-Tobías, 2017), a major challenge in the development of
cross-domain recommender systems is how to identify the correspondence between
the involved domains in order to build a bridge to support knowledge transfer. If
the bridge is not built correctly, insufficient transfer of positive knowledge and at
the same time the occurrence of negative knowledge transfer (Pan and Yang, 2010)
would be the two serious consequences leading to the decline of recommendation
performance.
One intuitive solution for addressing the above challenge has been explored by
assuming users or items are fully or partially shared in both source and target
domains (Jiang et al., 2016; Pan et al., 2011a; Shi et al., 2013b; Singh and Gordon,
2008), so that the knowledge can be transferred through those overlapping users or
items. However, due to the privacy settings in different companies or platforms, it is
not common to have overlapping users and items between heterogeneous domains.
With respect to the situation of disjoint domains, where the correspondence
between users and between items is not known in advance, most existing research
only exploits user preferences in the form of numerical ratings to learn an implicit
rating pattern to share between domains (Gao et al., 2013; Li et al., 2009a,b,
2015). The rating pattern represents the average ratings that user group would
give on the item group. In addition to linking heterogeneous domains by implicit
rating patterns, rich side information concerning users and items are exploited
to build an explicit domain link (Pan, 2016; Tan et al., 2014), such as through
tags (Fang et al., 2015; Fernández-Tobías and Cantador, 2014; Shi et al., 2011),
and reviews (Cao et al., 2017; Chakraverty and Saraswat, 2017; Song et al., 2017).
4
1.2 Research questions and objectives
Explicit domain links built by tags are more effective than implicit rating
patterns in bridging heterogeneous domains (Shi et al., 2011), but the only efforts
devoted to this direction is to use overlapping tags to align different feature
space (Fang et al., 2015; Fernández-Tobías and Cantador, 2014; Shi et al., 2011).
Additional information in tagging data needs to be explored and integrated into
cross-domain recommendation, such as: 1) the distinct information encoded in
domain specific tags; 2) the knowledge structure inferred by tags and 3) tag
semantics. Therefore, this thesis fills the gap by developing a set of novel tag-
based cross-domain recommendation models, which exploit the aforementioned
information to improve recommendation performance.
1.2 Research questions and objectives
The purpose of this research is to use user-generated tags to automatically establish
the correspondence between heterogeneous domains, so that knowledge from
the source domain can be transferred to the target domain with determined
correspondence. To this end, this research will answer the following questions:
Research Question 1: How to integrate domain specific tags into the framework
of cross-domain recommendation?
To eliminate domain heterogeneity, overlapping tags are generally exploited as
common features to align different domains. However, there are two drawbacks in
the above approach. First, only a limited number of overlapping tags are shared
between heterogeneous domains (Jiang et al., 2016). In this context, a weak
domain correlation will be established because most of the users and items in
both domains are not covered by the overlapping tags. As a result, suboptimal
5
1.2 Research questions and objectives
results will be achieved due to inadequate knowledge transfer between domains.
Second, it is a waste to discard abundant domain-specific tags, which correspond
to the unshared parts of tags in the individual domains. Moreover, domain-specific
tags are capable of capturing the unique features of individual domains compared
to the generality of overlapping tags. If abundant domain-specific tags can be
integrated into cross-domain recommendation, more knowledge transfer bridges
will be established to promote knowledge transfer.
Research Question 2: How to integrate the structural knowledge inferred by
tags into the framework of cross-domain recommendation?
The basic assumption in transfer learning is that the involved domains may
share a certain knowledge structure, which can be extracted by preserving the
important properties of the original data (Long et al., 2014b). In this respect, intra-
and inter-domain knowledge structure, which reflects the relationship within a
domain and between domains respectively, needs to be extracted as constraints to
regularize knowledge transfer. In terms of inter-domain knowledge structure, the
similarity between cross-domain users and between cross-domain items by profiling
with tags is a valuable source for discovering the direct correspondence between the
involved domains. Furthermore, the intra-domain similarity to alleviate negative
transfer by preserving the geometric structure in each domain should also be
refined with tags. However, there is little research in the literature that exploits
tags to infer both the intra- and inter-domain knowledge structure to improve
recommendation performance.
Research Question 3: How to integrate the semantic information of tags into
the framework of cross-domain recommendation?
6
1.2 Research questions and objectives
Tags are keywords or short phrases that are assigned to items by users. Cap-
tured in these tags is a great deal of information that is highly relevant to the
items. However, the uncontrolled vocabulary used by users has resulted in sparse,
redundant and ambiguous tag information (Gedikli and Jannach, 2013; Yang et al.,
2014). Handling unstructured tagging data to address uncontrolled vocabulary
problem is still an open challenge when exploiting tags to establish an explicit
domain linkage. In this context, semantics from tags can be harvested to correlate
those non-identical but semantically related tags to remove irrelevant information
and noise in tags (Jabeen et al., 2016).
According to the above research questions, this thesis proposes to achieve the
following research objectives:
Research Objective 1: To develop a new tag-based cross-domain recommen-
dation model by exploiting domain specific tags.
This objective corresponds to Research Question 1. Existing studies focus
on utilizing overlapping tags as common features to bridge different domains (Fang
et al., 2015; Fernández-Tobías and Cantador, 2014; Shi et al., 2011), however,
domain specific tags are discarded due to unaligned feature space. A possible
solution to this problem is to adapt the spectral feature alignment algorithm
in (Pan et al., 2010a) to align domain-specific tags from different domains into
unified clusters, with the help of overlapping tags. In this way, the clusters
of domain-specific tags can be used to reduce the gap between heterogeneous
domains, which can be further applied as new features to profile users and items
for the refinement of cross-domain similarity. Therefore, this study will develop a
new cross-domain recommendation model by exploiting domain specific tags to
increase links between heterogeneous domains.
7
1.2 Research questions and objectives
Research Objective 2: To develop a new tag-based cross-domain recommen-
dation model by exploiting tag-inferred structural knowledge.
This objective corresponds to Research Question 2. To promote positive
knowledge transfer and avoid negative knowledge transfer in cross-domain recom-
mendation, both intra- and inter-domain correlations that preserve the geometric
property of a domain and between domains should be learned in order to guide
knowledge transfer. Most studies exploit either overlapping tags (Fang et al.,
2015; Fernández-Tobías and Cantador, 2014; Shi et al., 2011) or domain-specific
tags (Hao et al., 2016) in building an inter-domain correlation. The complemen-
tary role of different types of tags needs to be explored to build a comprehensive
inter-domain correlation. With respect to the intra-domain correlation, tag-based
user similarity (Zhen et al., 2009) that preserves the distance between users has
been seamlessly integrated into the framework of CF to generate more precise
recommendations. However, such an extension suffers from two drawbacks: 1) the
relationship between items and tags has not been studied; 2) the learned intra-
domain correlation only boosts knowledge transfer in a single domain. Therefore,
this study will exploit the relationship between users and tags and between items
and tags to infer both intra- and inter-domain correlations, which are designed to
work together for more positive knowledge transfer.
Research Objective 3: To develop a new tag-based cross-domain recommen-
dation model by exploiting tag semantics.
This objective corresponds to Research Question 3. Some efforts have been
made in relation to tag-aware personalized recommendation using content-based
filtering (Cantador et al., 2010) or CF (Bouadjenek et al., 2013; Tso-Sutter et al.,
2008). However, as users can freely choose their own vocabulary in an arbitrary
8
1.3 Research contributions
language, this has resulted redundant and ambiguous tag information, which is
a further obstacle to the capture of the relationships between cross-domain tags.
In this respect, some studies proposed to utilize auxiliary information sources,
such as ontology (Fernández-Tobías et al., 2011), or a knowledge graph (Yang
et al., 2014), for the semantic matching of cross-domain tags. However, the
success of such approaches heavily depends on the construction of an external
knowledge base. Other solutions are to use auto-encoder (Zuo et al., 2016) or
deep learning technique (Xu et al., 2016) to map a tag-based user/item profile to
an abstract space for abstract matching. Although these methods achieved better
performance, they failed to explore the explicit semantic relationship between the
respective tags of the domains. Therefore, this study will use natural language
processing (NLP) techniques to explicitly determine the semantic matching for
cross-domain tags and integrate the semantic information of tags into cross-domain
recommendation.
1.3 Research contributions
Overall, this thesis makes several contributions as follows:
• The development of an enhanced tag-induced cross-domain collaborative
filtering model by exploiting abundant domain-specific tags.
An enhanced tag-induced cross-domain collaborative filtering model has been
presented in which abundant domain specific tags, not limited overlapping
tags, are utilized to increase the connections between heterogeneous domains.
To align diverse domain-specific tags, spectral clustering together with tag
co-occurrence patterns have been exploited to group domain-specific tags.
9
1.3 Research contributions
Based on the tag clusters, a new user and item profile can be defined
and utilized to compute cross-domain similarity for regularizing knowledge
transfer. The experimental results demonstrate that the proposed model is
capable of establishing a strong domain connection to support knowledge
transfer when the number of overlapping tags is scarce. Furthermore, domain-
specific tags are shown to be beneficial for adding more information about
user preferences into recommendation. Details can be found in Chapter 3
• The development of a complete tag-induced cross-domain recommendation
model by exploiting structural knowledge inferred with tags.
A complete tag-induced cross-domain recommendation model is proposed in
which both inter- and intra-domain correlations are considered as structural
knowledge to promote knowledge transfer. In this model, not only overlap-
ping tags but also domain-specific tags are exploited to play complementary
roles in the establishment of inter-domain correlation. Additionally, intra-
domain similarity between users and between items has also been introduced
by distinguishing the tag distribution in the individual domain, which is
likened to building a compact intra-domain correlation to support knowledge
transfer at a group level. Experiments on three public datasets and with five
state-of-the-art baseline approaches demonstrate that the proposed model
performs well in both rating prediction and item recommendation tasks.
Details can be found in Chapter 4
• The development of a tag semantically-boosted cross-domain recommenda-
tion model by exploiting semantic information of tags.
A tag semantically-boosted cross-domain recommendation model is devel-
10
1.4 Thesis structure
oped which aims to automatically group those non-identical but semantically
related tags to increase the domain overlap and ultimately the recommen-
dation quality. In this model, the word2vec technique is utilized to learn
a semantic representation of tags. Then, semantically equivalent tags are
successfully merged into the same group according to the learned semantic
representation. Derived tag clusters which span across domains has been
exploited as a joint embedding space for aligning heterogeneous domains. By
mapping users and items from both source and target domains to the same
embedding space, similar users and items across domains can be identified
and connected. As a result, knowledge is transferred from the source domain
to the target domain via matched users and items to improve recommenda-
tion performance. Experimental results on multiple datasets demonstrate
that our proposed model outperforms other state-of-the-art baselines in
top-N recommendation task. Details can be found in Chapter 5
1.4 Thesis structure
The content of all the chapters is described in more detail next and the structure
of the whole thesis is shown in Figure 1.1.
• Chapter 2 provides an overview of the state-of-the-art related to this re-
search. In particular, we focus on a survey of three specific fields, which
are recommender systems, transfer learning and cross-domain recommender
systems, respectively. In the introduction to recommender systems, first the
general definition of a recommender system is described and later a formal
formulation of the recommendation problem is given as well. Then a catego-
11
1.4 Thesis structure
rization and description of existing recommendation techniques is presented,
namely content-based, collaborative filtering, hybrid-based, context-aware
and deep learning based recommendations. In the introduction to transfer
learning, a mathematical definition of the transfer learning problem and
the classification of existing approaches based on knowledge transfer type
are provided. Finally, in the introduction of the cross-domain recommender
systems, a general definition of cross-domain recommendation problem is
proposed. Next, a categorization of cross-domain recommendation tech-
niques is presented, distinguishing implicit and explicit knowledge transfer
linkages.
• Chapter 3 develops an enhanced tag-induced cross-domain collaborative
filtering model. In this chapter, abundant domain-specific tags are first
investigated to increase the domain correspondence between heterogeneous
domains. To this end, spectral clustering together with a defined tag co-
occurrence pattern are utilized to group domain-specific tags into clusters,
which can be used as aligned features to reduce the gap between domains.
The empirical results of the proposed model are reported and discussed in
relation to rating prediction task using the well-known MovieLens 1 and
LibraryThing 2 datasets.
• Chapter 4 develops a complete tag-induced cross-domain recommendation
model by exploiting structural knowledge inferred with tags. Instead of
exploiting overlapping tags or domain-specific tags separately in building1MovieLens dataset, https://grouplens.org/datasets/movielens/2LibraryThing dataset, http://www.macle.nl/tud/LT
12
1.4 Thesis structure
Chapter 1Introduction
Chapter 2Literature Review
RQ 1Exploration of
domain specific tags
RQ 2Consolidation of tag-
inferred structural knowledge
RQ 3Exploitation of tag semantics
Chapter 3Exploiting Domain Specific Tags for
Cross-domain Recommendation
Chapter 5Exploiting Tag Semantics for Cross-domain
Recommendation
Chapter 4Exploiting Tag-
inferred structural knowledge for Cross-domain
Recommendation
Chapter 6Conclusions and Future Work
Fig. 1.1 Thesis structure
13
1.4 Thesis structure
an inter-domain correlation, this chapter applies both overlapping tags
and domain-specific tags in the establishment of a comprehensive inter-
domain correlation. In particular, on the basis of a weak domain connection
built by overlapping tags, the clusters of domain-specific tags are further
exploited to profile both users and items in order to refine cross-domain
similarity. This part is similar to the work in Chapter 3. However, to
overcome the problem where the tag cluster needs to be fixed in advance, a
more advanced clustering approach, namely Affinity Propagation (Frey and
Dueck, 2007), is utilized to learn the number of clusters adaptively from
the data. Furthermore, intra-domain correlation in the form of tag-based
user-similarity and item-similarity is also integrated into the framework of
cross-domain recommendation in order to promote more knowledge transfer.
To evaluate the proposed model, extensive experiments are conducted for
both rating prediction and item ranking tasks, with datasets composed of
MovieLens 1, LibraryThing 2 and LastFM 3.
• Chapter 5 develops a tag semantically-boosted cross-domain recommendation
model by exploiting tag semantics. The previous two chapters only consider
the similarity between cross-domain tags at the lexical level, whereas this
chapter uses the word2vec technique to determine an explicit semantic
matching for tags. As a result, tags that are semantically related, but
use different words, are captured and grouped together to eliminate noise
in the tagging data. Specifically, in this chapter, word2vec is first used
to learn a semantic representation of tags. Then, semantically equivalent3LastFM dataset, https://grouplens.org/datasets/hetrec-2011/
14
1.4 Thesis structure
oped which aims to automatically group those non-identical but semantically
related tags to increase the domain overlap and ultimately the recommen-
dation quality. In this model, the word2vec technique is utilized to learn
a semantic representation of tags. Then, semantically equivalent tags are
successfully merged into the same group according to the learned semantic
representation. Derived tag clusters which span across domains has been
exploited as a joint embedding space for aligning heterogeneous domains. By
mapping users and items from both source and target domains to the same
embedding space, similar users and items across domains can be identified
and connected. As a result, knowledge is transferred from the source domain
to the target domain via matched users and items to improve recommenda-
tion performance. Experimental results on multiple datasets demonstrate
that our proposed model outperforms other state-of-the-art baselines in
top-N recommendation task. Details can be found in Chapter 5
1.4 Thesis structure
The content of all the chapters is described in more detail next and the structure
of the whole thesis is shown in Figure 1.1.
• Chapter 2 provides an overview of the state-of-the-art related to this re-
search. In particular, we focus on a survey of three specific fields, which
are recommender systems, transfer learning and cross-domain recommender
systems, respectively. In the introduction to recommender systems, first the
general definition of a recommender system is described and later a formal
formulation of the recommendation problem is given as well. Then a catego-
11
1.5 Publications related to this thesis
2. Lu, J., Behbood, V., Hao, P., Hua Z., Xue S. and Zhang, G. 2015,
‘Transfer learning using computational intelligence: a survey’, Knowledge-
Based Systems, vol. 80, pp. 14-23. (ERA Rank: B)
3. Hao P., Lu, J. and Zhang, G. 2017, ‘Tag Semantically-boosted Cross-
domain Recommendation’, submitted to Decision Support System.
(ERA Rank: A*)
4. Hao, P., Zhang, G. and Lu, J. 2016, ‘Enhancing cross domain rec-
ommendation with domain dependent tags’, in proceedings of the
2016 IEEE International Conference on Fuzzy Systems, Canada, pp.
1266-1273. (ERA Rank: A)
5. Hao, P., Zhang, G., Behbood, V. and Zheng, Z. 2014, ‘A fuzzy domain
adaptation method based on self-constructing fuzzy neural network’, in
proceedings of the 11th International FLINS Conference on Decision
Making and Soft Computing, Brazil, pp. 676-681. (ERA rank: B)
16
Chapter 2
Research literature
This chapter presents a discussion of relevant studies associated with this research.
From the perspective of working principle, cross-domain recommendation is con-
sidered as a practical application of transfer learning in recommender systems
for addressing data sparsity problem. In this chapter, we provide an in-depth
overview of the involved techniques, including recommender systems, transfer
learning and cross-domain recommender systems, respectively. First in Section 2.1,
the recommendation problem and most popular techniques in recommender sys-
tems are introduced. Then Section 2.2 reviews the definition and categorization of
transfer learning techniques. Finally, in Section 2.3 we describe the general formal-
ization of cross-domain recommendation problem and presents the categorization
of cross-domain recommendation approaches.
17
2.1 Recommender systems
2.1 Recommender systems
With the growing amount of information on the internet provided by companies to
meet the needs of customers, it becomes harder for a human person to process large
amount of information gathered around him/her. In this context, recommender
systems (Adomavicius and Tuzhilin, 2005; Beel et al., 2016; Burke, 2002; Fernández-
Tobías, 2017; Lu et al., 2015b; Ricci et al., 2011; Su and Khoshgoftaar, 2009;
Zhang et al., 2017a) are developed as an effective information filtering tool that
helps people discover the most favorite information from a large space of options.
Recommender systems are beneficial for both customers and products or ser-
vices providers, they are not only able to save time for customers by automatically
filtering relevant items but also increase user engagements to promote sales for
business (Jannach and Adomavicius, 2016). Therefore, recommender systems
are widely applied in our daily life, including but not limited to: product rec-
ommendation (e.g., Amazon, eBay), video, TV or movie recommendation (e.g.,
Youtube, Netflix), music recommendation (e.g., Spotify), friend recommendation
(e.g., Facebook), news recommendation (e.g., Yahoo) and job recommendation
(e.g., LinkedIn).
2.1.1 Recommendation problem
In general terms, recommender systems are a special kind of software programs
or tools designed to estimate user preference on items that they may have never
interacted before (Ricci et al., 2011).
The input to the recommender systems include user features (e.g. age, gender
and occupation), item features (e.g. content description, metadata), user-item
18
2.1 Recommender systems
Fig. 2.1 A graphical illustration of the recommendation problem
interactions (e.g., past ratings, purchase data and clicking/browsing history) and
other additional context information (e.g., time, place and mood). The output
varies depending on the recommendation task, it can be predicted rating scores for
the rating prediction task, a ranked list of items for the top-N recommendation
task and correct categories of candidate items for the classification task (Zhang
et al., 2017a). A graphical representation of the recommendation problem is shown
in Figure 2.1.
More formally, (Adomavicius and Tuzhilin, 2005) presented a mathematical
formulation of recommender systems, which described recommender system as
19
2.1 Recommender systems
a utility function to find unseen items that maximize the expectation of users.
Specifically, the utility function f aims to maximize preference score of user u on
item i as follows,
rui = arg maxu∈UIu⊆I
f(u, Iu) (2.1)
where U , I denotes user and item sets in a system, respectively, and Iu denotes
the set of items for which user u previously expressed preferences. Based on the
estimated score rui, recommender systems can rank candidate items and decide
whether to recommend item i to user u. Note that in recommender systems,
user-item interactions are usually represented as a matrix R of size |U | × |I|,where rows correspond to users and columns correspond to items. The value
rui ∈ R refers to the level of preference or usefulness of item i to user u, it can
be explicit rating score within certain range (e.g., 0-5) or implicit feedbacks (e.g.,
clicking/listening counts). Since only a small fraction of items are exposed to
users, most entries in matrix R are missing and need to be evaluated by utility
function f defined in Eq. 2.1.
2.1.2 Classification of recommender systems
With the rapid development of recommender system research, numerous and
diverse recommendation approaches have been proposed. Recommender systems
are generally classified into three categories: collaborative filtering, content-based
and hybrid recommender systems (Adomavicius and Tuzhilin, 2005; Lu et al.,
2015b; Park et al., 2012). Until recently, context has been recognized as an
important information source for improving recommendation performance, and as
a result context-aware recommender systems (Adomavicius and Tuzhilin, 2011) are
20
2.1 Recommender systems
studied as new models in recommender system. In addition, since deep learning
has gained successful applications in many fields, such as computer vision, speech
recognition and natural language processing, more and more researchers have
devoted to apply deep learning techniques in making recommendation and develop
deep learning based recommender systems (Zhang et al., 2017a).
In the next subsections, we will review representative works for each catego-
rization listed as follows:
• Collaborative filtering (see 2.1.2.1), which only exploits user past preference
data to predict items for which a user might like in the future. Collaborative
filtering methods can be further divided into two groups: memory-based and
model-based collaborative filtering;
• Content-based recommender systems (see 2.1.2.2), which builds item profile
or representation by extracting content features from item descriptions and
suggests similar items to the ones that user liked in the past based on the
derived representation. Content-based recommender systems are commonly
used to recommend text-based items;
• Hybrid recommender systems (see 2.1.2.3), which are implemented by com-
bining the advantages of both collaborative filtering and content-based
recommender systems. The recommendation performance of hybrid is
demonstrated to be better than pure collaborative filtering and content-
based methods;
• Context-aware recommender systems (see 2.1.2.4), which incorporate con-
textual information to deliver more accurate and relevant recommendations.
21
2.1 Recommender systems
• Deep learning based recommender systems (see 2.1.2.5), which introduces
deep learning techniques to build advanced recommender systems.
2.1.2.1 Collaborative filtering
As one of most popular and well-known methods in building recommender systems,
collaborative filtering (CF) (Herlocker et al., 1999; Koren and Bell, 2015; Sarwar
et al., 2001; Schafer et al., 2007; Su and Khoshgoftaar, 2009; Wang et al., 2006)
predicts interests of a user based on the analysis of tastes and preferences of other
users in the system. A key advantage of CF-based approaches is that it does not
rely on additional content information about items except for user past preference
data, either in the form of explicit rating scores or implicit indications. As a
result, CF-based recommender systems are able to be deployed in a wide range of
recommendation scenarios. There are two main types of CF-based approaches:
Memory-based CF
In memory-based CF, the prediction is made on the basis of similarity between
users or items, which further refers to user-user CF (Resnick et al., 1994) and
item-item CF (Sarwar et al., 2001), respectively.
User-user CF assumes that users with similar tastes in the past will share
same preference on items in the future. This algorithm is effective but requires
high computation cost for computing user-user similarity matrix. It becomes more
serious when a user is dynamically added into a large system since we need to re-
compute the similarity for every user pair. Similar to user-user CF, item-item CF
also needs to compute similarity matrix but proceeds in an item-centric manner.
The underlying assumption of item-item CF is that a user who bought item x will
22
2.1 Recommender systems
enjoy similar item y as well. This algorithm takes lesser time than user-user CF
due to the relative static state of items. In practice, Amazon adopts item-item
CF as its main recommendation technique.
To compute the similarity, Pearson correlation and cosine based similarity are
two commonly used metrics. For example, the Pearson correlation between two
users u and v is defined as following:
suv =∑
i∈Iuv(rui − ru)(rvi − rv)√∑
i∈Iuv(rui − ru)2
√∑i∈Iuv
(rvi − rv)2(2.2)
where Iuv denotes the set of items liked by both users u and v. ru =∑
i∈Iurui
|Iu| ,
rv =∑
i∈Ivrvi
|Iv | . Iu and Iv are the set of items preferred (or rated) by user u and
user v, respectively. Similarly, the cosine-based similarity between users u and v
is defined as:
suv =∑
i∈Iuvruirvi√∑
i∈Iur2
ui
√∑i∈Iv
r2vi
(2.3)
Aforementioned similarity metrics can also be applied to measure similarity
between items.
The performance of memory-based CF relies on the amount of co-rated items
or co-rating users, which decreases when user-item interactions are sparse. This
feature hinders the application of memory-based CF on the large and sparse
datasets.
Model-based CF
In the implementation of model-based CF, various machine learning and data
mining techniques have been applied by learning a pattern from training data,
23
2.1 Recommender systems
which assumes observed data such as users, items and ratings are generated by the
pattern. Examples include Bayesian network (Yang et al., 2013; Zhang and Koren,
2007), clustering methods (George and Merugu, 2005; Shepitsen et al., 2008),
classification based methods (Zhang and Iyengar, 2002), latent semantic based
models (Hofmann, 2004), latent factor based models (Koren and Bell, 2015), latent
Dirichlet allocation (Marlin, 2004) and Markov decision process based models (Su
and Khoshgoftaar, 2009).
Due to the superior performance in Netflix competition, latent factor based
models gradually replace memory-based CF in building most effective recommender
systems. Latent factor based model utilizes matrix factorization (MF) (Koren
et al., 2009) as its core technique and models every user or item with a vector of
latent factors, so that the preference of users on items can be measured on the
latent space. Those latent factors are meaningless in numbers and their physical
explanation depends on the recommendation scenarios. For example, in a movie
recommendation, the factor can be actors, genre or other related information
when describing a movie. It can also be interpreted as age, gender, preference
style when characterizing a person. More specifically, MF approximates user-item
interaction matrix R by multiplying two low-rank matrices U and V that represent
user latent factors and item latent factors, respectively,
R ≈ U × V (2.4)
where row i of U denotes the latent factors of user i, and column j of V denotes
the latent factors of item j.
24
2.1 Recommender systems
Later, probabilistic mtrix factorization (PMF) (Mnih and Salakhutdinov,
2008) was proposed to extend MF into a probabilistic framework and shown
performs better than MF. However, both MF and PMF learn a global perspective
of all users and all items without consideration of their own characteristics. For
example, some users prefer to give higher ratings than others or some items tend
to receive higher ratings than other items. To cope with this systematic tendency,
Koren et al. (Koren et al., 2009) proposed biasedSVD on the basis of singular
vector decomposition (SVD) by introducing user and item bias terms. Further, to
integrate the implicit feedbacks with explicit rating scores, SVD++ (Koren, 2008)
was proposed to enhance the biasedSVD model.
Although model-based CF has achieved great success, it still suffers from
data sparsity problem (Grčar et al., 2005). Inaccurate recommendations will be
generated for those users and items who have fewer rating scores.
2.1.2.2 Content-based recommender systems
Content-based recommender systems (Lops et al., 2011; Mooney and Roy, 2000;
Pazzani and Billsus, 2007) recommend items that are similar to the ones preferred
by the users in the past. The principle of content-based recommender systems
includes two steps:
• It first analyses the description of the preferred items by a particular user
in order to find out common attributes, which can be used to distinguish
items. These attributes are kept in the user profile;
25
2.1 Recommender systems
• It then compares attributes of each item with the user profile, and as a
result only the items that have a higher degree of similarity with the user
profile would be recommended.
The advantage of content-based recommender systems is that it adopts semantic
content of items and recommends items to a specific user that is similar to the
preferred items in his/her profile. As a result, content-based recommender systems
would be able to recommend new items and unpopular items. Furthermore, it can
provide a clarification of recommended items by listing content-features based on
which an item is to be recommended. It does not need to have information about
preferences of other users in making recommendations, so it does not suffer from
the sparseness problem associated to collaborative filtering.
One of the main limitations of content-based recommender systems is the
overspecialization problem. It can only recommend items to a user according
to the preferred items in his/her user profile, thus, it cannot recommend items
outside the user’s profile. Additionally, in some particular cases, it may not be
desirable for a recommender system to recommend too similar items to users,
such as different news articles that describe the same event. Another limitation
of content-based recommender systems is the item content dependency problem.
As content-based recommender systems generate recommendations according to
contents of items, it is hard to use content-based method to recommend items
which cannot be represented as keywords, such as image and movies. Lastly,
the recommendation could not be provided correctly when there is not enough
information to build a solid profile of a user.
26
2.1 Recommender systems
2.1.2.3 Hybrid recommender systems
As it known to all, each recommendation technique has its own strength and
drawback. Hybrid recommender systems are developed to gain higher performance
and to avoid the drawback of individual recommender system (Burke, 2002, 2007;
Ghazanfar and Prugel-Bennett, 2010).
The most common practice for developing hybrid recommender systems is
to combine CF with other recommendation approaches in an attempt to avoid
cold-start, sparseness and/or scalability problems (Kim et al., 2006; Shambour
and Lu, 2012; Zhang et al., 2013). Several combination methods have been
employed, such as weighting (i.e., combine scores of several recommendation
techniques) (Burke, 2002), switching (i.e., switch between recommendation tech-
niques depending on the current situation) (Lekakos and Caravelas, 2008), mixed
(i.e., present recommendations from several recommendation techniques simulta-
neously) (Barragáns-Martínez et al., 2010), feature augmentation (i.e., use the
output from one recommender system as input feature to anther one) (Burke,
2005), cascade (i.e., refines recommendations of a recommender system by another
one) (Lampropoulos et al., 2012).
2.1.2.4 Context-aware recommender systems
In general, users interact with the system within a particular context and that
preferences for items within one context may be different from those in another
context. Context-aware recommender systems take into account contextual factors,
such as location, time and company, in generating more relevant recommenda-
tions (Adomavicius and Tuzhilin, 2011; Verbert et al., 2010, 2012). In contrasted
27
2.1 Recommender systems
to the traditional recommender system models, the rating function of CARS can
be viewed as:
R : Users × Items × Contexts = Ratings (2.5)
Three representative approaches have been designed to deal with contextual
preferences, which are contextual prefiltering, contextual postfiltering and contex-
tual modelling approaches, respectively. In a contextual prefiltering approach, the
contextual information is used to filter out irrelevant information before applying
a traditional recommender system model (Adomavicius et al., 2005; Codina et al.,
2013a,b; Zheng et al., 2013, 2014a). In a contextual postfiltering approach, the
recommendation results from traditional recommender system models are further
filtered using contextual information (Panniello et al., 2009; Ramirez-Garcia and
García-Valdez, 2014). As opposed to contextual prefiltering and contextual postfil-
tering approaches which rely on traditional models to generate recommendations,
contextual modelling approaches model the contextual information directly in
a recommendation function (Hariri et al., 2011; Hidasi and Tikk, 2012; Rendle
et al., 2011; Zheng et al., 2014b).
Overall, the field of context-aware recommender systems is promising, but
much work is needed to explore it comprehensively.
2.1.2.5 Deep learning based recommender systems
Deep learning (DL) (Bengio et al., 2013, 2012, 2009; Deng et al., 2014; Hinton
et al., 2006; Hinton and Salakhutdinov, 2006; LeCun et al., 2015; Schmidhuber,
2015) is a hot and emerging topic in both data mining and machine learning
areas. It learns multiple levels of representation and abstraction from data for
28
2.1 Recommender systems
supervised or unsupervised learning tasks. Initially DL techniques were applied to
computer vision (He et al., 2016; Jia et al., 2014; Krizhevsky et al., 2012; LeCun
et al., 1990; Simonyan and Zisserman, 2014; Szegedy et al., 2015) and speech
recognition (Graves and Jaitly, 2014; Graves et al., 2013; Hinton et al., 2012;
Xiong et al., 2016). Later deep models are applied to natural language processing
(NLP) tasks, such as semantic parsing (Kim, 2014; Socher et al., 2011a; Weston
et al., 2012), machine translation (Cho et al., 2014a,b; Deselaers et al., 2009;
Sutskever et al., 2014; Wu et al., 2016) and sentiment classification (Glorot et al.,
2011; Kim, 2014; Maas et al., 2011; Socher et al., 2011b).
The tremendous success of DL achieved in other research fields alongside
with the capability of capturing non-linear relationship from abundant accessible
data sources such as contextual, visual information, have bring out more rev-
olutions in the design of recommendation architecture. First attempt at using
DL technique for recommender systems involves restricted boltzmann machine
(RBM) (Salakhutdinov et al., 2007). Several recent approaches use autoencoder
(AE) (Kuchaiev and Ginsburg, 2017; Sedhain et al., 2015; Strub and Mary, 2015),
forward neural network (He et al., 2017), recurrent neural network (RNN) (Wu
et al., 2017), convolutional neural network (CNN) (Nguyen et al., 2017; Wang
et al., 2017), deep semantic similarity model (DSSM) (Elkahky et al., 2015; Xu
et al., 2016) and neural autoregressive distribution estimation (NADE) (Zheng
et al., 2016a,b).
In addition to developing deep recommendation models by exploiting single DL
techniques, there are also studies proposed to integrate traditional recommendation
models with DL in the manner of loosely coupled or tightly coupled. The difference
lies in whether the parameters of recommendation models and DL are optimized
29
2.2 Transfer learning
simultaneously. For instance, Zhang et. al. (Zhang et al., 2017b) proposed to learn
item feature representations via AE and then integrated them into the classic
recommendation model SVD++. In contrast, a general Bayesian deep learning
framework consisting of two tightly hinged components: perception component
(act by deep neural network) and task-specific component (act by PMF) was
proposed in (Wang and Yeung, 2016) to seamlessly combine deep learning and
recommendation model. A comprehensive investigation about the integration of
DL with traditional recommender systems is introduced in (Zhang et al., 2017a).
2.2 Transfer learning
Although machine learning technologies have attracted a remarkable level of
attention from researchers in different computational fields, most of these technolo-
gies work under the common assumption that the training data (source domain)
and the test data (target domain) have identical feature spaces with underlying
distribution. As a result, once the feature space or the feature distribution of the
test data changes, the prediction models cannot be used and must be rebuilt and
retrained from scratch using newly-collected training data, which is very expensive
and sometimes not practically possible. Similarly, since learning-based models
need adequate labeled data for training, it is nearly impossible to establish a
learning-based model for a target domain which has very few labeled data available
for supervised learning.
To address aforementioned problems, transfer learning (TL) (Cook et al.,
2013; Long et al., 2014a; Lu et al., 2015a; Pan and Yang, 2010; Shao et al., 2015;
Weiss et al., 2016) has been proposed as a new learning paradigm by utilizing
30
2.2 Transfer learning
previously-acquired knowledge to solve new but similar problems much more
quickly and effectively.
2.2.1 Definition of transfer learning
To have a better understanding of the definition of transfer learning, two important
terms need to be introduced first, which are Domain and Task.
Definition 2.1 (Domain) (Pan and Yang, 2010) A domain, which is denoted
by D = {χ, P (X)}, consists of two components:
(1) Feature space χ; and
(2) Marginal probability distribution P (X), where X = {x1, x2, · · · , xn} ∈ χ.
Definition 2.2 (Task) (Pan and Yang, 2010) A task, which is denoted by
T = {Y, f(·)}, consists of two components:
(1) A label space Y = {y1, y2, · · · , yn}; and
(2) An objective predictive function f(·) which is not observed and is to be
learned by pairs {xi, yi}.
The function f(·) can be used to predict the corresponding label, f(xi), of a new
instance xi. From a probabilistic viewpoint, f(xi) can be written as P (yi|xi). More
specifically, the source domain can be denoted as Ds = {(xs1 , ys1), · · · , (xsn , ysn)}where xsi
∈ χs is the source instance and ysi∈ Ys is the corresponding class label.
Similarly, the target domain can be denoted as Dt = {(xt1 , yt1), · · · , (xtn , ytn)}where xti
∈ χt and yti∈ Yt.
According to the definitions of domain and task, the transfer learning problem
can be defined as: Definition 2.3 (Transfer Learning) (Pan and Yang, 2010)
31
2.2 Transfer learning
Given a source domain Ds and learning task Ts, a target domain Dt and learning
task Tt, transfer learning aims to improve the learning of the target predictive
function ft(·) in Dt using the knowledge in Ds and Ts where Ds �= Dt or Ts �= Tt.
In above definition, the condition Ds �= Dt implies that either χs �= χt or
Ps(X) �= Pt(X). Similarly, the condition Ts �= Tt implies that either Ys �= Yt or
fs(·) �= ft(·).
2.2.2 Classification of transfer learning techniques
According to the uniform definition of transfer learning introduced in Section 2.2.1,
transfer learning techniques can be divided into three main categories (Pan and
Yang, 2010): 1) Inductive transfer learning, in which the learning task in the
target domain is different from the task in the source domain (i.e. Ts �= Tt);
2) Unsupervised transfer learning which is similar to inductive transfer learning
but focuses on solving unsupervised learning tasks in the target domain, such
as clustering, dimensionality reduction and density estimation; 3) Transductive
transfer learning, in which the learning tasks are the same in both domains, while
the source and target domains are different (i.e. Ts = Tt, Ds �= Dt).
2.2.2.1 Inductive transfer learning
Inductive transfer learning, in which the learning task in target domain is different
from the one in source domain but domains are same (i.e. Ts �= Tt and Ds =
Dt) (Pan and Yang, 2010). Inductive transfer learning is similar to multi-task
learning (Argyriou et al., 2007) when labelled data are available in a source domain
32
2.2 Transfer learning
or self-taught learning (Kemker and Kanan, 2017; Raina et al., 2007) when no
labelled data is provided in a source domain.
According to (Pan and Yang, 2010; Rohrbach et al., 2013), existing research
in inductive transfer learning can be classified into four categories: instance-based
transfer learning, feature-based transfer learning, parameter-based transfer learning
and relation-based transfer learning.
Instance-based transfer learning assumes that some labelled source domain
data can be reused again to train a new model in the target domain. To this end,
Dai et al. (Dai et al., 2007) proposed TrAdaBoost, which iteratively re-weights
the source domain data in order to pick out useful samples while alleviate useless
ones for training a classifier. Based on the same idea of removing useless samples
in source domain, different strategies have been adopted and various kinds of
algorithms haven been developed (Huang et al., 2007; Jiang and Zhai, 2007; Li
and Principe, 2017; Yao and Doretto, 2010).
Feature-based transfer learning aims to find common feature representation for
both source and target domains so that the mismatch between two domains can
be decreased. Argyriou et al. (Argyriou et al., 2007) proposed to learn a common
mapping function for both source and target domains, so that a classifier can be
constructed by solving an optimization problem on the low-dimensional feature
space. Lee et al. (Lee et al. 2007) proposed to ensemble related learning tasks
to learn meta priors that can be transferred across domains and add weights to
features for learning representation. Raina et al. (Raina et al. 2007) proposed to
apply sparse coding technique to learn high-level features when no labelled data is
provided in source domain. However, in some conditions, the high-level features
learned in source domain may not perform well in target domain. Under the
33
2.2 Transfer learning
setting of unsupervised feature learning, manifold learning had also been exploited
in developing inductive transfer learning approach (Wang and Mahadevan, 2008).
Parameter-based transfer learning assumes that models in related domains
may share common parameters or priors. In parameter-based transfer learning,
a larger weight is usually added for the loss function of target domain instead
of same weights for both source and target domains. In this direction, Gao et
al. (Gao et al., 2008) proposed a local weighted ensemble learning framework to
combine multiple models for transfer learning, where the weights are dynamically
assigned according to the model predictive power on each test example of target
domain.
Relation-based transfer learning is mainly used for transferring knowledge
among multiple relational domains, such as social network, where the data are
not dependent and identically distributed. To solve this problem, an algorithm
TAMAR that transfers relational knowledge with Markov Logic Networks across
relational domains is proposed. Later, the author also extended TAMAR to the
single-entity setting.
2.2.2.2 Transductive transfer learning
Transductive transfer learning, in which the learning tasks are the same but the
source and target domains are different (i.e. Ts = Tt and Ds �= Dt) (Pan and
Yang, 2010).
Transductive transfer learning has often been used interchangeably with do-
main adaptationn (Jiang, 2008; Li et al., 2014; Long et al., 2014a; Shi et al., 2010;
Tso-Sutter et al., 2008; Wang and Mahadevan, 2011). Under the framework of
domain adaptation, the discrepancy between source and target domain can be
34
2.2 Transfer learning
caused by following reasons: different marginal distribution (i.e., P (Xs) �= P (Xt)),
different conditional distribution (i.e., P (Ys|Xs) �= P (Yt|Xt)), and both. To
overcome marginal distribution discrepancy, sampling method can be applied
to estimate P (Xs) and P (Xt) separately based on the observed data. Fan et
al. (Fan et al., 2005) proposed to estimate the probability ratio by using various
classifiers. A kernel mean matching (KMM) algorithm was developed to learn
P (Xs) and P (Xt) directly by matching the means between the source domain data
and the target domain data in a reproducing-kernel Hilbert space (Huang et al.,
2006). Pan et al. (Pan et al., 2008) exploited the maximum mean discrepancy
embedding (MMDE) method, originally designed for dimensionality reduction, to
learn a low-dimensional space to reduce the marginal difference between different
domains for transductive transfer learning. However, MMDE may suffer from its
computational burden. Thus, Pan et al. (Pan et al., 2011a) further proposed an
efficient feature extraction algorithm, called transfer component analysis (TCA) to
overcome the drawback of MMDE. With respect to different conditional distribu-
tion, Zhong et al. (Zhong et al., 2009) proposed an adaptive kernel approach that
maps the marginal distribution of target domain and source domain data into a
common kernel space, and utilized a sample selection strategy to draw conditional
probabilities between the two domains closer. For the last case, Sun et al. (Sun
et al., 2011) proposed to tackle both marginal and conditional discrepancies in two
separate steps. First, they added weights to source data to reduce the marginal
distance between source and target data. They then computed weights of source
data to reduce the conditional distribution difference based on smooth assumption.
Finally, a classifier was learned for the target domain on those re-weighted source
data. Recently, Behbood et al. (Behbood et al., 2011, 2014) developed a fuzzy
35
2.2 Transfer learning
domain adaptation method for real world banking application. Gong et al. (Gong
et al., 2014) developed a novel approach for unsupervised domain adaptation
with applications to visual recognition. Specifically, they tried to learn robust
features to construct classifiers. First, they applied geodesic flow kernel (GFK) to
summarize the inner products in an infinite sequence of feature subspaces that
smoothly interpolates between the source and target domains. Second, they
leveraged kernel to combine multiple base GFKs to model both source and target
domains at fine-grained granularities.
2.2.2.3 Unsupervised transfer learning
Unsupervised transfer learning, in which both learning task and domains are
different (i.e., Ts �= Tt and Ds �= Dt). Additionally, no labelled data are observed
in the source and target domain in training (Pan and Yang, 2010). This problem
setting is still an open challenge in transfer learning.
In (Dai et al., 2008), a new approach called self-taught clustering (STC) is
proposed, which aims at clustering a small collection of unlabelled data in the target
domain with the help of a large amount of unlabelled data in the source domain.
Especially, self-taught clustering tries to learn a common feature space across
domains, which is benefit for clustering in the target domain. Similarly, Wang et
al. (Wang et al., 2008) first applied clustering methods to generate pseudo class
labels for the target unlabelled data, and then applied dimensionality reduction
methods to the target data and labelled source data to reduce the dimensions.
In (Jiang and Chung, 2012), a transfer spectral clustering (TSC) algorithm was
proposed by involving not only the data manifold information of the individual
36
2.3 Cross-domain recommender system
task but also the feature manifold information shared between tasks. Compared
to STC, TSC was built on graphs.
2.3 Cross-domain recommender system
Nowadays, the majority of recommender systems only offer recommendations of
items belonging to a single domain. For example, Netflix offers movie recommen-
dations; Spotify provides music recommendations. Although these recommender
systems have been applied successfully in the corresponding domains, there are
cases where providing multiple and diverse item recommendations could be bene-
ficial for exploring user unique preferences. For instance, a user could get relevant
music and book recommendations if he/she shows interests in a specific movie.
Furthermore, instead of treating each domain independently, exploiting knowledge
from relevant and auxiliary domains is helpful for improving recommendation
performance, especially in the situation of data sparsity.
Consider the above two motivations, the research of cross-domain recommender
system (CDRS) has grown to be an challenging but largely under explored topic.
It is studied as the problem of cross-system personalization in user modelling (Abel
et al., 2013; Shapira et al., 2013), as a potential solutions to address cold start
and data sparsity in recommender systems (Shi et al., 2011; Tiroshi et al., 2013),
and as a practical application of TL in the area of recommender systems (Li et al.,
2009a; Pan et al., 2010b).
Although CDRS has been studied from various perspectives in different re-
search areas, an unified definition of CDRS problem has not emerged yet. There
are some survey papers summarize the development of CDRS approaches. A
37
2.3 Cross-domain recommender system
brief survey (Li, 2011) introduces cross-domain collaborative filtering (CDCF) in
two dimensions: collaborative filtering domains and knowledge transfer styles.
With respect to collaborative filtering domains, it considers three representative
domains in practice, which are system domain, data domain, and temporal do-
main, respectively. In knowledge transfer styles, it introduces three transfer ways,
namely rating-pattern sharing, latent-feature sharing and domain correlating.
An extended survey (Fernández-Tobías et al., 2012) mainly focuses on relations
between domains, including content-based relations and collaborative filtering
based relations. Cremonesi et al. (Cremonesi et al., 2011) considers four types of
data overlap, including no overlap, user overlap, item overlap, and full overlap,
to distinguish the literature in CDRS. Shi et al. (Shi et al., 2014) introduces
CDRS from the perspective of how to improve user-based and model-based CF
techniques by exploring various kinds of auxiliary data. Recently a more com-
prehensive survey (Cantador et al., 2015) defines the notation of domain at four
levels (i.e., attribute level, type level, item level and system level) and addresses
three recommendation tasks (i.e., multi-domain recommendation, linked-domain
recommendation and cross-domain recommendation).
2.3.1 Definition of cross-domain recommender system
Let Us and Is be the sets of users and items in the source domain Ds, and Ut
and It be the sets of users and items in the target domain Dt, a cross-domain
recommender system aims to use the knowledge in the source user-item interaction
matrix Xs ∈ R|Us|×|Is| to predict the missing values in the target user-item
38
2.3 Cross-domain recommender system
interaction matrix Xt ∈ R|Ut|×|It|, so that the recommendation performance can
be greatly improved when the data in Xt is sparse.
2.3.2 Classification of cross-domain recommendation ap-
proaches
As discussed in (Cremonesi et al., 2011), there are four scenarios of data overlap
between source and target domains. Each scenario is briefly introduced below:
• No overlap: There are no overlapping users and items between the domains,
i.e., Ust = Us⋂Ut = ∅ and Ist = Is
⋂ It = ∅.
• User overlap: There are common users in both domains, i.e., Ust = Us⋂Ut �=
∅.
• Item overlap: There are common items in both domains, i.e., Ist = Is⋂ It �=
∅.
• User and item overlap: There are overlapped users and items between the
domains, i.e., Ust �= ∅ and Ist �= ∅.
According to the settings of data overlap, existing research on cross-domain
recommendation can be generally classified into two categories: cross-domain
recommendation for partially/fully overlapping domains and cross-domain recom-
mendation for non-overlapping domains.
39
2.3 Cross-domain recommender system
2.3.2.1 Cross-domain recommendation for partially/fully overlapping
domains
This type of approaches assume that same latent features are shared in both source
and target domains. The type of latent features can be either user-specific latent
features or item-specific latent features. Under this assumption, different kinds
of models have been developed by fusing various types of user-side or item-side
information.
The model of collective matrix factorization (CMF) (Singh and Gordon, 2008)
is proposed to collectively factorize a user-item rating matrix and a item-content
matrix, by sharing the same item-specific latent features to enable knowledge
transfer between two domains. Similar to CMF, Ma et al. (Ma et al., 2008)
factorizes a user-item rating matrix and a user-user social network matrix simul-
taneously in order to find the shared user-specific latent features. Later, the
approach of weighted nonnegative matrix co-tri-factorization (WNMCTF) (Yoo
and Choi, 2009) exploits nonnegative matrix factorization (NMF) technique to
collectively factorize one user-item rating matrix, one user demographic matrix
and one item-content matrix, with the idea of sharing both user-specific latent
features and item-specific latent features to enhance the knowledge transfer. MCF-
LF (Zhang et al., 2012), CLP-GP (Cao et al., 2010) and NB-MCF (Chatzis, 2013)
study multiple user-side auxiliary data matrices and learn user’s preferences and
similarities, which are shown to be more effective than the approaches that only
share the latent features.
Instead of using the auxiliary data directly, some researchers propose to
explore more hidden information in the auxiliary data. Shi et al. (Shi et al.,
40
2.3 Cross-domain recommender system
2013b) collectively factorizes a user-item rating matrix and a item-item similarity
matrix by mining from the movie’s mood description. Tang et al. (Tang et al.,
2013) collectively factorizes a user-item rating matrix weighted by user’s global
reputations and a user-user social matrix. Future, they add constraints on sharing
the same user-specific latent features. In addition to transferring all the knowledge
of the source domain, Lu et al. (Lu et al., 2013) selectively transfers high quality
knowledge from multiple user-aligned data, which was shown to be more accurate
than transfer without selection.
Considering the data heterogeneity in user feedbacks, Pan et al. (Pan et al.,
2010b) proposes CST which transfers knowledge from the auxiliary implicit
feedback of browsing records to the target explicit feedback of rating scores.
Specifically, it incorporates the coordinate systems (or latent features) extracted
from auxiliary data into the target factorization system via two regularization
terms. This work provides a way to deal with heterogeneous data for cross-domain
recommendation. In addition to sharing both user-specific and item-specific latent
features, the model of transfer by collective factorization (TCF) (Pan et al.,
2011b) also uses two inner matrices to represent the data-dependent information.
Latter, the authors propose a more effective model called iTCF (Pan and Ming,
2014) by extending TCF model.
2.3.2.2 Cross-domain recommendation for non-overlapping domains
Without overlapping users or items, how to build correspondences between cross-
domain users and between cross-domain items remain as an open challenge. To
address this problem, researchers propose to bridge two heterogeneous domains
41
2.3 Cross-domain recommender system
through an implicit way or explicit way. Representative approaches are summarized
as follows.
Implicit rating-based cross-domain recommender systems
Methods that handle two domains without overlapping of users or items usually
transfer knowledge between the domains at a group level. Additionally, they only
use user preference data.
Codebook transfer (CBT) (Li et al., 2009a) bridges two domains by clustering
rating matrices and finds user-item patterns at the cluster level. Specifically,
it first extracts a codebook from the source rating matrix, and then transfers
the codebook to a sparse target domain as the shared knowledge. However, the
source rating matrix is assumed to be full in this method. Later the rating
matrix generative model (RMGM) (Li et al., 2009b) extends CBT by combining
codebook construction and codebook expansion in one single step, and relaxes
the constraints made on source rating matrix. Further, Gao et al. (Gao et al.,
2013) generalizes the codebook to include a data-independent rating pattern
and a data-dependent rating pattern, which is shown to be more accurate than
only sharing the data-independent common knowledge. To better capture the
interactions between domain-specific user factors and item factors, Hu et al. (Hu
et al., 2013) proposes CDTF model to use user explicit and implicit feedbacks,
respectively. Additionally, the technique of multi-task learning (Elkahky et al.,
2015; Moreno et al., 2012) and active learning (Zhao et al., 2013, 2017) have also
been studied to promote knowledge in cross-domain recommendation.
42
2.3 Cross-domain recommender system
Explicit tag-based cross-domain recommender systems
Instead of linking domains by implicit rating patterns, tags are widely studied to
build an explicit knowledge transfer bridge between heterogeneous domains.
In this respect, shi et al. (Shi et al., 2011) applied overlapping tags to profile
cross-domain users and items, so that cross-domain user-to-user and item-to-
item similarities can be inferred from the tagging data. These similarities can
then be exploited as prior knowledge to regularize the joint matrix factorization
process. Since more unique information about individual domains is encoded in
the non-overlapping tags, Hao et al. (Hao et al., 2016) extended this work by
considering domain-specific tags. Enrich et al. (Enrich et al., 2013) developed
three tag-based rating prediction models using both the rating and tagging data
in the auxiliary domains. These models transfer knowledge using overlapping tags
on the assumption that tagging behavior in one domain can be exploited in a
completely different domain. Fernandez et al. (Fernández-Tobías and Cantador,
2014) improved one of these models by introducing an additional set of tag factors
to better capture the effect of tags in rating estimation. Wang et al. (Wang et al.,
2012) proposed a two-step tag transfer learning model. The model applied a
tag clustering approach to the tags in the source domain and used the learned
tag clusters to group tags in the target domain. In their model, the transferred
knowledge is represented as tag clusters. Fang et al. (Fang et al., 2015) learned
and shared a tag-occurrence matrix to help knowledge transfer across multiple
domains.
To semantically correlate tags of different domains, most of studies rely on
the construction of external knowledge base. Kumar et al. (Kumar et al., 2014)
proposed to measure the semantic relatedness between tags with WordNet based
43
2.3 Cross-domain recommender system
ontology. Yang et al. (Yang et al., 2015) proposed to capture the semantic
relatedness between two different tags/keywords through the concept vectors
distilled from online encyclopedias. In (Yang et al., 2014), a Chinese Knowledge
Graph consisting billions of concepts was built to find an explicit correlation
between two tags used in different social media.
44
Chapter 3
Exploiting Domain Specific Tags
for Cross-domain
Recommendation
3.1 Introduction
The performance of CDCF relies on whether an effective domain link can be
established as bridge for knowledge transfer. In this direction, most existing
CDCF approaches assume that either users or items are fully or partially shared
between source and target domains (Pan et al., 2012; Pan and Yang, 2013; Singh
and Gordon, 2008). The shared users or items then becomes a bridge to support
knowledge transfer. However, due to different privacy policies in companies, it is
more common that users and items in both domains are completely non-overlapped,
i.e. correspondence is unknown. For this task, (Fang et al., 2015; Fernández-
Tobías and Cantador, 2014; Shi et al., 2011) proposed to build an explicit domain
45
3.1 Introduction
relationship through user-generated tags. The underlying assumption of these
models is that users with similar tagging behaviors are likely to share similar
interests.
Although explicit domain relationship is shown more effective than the implicit
one (Shi et al., 2011), there are two limitations in above assumption. First, the ratio
of overlapping tags decreases with domain heterogeneity and most users or items
are not covered by those limited overlapping tags. Inaccurate recommendations
will be generated on the basis of weak domain connection. Second, it is a waste
to discard abundant domain specific tags, which correspond to unshared parts of
tags in the individual domain. In addition, domain specific tags are more effective
for capturing the distinctive characteristics of individual domain. If we take the
domain specific tags into account, more links between heterogeneous domains can
be established to promote knowledge transfer.
In order to explore the role of domain specific tags, this chapter proposes
a novel tag-based CDCF algorithm, called enhanced tag-induced cross domain
collaborative filtering (ETagCDCF), which instead builds the correspondence
between cross-domain users or cross-domain items by exploiting domain specific
tags. Nevertheless, the diverse format in domain specific tags has presented
a challenge in generating an uniform feature space for aligning heterogeneous
domains. To address this problem, ETagCDCF applies spectral clustering to
group the domain specific tags based on tag co-occurrence pattern. As a result,
domain specific tags that are used in the same pattern will be grouped together
and a new tag representation will be generated to represent those non-identical
tags. Modeling on the new tag representation, user and item profiles can be
greatly enriched by adding more information encoded in domain specific tags and
46
3.2 Preliminary knowledge
helpful for inferring more accurate cross-domain similarities. The experiments on
two public datasets show that the proposed model performs well and achieves
state-of-the-art performance.
The reminder of this chapter is organized as follows: Section 3.2 introduces
some basic definitions and notations used in this chapter; Section 3.4 presents
the proposed model in details; Section 3.5 conducts a set of experiments on two
public datasets to evaluate the performance of the proposed model; and Section
3.6 summarizes this chapter.
3.2 Preliminary knowledge
In this section, some preliminary definitions and notations are presented to help
understand the problem setting of this chapter.
Definition 3-1 (Domain): A domain D is a category of items that we are
interested to recommend to different users.
In above definition, domain is defined on the perspective of item type. Different
recommended items, for example, book, movie, music, can be regarded as different
domains. In each domain, a set of users and items can be presented by U = {ui}mi=0
and I = {ij}nj=0, respectively. All the ratings given by m users to n items are
denoted by matrix Rm×n = {rij}i=1,2,...,m;j=1,2,...,n. In addition, the unique tag
set used in a domain is denoted by T = {tk}lk=1, where l is the number of
unique tags. Furthermore, the tag assignments are organized in the form of
TR = {ui : {ij : tk}}i=1,2,...,m;j=1,2,...,n;k=1,2,...,l.
Definition 3-2 (Overlapping tags): Overlapping tags is a set of tag that are
distributed in the source and target domains.
47
3.2 Preliminary knowledge
jack
jamie
leolily
rosa
suzy
amy
5.0 4.5 1.0
? 3.0 ?
4.5 4.0 5.0
3.5 ?
? 5.0
? 5.0
? ?
#good book
#love #sci-fi
#not real#fantasy
#hero#sci-fi
Movie
Book
#romantic
#romantic
Knowledge
Fig. 3.1 A scenario for tag-based cross-domain recommendation. In this figure, weaim to exploit knowledge from a movie domain to bootstrap book recommendation.The unobserved rating score is denoted by ? and each tag text starts with #.
Overlapping tags can be denoted by Tc = {ti|ti ∈ Ts ∩ Tt, i = 1, 2, . . . , lc},
where lc is the number of common tags. Taking the recommendation scenario
presented in Figure 3.1 as an example, tags of #romantic and #sci-fi, which are
distributed in both domains, are called overlapping tags.
Definition 3-3 (Domain-specific tags): Domain-specific tags refer to the tags
that are unique to the individual domain.
According to the above definition, two sets of domain-specific tags can be
obtained. They are denoted by T ds = {ti|ti ∈ Ts − Tc, i = 1, 2, . . . , ls} and
T dt = {tj|tj ∈ Tt − Tc, j = 1, 2, . . . , lt}, where ls and lt denotes the number of
domain-specific tags in the source and target domains, respectively. Similarly,
in Figure 3.1, the domain-specific tags in the source domains are {#not real,
#hero}, and the domain-specific tags in the target domain are {#good book, #love,
#fantasy}.
48
3.3 Enhanced tag-induced cross domain collaborative filtering
3.3 Enhanced tag-induced cross domain collab-
orative filtering
Given both domains Ds and Dt, the task of ETagCDCF is to infer implicit
relationships between Us and Ut and between Is and It through domain-specific
tags, so that the recommendation performance in the sparse target domain Dt can
be greatly improved by transferring rating knowledge from the source domain Ds.
Generally, ETagCDCF consists of following three steps:
(1) Mapping domain-specific tags into predefined k clusters to form a new tag
representation, so that the heterogeneity between domain-specific tags can
be greatly reduced.
(2) Refining users and items profiles with the new tag representation and then
compute cross-domain user-to-user and item-to-item similarities.
(3) Integrating the learned cross-domain similarities into matrix factorization
to serve as a tie between the source and target domains for transferring
knowledge.
3.3.1 The alignment of domain-specific tags
Inspired by the idea in (Pan et al., 2010a), domain-specific tags are grouped by
applying spectral clustering technique. Specifically, the clustering is implemented
with the co-occurrence pattern that relies on the relationships between overlapping
tags and domain-specific tags. Since there is a bi-directional nature in the tagging
data, such as a user can randomly distribute his favorite tags to the any items,
49
3.3 Enhanced tag-induced cross domain collaborative filtering
Algorithm 3.1 Alignment of domain-specific tagsInput: original domain dependent tags T d
s and T dt , common tags Tc, number of
clusters k.
Output: user specific tags DSU , item specific tags DSV , user specific tag
membership matrix RU , item specific tag membership matrix RV .
1: Apply Equation 1 and 2 on the union sets of T ds ∪ T d
t to select K specific tags
for users, which is denoted by DSU = {ti|i = 1, 2, . . . , K}, and L specific tags
for items, which is denoted by DSV = {ti|i = 1, 2, . . . , L}.
2: Based on DSU and Tc, calculate (user specific tag)-(common tag) co-occurrence
matrix MU ∈ RK×lc , where MU
i,j = 1 if CU(ti, tj) = 1, otherwise MUi,j = 0.
Similarly, by using DSV and Tc, calculate (item specific tag)-(common tag)
co-occurrence matrix MV ∈ RL×lc .
3: for each z ∈ [U, V ] do
4: Construct matrix Lz = (Dz)−1/2 ×Az ×(Dz)−1/2, where Az =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣0 M z
M z 0
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦,
Dz is a diagonal matrix, and Dzi = ∑j Az
i,j.
5: Find the k largest eigenvectors of Lz, u1, u2, . . . , uk, and form the matrix
Ez = [u1, u2, . . . , uk].
6: end for
7: Extract the first K rows of matrix EU ∈ R(K+lc)×k to obtain RU ∈ R
K×k and
first L rows of matrix EV ∈ R(L+lc)×k to obtain RV ∈ R
L×k.
8: return DSU , DSV , RU and RV
50
3.3 Enhanced tag-induced cross domain collaborative filtering
and an item receives tags from arbitrary users, therefore the tag co-occurrence is
modelled on user and item sides, respectively.
For users, the tag co-occurrence is defined as following:
CU(ti, tj) =
⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩1 if UN(ti, tj) ≥ 1
0 otherwise
(3.1)
where UN(ti, tj) denotes the number of users who have assigned both tags ti and
tj.
Based on Equation 3.1, the domain-specific tags for user are represented
by DSU = {ti|∀ti ∈ T ds ∪ T d
t , ∃tj ∈ Tc, CU(ti, tj) = 1, i = 1, 2 . . . , (ls + lt), j =
1, 2, . . . , lc}.
Similarly, on item side, the tag co-occurrence is defined below:
CI(ti, tj) =
⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩1 if IN(ti, tj) ≥ 1
0 otherwise
(3.2)
where IN(ti, tj) denotes the number of items which are labeled by tags ti and tj.
According to Equation 3.2, the domain specific tags for items are denoted
by DSV = {ti|∀ti ∈ T ds ∪ T d
t , ∃tj ∈ Tc, CI(ti, tj) = 1, i = 1, 2 . . . , (ls + lt), j =
1, 2, . . . , lc}.
After filtering specific tags for both users and items, the goal is to partition
those candidate tags into k clusters, where k is a predefined parameter. The
51
3.3 Enhanced tag-induced cross domain collaborative filtering
complete procedure of the alignment of domain-specific tags is described in the
Algorithm 3.1.
As shown in (Ding and He, 2004), the k principle components which refers
to the k largest eigenvectors u1, u2, . . . , uk in step 5 of Algorithm 3.1, can be
used to represent the original data in the subspace spanned by these k principle
components. In our problem, these k principle components serve as the high-level
representation of tags by clustering domain-specific tags. In the next Section, a
mapping function is developed to map users and items to the new subspace so
that cross-domain similarity can be computed.
3.3.2 Cross-domain similarities refinement
Before re-defining user and item profiles with tag clusters, the tag-based profile is
first defined as follows:
Definition 4 (User-specific tag indicator matrix): User-specific tag indica-
tor matrix X reflects the relationships between users (either from Us or Dt) and
the domain-specific tags DSU , which is denoted by X = {xij|i = 1, 2, . . . , m, j =
1, 2, . . . , K}, where xij = 1 if user i has assigned tag tj ∈ DSU , otherwise xij = 0.
Definition 5 (Item-specific tag indicator matrix): Item-specific tag indica-
tor matrix Y reflects the relationships between items (either from Is or It) and
the domain-specific tagsDSV , which is denoted by Y = {yij|i = 1, 2, . . . , n, j =
1, 2, . . . , L}, where yij = 1 if item i has been tagged by tag tj ∈ DSV , otherwise
yij = 0.
52
3.3 Enhanced tag-induced cross domain collaborative filtering
In addition to applying binary value to denote whether the tag is assigned by
a user or to an item, the tagging frequency can also be applied to estimate xij/yij.
However there we only consider binary value.
Then the alignment of user and item profiles is implemented by the mapping
functions defined in Equation 3.3 and 3.4, respectively.
ΦU(Xi) = Xi × RU (3.3)
ΦV (Yi) = Yi × RV (3.4)
where Xi denotes user-specific tag indicator matrix from domain i (i = s or t, which
represents source or target domain) and Xi ∈ Rmi×K . Yi denotes item-specific tag
indicator matrix from domain i (i = s or t) and Yi ∈ Rni×L.
Once user and item profiles are converted to the vector of tag clusters, it is
able to compute the cross-domain user-to-user and item-to-item similarities with
cosine similarity metric. The overall procedure of the refinement of cross-domain
similarity is summarized in Algorithm 3.2.
3.3.3 Model and inference
Matrix factorization is one of the most popular approaches in making recommender
systems, which is based on the low-dimensional factor model. It transforms both
users and items to the same latent factor space and tries to explain the preferences
of users by linearly combining the latent factors of users and items. This low-rank
approximation model performs well in single-domain recommendation. To extend
53
3.3 Enhanced tag-induced cross domain collaborative filtering
Algorithm 3.2 Refinement of cross-domain similarityInput: source domain tag assignments TRs, target domain tag assignments TRt,
user specific tags DSU , item specific tags DSV , user specific tag membership
matrix RU , item specific tag membership matrix DSV .
Output: cross-domain user-to-user similarity matrix SU , cross-domain item-to-
item similarity matrix SV .
1: Apply Definitions 4 and 5 on DSU , DSV , TRs and TRt to form 4 matrices,
Xs ∈ Rms×K , Ys ∈ Rns×L, Xt ∈ R
mt×K and Yt ∈ Rnt×L.
2: Apply mapping function (3) to generate source domain user-tag cluster relation
matrix UTs ∈ Rms×k and and target domain user-tag cluster relation matrix
UTt ∈ Rmt×k.
3: Apply mapping function (4) to generate source domain item-tag cluster
relation matrix ITs ∈ Rns×k and target domain item-tag cluster relation
matrix ITt ∈ Rnt×k.
4: for user i ∈ [1, 2, . . . , ms] do
5: for user p ∈ [1, 2, . . . , mt] do
6: SUi,p = Σ(UTs[i,:]�UTt[p,:])
Σ(UTs[i,:]�UTs[i,:])×Σ(UTt[p,:]�UTt[p,:]) , where � denotes dot product of
two vectors.
7: end for
8: end for
9: for item j ∈ [1, 2 . . . , ns] do
10: for item q ∈ [1, 2, . . . , nt] do
11: SVj,q = Σ(ITs[j,:]�ITt[q,:])
Σ(ITs[j,:]�ITs[j,:])×Σ(ITt[q,:]�ITt[q,:])
12: end for
13: end for
14: return SU and SV
54
3.3 Enhanced tag-induced cross domain collaborative filtering
the model to cross-domain recommendation, users and items in both domains
should be mapped to the same latent space to support knowledge transfer.
The two cross-domain similarity matrices SU and SV , which reflect the im-
plicit relationships between the source and target domains, are further added
as constraints for regularizing joint matrix factorization (Shi et al., 2011). The
objective function of ETagCDCF is formulated as:
F = 12
ms∑i=1
ns∑j=1
Isij(Rs
ij − (U si )T V s
j )2
+ 12
mt∑p=1
nt∑q=1
I tpq(Rt
pq − (U tp)T V t
q )2
+ α
2
ms∑i=1
mt∑p=1
(SUip − (U s
i )T U tp)2
+ β
2
ns∑j=1
nt∑q=1
(SVjq − (V s
j )T V tq )2
+ λ
2 (‖Us‖2F + ‖Vs‖2
F + ‖Ut‖2F + ‖Vt‖2
F )
(3.5)
where Rs contains ratings from ms users to ns items in the source domain. Is
is an indicator matrix confirming all the calculations are only conducted on the
observed ratings. In the target domain, the rating matrix is denoted by Rt and
the corresponding indicator matrix is denoted by I t. The latent factors of users in
the source domain are denoted by matrix Us ∈ Rd×ms , whose ith column denotes
the d-dimensional latent factor vector of user i. Similarly, matrix Vs ∈ Rd×ns
denotes the latent factors for items in the source domain, whose jth column
denotes the d-dimensional latent factor vector for item j. The latent factors of
users and items in target domain are denoted by Ut ∈ Rd×mt and Vt ∈ R
d×nt ,
respectively. α and β are two trade-off parameters, which control the relative
55
3.3 Enhanced tag-induced cross domain collaborative filtering
importance of cross-domain user-to-user and item-to-item similarity, respectively.
λ is the regularization parameter used to penalize the model complexity in order
to avoid over-fitting.
In Equation 3.5, the first part represents the matrix factorization in the source
domain, while the second part corresponds to the matrix factorization in the
target domain. Both factorization processes are regularized by the third and
fourth parts.
The goal is to estimate four optimal variables Us, Vs, Ut, Vt, so that the rating
matrix Rt can be approximated by Rt ≈ U t × V t with minimum error. The local
minimum for Equation 3.5 can be found by performing gradient descent on four
variables Us, Vs, Ut, Vt alternatively. Specially, the gradients with respect to each
variable are computed as below:
∂F
∂U si
=ns∑
j=1Is
ij((U si )T V s
j − Rsij)V s
j + αmt∑p=1
((U si )T U t
p − SUip)U t
p + λU si (3.6)
∂F
∂V sj
=ms∑i=1
Isij((V s
i )T U sj − Rs
ij)U si + β
nt∑q=1
((V sj )T V t
q − SUjq)V t
q + λV sj (3.7)
∂F
∂U tp
=nt∑
q=1I t
ij((U tp)T V t
q − Rtpq)V t
q + αms∑i=1
((U si )T U t
p − SUip)U s
i + λU tp (3.8)
∂F
∂V tq
=mt∑p=1
I tpq((U t
p)T V tq − Rt
pq)U tp + β
ns∑j=1
((V sj )T V t
q − SVjq)V s
j + λV tq (3.9)
56
3.4 Experiments
In the training phrase, these four variables are updated according to the following
rules:
Us ← Us − ε∂F
∂Us
Vs ← Vs − ε∂F
∂Vs
Ut ← Ut − ε∂F
∂Ut
Vt ← Vt − ε∂F
∂Vt
(3.10)
The learning rate ε determines the updating extent of variables during each
iteration. The learning rate can be adjusted automatically by applying the binary
search. The initial learning rate ε is set as 0.001 in the experiment.
3.4 Experiments
In this Section, a series of experiments are conducted to evaluate the proposed
model ETagCDCF under the setting of limited overlapping tags. First, the dataset
used in the experiment is described, and then the experimental setting is introduced.
Next the impact of parameters on the final recommendation performance is studied.
Finally, the comparison over single-domain and cross-domain recommendation
approaches are performed to validate the effectiveness of ETagCDCF.
57
3.4 Experiments
3.4.1 Description of dataset and experimental settings
ETagCDCF are evaluated on two publicly available datasets: the MovieLens 10M
data set1 and LibraryThing data set2. The MovieLens 10M (ML) dataset contains
over 10 million ratings and 100,000 tag applications applied to 10,681 movies by
71,567 users. The LibraryThing (LT) dataset contains over 700,000 ratings and 2
million tag applications assigned by 7,564 users to 39,519 books. Ratings in both
datasets are represented on a 1-5 scale, with interval steps of 0.5.
There are three principles in designing the experiments. First, the goal is
to exploit domain-specific tags to improve recommendation performance when
the number of overlapping tags is limited. Second, limited ratings should be
provided for each individual user to support the cold start context. Finally, all
the results can be reproduced in future. Considering all the above three factors,
a compromised proposal is achieved by extracting the first 1,000 users and first
1,000 items from both original datasets. Especially, we only take ratings and tag
assignments whose identifiers for users and items are both within the range of
1,000. Based on this criterion, we will keep a small subset of the original dataset.
Due to the characteristics of original datasets, tag assignments in our ML dataset
will be too limited to share enough common tags with LT dataset. Besides, there
will be a small amount of ratings provided by our LT dataset, which further
decreases the number of ratings contributed by each user in LT dataset. Table 3.1
shows the statistics of the final datasets. The rating sparsity is calculated by
1 − ratingsusers×items
, which is used to evaluate the ratio of observed ratings in the whole
1http://www.grouplens.org/node/732http://www.macle.nl/tud/LT/
58
3.4 Experiments
Table 3.1 Statistics of the datasets used in Chapter 3
MovieLens 10M LibraryThing
Users 946 726
Items 857 256
Ratings 41894 2779
Rating sparsity 94.83% 98.50%
Unique tags 138 548
Tag assignments 256 2779
Ratio of overlapping (shared) tags 13.04% 3.28%
rating space. In addition, there are only 18 overlapping tags shared between ML
and LT datasets.
For ETagCDCF, the number of tag cluster k and dimensionality of the latent
factors d are set to 10 because experiments on the validation set reveals that this
combination of parameters can achieve the best performance. The regularization
parameter λ in Equation 3.5 is set to 0.01 after tuning on the validation set. The
selection of two trade-off parameters α and β will be further discussed in the
Section 3.4.2.
For each dataset, 5-fold cross validation is conducted, and the averaged result
is reported as the final result. All the comparisons are evaluated by Mean Absolute
Error (MAE) and Root Mean Square Error (RMSE), which are widely applied to
measure the performance of rating prediction task. The definitions of MAE and
RMSE are shown below:
MAE =∑i,j
|rij − rij|N
(3.11)
59
3.4 Experiments
RMSE =∑i,j
√|rij − rij|2
N(3.12)
where ri,j denotes the predicted rating user i give to item j, rij is the real rating,
and N is the total number of test ratings.
3.4.2 Impact of parameters
In this Subsection, experiments are conducted to investigate the impact of two
trade-off parameters α and β in ETagCDCF, which respectively controls the
contribution of cross-domain user-to-user and item-to-item similarities to the final
objective function in Equation 3.5.
Same strategy in (Shi et al., 2011) are adapted to tune these parameters on
the validation set. First, β = 0 is fixed unchanged and the value of α is changed
alternatively within the range [0.0001, 0.001, 0.01, 0.1] to check the performance
of MAE and RMSE. This results are shown in Figure 3.2. Based on the results,
α is set to 0.01 as it got the lowest value on both LT and ML datasets. Next, α
= 0.01 is unchanged and the value of β is varied within the range [0.0001, 0.001,
0.01, 0.1] to check the variation of MAE and RMSE. The corresponding results
are shown in Figure 3.2. According to the above results, α = 0.01 and β = 0.01
are set as the optimal values.
3.4.3 Performance comparison
Several single-domain and cross-domain recommendation approaches are chosen
as baselines in the experiment, whose introductions are described as follows:
60
3.4E
xperiments
Fig. 3.2 MAE and RMSE variations via changing α and β
61
3.4 Experiments
UCF (Herlocker et al., 1999): User-based collaborative filtering is a conventional
memory-based single domain recommendation approach. It looks for users who
share same tastes to calculate a prediction for the active user. The key challenge
is to compute similarities between all pairs of users. In our implementation, the
similarity is computed by Pearson correlation coefficient and the neighborhood
size is set to 50.
ICF (Sarwar et al., 2001): Item-based collaborative filtering is a model-based
single domain recommendation approach. It generates recommendations for a
user by finding items that are similar to the other items the user had liked before.
Since the relationship between items is relatively static, item-item similarity model
does not have to be built so often and as a result it reduces more computation
and usually perform better than UCF on sparse dataset. In our implementation,
adjusted cosine similarity is adopted to compute item similarities.
SVD (Sarwar et al., 2000): Singular Value Decomposition is another well-known
model-based single domain recommendation approach, which relies on matrix
factorization technique and decomposes a rating matrix into three matrices with
reducing the dimensionality of the product space. It maps users and items
into a low dimensional space and discovers the intrinsic relationship among the
latent feature vectors of users and items for making recommendation. In our
implementation, the dimension of latent feature space is set to 10.
TagCDCF (Shi et al., 2011): Tag-induced cross-domain collaborative filtering
is a recently proposed tag-based cross-domain recommendation approach, which
utilizes overlapping tags to connect cross-domain users and cross-domain items so
that knowledge can be transferred between domains through those similar users
and items.
62
3.4 Experiments
The performance of ETagCDCF and other baselines are shown in Table 3.2
and Table 3.3. A smaller value of MAE or RMSE means a better performance.
The experiment results expose several interesting findings, which are discussed in
the following.
To study whether the knowledge obtained from the auxiliary domains is
useful to improve recommendation performance in the target domain, this study
compares the results of cross-domain recommendation models (i.e., TagCDCF,
ETagCDCF) with single-domain recommendation benchmarks (i.e., UCF, ICF,
SVD). Based on the results, specifically when LT is set as the source domain
and ML is set as the target domain, TagCDCF and ETagCDCF significantly
outperform all the other single-domain baselines in both MAE and RMSE. For
example, the improvement achieved by ETagCDCF is up to 20.97% in MAE and
20.88% in RMSE when compared with IBCT. The reason for poor performance
of IBCT on ML dataset can be explained by the characteristics of the dataset.
Because all the ratings made by 946 users are widely distributed among 857 movies,
which makes it difficult to collect enough co-rated movies as the foundation to
compute similarities among pairs of items. Inaccurate item-item similarity in ICF
leads to incorrect recommendations. Similar analysis also applies to UCF. As a
representative approach for single-domain recommendation, SVD overcomes above
problem by performing better than UCF and ICF, but it still fails to outperform
cross-domain recommendation approaches (i.e., TagCDCF, ETagCDCF). This
indicates that the knowledge learned from LT dataset (source domain) indeed
helps to facilitate the recommendation making in ML dataset (target domain).
With respect to another situation of setting ML as the source domain and LT as
the target domain correspondingly, cross-domain recommendation algorithms are
63
3.4 Experiments
Table 3.2 MAE comparison with other baselines (mean ± std)
ModelsDataset
ML LT
UCF 0.901994 (0.006238) 0.762814 (0.023162)
ICF 0.888476 (0.004812) 0.861590 (0.004147)
SVD 0.899647 (0.005595) 0.763140 (0.011803)
TagCDCF 0.779907 (0.005225) 0.891171 (0.017739)
ETagCDCF 0.702188 (0.003401) 0.818152 (0.017481)
Table 3.3 RMSE comparison with other baselines (mean ± std)
ModelsDataset
ML LT
UCF 1.158626 (0.005768) 1.030641 (0.021892)
ICF 1.138212 (0.016996) 1.110459 (0.026747)
SVD 1.102908 (0.005104) 0.962341 (0.015199)
TagCDCF 1.027675 (0.007242) 1.137431 (0.022205)
ETagCDCF 0.900501 (0.005620) 1.035668 (0.028353)
64
3.4 Experiments
inferior to single-domain recommendation algorithms. Specifically, SVD obtains
the best results evaluated by MAE and RMSE. There are two main reasons to
explain above phenomenon. First, only 2779 ratings are given in the LT dataset.
The collected rating data cannot reach a considerable scale as the input for matrix
factorization to drive it work well. At the meantime, matrix factorization plays a
fundamental brick in building ETagCDCF. This can be interpreted as a reason
for poor performance of ETagCDCF. However, in the implementation of SVD,
same pre-processing job in (Sarwar et al., 2000) is applied by filling sparse rating
matrix, so that the impact of data sparseness will not cause the sharp drop of
performance. Second, there are only 256 tag assignments (138 unique tags) in
the ML dataset. Limited tagging data will not provide enough information for
inferring accurate cross-domain similarities, which in turn mislead the knowledge
transfer between both domains.
To identify the role of social tags in improving recommendation performance,
the performance of SVD is compared with TagCDCF and ETagCDCF because
they are all built on matrix factorization model. The only difference lies in the
fact that SVD only relies on ratings to make recommendation, while TagCDCF
and ETagCDCF also integrate tag information into recommendation models. The
results show that both ETagCDCF and TagCDCF perform better than SVD in
some conditions. For example, comparing SVD with ETagCDCF on ML dataset,
the improvement achieved by ETagCDCF is up to 21.95% in MAE and 18.35% in
RMSE. Based on these results, we can summarize that social tags indeed offer
some additional information that goes beyond ratings to the factorization.
To check the effectiveness of the domain-specific tags in linking heterogeneous
domains when there is only a limited number of overlapping tags, ETagCDCF
65
3.5 Summary
is compared with its counterpart TagCDCF. ETagCDCF achieves better results
than TagCDCF on both LT and ML datasets. The improvement is up to 8.19%
in MAE and 8.95% in RMSE on LT. Similarly on ML, the improvement is up
to 9.97% in MAE and 12.37% in RMSE. In the setting of the experiment, there
are only 18 overlapping tags between the source and target domains. The weak
connection between both domains established by limited overlapping tags will
result in poor performance of TagCDCF. This observation has also been studied
in (Shi et al., 2011). In contrast, ETagCDCF are developed to utilize abundant
domain-specific tags to bridge two domains in this situation. The significant
improvement achieved by ETagCDCF clearly supports the possibility of linking
heterogeneous domains by using domain-specific tags when only a limited number
of overlapping tags are available.
3.5 Summary
Compared to limited overlapping tags shared by both domains, domain-specific
tags are rich in amount and contain unique information of individual domains. This
chapter has proposed a novel tag-based cross-domain collaborative filtering model,
which exploits abundant domain-specific tags to bridge up disjoint domains. To
eliminate the distance in non-identical tags, this chapter adopts spectral clustering
with tag co-occurrence pattern to group domain-specific tags. As a result, a new
tag representation in the form of tag clusters can be learned to model user and
item profiles across domains. Modeling with the derived tag representation, more
accurate cross-domain user-to-user and item-to-item similarities can be calculated
and integrated into joint matrix factorization process to guide knowledge transfer.
66
3.5 Summary
The experimental results demonstrate that ETagCDCF is capable of establishing
a strong domain link to help transfer more knowledge between domains.
67
Chapter 4
Exploiting Tag-induced
Structural Information for
Cross-domain Recommendation
4.1 Introduction
The explicit domain link learned in Chapter 3 by utilizing abundant domain-
specific tags is useful for increasing correspondence between heterogeneous domains.
However, there are some limitations in ETagCDCF (see Chapter 3). First, the
similarity between domain-specific tags is measured by the co-occurrence relation-
ship with overlapping tags. Isolated domain-specific tags without companion of
overlapping tags cannot be grouped accurately and thus the derived tag clusters
will add noise to the learning of inter-domain correlation. Second, although the
number of overlapping tags is limited, overlapping tags are the most intuitive
features which are beneficial for building a weak inter-domain correlation for align-
68
4.1 Introduction
ing domain heterogeneity to some extent. Therefore, it is desirable to play the
complementary roles of overlapping tags and domain specific tags in establishing
a more tight inter-domain correlation.
Furthermore, according to the principle of domain adaptation, not only inter-
domain similarity but also intra-domain similarity need to be maximized for
promoting knowledge transfer. In single-domain recommendation, the intra-
domain similarity between users or between items by learning from tag distribution
has attracted much attention and many state-of-the-art techniques have been
proposed for improving item recommendation (De Gemmis et al., 2008; Zhao
et al., 2008; Zhen et al., 2009). However, the intra-domain correlation represented
in the form of intra-domain similarity is not yet considered in the development of
cross-domain recommender systems.
In this chapter, this study considers the challenge of integrating structural
knowledge inferred from tags, including both intra- and inter-domain correlations,
into recommendation framework. Specifically, first, users and items are profiled
with overlapping tags and a basic inter-domain correlation is built in the form
of cross-domain similarities (i.e., cross-domain user-to-user similarity and cross-
domain item-to-item similarity). On the basis of correspondence established
by overlapping tags, connections between involved domains are then added by
clustering domain-specific tags and the derived tag clusters are used as the new
representation to refine the cross-domain similarities. Finally, tagging information
is also exploited to compute the intra-domain similarities between users and
between items, which are linked to build a compact intra-domain correlation.
By adopting both inter- and intra-domain correlations as structural knowledge
to regularize joint matrix factorization, a complete tag-induced cross domain
69
4.2 Notations
recommendation model, called CTagCDR , is proposed in this chapter to fully
explore the complementary role of tags in promoting knowledge transfer. The
experimental results demonstrate that CTagCDR performs well in both rating
prediction and item recommendation tasks.
This chapter is organized as follows: Section 4.2 introduces some basic notations
and states the problem formally; Section 4.3 introduces details of the proposed
model and presents the parameter estimation process; Section 4.5 evaluates the
proposed model through a series of experiments; and lastly Section 4.6 summarizes
this chapter.
4.2 Notations
Before describing the proposed model CTagCDR, the commonly used notations
in this chapter need to be explained. Without loss of generality, two domains are
considered, although CTagCDR can be easily generalized to multiple domains.
In this chapter, boldface uppercase letters, such as A, denotes a matrix. Then
the i-th row and j-th column of matrix A are denoted as Ai∗ and A∗j respectively.
The (i, j)-th entry of matrix A is denoted by Aij.
Given a sparse target domain D1 and a dense source domain D2, and for
the π-th domain (π=1,2), suppose there are a set of nπ users Uπ, mπ items V π
and lπ tags T π respectively. Tags in π-th domain are divided into two parts due to
different distributions in the domain, including the shared tags Tc and the domain-
specific tags T πs . The shared tags are the domain-independent tags that appear
in both D1 and D2, which are denoted by Tc = {tcm | tc
m ∈ T 1 ∩ T 2, 1 ≤ m ≤ lc}.
70
4.3 Complete Tag Induced Cross-domain Recommendation
The domain-specific tags are the domain-dependent tags that are exclusive to the
individual domain, and are denoted by T πs = {ts
n | tsn ∈ T π − Tc, 1 ≤ n ≤ lπ − lc}.
Let Rπ ∈ Rnπ×mπ be the sparse user-item interaction matrix for the users Uπ
and items V π, in which rπij represents rating score given by the user i to the item j
in the π-th domain. To mark the observable values in the rating matrix Rπ, an
indicator matrix is represented by IRπ , where IRπ
ij = 1 if the user i rated on the
item j and IRπ
ij = 0 otherwise.
By analyzing the tag assignment triplet T πijk, which is represented in the
form of {uπi , vπ
j , tπk} and denotes the action that user i assigned tag k to item j,
different kinds of user/item tagging matrices are generated. Specifically, sym-
bols Xπu , Y π
u , Zπu are used to denote the user tagging matrix according to the
shared tags, the domain-specific tags and all the tags in domain Dπ, respectively.
Similarly, the symbols Xπv , Y π
v , Zπv denote different item tagging matrices. Fre-
quently used notations and corresponding descriptions are summarized in Table
4.1.
4.3 Complete Tag Induced Cross-domain Rec-
ommendation
In cross-domain recommendation tasks, where neither users nor items overlap,
the challenge can be formulated by employing the relationships between users
and tags, and between items and tags, to build a reliable domain connection, so
that knowledge from source domain can be confidently transferred to improve
prediction in the target domain.
71
4.3 Complete Tag Induced Cross-domain Recommendation
Table 4.1 Notations and corresponding descriptions used in Chapter 4
Symbols Descriptions
π domain index, π = 1, 2
Dπ domain π
nπ, mπ, lπ number of users, items, tags in Dπ, respectively
lc, lπs number of share tags, domain specific tags in Dπ, respectively
f number of latent factors
Uπ set of users in Dπ, Uπ = {uπi | 1 ≤ i ≤ nπ}
V π set of items in Dπ, V π = {vπj | 1 ≤ j ≤ mπ}
T π set of tags in Dπ, T π = {tπk | 1 ≤ k ≤ lπ}
Tc set of share tags, Tc = {tcm | 1 ≤ m ≤ lc}
T πd set of domain specific tags in Dπ, T π
d = {tsn | 1 ≤ n ≤ lπ
s }T π
ijk tag assignment made by user i on item j with tag k in Dπ.
Rπ nπ × mπ rating matrix in Dπ
Uπ nπ × f latent feature matrix of users in Dπ
V π mπ × f latent feature matrix of items in Dπ
Xπu , nπ × lc user tagging matrix based on share tags
Y πu , nπ × (lπ − lc) user tagging matrix based on domain specific tags
in Dπ
Zπu nπ × lπ user tagging matrix based on complete tags in Dπ
Xπv mπ × lc item tagging matrix based on share tags
Y πv
mπ × (lπ − lc) item tagging matrix based on domain specific tagsin Dπ
Zπv mπ × lπ item tagging matrix based on complete tags in Dπ
IA indicator matrix for matrix A
Ai∗ the i-th row of matrix A
A∗j the j-th column of matrix A
‖A‖ Frobenius norm of matrix A
72
4.3 Complete Tag Induced Cross-domain Recommendation
To exploit the full potential of tagging information, the proposed CTagCDR
model aims to infer a strong inter-domain correlation and a compact intra-domain
correlation from the tagging data. Specifically, CTagCDR is composed of following
four major steps:
Step 1: Building basic inter-domain correlations using shared tags;
Step 2: Enhancing inter-domain correlations using domain-specific tag clusters;
Step 3: Inferring intra-domain correlations from tags in individual domains;
Step 4: Aggregation and integration of Inter- and intra-domain structural knowl-
edge.
The work flow of CTagCDR model is illustrated in Figure 4.1.
4.3.1 Step 1: Building basic inter-domain correlations us-
ing shared tags
In traditional CF approaches (Konstan et al., 1997; Sarwar et al., 2001), a user is
represented by a vector defined over the entire item space. This reflects a user’s
preference for items that s/he is interested in. Similarly, an item is represented by
a vector defined over the entire user space, which indicates the users that have
shown an interest in this item. Due to the heterogeneity of disjoint domains, this
way of modelling fails to characterize cross-domain users and items in a unified
way. Considering the property that social tags can encode both user preferences
and item attributes, (Shi et al., 2011) proposes to break {user, item, tag} ternary
relationships into two binary relationships: {user, tag} and {item, tag}, and build
user and item profiles through shared tags. As a result, cross-domain users and
items could be mapped to the same space built by shared tags for comparison.
73
4.3C
omplete
Tag
InducedC
ross-domain
Recom
mendation
source domain
target domain
Step 3:Inferring intra-
domain correlation
Joint matrix factorization
Step 4:Inter-domain
knowledge aggregationStep 2:
Building complementary inter-
domain correlation
CTagCDR
Step 4:Intra-domain knowledge integration
basic inter-domain similarity: complementary inter-domain similarity: intra-domain similarity:
SteSSS p 4444:::::Inter-domain
knoknknknk wledge aggregationonononn
Step 1:Building basic inter-domain correlation
Fig. 4.1 Workflow and components of CTagCDR model.
74
4.3 Complete Tag Induced Cross-domain Recommendation
Nevertheless, only binary information is taken into account when making user
and item tagging matrices (Shi et al., 2011). It loses the ability to distinguish the
different tag distributions on user and item sides. To fully exploit the quantitative
information encoded in shared tags Tc, TF-IDF weighting is applied to build a
user tagging matrix Xπu . In particular, the (i, m)-th element is defined as tf ∗ idf
value between user i and shared tag m, as shown below,
[Xπu ]im =
⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩tfu(i, m) × log2
(nπ
dfu(m)
), if user i used tag m.
0, otherwise.
(4.1)
where tfu(i, m) denotes the normalized frequency of tag m in user i’s tagging
history on all items, and dfu(m) denotes the number of users who have used tag m.
Note that if user i has never used tag m, then [Xπu ]im = 0
Similarly, the distribution of shared tags on items can be modelled and rep-
resented in an item tagging matrix Xπv . The (j, m)-th element is defined as
follows,
[Xπv ]jm =
⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩tfv(j, m) × log2
(mπ
dfv(m)
), if item j was labelled by tag m.
0, otherwise
(4.2)
where tfv(j, m) denotes the normalized occurrence frequency of tag m on item j
used by all users, and dfv(m) denotes the number of items that has been attached to
75
4.3 Complete Tag Induced Cross-domain Recommendation
the shared tag m. If item j has never been attached by share tag m, then [Xπv ]jm =
0.
Once user and item profiles are generated by shared tags and vectorized, the
cross-domain user and item similarities can be computed with different similarity
metrics. For simplicity, a cosine similarity is used to compute the cross-domain user-
to-user similarity matrix Su ∈ Rn1×n2 and item-to-item similarity Sv ∈ R
m1×m2
as:
Suip =
∑lcd=1(X1
u)id × (X2u)pd√∑lc
d=1(X1u)id × (X1
u)id
√∑lcd=1(X2
u)pd × (X2u)pd
Svjq =
∑lcd=1(X1
v )jd × (X2v )qd√∑lc
d=1(X1v )jd × (X1
v )jd
√∑lcd=1(X2
v )qd × (X2v )qd
(4.3)
These similarity matrices encode the information of shared tags and act as basic
inter-domain correlation between the source and target domains. This process is
presented in Algorithm 4.1.
4.3.2 Step 2: Enhancing inter-domain correlations using
domain-specific tag clusters
Shared tags can help to address domain heterogeneity in modelling cross-domain
users and items. However, to collect enough shared tags for disjoint domains
is usually difficult, the resulting loose coupling between the domains leads to
inaccurate predictions. Further, it is a waste to abandon domain-specific tags,
which are rich in amount and able to reflect the intrinsic properties of the individual
domain. By using domain-specific tags to connect the cross-domain users and
76
4.3 Complete Tag Induced Cross-domain Recommendation
Algorithm 4.1 Basic inter-domain correlation constructionInput: Tag assignment triplets T π
tri(π = 1, 2).
Output: Basic cross-domain similarity matrices Sub and Sv
b .
1: Get shared tag set Tc = T 1 ∩ T 2.
2: Initialize Xπu ∈ Rnπ×lc and Xπ
v ∈ Rmπ×lc with zeros.
3: for π = 1, 2 do
4: for i = 1, 2, · · · , nπ do
5: for m = 1, 2, · · · , lc do
6: Given tag assignment triplets T πtri, fill element [Xπ
u ]im in Xπu by
Eq. (4.1).
7: end for
8: end for
9: for j = 1, 2, · · · , mπ do
10: for m = 1, 2, · · · , lc do
11: Given tag assignment triplets T πtri, fill element [Xπ
v ]jm in Xπv by
Eq. (4.2).
12: end for
13: end for
14: end for
15: Compute Sub and Sv
b by Eq. (4.3)
77
4.3 Complete Tag Induced Cross-domain Recommendation
items, more domain linkages will be established to enhance the inter-domain
correlation.
However, a number of obstacles hinder the application of domain-specific tags.
First, it is not trivial to employ heterogeneous domain-specific tags as features
to link different domains, even though assembling tags from different domains
to construct a pool of tags helps to address domain heterogeneity. In addition,
the pairwise interactions, such as between users and tags and between items
and tags, are generally sparse due to the power-law phenomenon. This way of
modelling poses another two problems: high scalability and heavy computabil-
ity. Second, tags are arbitrary words generated by users from an uncontrolled
vocabulary, ambiguity and redundancy exist in the tagging data. The recom-
mendation performance will be undermined if this problem is not addressed. In
this context, tag clustering provides a natural solution to this challenge. By
clustering domain-specific tags, the derived tag clusters will eliminate the data
ambiguity and redundancy. Further, the clusters also serve as high-level and
compact representations for domain-specific tags, which provide a way to establish
unified user and item profiles between different domains.
To cluster diverse domain-specific tags, a tag co-occurrence pattern is designed
to consder the relationships between the shared and domain-specific tags (Pan
et al., 2010a). Specifically, if the domain-specific tags from different domains occur
with same shared tag in the user or item tagging histories, they will be grouped
into same tag cluster. To avoid focusing on the tag examples that are associated
with most users and items, domain-specific tags are filtered based on information
entropy instead of directly selecting with the highest usage frequency. Given a
shared tag m, the way to measure the importance of domain specific tag n on
78
4.3 Complete Tag Induced Cross-domain Recommendation
user side is defined as follows:
θ(m, n) =
⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩−α(m, n)
β(m, n) × log2α(m, n)β(m, n) , if α(m, n) �= 0
0, otherwise
(4.4)
where α(m, n) denotes the number of users who have used both shared tag m and
domain-specific tag n in their tagging histories, β(m, n) denotes the sum of users
who used either the shared tag m or the domain-specific tag n to describe their
interests. If θ(m, n)>θ(m), then we keep the domain-specific tag n, otherwise
we abandon it. θ(m) denotes the filtering threshold with regard to the shared
tag m, which is set as the average importance over all domain-specific tags in
our experiment. The counterpart filtering on the item side can be obtained in a
similar manner.
Filtering on the user-tag relationship, we will result in a tripartite graph Gu =
{V, E} as shown in Figure 4.2 to represent the relationship between the domain-
specific tags and the shared tags, where V is the set of nodes in this graph,
and E denotes the set of undirected edges. Let lu denote the number of filtered
domain-specific tags. Therefore, there are two types of nodes in V : lu nodes for
the domain-specific tags and lc nodes for the shared tags. Note that an edge in E
only exists between different types of nodes, i.e. between the shared tag m and
the domain-specific tag n, and the edge weight indicates the similarity for the
connected nodes. Take the nodes m and n in Figure 4.2 for example, the edge
79
4.3 Complete Tag Induced Cross-domain Recommendation
source domain target domain
m
n
D2D1
Fig. 4.2 Example of tag tripartite graph constructed based on user-tag relationship.Red squares denote shared tags in both domains, while the green triangles and bluecircles denote the filtered domain-specific tags from both source and target domains,respectively. The edge weight reflects the similarity between the connected tags.
weight is set as the Jaccard similarity:
sim(m, n) = α(m, n)β(m, n)
Then we can define a (lu + lc) × (lu + lc) affinity matrix Au for this graph, which
is represented as:
Au =
⎡⎢⎢⎢⎢⎢⎢⎣Ilu×lu , B
Bᵀ, Ilc×lc
⎤⎥⎥⎥⎥⎥⎥⎦where the connectivity matrix B, lu × lc, whose elements denote the similarity
between the filtered domain-specific tags and shared tags, and I denotes the
80
4.3 Complete Tag Induced Cross-domain Recommendation
identity matrix. Similarly, we can derive a tripartite graph Gv and an affinity
matrix Av from the item side.
Once we have an affinity matrix Au(Av) to represent the tag co-occurrence
pattern, any clustering technique that takes a similarity measure as its input can
be applied to implement the clustering. To be specific, Affinity Propagation (Frey
and Dueck, 2007) is adopted as our clustering method since it automatically finds
the number of clusters based on the data provided, which saves us from tuning
this parameter in validation. The generated ku(kv) tag clusters from user-tag
(item-tag) relation will be used as the new features to profile cross-domain users
(items), so that users and items from different domains will be mapped to the
same space spanned by the domain-specific tag clusters.
In the new subspace, the user and item vectors are aligned and used to
compute the new cross-domain user-to-user and item-to-item similarity matrices,
whose construction process takes only the information of the domain-specific tags
into account. By doing so, we can separately study the impact of the domain-
specific tags in linking different domains, and exploit their encoded information
to regularize matrix factorization. We describe this process in Algorithm 4.2.
4.3.3 Step 3: Inferring intra-domain correlations from tags
in individual domains
Existing tag-based CDCF approaches (Fang et al., 2015; Shi et al., 2011, 2013a)
mainly focus on using tags directly as aligned features to build a bridge between
different domains. This way of modelling helps to build an inter-domain connection
for knowledge transfer. However, their models ignore adding constraints to each
81
4.3 Complete Tag Induced Cross-domain Recommendation
Algorithm 4.2 Complementary inter-domain correlation constructionInput: Tag assignment triplets T π
tri (π = 1, 2).
Output: Complementary cross-domain similarity matrices Suc and Sv
c .
1: Combine original domain-specific tags from source and target domains by Td =
T 1d + T 2
d .
2: Filter Td by Eq. 4.4 to get lu domain-specific tags T ud based on user-tag
relationship. Similarly, filter Td again to get lv domain-specific tags T vd based
on item-tag relationship.
3: Based on filtered domain spcific tags T ud and T v
d , build tagging matrices Y πu
and Y πv using Eqs. 4.1 and 4.2, respectively.
4: Construct affinity matrix Au and Av as explained in Section 4.3.2.
5: Apply AffinityPropagation on Au (Av) to get ku (kv) tag clusters, then make
tag membership matrix Mu ∈ R(lu+lc)×ku
(Mv ∈ R
(lv+lc)×kv
), where Mu
and Mv only take binary values {0, 1}, and only one ‘1’ can be in each
row of Mu and Mv .
6: for π = 1, 2 do
7: Y πu ← Y π
u × [Mu][1:lu,∗]
Y πv ← Y π
v × [Mv][1:lv ,∗]
8: end for
9: Given Y πu and Y π
v , compute Suc and Sv
c by Eq. 4.3
82
4.3 Complete Tag Induced Cross-domain Recommendation
user or item involved in the domain. The resulting loose coupling of the individual
domain will inevitably have side effects for the knowledge transfer. Moreover,
the implicit user or item relationships within the individual domain can also be
elicited from tagging data and be exploited to add more valuable information to
improve the recommendation performance.
Following this thread, we adopt the idea of TagiCoFi (Zhen et al., 2009),
which employs user similarities exploited from user tagging histories to regu-
larize the matrix factorization procedure. Specifically, it adds a regularization
term tr(U�LU ) to the objective function of the PMF model, where tr(·) denotes
the trace of a matrix, L = D − S is known as the Laplacian matrix, S is the
tag-based user similarity matrix, and D is a diagonal matrix whose diagonal
elements Dii = ∑j Sij. The tag-based user similarity matrix S can be computed
using equations (5-6) in (Zhen et al., 2009). Due to space limitations, we skip the
details here.
The regularization term drives the latent factors of the users with similar
tagging behaviors to be similar as well. Such an extension can be considered as
adding an intra-domain correlation from the perspective of users. Similarly, we
can explore the tagging histories of items within individual domain and define
another type of intra-domain correlation in the context of items.
Since in our problem we have two disjoint domains, to add an internal control
for each domain and keep knowledge flowing among similar users or items, we add
regularizations from both user and item perspectives by exploiting intra-domain
user and item tagging histories. The inferred intra-domain correlations are added
as regularization terms in Eq. 4.6.
83
4.3 Complete Tag Induced Cross-domain Recommendation
4.3.4 Step 4: Aggregation and Integration of Inter- and
intra-domain knowledge
The role of shared tags and domain-specific tags have been studied separately
in linking different domains. Both types of tags have their advantages and
disadvantages: shared tags serve as aligned features, independent of domain, but
fail to satisfy sufficient quantity, while clustering diverse domain-specific tags
accurately is nontrivial, which inevitably introduces noise to similarity calculations.
As a result, we propose aggregating their contributions to fully explore the
complementary role of shared tags and domain-specific tags for transferring
knowledge. Therefore, we define cross-domain user-to-user and item-to-item
similarities as:
Su = ISub ◦ Su
b +[1 − ISu
b
]◦ Su
c
Sv = ISvb ◦ Sv
b +[1 − ISv
b
]◦ Sv
c
(4.5)
where ◦ denotes element-wise multiplication. By assembling cross-domain sim-
ilarities generated with shared and domain-specific tags, more user and item
connections between domains will be built. In addition, we also need to integrate
the intra-domain correlations inferred from both user and item tagging histories
from the individual domains, so that the knowledge is not only be transferred
between domains but also within the domains.
To this end, we propose to extend joint matrix factorization by imposing
structural constraints on the inter- and intra-domain correlations. Specifically, we
84
4.3 Complete Tag Induced Cross-domain Recommendation
minimize following objective function:
f(R1, R2 | U1, V 1, U2, V 2)
=12
n1∑i=1
m1∑j=1
IR1
ij
(R1
ij − g(
U1i∗V 1
j∗�))2
+12
n2∑p=1
m2∑q=1
IR2
pq
(R2
pq − g(
U2p∗V 2
q∗�))2
+λu
2
n1∑i=1
n2∑p=1
ISu
ip
(Su
ip − g(
U1i∗U2
p∗�))2
+λv
2
m1∑j=1
m2∑q=1
ISv
jq
(Sv
jq − g(
V 1j∗V 2
q∗�))2
+λα
2
([tr(U1�L1
uU1)]
+[tr(V 1�L1
vV 1)]
+[tr(U2�L2
uU2)]
+[tr(V 2�L2
vV 2)])
+λβ
2
(‖U1‖2 + ‖V 1‖2 + ‖U2‖2 + ‖V 2‖2
)
(4.6)
where g(·) is a logistic normalization function, which is set as a sigmoid function
in our experiment, and λu, λv, λα, λβ are hyper-parameters for controlling inter-
domain user similarity, inter-domain item similarity, intra-domain similarity and
regularizing latent factors, respectively. The details on how the parameters are
tuned and the effects of different values are presented in the following section.
To optimize the proposed model, we apply stochastic gradient descent to
update U1, V 1, U2, V 2 alternately. The derivative of each variable is computed
85
4.3 Complete Tag Induced Cross-domain Recommendation
as per below:
∂f
∂U1i∗
=m1∑j=1
⎛⎝IR1
ij
(R1
i,j − g(
U1i∗V 1
j∗�))
g′(
U1i∗V 1
j∗�)⎞⎠V 1
j∗
+λu
n2∑p=1
⎛⎝ISu
ip
(Su
ip − g(
U1i∗U2
p∗�))
g′(
U1i∗U2
p∗�)⎞⎠U2
p∗
+λα
(L1
uU1)
i∗ + λβU1i∗
(4.7)
∂f
∂V 1j∗
=n1∑i=1
⎛⎝IR1
ij
(R1
i,j − g(
U1i∗V 1
j∗�))
g′(
U1i∗V 1
j∗�)⎞⎠U1
i∗
+λu
m2∑q=1
⎛⎝ISu
jq
(Sv
jq − g(
V 1j∗V 2
q∗�))
g′(
V 1j∗V 2
q∗�)⎞⎠V 2
q∗
+λα
(L1
vV 1)
j∗ + λβV 1j∗
(4.8)
∂f
∂U2p∗
=m2∑q=1
⎛⎝IR2
pq
(R2
p,q − g(
U2p∗V 2
q∗�))
g′(
U2p∗V 2
q∗�)⎞⎠V 2
q∗
+λv
n1∑i=1
⎛⎝ISu
ip
(Su
ip − g(
U1i∗U2
p∗�))
g′(
U1i∗U2
p∗�)⎞⎠U1
i∗
+λα
(L2
uU2)
p∗ + λβU2p∗
(4.9)
∂f
∂V 2q∗
=n2∑
p=1
⎛⎝IR2
pq
(R2
p,q − g(
U2p∗V 2
q∗�))
g′(
U2p∗V 2
q∗�)⎞⎠U2
p∗
+λv
m1∑j=1
⎛⎝ISu
jq
(Sv
jq − g(
V 1j∗V 2
q∗�))
g′(
V 1j∗V 2
q∗�)⎞⎠V 1
j∗
+λα
(L2
vV 2)
q∗ + λβV 2q∗
(4.10)
After updating latent factors of both users and items, we can approximate the
target rating matrix R1 by g(U1V 1) to verify the performance of our proposed
model.
86
4.4 Complexity analysis
4.4 Complexity analysis
The time complexity for each major step of CTagCDR model has been analyzed.
There are four major steps in CTagCDR. For Step 1, it mainly takes o(m1 × m2 ×lc + n1 × n2 × lc) time, which is the sum of computing cross-domain user-to-user
and item-to-item similarity matrices respectively. For Step 2, suppose there are
Ωu(Ωv) nonzero values in the affinity matrix Au (Av), the algorithmic complexity
of applying Affinity Propagation on Au and Av will be o(Ωu × Ωu + Ωv × Ωv).
After getting ku(kv) tag clusters for modelling cross-domain users (items), it
will take additional o(m1 × m2 × ku + n1 × n2 × kv) time to compute cross-
domain similarity matrices based on domain-specific tag clusters. For Step 3, we
model users and items with tags used within individual domain and construct
user and item similarity matrices for both source and target domains. This
step would take o(m1 × m1 × l1 + m2 × m2 × l2 + n1 × n1 × l1 + n2 × n2 × l2)
time. For Step 4, which is the core step of CTagCDR, it will totally take
o(ΩR1 + ΩR2 + ΩSu + ΩSv + ΩL1u
+ ΩL1v
+ ΩL2u
+ ΩL2v) time to update latent factors.
We find that most of time is taken to compute similarity matrix. However,
on one hand, there are some parallel computing software, such as Spark1, can be
used to speed up the computation of similarity matrix. On the other hand, all the
similarity matrices can be pre-processed as inputs for the algorithm since they
are constructed once. Therefore, we believe CTagCDR can be scaled to the large
dataset.1https://spark.apache.org/
87
4.5 Experiments
4.5 Experiments
In this section, we conduct a series of experiments to study the performance of
CTagCDR and test the effectiveness of exploiting both shared and domain-specific
tags in building bridges between disjoint domains for knowledge transfer. We first
describe the datasets used in these experiments, and then explain the experimental
setting, including: the methods employed as benchmarks and metrics applied to
evaluate performance of each approach. This is followed by experiments focused on
setting appropriate parameters for CTagCDR, especially the trade-off parameters:
λu, λv and λα. We also studied the impact of latent factors and ranking position
to the recommendation performance before we made overall comparison. Later,
we examined how our proposed method behaves under different configurations
of tag sparsity. Through these experiments, we aim to find positive answers to
following questions:
Q1: How does CTagCDR perform on different datasets when compared to both
single and cross-domain recommendation methods?
Q2: How does the tag-induced inter-domain correlation improve recommendation
performance when comparing with methods without bridging different domains
with tags?
Q3: How effective are the domain-specific tags in promoting knowledge transfer?
Q4: How will CTagCDR behave in different tag sparsity configurations?
88
4.5 Experiments
4.5.1 Datasets
To make a fair evaluation of our proposed model, it needs to conduct thorough
experiments on three publicly available datasets: MovieLens 10M dataset2, Library-
Thing dataset3 and LastFM dataset4. Those three datasets include information
on both user preferences and tagging information as required by our model.
MovieLens 10M (ML): A user-movie rating dataset containing over 10
million ratings (scales 0.5-5) and 95,580 tag applications, applied to 10,681 movies
by 71,567 users. In ML-10m, not every movie has been rated by a user and at
the same time tagged by at least one distinct tag. In other words, some movies
only have a rating score without a tag assignment, and vice versa. Since we focus
on improving recommendation performance by taking tagging information into
account, we discard records have neither type of information, resulting in 24,564
remaining ratings and tag assignments.
LibraryThing (LT): A user-book rating dataset, containing over 700,000
ratings (scales 1-5) and 2 million tag applications used by 7,564 users on 39,515
books. Each user gives a rating score and a tag assignment to a book in the
LT dataset. We observed inconsistencies in the rating scores in the original LT
dataset, where the same user-book pair has multiple different rating scores. To
avoid these inconsistencies, we filtered duplicate user-book pairs and kept only the
first record from the original dataset. To preserve a moderate size for evaluation,
we then selected the top 24564 records using the original dataset order as our final
dataset.2http://www.grouplens.org/node/733http://www.macle.nl/tud/LT/4http://grouplens.org/datasets/hetrec-2011/
89
4.5 Experiments
Table 4.2 Statistics of datasets used in Chapter 4
MovieLens-10m LibraryThing LastFM
# of users 2026 244 1524
# of items 5088 12809 6854
# of tags 9529 4596 7927
# of ratings 24564 24564 20665
rating ratio 0.24% 0.79% 0.17%
LastFM (FM): A user-song listening counts dataset released in HetRec 2011
contains social networking, tagging, and music artist listening information from
a set of 2K users from the Last.fm online music system. The same pruning
strategy was also applied to the FM dataset to select user-artist pairs with both
listening counts and tag information. We normalize each user’s listening count
by averaging all his/her listened artists. Unlike the aforementioned two datasets
with explicitly rated scores, the FM dataset contains only the listening count of a
user, considered as implicit feedback. We applied this dataset in our experiment
to test the performance of our proposed model with different forms of feedback.
The description of the final datasets used in this chapter is listed in Table 4.2.
4.5.2 Experiment Setup
4.5.2.1 Evaluation Methodology
We compared the performance of CTagCDR with both state-of-the-art single
domain and cross-domain recommendation approaches listed below:
PMF (Mnih and Salakhutdinov, 2008): probabilistic matrix factorization is a
popular method for basic matrix factorization. It tries to model user preferences
90
4.5 Experiments
as the dot product of latent factors of users and items. PMF is a state-of-the-art
single domain recommendation model but only exploits rating data. We apply
PMF as a single domain benchmark to evaluate the benefit of integrating tagging
information to improve recommendation performance.
TagiCoFi (Zhen et al., 2009): A tag-induced collaborative filtering method, which
builds on the basis of PMF model and captures user relationship from user tagging
behaviors to regularize matrix factorization. Compared to PMF, TagiCoFi exploits
both rating and tagging data in a single domain.
TagCDCF (Shi et al., 2011): Tag-induced cross-domain collaborative Filtering
is a tag-based cross-domain recommendation approach. It exploits overlapping
tags to link disjoint domains. The relationship between two domains is encoded
in cross-domain user-to-user and item-to-item similarity matrices. By building
domain connection with overlapping tags, useful knowledge can be transferred
from the source domain to the target domain. However, TagCDCF exploits
user/item-tag relations with only binary indicators. We applied TagCDCF as a
cross-domain benchmark to test our idea of exploiting domain-specific tags for
promoting knowledge transfer.
GTagCDCF (Shi et al., 2013a) : General tag-induced cross-domain collaborative
filtering is proposed to improve TagCDCF by taking the tagging frequency into
account. It is able to capture more information represented by tags and handle
multi-domain cases. In addition, GTagCDCF does not rely on the computation
of cross-domain similarity. Similar to TagCDCF, GTagCDCF only exploits
overlapping tags to connect multiple domains.
TMT (Fang et al., 2015): Cross-domain recommendation via tag matrix transfer
is a recently proposed model, which aims to establish a tag co-occurrence pattern
91
4.5 Experiments
from tag collections of both source and target domains, and it transfers knowledge
across domain via the leaned pattern. TMT is similar to our proposed model in
exploiting more types of tags, which include both overlapping and domain-specific
tags. However, instead of assuming sharing a common pattern between domains,
our model aims to cluster domain-specific tags as features to profile users and
items so that more implicit domain links can be discovered as bridges to transfer
knowledge.
4.5.2.2 Evaluation Metric
In datasets with explicit rating scores, such as ML-10m and LT, our task is to
predict a user preference score. To be consistent with existing work in evaluating
rating prediction, we adopt mean absolute error (MAE) and root mean square
error (RMSE) as the evaluation metric. The definitions of MAE and RMSE are
shown below:
MAE =∑
i,j∈TE
|rij − rij||TE|
RMSE =√√√√ ∑
i,j∈TE
(rij − rij)2
|TE|(4.11)
where rij denotes the predicted rating score user i give to item j, rij is the
corresponding ground truth. TE denotes the ratings needed to be predicted in
the test set and |TE| denotes the number of test cases. A lower MAE or RMSE
means a better prediction performance.
In dataset with implicit feedback, such as FM, our task is to provide user a
ranking list with limited items. In this task, we adopt Normalized Discounted
92
4.5 Experiments
Cumulative Gain (NDCG@k) to evaluate the ranking performance. First, for each
test user u, the DCG over first recommended k items is defined as:
DCG@k =k∑
i=1
2relui−1
log2(i + 1) (4.12)
where relui denotes the ranking score of recommended item at position i for user u.
The NDCG@k is the normalized version of DCG@k and averaged over all test
users, as defined by:
NDCG@k = 1N
N∑u=1
DCG@k
IDCG@k(4.13)
where IDCG@k is the DCG of the ideal ranking order, i.e., the ranking order based
on the actual ratings in test set. Higher values of NDCG@k are more desirable as
they indicate that the user favored items in their predicted lists.
4.5.2.3 Experimental Protocol
We examine the compared models for cross-domain recommendation task. For
that, we have ML vs LT, LT vs ML, FM vs ML, FM vs LT, LT vs FM and ML
vs FM as different kinds of related domains (the former is treated as the source
domain and the latter as target domain).
For each pair of domain, all the ratings and tag assignments from source
domain were applied as training data, while for each user in the target domain,
we randomly selected 20% of his/her ratings together with corresponding tag
assignments as the test data for evaluation, and the remaining ratings and tags
were combined with source domain ratings and tags to build training model. We
repeated each experiment 10 times and reported the average results.
93
4.6 Parameter Analysis
For fair comparison, we set a uniform size of latent factors for all methods.
The impact of latent factors on recommendation performance is further studied
in Section 4.7. In our implementation, the maximum iteration number was set to
100, and we adopted the best parameters reported in the corresponding papers to
implement each benchmark method.
For CTagCDR, the regularization parameter λβ was tuned to be 0.01 for
rating prediction task and 0.1 for ranking task, while the selection of other
parameters λu, λv, λα is described in the following section.
4.6 Parameter Analysis
There are two main outputs from the first three steps in CTagCDR: the inter-
domain and the intra-domain correlations (see Figure 4.1). They are learned
from tagging relations and are incorporated into matrix factorization to improve
recommendation performance. In this section, we will study the role of those
two tag-induced structures separately and describe the hyper-parameter tuning
process in controlling their contributions. For the sake of simplicity, we fixed
latent factors of CTagCDR model to 10 in this experiment.
To analyze the impact of the inter-domain correlations, we first set λα = 0,
which denotes no structural constraint is considered within individual domains.
The trade-off parameters λu and λv reflect the influence of cross-domain user-
to-user and item-to-item similarities, respectively, in regularizing mutual matrix
factorization. We adjusted the values of two trade-off parameters through a grid
search, and measured the recommendation performance in terms of RMSE on
the ML, LT datasets and NDCG@10 on the FM dataset. Due to limited space,
94
4.7 Impact of latent factors
we only presented LT vs ML, ML vs LT and ML vs FM as examples to describe
the parameter tuning process, the results are shown in Figure 4.3. From the
observations, we can find that CTagCDR achieved good results when λu and λv
moved over a wide range. This also indicates that CTagCDR model is not intended
to fall to the local optimum when searching for global solutions.
By adopting the optimal parameters for controlling the contribution from
the inter-domain correlation, we began to investigate the impact of intra-domain
correlations on recommendation performance. We varied the value of λα within
the range of {0.001, 0.01, 0.1, 1, 10} and showed the results in Figure 4.4. Due
to different tag configurations in all three cases, CTagCDR achieved the best
performance with different λα.
Next, by adopting the optimal parameters, we will investigate the effect
of latent factors and ranking position of ranking list on the recommendation
performance, and later make comprehensive comparison with all benchmarks.
4.7 Impact of latent factors
To study the impact of latent factors on the recommendation performance, we
evaluated the factors of [5, 10, 15, 20]. Figure 4.5(a) to Figure 4.5(e) show the
performance of RMSE and NDCG@10 with respect to the number of latent factors.
Based on the results in Figure 4.5, we can find that CTagCDR achieves the
best performance on five pairs of domains and is comparable to the best in the case
of LT vs FM when the number of latent factors is set to 10, outperforming other
state-of-the-art approaches by a large margin. However, we also notice that the
performance of CTagCDR degrades when increasing the number of latent factors
95
4.7 Impact of latent factors
(a) LT vs ML
(b) ML vs LT
(c) ML vs FM
Fig. 4.3 Impact of λu and λv on the recommendation performance of CTagCDR
96
4.7 Impact of latent factors
(a) LT vs ML
(b) ML vs LT
(c) ML vs FM
Fig. 4.4 Impact of λα on the recommendation performance of CTagCDR
97
4.8 Sensitivity analysis on Top-k Recommendation
to some extent. As widely investigated in the previous works, larger latent factors
may cause overfitting and result in high computational complexity. Therefore,
we set latent factors to 10 for the following experiments considering the trade-off
between recommendation performance and computation cost.
4.8 Sensitivity analysis on Top-k Recommenda-
tion
To determine whether item ranking performance is sensitive to the ranking position
k of ranking list, we conducted another group of experiment by changing k from
1 to 10. Figure 4.6 shows the ranking performance of all methods in terms of
NDCG@k.
98
4.8 Sensitivity analysis on Top-k Recommendation
(a) LT vs ML
(b) ML vs LT
Fig. 4.5 Performance of RMSE and NDCG@10 on LT vs ML and ML vs LT w.r.t.the number of latent factors 99
4.8 Sensitivity analysis on Top-k Recommendation
(c) FM vs LT
(d) FM vs ML
Fig. 4.5 Performance of RMSE and NDCG@10 on FM vs LT and FM vs ML w.r.t.the number of latent factors 100
4.8 Sensitivity analysis on Top-k Recommendation
(e) LT vs FM
(f) ML vs FM
Fig. 4.5 Performance of RMSE and NDCG@10 on LT vs FM and ML vs FMw.r.t.the number of latent factors
In Figure 4.6, we can find all the compared methods perform in the same trend
with respect to small variations in the value of k. That is, the recommendation
101
4.8 Sensitivity analysis on Top-k Recommendation
(a) ML vs FM
(b) LT vs FM
Fig. 4.6 Performance of NDCG@k w.r.t. the ranking position k of ranking list
102
4.9 Performance Comparison
performance will gradually get improved when examining from top-1 item to
top-10 items in a recommendation list. The results of top-k recommendation also
highlights our method’s superiority in item ranking task.
To be consistent with literature in selecting a proper size of ranking list, we
truncated the ranking list at 10 and applied NDCG@10 as a main metric for
evaluating ranking performance.
4.9 Performance Comparison
Table 4.3 shows the performance of CTagCDR and the other approaches over
6 different domain pairs. The overall results demonstrate that CTagCDR is an
effective recommendation approach in both rating prediction and item ranking
tasks. For rating prediction task, first in ML vs LT, we can see that CTagCDR
outperforms second best method GTagCDCF by a large margin, the average
improvement is up to 12.15% in terms of MAE and 10.99% in terms of RMSE.
Second in FM vs LT, the improvement over TMT is 12.39% in MAE and 14.17%
in RMSE, respectively. Lastly in FM vs ML, CTagCDR achieves relatively small
improvement over TMT by 1.14% in MAE and 0.81% in RMSE. With regards to
item ranking task, CTagCDR outperforms TMT by 0.4% in terms of NDCG@10 in
ML vs FM. We also noticed that in the cases of LT vs ML and LT vs FM, CTagCDR
slightly underperforms TMT. The possible explanation for underperformance is
that, the insufficient tagging data in source domain LT is not able to share enough
overlapping tags with target domain. Since CTagCDR groups domain-specific tags
based on overlapping tags and utilizes co-occurrence pattern, less reliable domain
connection built by overlapping tags will introduce noise into the clustering of
103
4.9P
erformance
Com
parison
Table 4.3 Overall performance on six domain pairs
ML→ LT LT→ ML FM→ LT FM→ ML LT→ FM ML→ FM
MAE RMSE MAE RMSE MAE RMSE MAE RMSE NDCG@10 NDCG@10
PMF 0.9573 1.1962 1.0343 1.315 0.9526 1.1863 1.0343 1.315 0.886 0.738
TagiCoFi 1.2037 1.4292 1.2399 1.4684 1.1841 1.4106 1.2399 1.4684 0.884 0.757
TagCDCF 1.0404 1.3714 1.1017 1.4346 1.1555 1.5098 1.1559 1.5105 0.898 0.740
GTagCDCF 0.8092 1.0043 0.8281 1.0535 0.8629 1.0688 0.844 1.0721 0.902 0.760
TMT 0.8109 1.0401 0.7106 0.9512 0.8041 1.0315 0.7127 0.9524 0.884 0.761
CTagCDR 0.7108 0.8939 0.7262 0.9867 0.7045 0.8853 0.7046 0.9447 0.902 0.764
104
4.9 Performance Comparison
domain-specific tags, which will further harm cross-domain similarity calculation
and mislead joint matrix factorization. The overall results in Table 4.3 help to
address question Q1, we can conclude that CTagCDR is superior to the rest of
competing methods for most datasets.
For the single domain recommendation approaches, to our surprise, PMF
outperformed TagiCoFi in both LT and ML datasets, even though TagiCoFi takes
tagging information into account to improve recommendation performance. One
possible explanation to interpret underperformance of TagiCoFi is that, fails to
meet model complexity with sufficient latent factors when adding tag-induced
user similarity as regularization into matrix factorization. From Figure 4.5, we
observed that the performance of TagCoFi improves with the increase of the
number of latent factors.
For the tag-based cross-domain recommendation approaches, apart from
TagCDCF, they all outperform the single domain recommendation methods,
showing the success of transferring knowledge from relevant domain to compen-
sate data sparsity in target domain. TagCDCF exploits binary tag information
and underperforms PMF, which indicates that loose domain coupling is possible
to result in negative knowledge transfer and thus deteriorates recommendation
performance. We also observed same result as reported in (Shi et al., 2013a)
that the performance of GTagCDCF is superior to TagCDCF, providing insight
that modeling tagging frequency is more helpful for improving recommendation
quality. In addition to exploiting shared tags only, TMT and CTagCDR integrate
more types of tags into matrix factorization and both show strong performance.
The difference lies in the way of modeling tagging information. TMT builds
a tag-occurrence matrix from mixed tags of different domains and transfers it
105
4.10 Performance under Different Sparsity Level
as a shared pattern across domains, while CTagCDR clusters domain-specific
tags as features to augment cross-domain connections so that more knowledge
transfer bridge can be built. CTagCDR consistently outperforms TMT on most
pairs of domains, admitting the effectiveness of building independent intra- and
inter-domain correlations in regularizing knowledge transfer.
To address question Q2, we specifically focused on the comparison between
CTagCDR and TagiCoFi for two reasons. First, both models integrate tagging
information into matrix factorization to improve recommendation performance.
Second, they both establish intra-domain correlation by learning a user-user simi-
larity matrix from user tagging behaviors in the single domain, but CTagCDR
additionally connects two different domains for knowledge transfer by exploit-
ing tags from both domains. The outperformance of CTagCDR illustrates the
effectiveness of integrating tag-induced intra-domain correlation for promoting
knowledge transfer.
With regard to question Q3, we expected to find a positive role for domain-
specific tags in strengthening cross-domain connections. This conclusion can be
confirmed by the multiple comparisons between CTagCDR and TagCDCF, and
between CTagCDR and GTagCDCF.
4.10 Performance under Different Sparsity Level
We further evaluate how different components (shared tag induced inter-domain
correlation, specific tag cluster induced inter-domain correlation, tag induced
intra-domain correlation) contribute to the recommendation performance when
handling different data sparsity. In specific, we compare the performance of four
106
4.10 Performance under Different Sparsity Level
Fig. 4.7 Change of recommendation performance on ML vs LT during the incrementof train data size
relevant models to check their robustness and behaviors under different data
sparsity configurations. These four relevant models are as follows: base model is
a pure TagCDCF model but replacing binary information with tf*idf value in the
tagging matrix; base+intra further adds tag induced intra-domain correlations to
the base model; While base+inter model only adds domain-specific tags induced
inter-domain correlation to the base model and base+inter+intra refers to the
model that add all above tag-induced correlations.
We chose ML vs LT as example to evaluate these models by incrementally
increasing the training data in LT from 20% to 80% with step size 20%. The
results are shown in Figure 4.7. Note that in our experiment setting each rating
score is accompanied by at least one tag assignment. If we decrease the rating
data, the size of tagging data is reduced correspondingly. This helps to simulate
sparse conditions in both rating and tagging data.
107
4.11 Summary
As we see in Figure 4.7, the recommendation performance, which is evaluated
in terms of RMSE, will gradually get improved when we incrementally increase
the size of training data. In addition, base+inter model outperforms base model
in all sparsity level, admitting the effectiveness of integrating contributions from
domain-specific tags in enhancing domain connection. This advantage will be
significant when more training data is given since more overlapping tags are
shared by both domains. To our surprise, compared to base and base+inter
models, for base+intra model, tag-induced intra-domain correlation is more
helpful in promoting knowledge transfer for improving recommendation perfor-
mance. This can be explained by the fact that compact structural constraint
will automatically group similar users and items in each individual domain to-
gether. As a result, knowledge can be transferred between group of users and
items, which avoids introducing noise in the individual level. We also noticed that
base+inter+intra model performs slightly better when comparing with compet-
itive model base+intra. This observation shows to some extent that adding the
contribution from domain-specific tags is effective to promote knowledge transfer.
However, how to model domain-specific tags to avoid introducing noise remains a
challenging problem for the future study.
4.11 Summary
In order to explore the complementary role of different types of tags in linking
disjoint domains, this chapter proposes a complete tag-induced cross-domain
recommendation model, called CTagCDR, which infers inter- and intra-domain
correlations from tags as structural knowledge to regularize joint matrix factor-
108
4.11 Summary
ization. Compared to existing tag-based cross-domain recommendation models,
CTagCDR is able to capture the complete information encoded in both overlap-
ping and domain specific tags. The experimental results on three public datasets
and with five state-of-the-art baseline approaches demonstrate that CTagCDR
performs well in both rating prediction and item recommendation tasks and
can effectively improve recommendation performance when rating data in target
domain is sparse.
109
Chapter 5
Exploiting Tag Semantic for
Cross-domain Recommendation
5.1 Introduction
Despite the success achieved in Chapter 3 by exploiting abundant domain specific
tags to increase domain correlation, and in Chapter 4 by taking advantage of
different types of tags to infer structural constrains for regularizing knowledge
transfer, there are still some limitations in above methods. Overall, we apply bag-
of-words and utilize tags as features to build user and item profiles. The similarity
between cross-domain users or cross-domain items is solely based on lexical
similarity of tags. However, the uncontrolled vocabularies has resulted numerous
ambiguous, redundant and non-identical tags. If two tags are semantically related
but use different words, above methods may not consider those two tags to be
similar. In addition, each tag used in making profile is treated independently.
110
5.1 Introduction
Therefore, the context of tag distribution on a user or an item side has not been
preserved.
To address aforementioned problems, in this chapter a new tag semantic-
boosted cross domain recommendation algorithm, called TSCDR , is proposed to
improve cross-domain recommendation performance by considering tag semantics.
First, word2vec (Mikolov et al., 2013) technique is applied on a data structure
designed to encode tagging context. The output is a latent representation of
tags to reflect the semantic similarity among tags. Based on the learned tag
representation, then k-means clustering is used to merge those non-identical but
semantically equivalent tags into the same group. The resulted tag clusters are
further exploited as a joint embedding space to span across domains. By mapping
users and items to the same embedding space, we can identify more accurate cross-
domain user-to-user similarity and item-to-item similarity to regularize knowledge
transfer. Extensive experiments conducted on three public datasets with different
sparsity settings have justified the promising performance of TSCDR on top-N
recommendation task.
The rest of this chapter is organized as follows: Section 2 introduces the
preliminary knowledge for understanding this chapter. Section 4 is dedicated
to the presentation of the proposed model. In Section 5 extensive experiments
are performed to demonstrate the effectiveness of our approach. Conclusions are
summarized in Section 6.
111
5.2 Preliminaries
5.2 Preliminaries
This section begins with some frequently used notations, followed by the formu-
lation of the recommendation problem this study aims to address. An overview
of the tag-induced cross-domain collaborative filtering model, which forms the
building block for our proposed approach concludes this section.
5.2.1 Notations and Problem Formulation
Bold uppercase letters, such as Z, denote matrices. The i-th row and the j-th
column are denoted as Zi∗ and Z∗j, respectively. The (i, j)-th element of matrix
Z is denoted as Zi,j. Given source domain s and a target domain τ , in a single
domain π (π ∈ {s, τ}), there are nπ users Uπ and mπ items Iπ. The interaction
between users and items is represented by the user-item rating matrix Rπ, where
rπui is the rating score given by user u to item i. A binary weight matrix IRπ masks
the missing entries in Rπ, where (IRπ)ij = 1 if Rπij is observed and (IRπ)ij = 0
otherwise. In addition to the rating data, there are lπ tags T π annotating items.
Aπ ⊆ Uπ × Iπ × T π is a set of tag assignments, where the element (u, i, t) denotes
the tag t has been attached to item i by user u. Some frequently used symbols
along with their definitions are summarized in Table 5.1.
Our cross-domain recommendation task is to estimate the unobserved rat-
ing scores in Rτ in order to rank items. Formally, the source domain Φs =
{U s, Is, Rs,As} contains a dense training set and the target domain Φτ =
{U τ , Iτ , Rτ ,Aτ } contains a sparse one. The goal is to learn a function f with pa-
rameters Θ that predicts the most likely rating rτui in the test set Φτ = {(u, i)|u ∈
112
5.2 Preliminaries
Table 5.1 Symbols and corresponding descriptions used in Chapter 5
Symbols Descriptions
π = {s, τ} domain indicator (s for source domain, τ for target domain)
u, i, t individual user u, item i and tag t
Uπ A set of users in domain π, and |Uπ| = nπ
Iπ A set of items in domain π, and |Iπ| = mπ
T π A set of tags in domain π, and |T π| = lπ
Aπ tag assignments in domain π, Aπ ⊆ Uπ × Iπ × T π
Rπ user-item rating matrix in domain π, Rπ ∈ Rnπ×mπ
IRπ indicator matrix for Rπ, IRπ ∈ Rnπ×mπ
rπui rating score given by user u on item i in domain π
rπui predicted rating score for user u on item i in domain π
d dimension size for latent factors
P π latent factor matrix for users in domain π, P π ∈ Rnπ×d
Qπ latent factor matrix for items in domain π, Qπ ∈ Rmπ×d
SU cross-domain user-user similarity matrix
SI cross-domain item-item similarity matrix
‖Z‖F Frobenius norm of matrix Z
λ regularization parameter for �2-norm
λu, λiregularization weights for cross-domain user similarityand cross-domain item similarity, respectively
U τ , i ∈ Iτ }. This recommendation task can be formulated as:
arg min(rτui − rui) = arg min(f(u, i|Φs, Φτ , Θ) − rui)
s.t. U s ∩ U τ = ∅
Is ∩ Iτ = ∅
(5.1)
113
5.2 Preliminaries
5.2.2 Tag-induced Cross Domain Collaborative Filtering
Model
The TagCDCF model (Shi et al., 2011) uses overlapping tags as common features to
model both user and item profiles, and infers the similarities between cross-domain
users and cross-domain items as prior knowledge to regularize the joint matrix
factorization. TagCDCF is formulated as a minimizing optimization problem:
L =12(Rs − IRs � (P s(Qs)�)
)2
+12(Rτ − IRτ � (P τ (Qτ )�)
)2
+λu
2(SU − ISU � (P s(P τ )�)
)2
+λi
2(SI − ISI � (Qs(Qτ )�)
)2
+λ(‖P s‖2F +‖Qs‖2
F +‖P τ ‖2F +‖Qτ ‖2
F )
(5.2)
where � denotes the element-wise multiply operation. Knowledge in the source
domain is transferred to the target domain through two key components: the
similarity matrix between cross-domain users SU and the similarity matrix between
cross-domain items SI . TagCDCF only considers overlapping tags, not non-
overlapping tags, when building domain connections, and it does not exploit the
semantic information in tags. Hence, TagCDCF does not achieve optimal results
because SU and SI are inaccurate. There are several ways to infer SU and SI by
integrating the semantic information in tags. These are introduced next.
114
5.3 Tag Semantically-boosted Cross-domain Recommendation
Fig. 5.1 An example of ambiguous, redundant and non-identical but semanticallyequivalent tags
5.3 Tag Semantically-boosted Cross-domain Rec-
ommendation
The uncontrolled vocabularies used by users when assigning a tag has resulted
numerous ambiguous, redundant and non-identical tags (see Figure 5.1), therefore
we propose to utilize semantics to correlate tags to eliminate noise in tagging
data. The goal is to infer more accurate SU and SI on this basis. We begin by
introducing two simple baseline solutions. The details of our approach follow.
The most intuitive idea is to apply topic modelling technique on tag collection
consisting of tags from both source and target domains to learn joint topics, as
shown in Figure 5.2a. Using this method, cross-domain users and items can be
115
5.3 Tag Semantically-boosted Cross-domain Recommendation
linked by mapping tag-based user and item profiles to the joint topic space. This
model is called Joint Topic Mining and details are introduced in Subsection 5.3.1
Joint topic mining connects different domains through a subset of joint topics.
However, it may be difficult to find enough reliable joint topics to fully represent
tagging behaviors of different domains. Therefore, our second model performs
topic modelling on individual domains separately to ensure the accuracy of the
tag topics. The challenge lies in how to align topics in different domains. We
propose adding a link between two topic nodes if the same tags are distributed
across two topics, as shown in Figure 5.2b. As such, an implicit path is inferred
and built to connect cross-domain users and items. This method is called Topic
Alignment, and the details are described in Subsection 5.3.2.
The topic alignment method is able to reveal distinctive topics in each domain,
but its performance suffers from two drawbacks. First, building topic links by
identical tags tends to result in sparse connections, especially for the topics
unique to each domain. Second, this method treats each tag independently, while
ignoring the surrounding tags used by the same user or annotated on the same
item. Therefore, our model also considers the usage context in which a tag is
used when extracting the tag semantic information. The details are presented in
Subsection 5.3.3.
5.3.1 Joint Topic Mining
The first step in determining a set of joint topics for two heterogeneous domains is
to construct a tag corpus by combining tags from both source and target domains.
This is specifically defined as F = T s ∪ T t, where |G| = g denotes the number
116
5.3 Tag Semantically-boosted Cross-domain Recommendation
(a) Joint Topic Mining
(b) Topic Alignment
Fig. 5.2 Graphical illustration of joint topic mining and topic alignment
117
5.3 Tag Semantically-boosted Cross-domain Recommendation
of unique tags in both domains. Based on the tag corpus, a tag weight vector
xsu = {ws
u1, wsu2, · · · , ws
ug} is created for source domain user u to represent his or
her preferences on all candidate tags. The tag weight wsut is defined as follows:
wsut =
⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩f s(u, t) if tag t was assigned by user u
0 otherwise(5.3)
where f s(u, t) denotes the frequency that source domain user u used tag t to label
items in Is. Given the source domain user-tag weight vector, a ns × g user-tag
matrix Xs is created:
Xs =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
xs1
...
xsu
...
xsns
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
=
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
ws11, ws
12, · · · , ws1g
... ... ... ...
wsu1, ws
u2, · · · , wsug
... ... ... ...
wsns1, ws
ns2, · · · , wsnsg
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
(5.4)
A g-dimensional tag weight vector ysi = {δs
i1, δsi2, · · · , δs
ig} can also be defined
for the source domain item i. Then, each tag weight δsit is calculated as follows:
δsit =
⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩f s(i, t) if tag t was labeled on item i
0 otherwise(5.5)
118
5.3 Tag Semantically-boosted Cross-domain Recommendation
where f s(i, t) denotes the frequency that tag t was attached on source domain
item i by users in U s. An ms × f item-tag matrix Y s is created for all items in
the source domain,
Y s =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
ys1
...
ysi
...
ysms
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
=
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
δs11, δs
12, · · · , δs1g
... ... ... ...
δsi1, δs
i2, · · · , δsig
... ... ... ...
δsms1, δs
ms2, · · · , δsmsg
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
(5.6)
which represents the tag distributions on items in source domain. Similarly, an
nτ × g matrix Xτ is created for target domain to denote the relationships between
its users and the tag corpus, and an mτ × g matrix Y τ is created to denote the
relationship between the items and the tag corpus.
Next, we concatenated tagging matrices of both domains to construct mixture
tagging matrices as:
X =
⎡⎢⎢⎢⎢⎢⎢⎣Xs
Xτ
⎤⎥⎥⎥⎥⎥⎥⎦ Y =
⎡⎢⎢⎢⎢⎢⎢⎣Y s
Y τ
⎤⎥⎥⎥⎥⎥⎥⎦ (5.7)
where X ∈ R(ns+nτ )×g and Y ∈ R
(ms+mτ )×g. Then we applied topic modelling
techniques to learn latent features for cross-domain users and items in the form of
topics. We chose Latent Dirichlet Allocation (LDA) (Hoffman et al., 2010) as our
119
5.3 Tag Semantically-boosted Cross-domain Recommendation
basic topic modelling method because of its significant performance in many NLP
tasks. In the problem at hand, non-overlapping tags are considered to be related
in both domains when they co-occur with the same overlapping tags. Further,
the more frequent the co-occurrence, the stronger the relationship among the
non-overlapping tags.
The topic distributions of tags on users can be learned by feeding the combined
matrix X to the LDA model, which is represented as a user-topic matrix N =⎡⎢⎢⎢⎢⎢⎢⎣N s
N τ
⎤⎥⎥⎥⎥⎥⎥⎦, where N ∈ R(ns+nτ )×k and k is the number of joint topics. N is divided
into two parts including source domain user-topic matrix N s ∈ Rns×k and target
domain user-topic matrix N τ ∈ Rnτ ×k. Similarly, the item-topic matrix M =⎡⎢⎢⎢⎢⎢⎢⎣
M s
M τ
⎤⎥⎥⎥⎥⎥⎥⎦ represents the item topic distributions, which can be learned by feeding
Y into the LDA model, where M s ∈ Rms×k denotes source domain item-topic
matrix and M τ ∈ Rnτ ×k denotes target domain item-topic matrix.
The similarity between a source domain user/item and a target domain
user/item is measured by mapping the cross-domain users and cross-domain
items into the same topic space and using the cosine similarity between the latent
representations of their profiles over joint topics. This is formally defined as:
sim(zs, zτ ) = zs · zτ
|zs||zτ | (5.8)
where zs is a row vector from N s (M s), and correspondingly zτ is a row vector
from N τ (M τ ). In this way, we obtain a refined cross-domain user similarity
120
5.3 Tag Semantically-boosted Cross-domain Recommendation
matrix SU and a cross-domain item similarity matrix SI as evaluated by the dense
topics rather than sparse tagging data.
5.3.2 Topic Alignment
The joint topic mining method relies on general concepts, such as joint topics,
to link different domains. However, as previously mentioned, this method is not
able to identify unique discriminative topics expressed by the domain specific tags.
Therefore, rather than building a joint topic space, we propose extracting topics
in each domain separately. In case of diverse topics belonging to multiple domains,
the challenge is finding matching relationships between the topics.
We chose a topic matching model (Tang et al., 2012) that combines the topic
model with random walk to explore the implicit relationships among different
objects (e.g. authors, papers, publication venues) in cross-domain research col-
laboration. However, the tags in our problem are different from the keywords in
scientific papers, and the extracted topics do not represent collaborative depen-
dencies between users or between users and items. Therefore, we again applied
basic LDA model to estimate the topic distributions of tags associated with users
and items. For simplicity, assume there are ks topics in source domain and kτ
topics in target domain.
To build a path from users in the source domain to the users in the target
domain, the LDA model obtains two sets of topic distributions for users, which are
extracted from both domains respectively. The topic distribution from the source
domain is denoted as KsU ∈ R
ns×ks ; KτU ∈ R
nτ ×kτ denotes the distribution in
target domain. Then, the user-topic graphs generated in LDA model are extended
121
5.3 Tag Semantically-boosted Cross-domain Recommendation
by linking the topics nodes from both domains. The relevance between user topic
nodes φsj and φτ
j′ is defined as:
Rel(φsj , φτ
j′ ) =
⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩|P(φs
j)∩P(φτ
j′ )|
|P(φsj)∪P(φτ
j′ )| if P(φs
j) ∩ P(φτj
′ ) �= ∅
0 otherwise(5.9)
where P(φsj) denotes the subset of tags distributed over source domain user topic
φsj , and P(φτ
j′ ) denotes the corresponding tag set distributed over target domain
user topic φτj
′ . This generates the topic linkage matrix LU ∈ Rks×kτ , which
represents connections between the user topics in both domains.
Based on the new graph representation, a random walk with restart algo-
rithm (Tong et al., 2008) is used to suggest relevant user nodes in target domain
for a given user node in source domain, i.e.,
α(t+1) = (1 − ε)T αt + εβ (5.10)
where αt ∈ Rns+nτ is a vector denoting the probability that random walk arrives
at corresponding user nodes in step t, β ∈ Rns+nτ is a vector of 0, except that
the element corresponding to the start node is set to 1. ε denotes a probability
that the walk will return to the start node in each step. T defines the transition
probability of the random walk, and T =
⎡⎢⎢⎢⎢⎢⎢⎣X I
I� X�
⎤⎥⎥⎥⎥⎥⎥⎦, where X = KsULU (Kτ
U)�
and I is an identity matrix. Since we are only interested in finding relevant users
in the target domain for a given user in the source domain, the part corresponding
122
5.3 Tag Semantically-boosted Cross-domain Recommendation
to the users in the target domain in α is sectioned off to compose a row in SU
when a stable state is reached in Equation 5.10.
Similarly, the above processes are also applied to items to extend the item-topic
graphs and update cross-domain item similarity matrix SI .
5.3.3 Embedding space Learning
While the topic alignment method is able to correlate tags between domains
through topics, it is not able to identify and match tags that share similar
semantic meanings. Inspired by word2vec (Mikolov et al., 2013), which attempts
to map words and phrases to a low dimensional continuous vector space to capture
semantic and syntactic information in words, the tagging data is projected into
the same one unified framework of neutral word embedding to learn the semantic
relationships between the tags.
There are two main similarities between the text data and the tagging data:
(1) The tagging data is organized in a similar way to the text data in word2vec.
In the tagging data, user tag specific item, and a list of relevant tags can be
collected from each user-item pair. These user-item pairs can be mapped
as documents in word2vec, and the distributed tags associated with the
user-item pairs are equivalent to the words in word2vec.
(2) The tagging data has a similar contextual setting to words in the text data.
Tags are text-based features, and a set of tags used in the same user-item pair
is used as the context for finding relationships between a current tag and its
surrounding tags. Further, tags with the same context are interchangeable
123
5.3 Tag Semantically-boosted Cross-domain Recommendation
--
--
….
, , , ,
, , , , ,
….
....
....
Fig. 5.3 Modelling tagging data for word2vec. The tag marked by red color denotesoverlapping tag in both domains.
if the time they were generated is discarded. In other words, the order of
tags does not influence the prediction result.
In this case, we can apply word2vec on tagging data to produce tag embeddings.
Since the tag set associated with each user-item pair is usually small, we used
skip-gram model with negative sampling (SGNS) , which is more effective on
small dataset. In particular, we employed the word2vec technique implemented
in gensim toolbox 1 to process tagging data. To prepare input for word2vec,
<user-item-tag> triplet data was presented such that tags used on the same
user-item pair are collected as a list, and all the tag lists are presented together to
build the dictionary (corpus). The presentation of modelling tagging data as input
to word2vec is shown in Figure 5.3 . Within the word2vec model, we set vector
dimensionality to 300 and context window size to 5, the context window size
indicates how many tags around the target word are considered as context during1https://radimrehurek.com/gensim/
124
5.3 Tag Semantically-boosted Cross-domain Recommendation
training. Word2vec outputs continuous vector representations for the unique
tags in the dictionary, which can be used to decide which tag is semantically or
contextually closer to which.
Although word2vec can process the uniform-length tag vectors, the challenge
of our problem is to handle the user and item profiles with variable-length tags.
Hence, the individual tag vectors need to be transformed into a feature set that is
the same length for both cross-domain users and cross-domain items. One possible
way to accomplish this is to exploit tag clusters as new features to build user
and item profiles. Then, the k-means clustering method can be used to group
semantically related tags based on the learned tag vectors. We selected k-means
clustering technique for our approach because it is simple and computationally
efficient in terms of computational cost; However, other clustering methods could
also be applied. The only issue with k-means clustering is choosing the right
number of clusters. The effect of the amount of chosen clusters on the performance
of final recommendation is further discussed in Section 5.4.5.2.
Suppose there are θ tag clusters. A simple function converts the tag-based
user profiles into a bag-of-centroids. This works just like bag-of-words but uses
clusters instead. The function is defined as:
f(P(u)) = (p(c1|u), · · · , p(cj|u), · · · , p(cθ|u))
p(cj|u) =
⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩|P(u)∩P(cj)|∑θ
j|P(cj)| if P(u) ∩ P(cj) �= ∅
0 otherwise
(5.11)
125
5.3 Tag Semantically-boosted Cross-domain Recommendation
where p(cj|u) measures the probability that user u tends to use tags in cluster cj
based on his/her tagging history. P(u) denotes the tags used by user u and P(cj)
denotes the tags belonging to the cluster centroid cj.
Once the user profiles have been refined into tag clusters, we can compute SU
which reflects user relationships according to their tagging behaviors. Similarly,
we can apply Equation 5.11 to build profiles for cross-domain items and calculate
SI .
5.3.4 Optimization
To minimize the objective function in Equation 5.1, we apply gradient descent to
alternately update P s, Qs, P τ , Qτ . The derivative of each variable is computed
as:∂L
∂P s=(Rs − IRs � (P s(Qs)�)
)Qs
+λu
(SU − ISU � (P s(P τ )�)
)P τ
+λP s
(5.12)
∂L∂Qs
=(Rs − IRs � (P s(Qs)�)
)�P s
+λu
(SI − ISI � (Qs(Qτ )�)
)Qτ
+λQs
(5.13)
∂L∂P τ
=(Rτ − IRτ � (P τ (Qτ )�)
)Qτ
+λv
(SU − ISU � (P s(P τ )�)
)�P s
+λP τ
(5.14)
126
5.4 Experiments and analysis
∂L∂Qτ
=(Rτ − IRτ � (P τ (Qτ )�)
)�P τ
+λv
(SI − ISI � (Qs(Qτ )�)
)�Qs
+λQτ
(5.15)
Our approach predicts ratings in the target domain by Rτ = P τ (Qτ )� once
optimal parameters have been learned. To evaluate the performance of top-N
recommendation, unobserved items are ranked according to the predicted ratings
for each test user.
5.4 Experiments and analysis
This section presents a series of experiments that compare the performance of
TSCDR with other state-of-the-art single domain and cross-domain recommenda-
tion approaches.
5.4.1 Dataset
We experimented on three publicly accessible datasets: MovieLens 10M dataset2,
LibraryThing dataset3 and LastFM dataset4. To the best of our knowledge, these
three datasets are the only datasets that include both rating data and tagging
data.
MovieLens 10M (ML) is a movie rating dataset containing 95,580 tags and
over 10 million ratings with a scale of 0.5 to 5, provided by 71,567 users on 10,681
movies.2http://www.grouplens.org/node/733http://www.macle.nl/tud/LT/4http://grouplens.org/datasets/hetrec-2011/
127
5.4 Experiments and analysis
LibraryThing (LT) is a book rating dataset that contains over 700,000
ratings (on a scale of 1-5) and 2 million tags by 7,564 users on 39,515 books.
As highlighted in (Fernández-Tobías and Cantador, 2014), we also observed
inconsistent rating scores on the same user-book pairs in the original LT dataset.
To over these inconsistencies, we corrected the ratings by duplicating the rating
in the first record and placing duplicate pairs in the original order.
LastFM (FM) is a user-song dataset released by HetRec in 2011, which
contains social networking, tagging, and music listening count from a set of 2000
users sampled from the Last.fm online music system. Unlike the other two datasets,
which contain explicit rating scores, the FM dataset contains the amount of times
a song was listened to, which is considered to be a type of implicit feedback.
5.4.2 Experiment Setup
Following the strategy adopted in (Fernández-Tobías and Cantador, 2014) for
preprocessing the dataset, we filtered the original dataset by removing records
without either rating scores or tags. To reduce redundancy and ambiguity in the
tagging data, we further stemmed the tags to remove meaningless tags with only
numbers or non-alphabetic characters.
Note that the scope of this study focuses on learning the relatedness between
heterogeneous domains through tags. Therefore, we used the constraints in (Gedikli
and Jannach, 2013) to vary the quality of tag information and investigate its
effect on recommendation performance. Specifically, we created two different
versions of each dataset by adjusting the threshold of the constraints listed in
Table 5.2. The resulting dataset variations are shown in Table 5.3. Based on
128
5.4 Experiments and analysis
Table 5.2 The tag filtering quality constraints (Gedikli and Jannach, 2013)
Constraint Description
Min Users/Tag (U/T) Minimum number of users per tag.
Min Items/Tag (I/T) Minimum number of items per tag.
Min Tags/User (T/U) Minimum number of tags a user has specified.
Min Items/User (I/U) Minimum number of items rated by a user.
Min Tags/Item (T/I) Minimum number of tags applied to an item.
Min Users/Item (U/I) Minimum number of users that rated an item.
sparsity conditions, we generated the following six related domain pairs to conduct
experiments: ML-high vs LT-high, ML-high vs FM-high, ML-high vs LT-low,
ML-high vs FM-low, LT-high vs ML-low, LT-high vs FM-low. In each domain
pair, the former was considered to be the source domain, and the latter was
treated as the target domain.
To measure the performance of different approaches in top-N recommendation
task, we applied leave-one-out method for validation. For each user, we randomly
chose one of his/her interaction data as the test data with the remaining used
for training. Additionally, to tune the hyper-parameters for each baseline, we
randomly chose one interaction data for each user from the training data to
create the validation data. During the evaluation, we followed (He et al., 2017)
by generating negative items to estimate ranking performance. Specifically, we
ranked each user’s test item alongside with 100 negative items that had not been
unseen by the user. We run all experiments five times and reported the average
results.
129
5.4 Experiments and analysis
We set the latent factors to a uniform size for all methods for a fair comparison.
In our implementation, the size of latent factor is set to 20 as a trade-off between
computational cost and efficiency. The other methods were set to their best
parameters as reported in the corresponding literature.
We set the tag vector dimensionality to 100 and discarded tags that were
used by less than five user-item pairs so as to learn effective tag embeddings
with word2vect technique. Further, we defined the context of the current tag
by considering ten surrounding tags—five tags before and five tags after the
current tag. Selection of the regularization parameters λu and λi is discussed in
Subsection 5.4.5.1.
5.4.3 Evaluation Metrics
Hit ratio (HR) and normalized discounted cumulative gain (NDCG) were selected
as evaluation metrics to judge the quality of a ranked recommendation list.
HR measures whether the test item is present in the top-N recommendation
lists, which is defined as:
HR = #hits
#users(5.16)
where #hits counts the number of users whose test items are successfully recalled
in the top-N recommendation list and #users is the total number of test users.
NDCG penalizes the position of a test item in the recommendation list. It
assumes that the lower the position of a test item, the less useful it is to the user.
This metric is defined as:
NDCG = DCG
IDCGDCG =
N∑j=1
2relj − 1log2(j + 1) (5.17)
130
5.4E
xperiments
andanalysis
Table 5.3 Dataset Variations for ML, LT and FM
NameMinimum Constraints Resulting Dataset
U/T I/T T/U I/U T/I U/I Users Items Ratings Tags Sparsity
ML-high 3 4 11 12 7 5 183 531 6142 1076 93.7%
ML-low 1 2 5 6 3 2 369 2177 14735 3100 98.2%
LT-high 45 83 53 102 19 20 835 1768 83216 850 94.4%
LT-low 20 40 25 50 9 10 2897 10917 356430 1700 98.9%
FM-high 4 7 14 11 7 3 349 590 5386 478 97.4%
FM-low 2 3 7 5 3 1 747 3367 12641 1127 99.5%
131
5.4 Experiments and analysis
where relj is the relevance score of the test item at position j, and relj ∈ {0, 1}depending on whether the item appears in the top-N ranked list.
We truncated the recommendation list at 10 for each metric, which is a
commonly accepted size in the literature (He et al., 2017). However, we also
varied the length of the ranked list and tested how that affects recommendation
performance (see Subsection 5.4.5.4).
5.4.4 Baselines
The following methods were chosen as baselines to evaluate TSCDR:
ContextWalk (Bogers, 2010) is a graph-based approach. It models ternary
relationships <user-item-tag> in an undirected graph. Nodes represent users,
items and tags and the edges between the nodes represent their corresponding
relationships.
BPR (Rendle et al., 2009) is a personalized ranking approach that optimizes
matrix factorization with a pairwise loss function. It is a highly competitive
method for item recommendation task.
TagiCoFi (Zhen et al., 2009) is a tag-based single domain recommendation
approach. It learns the similarities between users based on tags, and then adds user
similarity matrix into PMF (Mnih and Salakhutdinov, 2008) model to regularize
matrix factorization.
TagCDCF (Shi et al., 2011) is a tag-induced cross-domain recommendation
approach. It links two disjoint domains through overlapping tags. Based on the
profiles defined with overlapping tags, it learns cross-domain user-to-user and
132
5.4 Experiments and analysis
item-to-item similarity matrices as prior knowledge to regularize the joint matrix
factorization.
In addition to above state-of-the-art methods, we also experimented with two
schemes proposed in this chapter:
joint topic mining (JTM) , which uses joint topics to link users and items
across domains so that cross-domain similarity can be calculated based on topics
rather than tags, as described in Subsection 5.3.1; and
topic alignment (TA) , which combines topic modelling with random walk
to learn implicit relationships among cross-domain users and among cross-domain
items, as described in Subsection 5.3.2.
5.4.5 Experiment Results and Analysis
Here, we illustrate the performance of TSCDR model from different perspectives.
5.4.5.1 The Effect of Regularization Parameters
Given the regularization parameters: λu and λi control the contributions of cross-
domain similarities in regularizing the joint matrix factorization, it is important
to test their impact on recommendation performance. Due to space limitations,
we have used LT-high vs ML-low as a representative example to show the
recommendation performance effected by λu and λi. The evaluation is measured
by HR@10 and NDCG@10.
We followed parameter tuning process in (Shi et al., 2011) by fixing λi = 0
first and adjusting the value of λu within the range of [0.001, 0.01, 0.1, 1] to study
the impact of λu, as shown in Figure 5.4a. TSCDR achieved the best result when
133
5.4 Experiments and analysis
(a) Effect of λu on LT-high VS ML-low
(b) Effect of λi on LT-high VS ML-low
Fig. 5.4 Performance of HR@10 and NDCG@10 w.r.t λu and λi
134
5.4 Experiments and analysis
Fig. 5.5 Performance of HR@10 and NDCG@10 w.r.t number of tag clusters
λu = 0.1. Given the value λu = 0.1, we then gradually varied the value of λi to test
its impact on recommendation performance. The results are shown in Figure 5.4b.
HR@10 and NDCG@10 reached the peak performance when λi = 0.1 and the
peak values in Figure 5.4b are higher than Figure 5.4a, indicating that both the
user and item similarities are playing an active role in transferring knowledge
across domains to improve recommendation performance.
From these two observations, we found that TSCDR achieves good results
when λu and λv vary over a wide range, suggesting that the TSCDR model does
not tend to fall to the local optimum when searching for global solutions.
135
5.4 Experiments and analysis
5.4.5.2 The Effect of Tag Clusters
One of the goals of this study is to explore the role of tag clusters as features to
differentiate users and items across domains. Again, we chose LT-high vs ML-low
as a representative example. We set λu = 0.1 and λi = 0.1 as optimal parameters.
Figure 5.5 shows the effect of varying the number of tag clusters.
It is clear that recommendation performance is correlated to the clustering
result of the tag embeddings generated by the word2vec technique. Setting a
smaller or larger number of clusters does not help to group similar tags, which
in turn contributes to fewer weights when distinguishing users and items across
domains. As Figure 5.5 shows, the best result was achieved with 4 tag clusters
as the number of clusters increasing from 2 to 10 because users and items are
partitioned more accurately according to the tags in their profiles. Additionally,
we found that recommendation performance does not change significantly with
a further increase in the number of clusters from 10 to 20. This fact indicates
that the model may be approaching the point of overfitting the training data with
such a large number of tag clusters.
5.4.5.3 Comparison with the Baselines
Table 5.4 shows the recommendation performance of TSCDR and other baselines
using HR@10 and NDCG@10.
TSCDR achieves the best recommendation performance with most domain
pairs in terms of both metrics. It outperformed the next best performing baseline
method BPR by a large margin (e.g., an 11.8% HR@10 and 23.1% NDCG@10
improvement for domain pair LT-high vs ML-low). However, we also noticed
136
5.4E
xperiments
andanalysis
Table 5.4 Comparison of TSCDR with other baselines
ML-high vs LT-high ML-high vs FM-high ML-high vs LT-low ML-high vs FM-low LT-high vs ML-low LT-high vs FM-low
HR@10 NDCG@10 HR@10 NDCG@10 HR@10 NDCG@10 HR@10 NDCG@10 HR@10 NDCG@10 HR@10 NDCG@10
ContextWalk 0.196 0.093 0.289 0.133 0.217 0.102 0.185 0.079 0.142 0.065 0.19 0.083
TagiCoFi 0.008 0.003 0.286 0.154 0.01 0.003 0.475 0.287 0.004 0.001 0.427 0.268
BPR 0.534 0.311 0.382 0.209 0.527 0.307 0.457 0.268 0.297 0.156 0.431 0.257
TagCDCF 0.141 0.068 0.079 0.034 0.136 0.068 0.05 0.021 0.076 0.035 0.059 0.027
JTM 0.194 0.113 0.08 0.038 0.181 0.108 0.038 0.018 0.171 0.077 0.125 0.066
TA 0.14 0.076 0.07 0.037 0.143 0.068 0.044 0.02 0.143 0.069 0.07 0.032
TSCDR 0.478 0.292 0.527 0.327 0.48 0.294 0.617 0.438 0.332 0.192 0.597 0.417
137
5.4 Experiments and analysis
that BPR performed better than TSCDR with the domain pairs ML-high vs
LT-high and ML-high vs LT-low, where the source and target domains share
the same format of explicit feedbacks. This may be due to the size of dataset in
the source domain, which is much smaller than that of the target domain. Even
though the rating ratio of the source domain is more dense. Without sufficient
data in the source domain, there is not enough knowledge to transfer to the
target domain to improve recommendation performance. However, TSCDR still
outperformed the other cross-domain and single-domain recommendation methods
in these cases, indicating the effectiveness of building more knowledge transfer
bridges by correlating tags with deep semantic information.
JTM and TA performed slightly better than TagCDCF with most domain
pairs. The resulting abstract features in the form of topics are much more
effective representations for personalized recommendation. However, maintaining
performance with different parameter settings remains as a challenge to be further
investigated.
5.4.5.4 The Impact of Recommendation List Size
To determine whether item ranking performance is sensitive to the size of a
recommendation list, we changed the size of the top-N lists from 10 to 50 in step
10. Figures 5.6 and 5.7 show the evaluation rankings of all methods in terms of
HR@N and NDCG@N.
Figures 5.6 and 5.7 show similar trends for all the methods when the value of
N changed. That is, performance gradually improves for the top 10 to the top
50 items in a recommendation list. JTM outperforms TagCDCF in all domain
pairs, indicating that modelling topics, not tags, is more effective in bridging the
138
5.4E
xperiments
andanalysis
(a) ML-high VS LT-high (b) ML-high VS FM-high (c) LT-high VS ML-low
(d) ML-high VS LT-low (e) ML-high VS FM-low (f) LT-high VS FM-low
Fig. 5.6 Performance of top-N recommendation in terms of HR@N where N ranges from 10 to 50
139
5.4E
xperiments
andanalysis
(a) ML-high VS LT-high (b) ML-high VS FM-high (c) LT-high VS ML-low
(d) ML-high VS LT-low (e) ML-high VS FM-low (f) LT-high VS FM-low
Fig. 5.7 Performance of top-N recommendation in terms of NDCG@N where N ranges from 10 to 50
140
5.5 Summary
domains. To our surprise, there was not a significant difference between TA and
TagCDCF in most cases. This may be because the sparse connection between
topic layers is only built at tag level. For most domain pairs, TSCDR showed a
consistent improvement over the baselines at different positions, highlighting the
superior performance of our method in top-N recommendation tasks.
5.5 Summary
In this chapter, we aim to exploit semantic information to correlate those se-
mantically equivalent but non-identical tags, so that a close relation between
heterogeneous domains can be established by linking tags of different domains
and the knowledge in source domain can be transferred to the target domain to
address data sparsity problem. To this end, we have proposed a new tag-based
cross-domain recommendation algorithm, namely TSCDR, which unifies word2vec
and matrix factorization in a simple, extensible framework. We devised a new
feature space spanning across disjoint domains by grouping semantically equivalent
tags. TSCDR exploits the learned tag clusters to infer a more accurate cross-
domain similarity and utilize it to regularize joint matrix factorization. Extensive
experiments have been conducted to justify the promising performance of TSCDR
in top-N recommendation task.
141
Chapter 6
Conclusions and future work
This chapter presents the conclusions derived from the entire thesis. In Section 6.1
the main contributions of this thesis are summarized, and in Section 6.2, some
possible research directions are provided for future work.
6.1 Conclusions
Cross-domain recommender system is a new research topic and have attracted much
attention in recent years due to its effectiveness in alleviating the data sparsity
problem in the recommender systems. In this thesis, one of the major challenges in
the development of cross-domain recommender systems has been addressed, which
is to automatically build a bridge (domain link) between the involved domains
for transferring knowledge. As a problem setting for two heterogeneous domains,
the correspondence between cross-domain users and between cross-domain items
is not provided. In this context, user-generated tags are studied to link different
domains explicitly. However, how to exploit tags in establishing correspondence
142
6.1 Conclusions
between heterogeneous domains for improving recommendation performance still
remains as an open challenge and needs to be investigated extensively. The main
contributions of this thesis are summarized as follows:
(1) The development of an enhanced tag-induced cross-domain collaborative
filtering model (Chapter 3) by exploiting abundant domain-specific tags.
(Research Objective 1 is achieved)
An enhanced tag-induced cross-domain collaborative filtering model is pre-
sented in which abundant domain-specific tags, not limited overlapping
tags, are utilized to increase connections between heterogeneous domains.
To align diverse domain specific tags, spectral clustering together with tag
co-occurrence patterns are exploited to group domain-specific tags. Based
on the tag clusters, a new user and item profile can be defined and utilized
to compute cross-domain similarity for regularizing knowledge transfer. The
experimental results demonstrate that the proposed model is capable of
establishing a strong domain connection to support knowledge transfer when
the number of overlapping tags is scarce. Furthermore, domain-specific
tags are shown to be beneficial for adding more information about user
preferences into recommendations.
(2) The development of a complete tag-induced cross-domain recommendation
model (Chapter 4) by exploiting structural knowledge inferred with tags.
(Research Objective 2 is achieved)
A complete tag-induced cross-domain recommendation model is proposed in
which both inter- and intra-domain correlations are considered as structural
knowledge to promote knowledge transfer. In this model, not only overlap-
143
6.1 Conclusions
ping tags but also domain-specific tags are exploited to play complementary
roles in the establishment of inter-domain correlation. Additionally, intra-
domain similarity between users and between items has also been introduced
by distinguishing the tag distribution in the individual domain, which is
likened to building a compact intra-domain correlation to support knowledge
transfer at a group level. Experiments on three public datasets and with five
state-of-the-art baseline approaches demonstrate that the proposed model
performs well in both rating prediction and item recommendation tasks.
(3) The development of a tag semantically-boosted cross-domain recommen-
dation model (Chapter 5) by exploiting semantic information of tags.
(Research Objective 3 is achieved)
A tag semantically-boosted cross-domain recommendation model is devel-
oped which aims to automatically group those non-identical but semantically
related tags to increase the domain overlap and ultimately the recommen-
dation quality. In this model, the word2vec technique is utilized to learn
a semantic representation of tags. Then, semantically equivalent tags are
successfully merged into the same group according to the learned semantic
representation. Derived tag clusters spanning across domains are exploited
as a joint embedding space for aligning heterogeneous domains. By mapping
users and items from both source and target domains to the same embed-
ding space, similar users and items across domains can be identified and
connected. As a result, knowledge is transferred from the source domain to
the target domain via matched users and items to improve recommendation
performance. Experimental results on multiple datasets demonstrate that
144
6.2 Future work
our proposed model outperforms other state-of-the-art baselines in the top-N
recommendation task.
6.2 Future work
Future directions identified in this research can be summarized as follows:
• In Chapter 3, the overlap between disjoint domains was increased with
tag clusters by grouping domain-specific tags. As shown in (Enrich et al.,
2013), irrelevant tags may hinder the improvement of recommendation
performance. The preprocessing step that filters domain-specific tags by
taking into account of tag relevance is not considered in our proposed model.
It is possible that an improved result could be achieved if irrelevant tags are
ignored to avoid introducing noise in clustering.
• Regarding the model proposed in Chapter 4, a simple similarity integration
strategy was designed in the establishment of inter-domain correlation in
which contributions of overlapping tags and domain-specific tags are treated
in the same way. According to the work of (Shambour and Lu, 2012; Slokom
and Ayachi, 2017), a delicate similarity fusion method may be more useful
for the exploration of tag-induced similarity.
• With respect to Chapter 5, the semantic information of tags was exploited
in improving recommendation performance. It is interesting to extend
the framework of TSCDR to integrate other auxiliary data sources, such
as reviews (Song et al., 2017) and images (McAuley et al., 2015), which
145
6.2 Future work
also contain plentiful semantic information to be mined to develop a more
effective recommendation approach.
• Applying the proposed models on other recommendation applications and
testing on more datasets are other attractive research directions. The
experiments are a strong indicator by which to evaluate the generality and
accuracy of the proposed models.
• It is also planned to integrate the proposed tag-based cross-domain recom-
mendation models into Smart BizSeeker in order to develop a prototype of
an advanced recommendation engine for application.
• Finally, all the models developed in this research are based on matrix factor-
ization, which only deals with user-item interactions in the form of numeric
ratings. In addition to the most common user and item dimensions, there
are many other valuable dimensions in real recommendation scenarios, such
as time, inquiry and price. They are useful for understanding the custom
needs of users if the potential relationship among them can be discovered.
To address multidimensional data, tensor-based models provide a straight-
forward way of integrating context information into recommendation (Frolov
and Oseledets, 2017; Symeonidis, 2016). It is an alternative to consider
tensor factorization in our proposed models.
146
Bibliography
Abel, F., Herder, E., Houben, G.-J., Henze, N., and Krause, D. (2013). Cross-
system user modeling and personalization on the social web. User Modeling
and User-Adapted Interaction, pages 1–41.
Adomavicius, G., Sankaranarayanan, R., Sen, S., and Tuzhilin, A. (2005). Incorpo-
rating contextual information in recommender systems using a multidimensional
approach. ACM Transactions on Information Systems, 23(1):103–145.
Adomavicius, G. and Tuzhilin, A. (2005). Toward the next generation of recom-
mender systems: A survey of the state-of-the-art and possible extensions. IEEE
Transactions on Knowledge and Data Engineering, 17(6):734–749.
147
BIBLIOGRAPHY
Adomavicius, G. and Tuzhilin, A. (2011). Context-aware recommender systems.
In Recommender systems handbook, pages 217–253. Springer.
Argyriou, A., Evgeniou, T., and Pontil, M. (2007). Multi-task feature learning.
In Advances in Neural Information Processing Systems, pages 41–48.
Barragáns-Martínez, A. B., Costa-Montenegro, E., Burguillo, J. C., Rey-López,
M., Mikic-Fonte, F. A., and Peleteiro, A. (2010). A hybrid content-based and
item-based collaborative filtering approach to recommend tv programs enhanced
with singular value decomposition. Information Sciences, 180(22):4290–4311.
Beel, J., Gipp, B., Langer, S., and Breitinger, C. (2016). paper recommender sys-
tems: a literature survey. International Journal on Digital Libraries, 17(4):305–
338.
Behbood, V., Lu, J., and Zhang, G. (2011). Long term bank failure prediction
using fuzzy refinement-based transductive transfer learning. In 2011 IEEE
International Conference on Fuzzy Systems, pages 2676–2683.
148
BIBLIOGRAPHY
Behbood, V., Lu, J., and Zhang, G. (2014). Fuzzy refinement domain adaptation
for long term prediction in banking ecosystem. IEEE Transactions on Industrial
Informatics, 10(2):1637–1646.
Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A
review and new perspectives. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 35(8):1798–1828.
Bengio, Y., Courville, A. C., and Vincent, P. (2012). Unsupervised feature learning
and deep learning: A review and new perspectives. CoRR, abs/1206.5538, pages
1–30.
Bengio, Y. et al. (2009). Learning deep architectures for ai. Foundations and
trends® in Machine Learning, 2(1):1–127.
Bogers, T. (2010). Movie recommendation using random walks over the contextual
graph. In Proceedings of the 2nd International Workshop on Context-Aware
Recommender Systems, pages 1–5.
149
BIBLIOGRAPHY
Bouadjenek, M. R., Hacid, H., Bouzeghoub, M., and Vakali, A. (2013). Using
social annotations to enhance document representation for personalized search.
In Proceedings of the 36th international ACM SIGIR Conference on Research
and Development in Information Retrieval, pages 1049–1052.
Brin, S. and Page, L. (2012). Reprint of: The anatomy of a large-scale hypertextual
web search engine. Computer Networks, 56(18):3825–3833.
Burke, R. (2002). Hybrid recommender systems: Survey and experiments. User
Modeling and User-Adapted Interaction, 12(4):331–370.
Burke, R. (2005). Hybrid systems for personalized recommendations. Intelligent
Techniques for Web Personalization, pages 133–152.
Burke, R. (2007). The adaptive web. chapter Hybrid Web Recommender Systems,
pages 377–408.
Cantador, I., Bellogín, A., and Vallet, D. (2010). Content-based recommenda-
tion in social tagging systems. In Proceedings of the 4th ACM conference on
Recommender systems, pages 237–240.
150
BIBLIOGRAPHY
Cantador, I., Fernández-Tobías, I., Berkovsky, S., and Cremonesi, P. (2015).
Cross-domain recommender systems. In Recommender Systems Handbook,
pages 919–959. Springer.
Cao, B., Liu, N. N., and Yang, Q. (2010). Transfer learning for collective link
prediction in multiple heterogenous domains. In Proceedings of the 27th Inter-
national Conference on Machine Learning, pages 159–166.
Cao, D., He, X., Nie, L., Wei, X., Hu, X., Wu, S., and Chua, T.-S. (2017). Cross-
platform app recommendation by jointly modeling ratings and texts. ACM
Transactions on Information Systems, 35(4):37.
Chakraverty, S. and Saraswat, M. (2017). Review based emotion profiles for cross
domain recommendation. Multimedia Tools and Applications, pages 1–24.
Chatzis, S. (2013). Nonparametric bayesian multitask collaborative filtering.
In Proceedings of the 22nd ACM international conference on Conference on
Information and Knowledge Management, pages 2149–2158.
151
BIBLIOGRAPHY
Cho, K., Van Merriënboer, B., Bahdanau, D., and Bengio, Y. (2014a). On the
properties of neural machine translation: Encoder-decoder approaches. arXiv
preprint arXiv:1409.1259.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk,
H., and Bengio, Y. (2014b). Learning phrase representations using rnn encoder-
decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
Codina, V., Ricci, F., and Ceccaroni, L. (2013a). Exploiting the semantic similarity
of contextual situations for pre-filtering recommendation. In International
Conference on User Modeling, Adaptation, and Personalization, pages 165–177.
Springer.
Codina, V., Ricci, F., and Ceccaroni, L. (2013b). Semantically-enhanced pre-
filtering for context-aware recommender systems. In Proceedings of the 3rd
Workshop on Context-awareness in Retrieval and Recommendation, pages 15–18.
ACM.
Cook, D., Feuz, K. D., and Krishnan, N. C. (2013). Transfer learning for activity
recognition: A survey. Knowledge and Information Systems, 36(3):537–556.
152
BIBLIOGRAPHY
Cremonesi, P., Tripodi, A., and Turrin, R. (2011). Cross-domain recommender
systems. In Proceedings of the 11th IEEE International Conference on Data
Mining Workshops, pages 496–503.
Dai, W., Yang, Q., Xue, G.-R., and Yu, Y. (2007). Boosting for transfer learning.
In Proceedings of the 24th International Conference on Machine Learning, pages
193–200.
Dai, W., Yang, Q., Xue, G.-R., and Yu, Y. (2008). Self-taught clustering. In
Proceedings of the 25th International Conference on Machine Learning, pages
200–207.
De Gemmis, M., Lops, P., Semeraro, G., and Basile, P. (2008). Integrating tags
in a semantic content-based recommender. In Proceedings of the 2nd ACM
conference on Recommender Systems, pages 163–170.
Deng, L., Yu, D., et al. (2014). Deep learning: methods and applications. Foun-
dations and Trends® in Signal Processing, 7(3):197–387.
153
BIBLIOGRAPHY
Deselaers, T., Hasan, S., Bender, O., and Ney, H. (2009). A deep learning approach
to machine transliteration. In Proceedings of the 4th Workshop on Statistical
Machine Translation, pages 233–241.
Ding, C. and He, X. (2004). K-means clustering via principal component analysis.
In Proceedings of the twenty-first international conference on Machine learning,
pages 29–35.
Elkahky, A. M., Song, Y., and He, X. (2015). A multi-view deep learning approach
for cross domain user modeling in recommendation systems. In Proceedings of
the 24th International Conference on World Wide Web, pages 278–288.
Enrich, M., Braunhofer, M., and Ricci, F. (2013). Cold-start management with
cross-domain collaborative filtering and tags. In Proceedings of the International
Conference on Electronic Commerce and Web Technologies, pages 101–112.
Fan, W., Davidson, I., Zadrozny, B., and Yu, P. S. (2005). An improved cate-
gorization of classifier’s sensitivity on sample selection bias. In Data Mining,
Fifth IEEE International Conference on, pages 4–11.
154
BIBLIOGRAPHY
Fang, Z., Gao, S., Li, B., Li, J., and Liao, J. (2015). Cross-domain recommendation
via tag matrix transfer. In Proceedings of IEEE International Conference on
Data Mining Workshop, pages 1235–1240.
Fernández-Tobías, I. (2017). Matrix factorization models for cross-domain recom-
mendation: Addressing the cold start in collaborative filtering. PhD thesis.
Fernández-Tobías, I. and Cantador, I. (2014). Exploiting social tags in matrix
factorization models for cross-domain collaborative filtering. In Proceedings of
the Workshop on New Trends in Content-based Recommender System in RecSys,
pages 34–41.
Fernández-Tobías, I., Cantador, I., Kaminskas, M., and Ricci, F. (2011). A generic
semantic-based framework for cross-domain recommendation. In Proceedings of
the 2nd International Workshop on Information Heterogeneity and Fusion in
Recommender Systems, pages 25–32.
Fernández-Tobías, I., Cantador, I., Kaminskas, M., and Ricci, F. (2012). Cross-
domain recommender systems: A survey of the state of the art. In Proceedings
of Spanish Conference on Information Retrieval, pages 24–36.
155
BIBLIOGRAPHY
Frey, B. J. and Dueck, D. (2007). Clustering by passing messages between data
points. Science, 315(5814):972–976.
Frolov, E. and Oseledets, I. (2017). Tensor methods and recommender systems.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7(3):1–
41.
Gao, J., Fan, W., Jiang, J., and Han, J. (2008). Knowledge transfer via multiple
model local structure mapping. In Proceedings of the 14th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pages
283–291.
Gao, S., Luo, H., Chen, D., Li, S., Gallinari, P., and Guo, J. (2013). Cross-
domain recommendation via cluster-level latent factor model. In Joint European
Conference on Machine Learning and Knowledge Discovery in Databases, pages
161–176.
Gedikli, F. and Jannach, D. (2013). Improving recommendation accuracy based
on item-specific tag preferences. ACM Transactions on Intelligent Systems and
Technology, 4(1):11–19.
156
BIBLIOGRAPHY
George, T. and Merugu, S. (2005). A scalable collaborative filtering framework
based on co-clustering. In Proceedings of the 5th IEEE international conference
on Data Mining, pages 4–12.
Ghazanfar, M. A. and Prugel-Bennett, A. (2010). A scalable, accurate hybrid
recommender system. In Proceedings of the 3d International Conference on
Knowledge Discovery and Data Mining, pages 94–98.
Glorot, X., Bordes, A., and Bengio, Y. (2011). Domain adaptation for large-scale
sentiment classification: A deep learning approach. In Proceedings of the 28th
International Conference on Machine Learning, pages 513–520.
Gong, B., Grauman, K., and Sha, F. (2014). Learning kernels for unsupervised
domain adaptation with applications to visual object recognition. International
Journal Computer Vision, 109(1-2):3–27.
Graves, A. and Jaitly, N. (2014). Towards end-to-end speech recognition with
recurrent neural networks. In Proceedings of the 31st International Conference
on Machine Learning, pages 1764–1772.
157
BIBLIOGRAPHY
Graves, A., Mohamed, A.-r., and Hinton, G. (2013). Speech recognition with
deep recurrent neural networks. In Proceedings of the 2013 IEEE International
Conference on Acoustics, Speech and Signal Processing, pages 6645–6649.
Grčar, M., Mladenič, D., Fortuna, B., and Grobelnik, M. (2005). Data sparsity
issues in the collaborative filtering framework. In International Workshop on
Knowledge Discovery on the Web, pages 58–76.
Hao, P., Zhang, G., and Lu, J. (2016). Enhancing cross domain recommendation
with domain dependent tags. In Proceedings of the 2016 IEEE International
Conference on Fuzzy Systems, pages 1266–1273.
Hariri, N., Mobasher, B., Burke, R., and Zheng, Y. (2011). Context-aware
recommendation based on review mining. In Proceedings of the 9th Workshop
on Intelligent Techniques for Web Personalization and Recommender Systems,
page 30.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image
recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision
and Pattern Recognition, pages 770–778.
158
BIBLIOGRAPHY
He, X., Liao, L., Zhang, H., Nie, L., Hu, X., and Chua, T.-S. (2017). Neural
collaborative filtering. In Proceedings of the 26th International Conference on
World Wide Web, pages 173–182.
Herlocker, J. L., Konstan, J. A., Borchers, A., and Riedl, J. (1999). An algorithmic
framework for performing collaborative filtering. In Proceedings of the 22nd
annual international ACM SIGIR conference on Research and development in
information retrieval, pages 230–237.
Hidasi, B. and Tikk, D. (2012). Fast als-based tensor factorization for context-
aware recommendation from implicit feedback. Machine Learning and Knowledge
Discovery in Databases, pages 67–82.
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-r., Jaitly, N., Senior, A.,
Vanhoucke, V., Nguyen, P., Sainath, T. N., et al. (2012). Deep neural networks
for acoustic modeling in speech recognition: The shared views of four research
groups. IEEE Signal Processing Magazine, 29(6):82–97.
Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006). A fast learning algorithm for
deep belief nets. Neural computation, 18(7):1527–1554.
159
BIBLIOGRAPHY
Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of
data with neural networks. science, 313(5786):504–507.
Hoffman, M. D., Blei, D. M., and Bach, F. (2010). Online learning for latent
dirichlet allocation. In Proceedings of the 23rd International Conference on
Neural Information Processing Systems, pages 856–864.
Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM
Transactions on Information System, 22(1):89–115.
Hu, L., Cao, J., Xu, G., Cao, L., Gu, Z., and Zhu, C. (2013). Personalized
recommendation via cross-domain triadic factorization. In Proceedings of the
22nd International Conference on World Wide Web, pages 595–606.
Huang, J., Gretton, A., Borgwardt, K. M., Schölkopf, B., and Smola, A. J. (2007).
Correcting sample selection bias by unlabeled data. In Advances in Neural
Information Processing Systems, pages 601–608.
Huang, J., Smola, A. J., Gretton, A., Borgwardt, K. M., and Scholkopf, B. (2006).
Correcting sample selection bias by unlabeled data. In Proceedings of the 19th
160
BIBLIOGRAPHY
International Conference on Neural Information Processing Systems, pages
601–608.
Hwangbo, H. and Kim, Y. (2017). An empirical study on the effect of data sparsity
and data overlap on cross domain collaborative filtering performance. Expert
Systems with Applications, 89:254–265.
Jabeen, F., Khusro, S., Majid, A., and Rauf, A. (2016). Semantics discovery in
social tagging systems: A review. Multimedia Tools and Applications, 75(1):573–
605.
Jannach, D. and Adomavicius, G. (2016). Recommendations with a purpose. In
Proceedings of the 10th ACM Conference on Recommender Systems, pages 7–10.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadar-
rama, S., and Darrell, T. (2014). Caffe: Convolutional architecture for fast
feature embedding. In Proceedings of the 22nd ACM International Conference
on Multimedia, pages 675–678.
161
BIBLIOGRAPHY
Jiang, J. (2008). A literature survey on domain adaptation of statistical classifiers.
URL: http://sifaka. cs. uiuc. edu/jiang4/domainadaptation/survey, 3:1–12.
Jiang, J. and Zhai, C. (2007). Instance weighting for domain adaptation in nlp.
In Proceedings of the Association for Computational Linguistics, pages 264–271.
Jiang, M., Cui, P., Yuan, N. J., Xie, X., and Yang, S. (2016). Little is much:
Bridging cross-platform behaviors through overlapped crowds. In Proceedings
of the 30th Association for the Advancement of Artificial Intelligence, pages
13–19.
Jiang, W. and Chung, F.-l. (2012). Transfer spectral clustering. In Joint European
Conference on Machine Learning and Knowledge Discovery in Databases, pages
789–803.
Kemker, R. and Kanan, C. (2017). Self-taught feature learning for hyperspectral
image classification. IEEE Transactions on Geoscience and Remote Sensing,
55(5):2693–2705.
162
BIBLIOGRAPHY
Khan, M. M., Ibrahim, R., and Ghani, I. (2017). Cross domain recommender
systems: A systematic literature review. ACM Computing Surveys, 50(3):36.
Kim, B. M., Li, Q., Park, C. S., Kim, S. G., and Kim, J. Y. (2006). A new
approach for combining content-based and collaborative filters. Journal of
Intelligent Information Systems, 27(1):79–91.
Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv
preprint arXiv:1408.5882.
Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L., Gordon, L. R., and
Riedl, J. (1997). Grouplens: applying collaborative filtering to usenet news.
Communications of the ACM, 40(3):77–87.
Koren, Y. (2008). Factorization meets the neighborhood: A multifaceted collabo-
rative filtering model. In Proceedings of the 14th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, pages 426–434.
Koren, Y. and Bell, R. (2015). Advances in collaborative filtering. pages 77–118.
163
BIBLIOGRAPHY
Koren, Y., Bell, R., and Volinsky, C. (2009). Matrix factorization techniques for
recommender systems. Computer, 42(8):30–37.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification
with deep convolutional neural networks. In Advances in Neural Information
Processing Systems, pages 1097–1105.
Kuchaiev, O. and Ginsburg, B. (2017). Training deep autoencoders for collabora-
tive filtering. arXiv preprint arXiv:1708.01715.
Kumar, A., Kumar, N., Hussain, M., Chaudhury, S., and Agarwal, S. (2014).
Semantic clustering-based cross-domain recommendation. In IEEE Symposium
on Computational Intelligence and Data Mining, pages 137–141.
Lampropoulos, A. S., Lampropoulou, P. S., and Tsihrintzis, G. A. (2012). A
cascade-hybrid music recommender system for mobile services based on musical
genre classification and personality diagnosis. Multimedia Tools and Applications,
59(1):241–258.
164
BIBLIOGRAPHY
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature,
521(7553):436–444.
LeCun, Y., Boser, B. E., Denker, J. S., Henderson, D., Howard, R. E., Hubbard,
W. E., and Jackel, L. D. (1990). Handwritten digit recognition with a back-
propagation network. In Proceedings of the Advances in Neural Information
Processing Systems, pages 396–404.
Lekakos, G. and Caravelas, P. (2008). A hybrid approach for movie recommenda-
tion. Multimedia tools and applications, 36(1):55–70.
Li, B. (2011). Cross-domain collaborative filtering: A brief survey. In Proceedings
of the 23rd IEEE International Conference on Tools with Artificial Intelligence,
pages 1085–1086.
Li, B., Yang, Q., and Xue, X. (2009a). Can movies and books collaborate?
cross-domain collaborative filtering for sparsity reduction. In Proceedings of
21st International Joint Conference on Artificial Intelligence, pages 2052–2057.
165
BIBLIOGRAPHY
Li, B., Yang, Q., and Xue, X. (2009b). Transfer learning for collaborative
filtering via a rating-matrix generative model. In Proceedings of the 26th Annual
International Conference on Machine Learning, pages 617–624.
Li, B., Zhu, X., Li, R., and Zhang, C. (2015). Rating knowledge sharing in cross-
domain collaborative filtering. IEEE transactions on cybernetics, 45(5):1068–
1082.
Li, K. and Principe, J. C. (2017). Transfer learning in adaptive filters: The
nearest-instance-centroid-estimation kernel least-mean-square algorithm. IEEE
Transactions on Signal Processing, 65(24):6520–6535.
Li, W., Duan, L., Xu, D., and Tsang, I. W. (2014). Learning with augmented
features for supervised and semi-supervised heterogeneous domain adaptation.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6):1134–
1148.
Long, M., Wang, J., Ding, G., Pan, S. J., and Philip, S. Y. (2014a). Adaptation
regularization: A general framework for transfer learning. IEEE Transactions
on Knowledge and Data Engineering, 26(5):1076–1089.
166
BIBLIOGRAPHY
Long, M., Wang, J., Ding, G., Shen, D., and Yang, Q. (2014b). Transfer learning
with graph co-regularization. IEEE Transactions on Knowledge and Data
Engineering, 26(7):1805–1818.
Lops, P., De Gemmis, M., and Semeraro, G. (2011). Content-based recommender
systems: State of the art and trends. In Recommender Systems Handbook, pages
73–105.
Lu, J., Behbood, V., Hao, P., Zuo, H., Xue, S., and Zhang, G. (2015a). Transfer
learning using computational intelligence: a survey. Knowledge-Based Systems,
80:14–23.
Lu, J., Wu, D., Mao, M., Wang, W., and Zhang, G. (2015b). Recommender
system application developments: A survey. Decision Support Systems, 74:12 –
32.
Lu, Z., Zhong, E., Zhao, L., Xiang, E. W., Pan, W., and Yang, Q. (2013). Selective
transfer learning for cross domain recommendation. In Proceedings of the 2013
SIAM International Conference on Data Mining, pages 641–649.
167
BIBLIOGRAPHY
Ma, H., Yang, H., Lyu, M. R., and King, I. (2008). Sorec: social recommendation
using probabilistic matrix factorization. In Proceedings of the 17th ACM
Conference on Information and knowledge Management, pages 931–940. ACM.
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., and Potts, C. (2011).
Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual
Meeting of the Association for Computational Linguistics: Human Language
Technologies, pages 142–150.
Marlin, B. M. (2004). Modeling user rating profiles for collaborative filtering. In
Advances in Neural Information Processing Systems, pages 627–634.
McAuley, J., Targett, C., Shi, Q., and Van Den Hengel, A. (2015). Image-based
recommendations on styles and substitutes. In Proceedings of the 38th Interna-
tional ACM SIGIR Conference on Research and Development in Information
Retrieval, pages 43–52.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Dis-
tributed representations of words and phrases and their compositionality. In
Advances in Neural Information Processing Systems, pages 3111–3119.
168
BIBLIOGRAPHY
Mnih, A. and Salakhutdinov, R. (2008). Probabilistic matrix factorization. In
Advances in Neural Information Processing Systems, pages 1257–1264.
Mooney, R. J. and Roy, L. (2000). Content-based book recommending using
learning for text categorization. In Proceedings of the 5th ACM conference on
Digital libraries, pages 195–204.
Moreno, O., Shapira, B., Rokach, L., and Shani, G. (2012). Talmud: transfer
learning for multiple domains. In Proceedings of the 21st ACM International
Conference on Information and Knowledge Management, pages 425–434.
Nguyen, H. T., Wistuba, M., Grabocka, J., Drumond, L. R., and Schmidt-Thieme,
L. (2017). Personalized deep learning for tag recommendation. In Proceedings
of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages
186–197.
Page, L., Brin, S., Motwani, R., and Winograd, T. (1999). The pagerank citation
ranking: Bringing order to the web. Technical report, Stanford InfoLab.
169
BIBLIOGRAPHY
Pan, S. J., Kwok, J. T., and Yang, Q. (2008). Transfer learning via dimensionality
reduction. In AAAI, volume 8, pages 677–682.
Pan, S. J., Ni, X., Sun, J.-T., Yang, Q., and Chen, Z. (2010a). Cross-domain
sentiment classification via spectral feature alignment. In Proceedings of the
19th international conference on World wide web, pages 751–760.
Pan, S. J., Tsang, I. W., Kwok, J. T., and Yang, Q. (2011a). Domain adaptation
via transfer component analysis. IEEE Transactions on Neural Networks,
22(2):199–210.
Pan, S. J. and Yang, Q. (2010). A survey on transfer learning. IEEE Transactions
on Knowledge and Data Engineering, 22(10):1345–1359.
Pan, W. (2016). A survey of transfer learning for collaborative recommendation
with auxiliary data. Neurocomputing, 177:447–453.
Pan, W., Liu, N. N., Xiang, E. W., and Yang, Q. (2011b). Transfer learning to
predict missing ratings via heterogeneous user feedbacks. In Proceedings 23rd
of International Joint Conference on Artificial Intelligence, pages 2318–2323.
170
BIBLIOGRAPHY
Pan, W. and Ming, Z. (2014). Interaction-rich transfer learning for collaborative
filtering with heterogeneous user feedback. IEEE Intelligent Systems, 29(6):48–
54.
Pan, W., Xiang, E. W., Liu, N. N., and Yang, Q. (2010b). Transfer learning in
collaborative filtering for sparsity reduction. In Proceedings of the 24th AAAI
Conference on Artificial Intelligence, volume 10, pages 230–235.
Pan, W., Xiang, E. W., and Yang, Q. (2012). Transfer learning in collaborative
filtering with uncertain ratings. In Proceedings of the 26th AAAI Conference
on Artificial Intelligence, pages 662–668.
Pan, W. and Yang, Q. (2013). Transfer learning in heterogeneous collaborative
filtering domains. Artificial Intelligence, 197(0):39–55.
Panniello, U., Tuzhilin, A., Gorgoglione, M., Palmisano, C., and Pedone, A.
(2009). Experimental comparison of pre-vs. post-filtering approaches in context-
aware recommender systems. In Proceedings of the 3rd ACM conference on
Recommender Systems, pages 265–268.
171
BIBLIOGRAPHY
Park, D. H., Kim, H. K., Choi, I. Y., and Kim, J. K. (2012). A literature
review and classification of recommender systems research. Expert Systems with
Applications, 39(11):10059–10072.
Pazzani, M. J. and Billsus, D. (2007). Content-based recommendation systems.
In The Adaptive Web, pages 325–341.
Raina, R., Battle, A., Lee, H., Packer, B., and Ng, A. Y. (2007). Self-taught
learning: transfer learning from unlabeled data. In Proceedings of the 24th
International Conference on Machine Learning, pages 759–766.
Ramirez-Garcia, X. and García-Valdez, M. (2014). Post-filtering for a restaurant
context-aware recommender system. In Recent Advances on Hybrid Approaches
for Designing Intelligent Systems, pages 695–707.
Rendle, S., Freudenthaler, C., Gantner, Z., and Schmidt-Thieme, L. (2009). Bpr:
Bayesian personalized ranking from implicit feedback. In Proceedings of the
25th Conference on Uncertainty in Artificial Intelligence, pages 452–461.
172
BIBLIOGRAPHY
Rendle, S., Gantner, Z., Freudenthaler, C., and Schmidt-Thieme, L. (2011). Fast
context-aware recommendations with factorization machines. In Proceedings of
the 34th international ACM SIGIR conference on Research and development in
Information Retrieval, pages 635–644.
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., and Riedl, J. (1994). Grou-
plens: An open architecture for collaborative filtering of netnews. In Proceedings
of the 1994 ACM Conference on Computer Supported Cooperative Work, pages
175–186.
Ricci, F., Rokach, L., and Shapira, B. (2011). Introduction to recommender
systems handbook. In Recommender systems handbook, pages 1–35.
Rohrbach, M., Ebert, S., and Schiele, B. (2013). Transfer learning in a transductive
setting. In Advances in Neural Information Processing Systems, pages 46–54.
Salakhutdinov, R., Mnih, A., and Hinton, G. (2007). Restricted boltzmann
machines for collaborative filtering. In Proceedings of the 24th international
conference on Machine learning, pages 791–798.
173
BIBLIOGRAPHY
Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. (2000). Application of
dimensionality reduction in recommender system-a case study. Technical report,
DTIC Document.
Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. (2001). Item-based collaborative
filtering recommendation algorithms. In Proceedings of the 10th international
conference on World Wide Web, pages 285–295.
Sarwar, B. M. (2001). Sparsity, Scalability, and Distribution in Recommender
Systems. PhD thesis. AAI9994525.
Schafer, J. B., Frankowski, D., Herlocker, J., and Sen, S. (2007). Collaborative
Filtering Recommender Systems, pages 291–324.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural
Networks, 61:85–117.
Sedhain, S., Menon, A. K., Sanner, S., and Xie, L. (2015). Autorec: Autoencoders
meet collaborative filtering. In Proceedings of the 24th International Conference
on World Wide Web, pages 111–112.
174
BIBLIOGRAPHY
Shambour, Q. and Lu, J. (2012). A trust-semantic fusion-based recommendation
approach for e-business applications. Decision Support System, 54(1):768–780.
Shao, L., Zhu, F., and Li, X. (2015). Transfer learning for visual categorization:
A survey. IEEE Transactions on Neural Networks and Learning Systems,
26(5):1019–1034.
Shapira, B., Rokach, L., and Freilikhman, S. (2013). Facebook single and cross
domain data for recommendation systems. User Modeling and User-Adapted
Interaction, pages 1–37.
Shepitsen, A., Gemmell, J., Mobasher, B., and Burke, R. (2008). Personalized
recommendation in social tagging systems using hierarchical clustering. In
Proceedings of the 2008 ACM conference on Recommender systems, pages 259–
266.
Shi, X., Liu, Q., Fan, W., Philip, S. Y., and Zhu, R. (2010). Transfer learning on
heterogenous feature spaces via spectral transformation. In Proceedings of the
10th International Conference on Data Mining, pages 1049–1054.
175
BIBLIOGRAPHY
Shi, Y., Larson, M., and Hanjalic, A. (2011). Tags as bridges between domains:
Improving recommendation with tag-induced cross-domain collaborative filter-
ing. In Proceedings of the 19th International Conference on User Modeling,
Adaption and Personalization, pages 305–316.
Shi, Y., Larson, M., and Hanjalic, A. (2013a). Exploiting social tags for cross-
domain collaborative filtering. arXiv:1302.4888v2.
Shi, Y., Larson, M., and Hanjalic, A. (2013b). Mining contextual movie similarity
with matrix factorization for context-aware recommendation. ACM Transactions
on Intelligent Systems and Technology, 4(1):1–19.
Shi, Y., Larson, M., and Hanjalic, A. (2014). Collaborative filtering beyond the
user-item matrix: A survey of the state of the art and future challenges. ACM
Computing Surveys, 47(1):1–45.
Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for
large-scale image recognition. arXiv preprint arXiv:1409.1556.
176
BIBLIOGRAPHY
Singh, A. P. and Gordon, G. J. (2008). Relational learning via collective matrix
factorization. In Proceedings of the 14th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, pages 650–658.
Slokom, M. and Ayachi, R. (2017). A hybrid user and item based collaborative
filtering approach by possibilistic similarity fusion. In Advances in Combining
Intelligent Methods, pages 125–147.
Socher, R., Lin, C. C., Manning, C., and Ng, A. Y. (2011a). Parsing natural
scenes and natural language with recursive neural networks. In Proceedings of
the 28th International Conference on Machine Learning, pages 129–136.
Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., and Manning, C. D. (2011b).
Semi-supervised recursive autoencoders for predicting sentiment distributions.
In Proceedings of the Conference on Empirical Methods in Natural Language
Processing, pages 151–161.
Song, T., Peng, Z., Wang, S., Fu, W., Hong, X., and Philip, S. Y. (2017). Review-
based cross-domain recommendation through joint tensor factorization. In
177
BIBLIOGRAPHY
International Conference on Database Systems for Advanced Applications, pages
525–540.
Strub, F. and Mary, J. (2015). Collaborative filtering with stacked denoising
autoencoders and sparse inputs. In NIPS workshop on Machine Learning for
E-commerce, pages 1–9.
Su, X. and Khoshgoftaar, T. M. (2009). A survey of collaborative filtering
techniques. Advances in artificial intelligence, 2009:1–19.
Sun, Q., Chattopadhyay, R., Panchanathan, S., and Ye, J. (2011). A two-stage
weighting framework for multi-source domain adaptation. In Proceedings of the
Advances in Neural Information Processing Systems, pages 505–513.
Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to sequence learning
with neural networks. In Proceedings of the Advances in Neural Information
Processing Systems, pages 3104–3112.
Symeonidis, P. (2016). Matrix and tensor decomposition in recommender systems.
In Proceedings of the 10th ACM Conference on Recommender Systems, pages
178
BIBLIOGRAPHY
429–430.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,
Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions.
In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern
Recognition, pages 1–9.
Tan, S., Bu, J., Qin, X., Chen, C., and Cai, D. (2014). Cross domain recommen-
dation based on multi-type media fusion. Neurocomputing, 127:124–134.
Tang, J., Hu, X., Gao, H., and Liu, H. (2013). Exploiting local and global social
context for recommendation. In Proceedings of the 23rd International Joint
Conference on Artificial Intelligence, pages 2712–2718.
Tang, J., Wu, S., Sun, J., and Su, H. (2012). Cross-domain collaboration recom-
mendation. In Proceedings of the 18th ACM SIGKDD international conference
on Knowledge discovery and data mining, pages 1285–1293.
Tiroshi, A., Berkovsky, S., Kaafar, M. A., Chen, T., and Kuflik, T. (2013). Cross
social networks interests predictions based ongraph features. In Proceedings of
179
BIBLIOGRAPHY
the 7th ACM Conference on Recommender Systems, pages 319–322.
Tong, H., Faloutsos, C., and Pan, J.-Y. (2008). Random walk with restart: fast
solutions and applications. Knowledge and Information Systems, 14(3):327–346.
Tso-Sutter, K. H., Marinho, L. B., and Schmidt-Thieme, L. (2008). Tag-aware rec-
ommender systems by fusion of collaborative filtering algorithms. In Proceedings
of the 2008 ACM symposium on Applied computing, pages 1995–1999.
Verbert, K., Duval, E., Lindstaedt, S., and Gillet, D. (2010). Context-aware
recommender systems. Journal of Universal Computer Science, 16(16):2175–
2178.
Verbert, K., Manouselis, N., Ochoa, X., Wolpers, M., Drachsler, H., Bosnic, I.,
and Duval, E. (2012). Context-aware recommender systems for learning: a
survey and future challenges. IEEE Transactions on Learning Technologies,
5(4):318–335.
Wang, C. and Mahadevan, S. (2008). Manifold alignment using procrustes analysis.
In Proceedings of the 25th International Conference on Machine Learning, pages
180
BIBLIOGRAPHY
1120–1127.
Wang, C. and Mahadevan, S. (2011). Heterogeneous domain adaptation using
manifold alignment. In Proceedings of the 22nd International Joint Conference
on Artificial Intelligence, pages 1541–1546.
Wang, H. and Yeung, D.-Y. (2016). Towards bayesian deep learning: A frame-
work and some existing methods. IEEE Transactions on Knowledge and Data
Engineering, 28(12):3395–3408.
Wang, J., de Vries, A. P., and Reinders, M. J. T. (2006). Unifying user-based and
item-based collaborative filtering approaches by similarity fusion. In Proceedings
of the 29th Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval, pages 501–508.
Wang, W., Chen, Z., Liu, J., Qi, Q., and Zhao, Z. (2012). User-based collaborative
filtering on cross domain by tag transfer learning. In Proceedings of the 1st
International Workshop on Cross Domain Knowledge Discovery in Web and
Social Network Mining, pages 10–17.
181
BIBLIOGRAPHY
Wang, X., Yu, L., Ren, K., Tao, G., Zhang, W., Yu, Y., and Wang, J. (2017).
Dynamic attention deep model for article recommendation by learning human
editors’ demonstration. In Proceedings of the 23rd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, pages 2051–2059.
Wang, Z., Song, Y., and Zhang, C. (2008). Transferred dimensionality reduction.
Machine Learning and Knowledge Discovery in Databases, pages 550–565.
Weiss, K., Khoshgoftaar, T. M., and Wang, D. (2016). A survey of transfer
learning. Journal of Big Data, 3(1):1–40.
Weston, J., Ratle, F., Mobahi, H., and Collobert, R. (2012). Deep learning via
semi-supervised embedding. pages 639–655.
Wu, C.-Y., Ahmed, A., Beutel, A., Smola, A. J., and Jing, H. (2017). Recur-
rent recommender networks. In Proceedings of the Tenth ACM International
Conference on Web Search and Data Mining, pages 495–503.
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun,
M., Cao, Y., Gao, Q., Macherey, K., et al. (2016). Google’s neural machine
182
BIBLIOGRAPHY
translation system: Bridging the gap between human and machine translation.
arXiv preprint arXiv:1609.08144.
Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M., Stolcke, A., Yu, D., and
Zweig, G. (2016). Achieving human parity in conversational speech recognition.
arXiv preprint arXiv:1610.05256.
Xu, Z., Chen, C., Lukasiewicz, T., Miao, Y., and Meng, X. (2016). Tag-aware
personalized recommendation using a deep-semantic similarity model with
negative sampling. In Proceedings of the 25th ACM International on Conference
on Information and Knowledge Management, pages 1921–1924.
Yang, D., He, J., Qin, H., Xiao, Y., and Wang, W. (2015). A graph-based
recommendation across heterogeneous domains. In Proceedings of the 24th
ACM International on Conference on Information and Knowledge Management,
pages 463–472.
Yang, D., Xiao, Y., Song, Y., Zhang, J., Zhang, K., and Wang, W. (2014). Tag
propagation based recommendation a cross diverse social media. In Proceedings
of the 23rd International Conference on World Wide Web, pages 407–408.
183
BIBLIOGRAPHY
Yang, X., Guo, Y., and Liu, Y. (2013). Bayesian-inference-based recommendation
in online social networks. IEEE Transactions on Parallel and Distributed
Systems, 24(4):642–651.
Yao, Y. and Doretto, G. (2010). Boosting for transfer learning with multiple
sources. In Proceedings of 2010 IEEE Conference on Computer Vision and
Pattern Recognition, pages 1855–1862.
Yoo, J. and Choi, S. (2009). Weighted nonnegative matrix co-tri-factorization for
collaborative prediction. Advances in Machine Learning, pages 396–411.
Zhang, S., Yao, L., and Sun, A. (2017a). Deep learning based recommender
system: A survey and new perspectives. arXiv preprint arXiv:1707.07435.
Zhang, S., Yao, L., and Xu, X. (2017b). Autosvd++: An efficient hybrid
collaborative filtering model via contractive auto-encoders. arXiv preprint
arXiv:1704.00551.
Zhang, T. and Iyengar, V. S. (2002). Recommender systems using linear classifiers.
Journal of Machine Learning Research, 2:313–334.
184
BIBLIOGRAPHY
Zhang, Y., Cao, B., and Yeung, D.-Y. (2012). Multi-domain collaborative filtering.
arXiv preprint arXiv:1203.3535.
Zhang, Y. and Koren, J. (2007). Efficient bayesian hierarchical user modeling for
recommendation system. In Proceedings of the 30th annual international ACM
SIGIR conference on Research and development in information retrieval, pages
47–54.
Zhang, Z., Lin, H., Liu, K., Wu, D., Zhang, G., and Lu, J. (2013). A hybrid
fuzzy-based personalized recommender system for telecom products/services.
Information Science, 235:117–129.
Zhao, L., Pan, S. J., Xiang, E. W., Zhong, E., Lu, Z., and Yang, Q. (2013). Active
transfer learning for cross-system recommendation. In Proceedings of the 27th
AAAI Conference on Artificial Intelligence, pages 1205–1211.
Zhao, L., Pan, S. J., and Yang, Q. (2017). A unified framework of active transfer
learning for cross-system recommendation. Artificial Intelligence, 245:38–55.
185
BIBLIOGRAPHY
Zhao, S., Du, N., Nauerz, A., Zhang, X., Yuan, Q., and Fu, R. (2008). Improved
recommendation based on collaborative tagging behaviors. In Proceedings of
the 13th international conference on Intelligent user interfaces, pages 413–416.
Zhen, Y., Li, W.-J., and Yeung, D.-Y. (2009). Tagicofi: tag informed collaborative
filtering. In Proceedings of the 3rd ACM Conference on Recommender systems,
pages 69–76.
Zheng, Y., Burke, R., and Mobasher, B. (2013). Recommendation with differential
context weighting. In Proceedings of the 21st International Conference on User
Modeling, Adaptation, and Personalization, pages 152–164.
Zheng, Y., Burke, R., and Mobasher, B. (2014a). Splitting approaches for context-
aware recommendation: An empirical study. In Proceedings of the 29th Annual
ACM Symposium on Applied Computing, pages 274–279.
Zheng, Y., Liu, C., Tang, B., and Zhou, H. (2016a). Neural autoregressive
collaborative filtering for implicit feedback. In Proceedings of the 1st Workshop
on Deep Learning for Recommender Systems, pages 2–6.
186
BIBLIOGRAPHY
Zheng, Y., Mobasher, B., and Burke, R. (2014b). Cslim: Contextual slim recom-
mendation algorithms. In Proceedings of the 8th ACM Conference on Recom-
mender Systems, pages 301–304.
Zheng, Y., Tang, B., Ding, W., and Zhou, H. (2016b). A neural autoregressive
approach to collaborative filtering. In Proceedings of the 33rd International
Conference on International Conference on Machine Learning, pages 764–773.
Zhong, E., Fan, W., Peng, J., Zhang, K., Ren, J., Turaga, D., and Verscheure, O.
(2009). Cross domain distribution adaptation via kernel mapping. In Proceedings
of the 15th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, pages 1027–1036.
Zuo, Y., Zeng, J., Gong, M., and Jiao, L. (2016). Tag-aware recommender systems
based on deep neural networks. Neurocomputing, 204:51–60.
187
Abbreviations
Roman Symbols
AE AutoEncoder
BPR Bayesian Personalized Ranking
CBT CodeBook Transfer
CDCF Cross-domain Collaborative Filtering
CDRS Cross-domain Recommender System
CDTF Cross Domain Triadic Factorization
CF Collaborative Filtering
CMF Collective Matrix Factorization
CNN Convolutional Neural Network
CST Coordinate System Transfer
CTagCDR Completer Tag-induced Cross Domain Recommendation
DL Deep Learning
188
BIBLIOGRAPHY
DSSM Deep Semantic Similarity Model
ETagCDCF Enhance Tag-induced Cross Domain Collaborative Filtering
GFK Geodesic Flow Kernel
GTagCDCF General Tag-induced Cross Domain Collaborative Filtering
ICF Item-based Collaborative Filtering
JTM Joint Topic Mining
KMM Kernel Mean Matching
MAE Mean Absolute Error
MF Matrix Factorization
MMDE Maximum Mean Discrepancy Embedding
NADE Neural Autoregressive Distribution Estimation
NLP Natural Language Processing
NMF Nonnegative Matrix Factorization
PMF Probabilistic Matrix Factorization
RBM Restricted Boltzmann Machine
RMGM Rating Matrix Generative Model
RMSE Root Mean Square Error
RNN Recurrent Neural Network
189
BIBLIOGRAPHY
SGNS Skip-Gram with Negative Sampling
STC Self-taught Clustering
SVD Singular Vector Decomposition
TA Topic Alignment
Tag-induced Cross Domain Collaborative Filtering
TagiCoFi Tag informed Collaborative Filtering
TCA Transfer Component Analysis
TCF Transfer by Collective Transfer
TL Transfer Learning
TSC Transfer Spectral Clustering
TSCDR Tag Semantic-boosted Cross Domain Recommenation
UCF User-based Collaborative Filtering
WNMCTF Weighted Nonnegative Matrix Co-Tri-Factorization
190