cross-domain recommender system through tag …...cross-domain recommender system through tag-based...

Cross-domain Recommender

System Through Tag-based

Models

Peng Hao

A thesis submitted for the Degree of

Doctor of Philosophy

Faulty of Engineering and Information Technology

University of Technology SydneyMarch 2018

CERTIFICATE OF

AUTHORSHIP/ORIGINALITY

I certify that the work in this thesis has not previously been submitted for a

degree nor has it been submitted as part of requirements for a degree except as

fully acknowledged within the text.

I also certify that the thesis has been written by me. Any help that I have received

in my research work and the preparation of the thesis itself has been acknowledged.

In addition, I certify that all information sources and literature used are indicated

in the thesis.

Signature of candidate:

Date:

05/04/2018

Production Note:

Signature removed prior to publication.

Acknowledgements

This thesis is the result of four years hard work, during which I received a lot of

help from many people. So in there I would like to show my gratitude to all of

them.

First, I would like to thank my principle supervisor, A/Prof. Guangquan

Zhang, for offering me an opportunity to conduct my research in the Decision

Systems and e-Service Intelligence (DeSI) Lab, University of Technology Sydney

(UTS), Australia. I can still remember the excitement I had when I got the official

offer from UTS. At the beginning of the research, he encouraged me to choose

the topic that I am interested in. He also taught me how to approach a research

problem in general, and was always enthusiastic to help solve the difficulties in

my life. I would also like to express my thanks to my co-supervisor, Distinguished

Professor Jie Lu. I have learned a lot from her over these years, not only the

methodology of doing research but also the skills of writing a scientific paper.

Her comments and suggestions have strengthened this thesis significantly. Her

continuous hard work and generous personality have influenced me deeply, and

will be a great treasure in my future research and work. I feel very lucky to

have both of them as my supervisors. Without their excellent supervision and

continuous encouragement, this research could not be finished on time.

Special thanks also goes to Prof. Luis Martinez for welcoming me and providing

all the necessary help during my visit to the University of Jaen. I had spent a

wonderful time with the collaboration of him. The beautiful scene and quiet life

have attracted me to visit there again in the future.

I would also express my appreciation to all the members of the DeSI Lab in the

Centre of Artificial Intelligence (CAI), for their active participation and valuable

comments in every presentation I made during my study.

I also wish to thank the financial support I received from the China Scholarship

Council (CSC) and UTS, which support me to finish my study.

Finally, I would like to thank my family members, including my wife, father,

mother and mother-in-law. Without their great love, conscious encouragement

and infinite compassion, my dream of pursuing a Ph.D. degree will not be achieved.

Very special thanks to them all!

iii

Abstract

Nowadays, data pertaining to clients are generated at such a rapid rate it is

completely beyond the processing ability of a human, which leads to a problem

called information explosion. How to quickly and automatically provide person-

alized choices for someone from a large collection of resources has become a key

factor in determining the success of many commercial activities. In this context,

recommender systems have been developed as a type of software that aims to

predict and suggest items which are relevant to a specific user by analyzing the

user’s previous interaction data with certain items. Recommender systems have a

broad application in our daily life, such as product recommendation in Amazon,

video and movie recommendation in Youtube, music recommendation in Spotify.

A fundamental brick in building most recommender systems is the collaborative

filtering-based model, which has been widely adopted due to its outstanding

performance and flexible deployment. However, this model together and its

variations suffer from the so-called data sparsity problem, which results when

user sonly rate a limited number of items. With the development of the transfer

learning technique in recent years, cross-domain recommendation has emerged as

an effective way to address data sparsity in recommender systems. The principle

of cross-domain recommendation is to exploit knowledge from auxiliary source

domains to assist recommendation making in a sparse target domain.

In the development of cross-domain recommender systems, the most important

step is to build a bridge between the domains in order to transfer knowledge. This

task becomes more challenging in disjoint domains where users and items in both

domains are completely non-overlapping. In this respect, tags are studied and

utilized to establish explicit correspondence between domains. However, how to

effectively exploit tags to increase domain overlap and ultimate recommendation

quality remains as an open challenge which needs to be addressed.

This thesis aims to develop novel tag-based cross-domain recommendation

models in disjoint domains. First, it review the existing state-of-the-art techniques

related to this research. It then provides three solutions by exploiting domain-

specific tags, tag-inferred structural knowledge and tag semantics, respectively.

To evaluate the proposed models, this thesis conducts a series of experiments

on public datasets and compare them with state-of-the-art baseline approaches.

The experimental results show the superior performance achieved by our models

in different recommendation tasks under sparse settings. The findings of this

research not only contribute to the state-of-the-art on cross-domain recommender

systems, but also provide practical guidance for handling unstructured tag data

in recommendation tasks.

v

Table of contents

CERTIFICATE OF AUTHORSHIP/ORIGINALITY i

Acknowledgements ii

Abstract iv

List of figures xi

List of tables xiii

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Research questions and objectives . . . . . . . . . . . . . . . . . . 5

1.3 Research contributions . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5 Publications related to this thesis . . . . . . . . . . . . . . . . . . 15

2 Research literature 17

2.1 Recommender systems . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1.1 Recommendation problem . . . . . . . . . . . . . . . . . . 18

TABLE OF CONTENTS

2.1.2 Classification of recommender systems . . . . . . . . . . . 20

2.1.2.1 Collaborative filtering . . . . . . . . . . . . . . . 22

2.1.2.2 Content-based recommender systems . . . . . . . 25

2.1.2.3 Hybrid recommender systems . . . . . . . . . . . 27

2.1.2.4 Context-aware recommender systems . . . . . . . 27

2.1.2.5 Deep learning based recommender systems . . . . 28

2.2 Transfer learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.2.1 Definition of transfer learning . . . . . . . . . . . . . . . . 31

2.2.2 Classification of transfer learning techniques . . . . . . . . 32

2.2.2.1 Inductive transfer learning . . . . . . . . . . . . . 32

2.2.2.2 Transductive transfer learning . . . . . . . . . . . 34

2.2.2.3 Unsupervised transfer learning . . . . . . . . . . 36

2.3 Cross-domain recommender system . . . . . . . . . . . . . . . . . 37

2.3.1 Definition of cross-domain recommender system . . . . . . 38

2.3.2 Classification of cross-domain recommendation approaches 39

2.3.2.1 Cross-domain recommendation for partially/fully

overlapping domains . . . . . . . . . . . . . . . . 40

2.3.2.2 Cross-domain recommendation for non-overlapping

domains . . . . . . . . . . . . . . . . . . . . . . . 41

3 Exploiting Domain Specific Tags for Cross-domain Recommen-

dation 45

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2 Preliminary knowledge . . . . . . . . . . . . . . . . . . . . . . . . 47

3.3 Enhanced tag-induced cross domain collaborative filtering . . . . 49

vii

TABLE OF CONTENTS

3.3.1 The alignment of domain-specific tags . . . . . . . . . . . . 49

3.3.2 Cross-domain similarities refinement . . . . . . . . . . . . 52

3.3.3 Model and inference . . . . . . . . . . . . . . . . . . . . . 53

3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.4.1 Description of dataset and experimental settings . . . . . . 58

3.4.2 Impact of parameters . . . . . . . . . . . . . . . . . . . . . 60

3.4.3 Performance comparison . . . . . . . . . . . . . . . . . . . 60

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4 Exploiting Tag-induced Structural Information for Cross-domain

Recommendation 68

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.3 Complete Tag Induced Cross-domain Recommendation . . . . . . 71

4.3.1 Step 1: Building basic inter-domain correlations using shared

tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.3.2 Step 2: Enhancing inter-domain correlations using domain-

specific tag clusters . . . . . . . . . . . . . . . . . . . . . . 76

4.3.3 Step 3: Inferring intra-domain correlations from tags in

individual domains . . . . . . . . . . . . . . . . . . . . . . 81

4.3.4 Step 4: Aggregation and Integration of Inter- and intra-

domain knowledge . . . . . . . . . . . . . . . . . . . . . . 84

4.4 Complexity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

viii

TABLE OF CONTENTS

4.5.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . 90

4.5.2.1 Evaluation Methodology . . . . . . . . . . . . . . 90

4.5.2.2 Evaluation Metric . . . . . . . . . . . . . . . . . 92

4.5.2.3 Experimental Protocol . . . . . . . . . . . . . . . 93

4.6 Parameter Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.7 Impact of latent factors . . . . . . . . . . . . . . . . . . . . . . . . 95

4.8 Sensitivity analysis on Top-k Recommendation . . . . . . . . . . . 98

4.9 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . 103

4.10 Performance under Different Sparsity Level . . . . . . . . . . . . . 106

4.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5 Exploiting Tag Semantic for Cross-domain Recommendation 110

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.2.1 Notations and Problem Formulation . . . . . . . . . . . . 112

5.2.2 Tag-induced Cross Domain Collaborative Filtering Model . 114

5.3 Tag Semantically-boosted Cross-domain Recommendation . . . . 115

5.3.1 Joint Topic Mining . . . . . . . . . . . . . . . . . . . . . . 116

5.3.2 Topic Alignment . . . . . . . . . . . . . . . . . . . . . . . 121

5.3.3 Embedding space Learning . . . . . . . . . . . . . . . . . . 123

5.3.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 126

5.4 Experiments and analysis . . . . . . . . . . . . . . . . . . . . . . 127

5.4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.4.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . 128

5.4.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . 130

ix

TABLE OF CONTENTS

5.4.4 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.4.5 Experiment Results and Analysis . . . . . . . . . . . . . . 133

5.4.5.1 The Effect of Regularization Parameters . . . . . 133

5.4.5.2 The Effect of Tag Clusters . . . . . . . . . . . . . 136

5.4.5.3 Comparison with the Baselines . . . . . . . . . . 136

5.4.5.4 The Impact of Recommendation List Size . . . . 138

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

6 Conclusions and future work 142

6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

Bibliography 147

Abbreviations 188

x

List of figures

1.1 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1 A graphical illustration of the recommendation problem . . . . . . 19

3.1 A scenario for tag-based cross-domain recommendation. In this fig-

ure, we aim to exploit knowledge from a movie domain to bootstrap

book recommendation. The unobserved rating score is denoted by

? and each tag text starts with #. . . . . . . . . . . . . . . . . . . 48

3.2 MAE and RMSE variations via changing α and β . . . . . . . . . 61

4.1 Workflow and components of CTagCDR model. . . . . . . . . . . 74

4.2 Example of tag tripartite graph constructed based on user-tag

relationship. Red squares denote shared tags in both domains,

while the green triangles and blue circles denote the filtered domain-

specific tags from both source and target domains, respectively.

The edge weight reflects the similarity between the connected tags. 80

4.3 Impact of λu and λv on the recommendation performance of CTagCDR 96

4.4 Impact of λα on the recommendation performance of CTagCDR . 97

xi

LIST OF FIGURES

4.5 Performance of RMSE and NDCG@10 on LT vs ML and ML vs

LT w.r.t. the number of latent factors . . . . . . . . . . . . . . . . 99

4.5 Performance of RMSE and NDCG@10 on FM vs LT and FM vs

ML w.r.t. the number of latent factors . . . . . . . . . . . . . . . 100

4.5 Performance of RMSE and NDCG@10 on LT vs FM and ML vs

FMw.r.t. the number of latent factors . . . . . . . . . . . . . . . . 101

4.6 Performance of NDCG@k w.r.t. the ranking position k of ranking list102

4.7 Change of recommendation performance on ML vs LT during the

increment of train data size . . . . . . . . . . . . . . . . . . . . . 107

5.1 An example of ambiguous, redundant and non-identical but seman-

tically equivalent tags . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.2 Graphical illustration of joint topic mining and topic alignment . 117

5.3 Modelling tagging data for word2vec. The tag marked by red color

denotes overlapping tag in both domains. . . . . . . . . . . . . . . 124

5.4 Performance of HR@10 and NDCG@10 w.r.t λu and λi . . . . . . 134

5.5 Performance of HR@10 and NDCG@10 w.r.t number of tag clusters135

5.6 Performance of top-N recommendation in terms of HR@N where

N ranges from 10 to 50 . . . . . . . . . . . . . . . . . . . . . . . . 139

5.7 Performance of top-N recommendation in terms of NDCG@N where

N ranges from 10 to 50 . . . . . . . . . . . . . . . . . . . . . . . . 140

xii

List of tables

3.1 Statistics of the datasets used in Chapter 3 . . . . . . . . . . . . . 59

3.2 MAE comparison with other baselines (mean ± std) . . . . . . . . 64

3.3 RMSE comparison with other baselines (mean ± std) . . . . . . . 64

4.1 Notations and corresponding descriptions used in Chapter 4 . . . 72

4.2 Statistics of datasets used in Chapter 4 . . . . . . . . . . . . . . . 90

4.3 Overall performance on six domain pairs . . . . . . . . . . . . . . 104

5.1 Symbols and corresponding descriptions used in Chapter 5 . . . . 113

5.2 The tag filtering quality constraints (Gedikli and Jannach, 2013) . 129

5.3 Dataset Variations for ML, LT and FM . . . . . . . . . . . . . . 131

5.4 Comparison of TSCDR with other baselines . . . . . . . . . . . . 137

xiii

Chapter 1

Introduction

This chapter presents the introduction of the thesis. Section 1.1 describes the

background for conducting this research. Section 1.2 presents the research ques-

tions and the corresponding objectives we aim to achieve. Section 1.3 highlights

the contributions of this research. Section 1.4 introduces the structure of the

thesis and Section 1.5 lists the publications related to this thesis.

1.1 Background

On 6 August 1991, 26 years ago, the World Wide Web (WWW) became publicly

available. After several decades of development, the Internet has now become

the main way by which people around the world share information globally.

However, everything has two sides. The Internet itself offers great convenience

in terms of accessing various sources of information, but the increasing amount

of complex and heterogeneous data generated every second has become a serious

burden on human processing abilities. To address the problem of information

1

1.1 Background

explosion, search engine (Brin and Page, 2012; Page et al., 1999) and recommender

systems (Adomavicius and Tuzhilin, 2005; Beel et al., 2016; Lu et al., 2015b; Ricci

et al., 2011) have been developed as two alternative solutions. As opposed to a

search engine which searches and identifies items in a database according to the

keywords or characters specified by a user, a recommender system is a software

tool that generates recommendations to users based on their past preferences.

The goal of recommendation systems is to provide the right information on

products/services to the right customers at the right time. This can be achieved

by automatically filtering out unrelated products and suggesting only relevant ones.

There are many successful applications of recommender systems. Representative

examples include but are not limited to: product recommendation in E-commerce

websites (e.g., Amazon, eBay), movie and video recommendation (e.g., YouTube),

news recommendation (e.g., Yahoo), music recommendation (e.g., Spotify), social

recommendation (e.g., Facebook), App recommendation (e.g., Apple).

In general, existing recommendation techniques are mainly classified into three

categories, collaborative filtering (Koren and Bell, 2015; Sarwar et al., 2001; Su

and Khoshgoftaar, 2009), content-based (Lops et al., 2011; Mooney and Roy, 2000;

Pazzani and Billsus, 2007) and hybrid (Burke, 2002, 2007; Ghazanfar and Prugel-

Bennett, 2010) recommendation. Collaborative filtering (CF) is the most successful

and widely used technique for building a recommender system. It helps people

make choices based on the opinions of other people who share similar interests. The

content-based (CB) recommendation technique recommends items that are similar

to the ones preferred by a specific user in the past. Each recommendation technique

has its own merits and drawbacks. Therefore, hybrid recommendation is proposed

to gain higher performance and to avoid the drawbacks of pure recommendation

2

1.1 Background

techniques. The most common practice in hybrid recommendation is to combine

CF with other recommendation techniques in an attempt to solve cold-start,

sparseness and/or scalability problems (Kim et al., 2006).

Although great progress has been achieved in the area of recommender systems,

it is restricted to offering recommendations of items belonging to only one single

domain. There is a strong demand for joint recommendation in our daily life. For

example, instead of suggesting similar movies to a user when he browses a movie

in Netflix, other types of items provided by different websites such as music, books,

and video games somehow related to the movie, should be recommended to the

user as well. It would be beneficial to exploit user preferences in different domains

in order to build a general model to better capture user interests. Furthermore, it

is well known that the data sparsity problem, which is caused by the fact that users

generally rate a limited number of items, has posed a key challenge for CF-based

recommender systems (Hwangbo and Kim, 2017; Sarwar, 2001). Analyses show

that there could be dependencies and correlations between user preferences in

different domains, user knowledge acquired in one domain can be transferred and

exploited in several other related domains. The data sparsity problem together

with the desire to offer joint recommendations give us a strong motivation to

develop novel recommendation techniques.

In this context, the cross-domain recommender system has received much atten-

tion from both research and industry communities in the last few years (Cremonesi

et al., 2011; Fernández-Tobías, 2017; Fernández-Tobías et al., 2012; Khan et al.,

2017; Li, 2011). It can be considered as a practical application of transfer learning

techniques (Lu et al., 2015a; Pan and Yang, 2010; Weiss et al., 2016) in the field

of recommender systems, which aims to exploit existing knowledge from auxiliary

3

1.1 Background

source domains to facilitate recommendation making in a sparse target domain.

As shown in (Fernández-Tobías, 2017), a major challenge in the development of

cross-domain recommender systems is how to identify the correspondence between

the involved domains in order to build a bridge to support knowledge transfer. If

the bridge is not built correctly, insufficient transfer of positive knowledge and at

the same time the occurrence of negative knowledge transfer (Pan and Yang, 2010)

would be the two serious consequences leading to the decline of recommendation

performance.

One intuitive solution for addressing the above challenge has been explored by

assuming users or items are fully or partially shared in both source and target

domains (Jiang et al., 2016; Pan et al., 2011a; Shi et al., 2013b; Singh and Gordon,

2008), so that the knowledge can be transferred through those overlapping users or

items. However, due to the privacy settings in different companies or platforms, it is

not common to have overlapping users and items between heterogeneous domains.

With respect to the situation of disjoint domains, where the correspondence

between users and between items is not known in advance, most existing research

only exploits user preferences in the form of numerical ratings to learn an implicit

rating pattern to share between domains (Gao et al., 2013; Li et al., 2009a,b,

2015). The rating pattern represents the average ratings that user group would

give on the item group. In addition to linking heterogeneous domains by implicit

rating patterns, rich side information concerning users and items are exploited

to build an explicit domain link (Pan, 2016; Tan et al., 2014), such as through

tags (Fang et al., 2015; Fernández-Tobías and Cantador, 2014; Shi et al., 2011),

and reviews (Cao et al., 2017; Chakraverty and Saraswat, 2017; Song et al., 2017).

4

1.2 Research questions and objectives

Explicit domain links built by tags are more effective than implicit rating

patterns in bridging heterogeneous domains (Shi et al., 2011), but the only efforts

devoted to this direction is to use overlapping tags to align different feature

space (Fang et al., 2015; Fernández-Tobías and Cantador, 2014; Shi et al., 2011).

Additional information in tagging data needs to be explored and integrated into

cross-domain recommendation, such as: 1) the distinct information encoded in

domain specific tags; 2) the knowledge structure inferred by tags and 3) tag

semantics. Therefore, this thesis fills the gap by developing a set of novel tag-

based cross-domain recommendation models, which exploit the aforementioned

information to improve recommendation performance.


The purpose of this research is to use user-generated tags to automatically establish

the correspondence between heterogeneous domains, so that knowledge from

the source domain can be transferred to the target domain with determined

correspondence. To this end, this research will answer the following questions:

Research Question 1: How to integrate domain specific tags into the framework

of cross-domain recommendation?

To eliminate domain heterogeneity, overlapping tags are generally exploited as

common features to align different domains. However, there are two drawbacks in

the above approach. First, only a limited number of overlapping tags are shared

between heterogeneous domains (Jiang et al., 2016). In this context, a weak

domain correlation will be established because most of the users and items in

both domains are not covered by the overlapping tags. As a result, suboptimal

5


results will be achieved due to inadequate knowledge transfer between domains.

Second, it is a waste to discard abundant domain-specific tags, which correspond

to the unshared parts of tags in the individual domains. Moreover, domain-specific

tags are capable of capturing the unique features of individual domains compared

to the generality of overlapping tags. If abundant domain-specific tags can be

integrated into cross-domain recommendation, more knowledge transfer bridges

will be established to promote knowledge transfer.

Research Question 2: How to integrate the structural knowledge inferred by

tags into the framework of cross-domain recommendation?

The basic assumption in transfer learning is that the involved domains may

share a certain knowledge structure, which can be extracted by preserving the

important properties of the original data (Long et al., 2014b). In this respect, intra-

and inter-domain knowledge structure, which reflects the relationship within a

domain and between domains respectively, needs to be extracted as constraints to

regularize knowledge transfer. In terms of inter-domain knowledge structure, the

similarity between cross-domain users and between cross-domain items by profiling

with tags is a valuable source for discovering the direct correspondence between the

involved domains. Furthermore, the intra-domain similarity to alleviate negative

transfer by preserving the geometric structure in each domain should also be

refined with tags. However, there is little research in the literature that exploits

tags to infer both the intra- and inter-domain knowledge structure to improve

recommendation performance.

Research Question 3: How to integrate the semantic information of tags into

the framework of cross-domain recommendation?

6


Tags are keywords or short phrases that are assigned to items by users. Cap-

tured in these tags is a great deal of information that is highly relevant to the

items. However, the uncontrolled vocabulary used by users has resulted in sparse,

redundant and ambiguous tag information (Gedikli and Jannach, 2013; Yang et al.,

2014). Handling unstructured tagging data to address uncontrolled vocabulary

problem is still an open challenge when exploiting tags to establish an explicit

domain linkage. In this context, semantics from tags can be harvested to correlate

those non-identical but semantically related tags to remove irrelevant information

and noise in tags (Jabeen et al., 2016).

According to the above research questions, this thesis proposes to achieve the

following research objectives:

Research Objective 1: To develop a new tag-based cross-domain recommen-

dation model by exploiting domain specific tags.

This objective corresponds to Research Question 1. Existing studies focus

on utilizing overlapping tags as common features to bridge different domains (Fang

et al., 2015; Fernández-Tobías and Cantador, 2014; Shi et al., 2011), however,

domain specific tags are discarded due to unaligned feature space. A possible

solution to this problem is to adapt the spectral feature alignment algorithm

in (Pan et al., 2010a) to align domain-specific tags from different domains into

unified clusters, with the help of overlapping tags. In this way, the clusters

of domain-specific tags can be used to reduce the gap between heterogeneous

domains, which can be further applied as new features to profile users and items

for the refinement of cross-domain similarity. Therefore, this study will develop a

new cross-domain recommendation model by exploiting domain specific tags to

increase links between heterogeneous domains.

7



dation model by exploiting tag-inferred structural knowledge.

This objective corresponds to Research Question 2. To promote positive

knowledge transfer and avoid negative knowledge transfer in cross-domain recom-

mendation, both intra- and inter-domain correlations that preserve the geometric

property of a domain and between domains should be learned in order to guide

knowledge transfer. Most studies exploit either overlapping tags (Fang et al.,

2015; Fernández-Tobías and Cantador, 2014; Shi et al., 2011) or domain-specific

tags (Hao et al., 2016) in building an inter-domain correlation. The complemen-

tary role of different types of tags needs to be explored to build a comprehensive

inter-domain correlation. With respect to the intra-domain correlation, tag-based

user similarity (Zhen et al., 2009) that preserves the distance between users has

been seamlessly integrated into the framework of CF to generate more precise

recommendations. However, such an extension suffers from two drawbacks: 1) the

relationship between items and tags has not been studied; 2) the learned intra-

domain correlation only boosts knowledge transfer in a single domain. Therefore,

this study will exploit the relationship between users and tags and between items

and tags to infer both intra- and inter-domain correlations, which are designed to

work together for more positive knowledge transfer.


dation model by exploiting tag semantics.

This objective corresponds to Research Question 3. Some efforts have been

made in relation to tag-aware personalized recommendation using content-based

filtering (Cantador et al., 2010) or CF (Bouadjenek et al., 2013; Tso-Sutter et al.,

2008). However, as users can freely choose their own vocabulary in an arbitrary

8

1.3 Research contributions

language, this has resulted redundant and ambiguous tag information, which is

a further obstacle to the capture of the relationships between cross-domain tags.

In this respect, some studies proposed to utilize auxiliary information sources,

such as ontology (Fernández-Tobías et al., 2011), or a knowledge graph (Yang

et al., 2014), for the semantic matching of cross-domain tags. However, the

success of such approaches heavily depends on the construction of an external

knowledge base. Other solutions are to use auto-encoder (Zuo et al., 2016) or

deep learning technique (Xu et al., 2016) to map a tag-based user/item profile to

an abstract space for abstract matching. Although these methods achieved better

performance, they failed to explore the explicit semantic relationship between the

respective tags of the domains. Therefore, this study will use natural language

processing (NLP) techniques to explicitly determine the semantic matching for

cross-domain tags and integrate the semantic information of tags into cross-domain

recommendation.


Overall, this thesis makes several contributions as follows:

• The development of an enhanced tag-induced cross-domain collaborative

filtering model by exploiting abundant domain-specific tags.

An enhanced tag-induced cross-domain collaborative filtering model has been

presented in which abundant domain specific tags, not limited overlapping

tags, are utilized to increase the connections between heterogeneous domains.

To align diverse domain-specific tags, spectral clustering together with tag

co-occurrence patterns have been exploited to group domain-specific tags.

9


Based on the tag clusters, a new user and item profile can be defined

and utilized to compute cross-domain similarity for regularizing knowledge

transfer. The experimental results demonstrate that the proposed model is

capable of establishing a strong domain connection to support knowledge

transfer when the number of overlapping tags is scarce. Furthermore, domain-

specific tags are shown to be beneficial for adding more information about

user preferences into recommendation. Details can be found in Chapter 3

• The development of a complete tag-induced cross-domain recommendation

model by exploiting structural knowledge inferred with tags.

A complete tag-induced cross-domain recommendation model is proposed in

which both inter- and intra-domain correlations are considered as structural

knowledge to promote knowledge transfer. In this model, not only overlap-

ping tags but also domain-specific tags are exploited to play complementary

roles in the establishment of inter-domain correlation. Additionally, intra-

domain similarity between users and between items has also been introduced

by distinguishing the tag distribution in the individual domain, which is

likened to building a compact intra-domain correlation to support knowledge

transfer at a group level. Experiments on three public datasets and with five

state-of-the-art baseline approaches demonstrate that the proposed model

performs well in both rating prediction and item recommendation tasks.

Details can be found in Chapter 4

• The development of a tag semantically-boosted cross-domain recommenda-

tion model by exploiting semantic information of tags.

A tag semantically-boosted cross-domain recommendation model is devel-

10

1.4 Thesis structure

oped which aims to automatically group those non-identical but semantically

related tags to increase the domain overlap and ultimately the recommen-

dation quality. In this model, the word2vec technique is utilized to learn

a semantic representation of tags. Then, semantically equivalent tags are

successfully merged into the same group according to the learned semantic

representation. Derived tag clusters which span across domains has been

exploited as a joint embedding space for aligning heterogeneous domains. By

mapping users and items from both source and target domains to the same

embedding space, similar users and items across domains can be identified

and connected. As a result, knowledge is transferred from the source domain

to the target domain via matched users and items to improve recommenda-

tion performance. Experimental results on multiple datasets demonstrate

that our proposed model outperforms other state-of-the-art baselines in

top-N recommendation task. Details can be found in Chapter 5


The content of all the chapters is described in more detail next and the structure

of the whole thesis is shown in Figure 1.1.

• Chapter 2 provides an overview of the state-of-the-art related to this re-

search. In particular, we focus on a survey of three specific fields, which

are recommender systems, transfer learning and cross-domain recommender

systems, respectively. In the introduction to recommender systems, first the

general definition of a recommender system is described and later a formal

formulation of the recommendation problem is given as well. Then a catego-

11


rization and description of existing recommendation techniques is presented,

namely content-based, collaborative filtering, hybrid-based, context-aware

and deep learning based recommendations. In the introduction to transfer

learning, a mathematical definition of the transfer learning problem and

the classification of existing approaches based on knowledge transfer type

are provided. Finally, in the introduction of the cross-domain recommender

systems, a general definition of cross-domain recommendation problem is

proposed. Next, a categorization of cross-domain recommendation tech-

niques is presented, distinguishing implicit and explicit knowledge transfer

linkages.

• Chapter 3 develops an enhanced tag-induced cross-domain collaborative

filtering model. In this chapter, abundant domain-specific tags are first

investigated to increase the domain correspondence between heterogeneous

domains. To this end, spectral clustering together with a defined tag co-

occurrence pattern are utilized to group domain-specific tags into clusters,

which can be used as aligned features to reduce the gap between domains.

The empirical results of the proposed model are reported and discussed in

relation to rating prediction task using the well-known MovieLens 1 and

LibraryThing 2 datasets.

• Chapter 4 develops a complete tag-induced cross-domain recommendation

model by exploiting structural knowledge inferred with tags. Instead of

exploiting overlapping tags or domain-specific tags separately in building1MovieLens dataset, https://grouplens.org/datasets/movielens/2LibraryThing dataset, http://www.macle.nl/tud/LT

12


Chapter 1Introduction

Chapter 2Literature Review

RQ 1Exploration of

domain specific tags

RQ 2Consolidation of tag-

inferred structural knowledge

RQ 3Exploitation of tag semantics

Chapter 3Exploiting Domain Specific Tags for

Cross-domain Recommendation

Chapter 5Exploiting Tag Semantics for Cross-domain

Recommendation

Chapter 4Exploiting Tag-

inferred structural knowledge for Cross-domain

Recommendation

Chapter 6Conclusions and Future Work

Fig. 1.1 Thesis structure

13


an inter-domain correlation, this chapter applies both overlapping tags

and domain-specific tags in the establishment of a comprehensive inter-

domain correlation. In particular, on the basis of a weak domain connection

built by overlapping tags, the clusters of domain-specific tags are further

exploited to profile both users and items in order to refine cross-domain

similarity. This part is similar to the work in Chapter 3. However, to

overcome the problem where the tag cluster needs to be fixed in advance, a

more advanced clustering approach, namely Affinity Propagation (Frey and

Dueck, 2007), is utilized to learn the number of clusters adaptively from

the data. Furthermore, intra-domain correlation in the form of tag-based

user-similarity and item-similarity is also integrated into the framework of

cross-domain recommendation in order to promote more knowledge transfer.

To evaluate the proposed model, extensive experiments are conducted for

both rating prediction and item ranking tasks, with datasets composed of

MovieLens 1, LibraryThing 2 and LastFM 3.

• Chapter 5 develops a tag semantically-boosted cross-domain recommendation

model by exploiting tag semantics. The previous two chapters only consider

the similarity between cross-domain tags at the lexical level, whereas this

chapter uses the word2vec technique to determine an explicit semantic

matching for tags. As a result, tags that are semantically related, but

use different words, are captured and grouped together to eliminate noise

in the tagging data. Specifically, in this chapter, word2vec is first used

to learn a semantic representation of tags. Then, semantically equivalent3LastFM dataset, https://grouplens.org/datasets/hetrec-2011/

14







representation. Derived tag clusters which span across domains has been

exploited as a joint embedding space for aligning heterogeneous domains. By

mapping users and items from both source and target domains to the same

embedding space, similar users and items across domains can be identified

and connected. As a result, knowledge is transferred from the source domain

to the target domain via matched users and items to improve recommenda-

tion performance. Experimental results on multiple datasets demonstrate

that our proposed model outperforms other state-of-the-art baselines in

top-N recommendation task. Details can be found in Chapter 5


The content of all the chapters is described in more detail next and the structure

of the whole thesis is shown in Figure 1.1.

• Chapter 2 provides an overview of the state-of-the-art related to this re-

search. In particular, we focus on a survey of three specific fields, which

are recommender systems, transfer learning and cross-domain recommender

systems, respectively. In the introduction to recommender systems, first the

general definition of a recommender system is described and later a formal

formulation of the recommendation problem is given as well. Then a catego-

11

1.5 Publications related to this thesis

2. Lu, J., Behbood, V., Hao, P., Hua Z., Xue S. and Zhang, G. 2015,

‘Transfer learning using computational intelligence: a survey’, Knowledge-

Based Systems, vol. 80, pp. 14-23. (ERA Rank: B)

3. Hao P., Lu, J. and Zhang, G. 2017, ‘Tag Semantically-boosted Cross-

domain Recommendation’, submitted to Decision Support System.

(ERA Rank: A*)

4. Hao, P., Zhang, G. and Lu, J. 2016, ‘Enhancing cross domain rec-

ommendation with domain dependent tags’, in proceedings of the

2016 IEEE International Conference on Fuzzy Systems, Canada, pp.

1266-1273. (ERA Rank: A)

5. Hao, P., Zhang, G., Behbood, V. and Zheng, Z. 2014, ‘A fuzzy domain

adaptation method based on self-constructing fuzzy neural network’, in

proceedings of the 11th International FLINS Conference on Decision

Making and Soft Computing, Brazil, pp. 676-681. (ERA rank: B)

16

Chapter 2

Research literature

This chapter presents a discussion of relevant studies associated with this research.

From the perspective of working principle, cross-domain recommendation is con-

sidered as a practical application of transfer learning in recommender systems

for addressing data sparsity problem. In this chapter, we provide an in-depth

overview of the involved techniques, including recommender systems, transfer

learning and cross-domain recommender systems, respectively. First in Section 2.1,

the recommendation problem and most popular techniques in recommender sys-

tems are introduced. Then Section 2.2 reviews the definition and categorization of

transfer learning techniques. Finally, in Section 2.3 we describe the general formal-

ization of cross-domain recommendation problem and presents the categorization

of cross-domain recommendation approaches.

17

2.1 Recommender systems


With the growing amount of information on the internet provided by companies to

meet the needs of customers, it becomes harder for a human person to process large

amount of information gathered around him/her. In this context, recommender

systems (Adomavicius and Tuzhilin, 2005; Beel et al., 2016; Burke, 2002; Fernández-

Tobías, 2017; Lu et al., 2015b; Ricci et al., 2011; Su and Khoshgoftaar, 2009;

Zhang et al., 2017a) are developed as an effective information filtering tool that

helps people discover the most favorite information from a large space of options.

Recommender systems are beneficial for both customers and products or ser-

vices providers, they are not only able to save time for customers by automatically

filtering relevant items but also increase user engagements to promote sales for

business (Jannach and Adomavicius, 2016). Therefore, recommender systems

are widely applied in our daily life, including but not limited to: product rec-

ommendation (e.g., Amazon, eBay), video, TV or movie recommendation (e.g.,

Youtube, Netflix), music recommendation (e.g., Spotify), friend recommendation

(e.g., Facebook), news recommendation (e.g., Yahoo) and job recommendation

(e.g., LinkedIn).

2.1.1 Recommendation problem

In general terms, recommender systems are a special kind of software programs

or tools designed to estimate user preference on items that they may have never

interacted before (Ricci et al., 2011).

The input to the recommender systems include user features (e.g. age, gender

and occupation), item features (e.g. content description, metadata), user-item

18


Fig. 2.1 A graphical illustration of the recommendation problem

interactions (e.g., past ratings, purchase data and clicking/browsing history) and

other additional context information (e.g., time, place and mood). The output

varies depending on the recommendation task, it can be predicted rating scores for

the rating prediction task, a ranked list of items for the top-N recommendation

task and correct categories of candidate items for the classification task (Zhang

et al., 2017a). A graphical representation of the recommendation problem is shown

in Figure 2.1.

More formally, (Adomavicius and Tuzhilin, 2005) presented a mathematical

formulation of recommender systems, which described recommender system as

19


a utility function to find unseen items that maximize the expectation of users.

Specifically, the utility function f aims to maximize preference score of user u on

item i as follows,

rui = arg maxu∈UIu⊆I

f(u, Iu) (2.1)

where U , I denotes user and item sets in a system, respectively, and Iu denotes

the set of items for which user u previously expressed preferences. Based on the

estimated score rui, recommender systems can rank candidate items and decide

whether to recommend item i to user u. Note that in recommender systems,

user-item interactions are usually represented as a matrix R of size |U | × |I|,where rows correspond to users and columns correspond to items. The value

rui ∈ R refers to the level of preference or usefulness of item i to user u, it can

be explicit rating score within certain range (e.g., 0-5) or implicit feedbacks (e.g.,

clicking/listening counts). Since only a small fraction of items are exposed to

users, most entries in matrix R are missing and need to be evaluated by utility

function f defined in Eq. 2.1.

2.1.2 Classification of recommender systems

With the rapid development of recommender system research, numerous and

diverse recommendation approaches have been proposed. Recommender systems

are generally classified into three categories: collaborative filtering, content-based

and hybrid recommender systems (Adomavicius and Tuzhilin, 2005; Lu et al.,

2015b; Park et al., 2012). Until recently, context has been recognized as an

important information source for improving recommendation performance, and as

a result context-aware recommender systems (Adomavicius and Tuzhilin, 2011) are

20


studied as new models in recommender system. In addition, since deep learning

has gained successful applications in many fields, such as computer vision, speech

recognition and natural language processing, more and more researchers have

devoted to apply deep learning techniques in making recommendation and develop

deep learning based recommender systems (Zhang et al., 2017a).

In the next subsections, we will review representative works for each catego-

rization listed as follows:

• Collaborative filtering (see 2.1.2.1), which only exploits user past preference

data to predict items for which a user might like in the future. Collaborative

filtering methods can be further divided into two groups: memory-based and

model-based collaborative filtering;

• Content-based recommender systems (see 2.1.2.2), which builds item profile

or representation by extracting content features from item descriptions and

suggests similar items to the ones that user liked in the past based on the

derived representation. Content-based recommender systems are commonly

used to recommend text-based items;

• Hybrid recommender systems (see 2.1.2.3), which are implemented by com-

bining the advantages of both collaborative filtering and content-based

recommender systems. The recommendation performance of hybrid is

demonstrated to be better than pure collaborative filtering and content-

based methods;

• Context-aware recommender systems (see 2.1.2.4), which incorporate con-

textual information to deliver more accurate and relevant recommendations.

21


• Deep learning based recommender systems (see 2.1.2.5), which introduces

deep learning techniques to build advanced recommender systems.

2.1.2.1 Collaborative filtering

As one of most popular and well-known methods in building recommender systems,

collaborative filtering (CF) (Herlocker et al., 1999; Koren and Bell, 2015; Sarwar

et al., 2001; Schafer et al., 2007; Su and Khoshgoftaar, 2009; Wang et al., 2006)

predicts interests of a user based on the analysis of tastes and preferences of other

users in the system. A key advantage of CF-based approaches is that it does not

rely on additional content information about items except for user past preference

data, either in the form of explicit rating scores or implicit indications. As a

result, CF-based recommender systems are able to be deployed in a wide range of

recommendation scenarios. There are two main types of CF-based approaches:

Memory-based CF

In memory-based CF, the prediction is made on the basis of similarity between

users or items, which further refers to user-user CF (Resnick et al., 1994) and

item-item CF (Sarwar et al., 2001), respectively.

User-user CF assumes that users with similar tastes in the past will share

same preference on items in the future. This algorithm is effective but requires

high computation cost for computing user-user similarity matrix. It becomes more

serious when a user is dynamically added into a large system since we need to re-

compute the similarity for every user pair. Similar to user-user CF, item-item CF

also needs to compute similarity matrix but proceeds in an item-centric manner.

The underlying assumption of item-item CF is that a user who bought item x will

22


enjoy similar item y as well. This algorithm takes lesser time than user-user CF

due to the relative static state of items. In practice, Amazon adopts item-item

CF as its main recommendation technique.

To compute the similarity, Pearson correlation and cosine based similarity are

two commonly used metrics. For example, the Pearson correlation between two

users u and v is defined as following:

suv =∑

i∈Iuv(rui − ru)(rvi − rv)√∑

i∈Iuv(rui − ru)2

√∑i∈Iuv

(rvi − rv)2(2.2)

where Iuv denotes the set of items liked by both users u and v. ru =∑

i∈Iurui

|Iu| ,

rv =∑

i∈Ivrvi

|Iv | . Iu and Iv are the set of items preferred (or rated) by user u and

user v, respectively. Similarly, the cosine-based similarity between users u and v

is defined as:

suv =∑

i∈Iuvruirvi√∑

i∈Iur2

ui

√∑i∈Iv

r2vi

(2.3)

Aforementioned similarity metrics can also be applied to measure similarity

between items.

The performance of memory-based CF relies on the amount of co-rated items

or co-rating users, which decreases when user-item interactions are sparse. This

feature hinders the application of memory-based CF on the large and sparse

datasets.

Model-based CF

In the implementation of model-based CF, various machine learning and data

mining techniques have been applied by learning a pattern from training data,

23


which assumes observed data such as users, items and ratings are generated by the

pattern. Examples include Bayesian network (Yang et al., 2013; Zhang and Koren,

2007), clustering methods (George and Merugu, 2005; Shepitsen et al., 2008),

classification based methods (Zhang and Iyengar, 2002), latent semantic based

models (Hofmann, 2004), latent factor based models (Koren and Bell, 2015), latent

Dirichlet allocation (Marlin, 2004) and Markov decision process based models (Su

and Khoshgoftaar, 2009).

Due to the superior performance in Netflix competition, latent factor based

models gradually replace memory-based CF in building most effective recommender

systems. Latent factor based model utilizes matrix factorization (MF) (Koren

et al., 2009) as its core technique and models every user or item with a vector of

latent factors, so that the preference of users on items can be measured on the

latent space. Those latent factors are meaningless in numbers and their physical

explanation depends on the recommendation scenarios. For example, in a movie

recommendation, the factor can be actors, genre or other related information

when describing a movie. It can also be interpreted as age, gender, preference

style when characterizing a person. More specifically, MF approximates user-item

interaction matrix R by multiplying two low-rank matrices U and V that represent

user latent factors and item latent factors, respectively,

R ≈ U × V (2.4)

where row i of U denotes the latent factors of user i, and column j of V denotes

the latent factors of item j.

24


Later, probabilistic mtrix factorization (PMF) (Mnih and Salakhutdinov,

2008) was proposed to extend MF into a probabilistic framework and shown

performs better than MF. However, both MF and PMF learn a global perspective

of all users and all items without consideration of their own characteristics. For

example, some users prefer to give higher ratings than others or some items tend

to receive higher ratings than other items. To cope with this systematic tendency,

Koren et al. (Koren et al., 2009) proposed biasedSVD on the basis of singular

vector decomposition (SVD) by introducing user and item bias terms. Further, to

integrate the implicit feedbacks with explicit rating scores, SVD++ (Koren, 2008)

was proposed to enhance the biasedSVD model.

Although model-based CF has achieved great success, it still suffers from

data sparsity problem (Grčar et al., 2005). Inaccurate recommendations will be

generated for those users and items who have fewer rating scores.

2.1.2.2 Content-based recommender systems

Content-based recommender systems (Lops et al., 2011; Mooney and Roy, 2000;

Pazzani and Billsus, 2007) recommend items that are similar to the ones preferred

by the users in the past. The principle of content-based recommender systems

includes two steps:

• It first analyses the description of the preferred items by a particular user

in order to find out common attributes, which can be used to distinguish

items. These attributes are kept in the user profile;

25


• It then compares attributes of each item with the user profile, and as a

result only the items that have a higher degree of similarity with the user

profile would be recommended.

The advantage of content-based recommender systems is that it adopts semantic

content of items and recommends items to a specific user that is similar to the

preferred items in his/her profile. As a result, content-based recommender systems

would be able to recommend new items and unpopular items. Furthermore, it can

provide a clarification of recommended items by listing content-features based on

which an item is to be recommended. It does not need to have information about

preferences of other users in making recommendations, so it does not suffer from

the sparseness problem associated to collaborative filtering.

One of the main limitations of content-based recommender systems is the

overspecialization problem. It can only recommend items to a user according

to the preferred items in his/her user profile, thus, it cannot recommend items

outside the user’s profile. Additionally, in some particular cases, it may not be

desirable for a recommender system to recommend too similar items to users,

such as different news articles that describe the same event. Another limitation

of content-based recommender systems is the item content dependency problem.

As content-based recommender systems generate recommendations according to

contents of items, it is hard to use content-based method to recommend items

which cannot be represented as keywords, such as image and movies. Lastly,

the recommendation could not be provided correctly when there is not enough

information to build a solid profile of a user.

26


2.1.2.3 Hybrid recommender systems

As it known to all, each recommendation technique has its own strength and

drawback. Hybrid recommender systems are developed to gain higher performance

and to avoid the drawback of individual recommender system (Burke, 2002, 2007;

Ghazanfar and Prugel-Bennett, 2010).

The most common practice for developing hybrid recommender systems is

to combine CF with other recommendation approaches in an attempt to avoid

cold-start, sparseness and/or scalability problems (Kim et al., 2006; Shambour

and Lu, 2012; Zhang et al., 2013). Several combination methods have been

employed, such as weighting (i.e., combine scores of several recommendation

techniques) (Burke, 2002), switching (i.e., switch between recommendation tech-

niques depending on the current situation) (Lekakos and Caravelas, 2008), mixed

(i.e., present recommendations from several recommendation techniques simulta-

neously) (Barragáns-Martínez et al., 2010), feature augmentation (i.e., use the

output from one recommender system as input feature to anther one) (Burke,

2005), cascade (i.e., refines recommendations of a recommender system by another

one) (Lampropoulos et al., 2012).

2.1.2.4 Context-aware recommender systems

In general, users interact with the system within a particular context and that

preferences for items within one context may be different from those in another

context. Context-aware recommender systems take into account contextual factors,

such as location, time and company, in generating more relevant recommenda-

tions (Adomavicius and Tuzhilin, 2011; Verbert et al., 2010, 2012). In contrasted

27


to the traditional recommender system models, the rating function of CARS can

be viewed as:

R : Users × Items × Contexts = Ratings (2.5)

Three representative approaches have been designed to deal with contextual

preferences, which are contextual prefiltering, contextual postfiltering and contex-

tual modelling approaches, respectively. In a contextual prefiltering approach, the

contextual information is used to filter out irrelevant information before applying

a traditional recommender system model (Adomavicius et al., 2005; Codina et al.,

2013a,b; Zheng et al., 2013, 2014a). In a contextual postfiltering approach, the

recommendation results from traditional recommender system models are further

filtered using contextual information (Panniello et al., 2009; Ramirez-Garcia and

García-Valdez, 2014). As opposed to contextual prefiltering and contextual postfil-

tering approaches which rely on traditional models to generate recommendations,

contextual modelling approaches model the contextual information directly in

a recommendation function (Hariri et al., 2011; Hidasi and Tikk, 2012; Rendle

et al., 2011; Zheng et al., 2014b).

Overall, the field of context-aware recommender systems is promising, but

much work is needed to explore it comprehensively.

2.1.2.5 Deep learning based recommender systems

Deep learning (DL) (Bengio et al., 2013, 2012, 2009; Deng et al., 2014; Hinton

et al., 2006; Hinton and Salakhutdinov, 2006; LeCun et al., 2015; Schmidhuber,

2015) is a hot and emerging topic in both data mining and machine learning

areas. It learns multiple levels of representation and abstraction from data for

28


supervised or unsupervised learning tasks. Initially DL techniques were applied to

computer vision (He et al., 2016; Jia et al., 2014; Krizhevsky et al., 2012; LeCun

et al., 1990; Simonyan and Zisserman, 2014; Szegedy et al., 2015) and speech

recognition (Graves and Jaitly, 2014; Graves et al., 2013; Hinton et al., 2012;

Xiong et al., 2016). Later deep models are applied to natural language processing

(NLP) tasks, such as semantic parsing (Kim, 2014; Socher et al., 2011a; Weston

et al., 2012), machine translation (Cho et al., 2014a,b; Deselaers et al., 2009;

Sutskever et al., 2014; Wu et al., 2016) and sentiment classification (Glorot et al.,

2011; Kim, 2014; Maas et al., 2011; Socher et al., 2011b).

The tremendous success of DL achieved in other research fields alongside

with the capability of capturing non-linear relationship from abundant accessible

data sources such as contextual, visual information, have bring out more rev-

olutions in the design of recommendation architecture. First attempt at using

DL technique for recommender systems involves restricted boltzmann machine

(RBM) (Salakhutdinov et al., 2007). Several recent approaches use autoencoder

(AE) (Kuchaiev and Ginsburg, 2017; Sedhain et al., 2015; Strub and Mary, 2015),

forward neural network (He et al., 2017), recurrent neural network (RNN) (Wu

et al., 2017), convolutional neural network (CNN) (Nguyen et al., 2017; Wang

et al., 2017), deep semantic similarity model (DSSM) (Elkahky et al., 2015; Xu

et al., 2016) and neural autoregressive distribution estimation (NADE) (Zheng

et al., 2016a,b).

In addition to developing deep recommendation models by exploiting single DL

techniques, there are also studies proposed to integrate traditional recommendation

models with DL in the manner of loosely coupled or tightly coupled. The difference

lies in whether the parameters of recommendation models and DL are optimized

29

2.2 Transfer learning

simultaneously. For instance, Zhang et. al. (Zhang et al., 2017b) proposed to learn

item feature representations via AE and then integrated them into the classic

recommendation model SVD++. In contrast, a general Bayesian deep learning

framework consisting of two tightly hinged components: perception component

(act by deep neural network) and task-specific component (act by PMF) was

proposed in (Wang and Yeung, 2016) to seamlessly combine deep learning and

recommendation model. A comprehensive investigation about the integration of

DL with traditional recommender systems is introduced in (Zhang et al., 2017a).


Although machine learning technologies have attracted a remarkable level of

attention from researchers in different computational fields, most of these technolo-

gies work under the common assumption that the training data (source domain)

and the test data (target domain) have identical feature spaces with underlying

distribution. As a result, once the feature space or the feature distribution of the

test data changes, the prediction models cannot be used and must be rebuilt and

retrained from scratch using newly-collected training data, which is very expensive

and sometimes not practically possible. Similarly, since learning-based models

need adequate labeled data for training, it is nearly impossible to establish a

learning-based model for a target domain which has very few labeled data available

for supervised learning.

To address aforementioned problems, transfer learning (TL) (Cook et al.,

2013; Long et al., 2014a; Lu et al., 2015a; Pan and Yang, 2010; Shao et al., 2015;

Weiss et al., 2016) has been proposed as a new learning paradigm by utilizing

30


previously-acquired knowledge to solve new but similar problems much more

quickly and effectively.

2.2.1 Definition of transfer learning

To have a better understanding of the definition of transfer learning, two important

terms need to be introduced first, which are Domain and Task.

Definition 2.1 (Domain) (Pan and Yang, 2010) A domain, which is denoted

by D = {χ, P (X)}, consists of two components:

(1) Feature space χ; and

(2) Marginal probability distribution P (X), where X = {x1, x2, · · · , xn} ∈ χ.

Definition 2.2 (Task) (Pan and Yang, 2010) A task, which is denoted by

T = {Y, f(·)}, consists of two components:

(1) A label space Y = {y1, y2, · · · , yn}; and

(2) An objective predictive function f(·) which is not observed and is to be

learned by pairs {xi, yi}.

The function f(·) can be used to predict the corresponding label, f(xi), of a new

instance xi. From a probabilistic viewpoint, f(xi) can be written as P (yi|xi). More

specifically, the source domain can be denoted as Ds = {(xs1 , ys1), · · · , (xsn , ysn)}where xsi

∈ χs is the source instance and ysi∈ Ys is the corresponding class label.

Similarly, the target domain can be denoted as Dt = {(xt1 , yt1), · · · , (xtn , ytn)}where xti

∈ χt and yti∈ Yt.

According to the definitions of domain and task, the transfer learning problem

can be defined as: Definition 2.3 (Transfer Learning) (Pan and Yang, 2010)

31


Given a source domain Ds and learning task Ts, a target domain Dt and learning

task Tt, transfer learning aims to improve the learning of the target predictive

function ft(·) in Dt using the knowledge in Ds and Ts where Ds �= Dt or Ts �= Tt.

In above definition, the condition Ds �= Dt implies that either χs �= χt or

Ps(X) �= Pt(X). Similarly, the condition Ts �= Tt implies that either Ys �= Yt or

fs(·) �= ft(·).

2.2.2 Classification of transfer learning techniques

According to the uniform definition of transfer learning introduced in Section 2.2.1,

transfer learning techniques can be divided into three main categories (Pan and

Yang, 2010): 1) Inductive transfer learning, in which the learning task in the

target domain is different from the task in the source domain (i.e. Ts �= Tt);

2) Unsupervised transfer learning which is similar to inductive transfer learning

but focuses on solving unsupervised learning tasks in the target domain, such

as clustering, dimensionality reduction and density estimation; 3) Transductive

transfer learning, in which the learning tasks are the same in both domains, while

the source and target domains are different (i.e. Ts = Tt, Ds �= Dt).

2.2.2.1 Inductive transfer learning

Inductive transfer learning, in which the learning task in target domain is different

from the one in source domain but domains are same (i.e. Ts �= Tt and Ds =

Dt) (Pan and Yang, 2010). Inductive transfer learning is similar to multi-task

learning (Argyriou et al., 2007) when labelled data are available in a source domain

32


or self-taught learning (Kemker and Kanan, 2017; Raina et al., 2007) when no

labelled data is provided in a source domain.

According to (Pan and Yang, 2010; Rohrbach et al., 2013), existing research

in inductive transfer learning can be classified into four categories: instance-based

transfer learning, feature-based transfer learning, parameter-based transfer learning

and relation-based transfer learning.

Instance-based transfer learning assumes that some labelled source domain

data can be reused again to train a new model in the target domain. To this end,

Dai et al. (Dai et al., 2007) proposed TrAdaBoost, which iteratively re-weights

the source domain data in order to pick out useful samples while alleviate useless

ones for training a classifier. Based on the same idea of removing useless samples

in source domain, different strategies have been adopted and various kinds of

algorithms haven been developed (Huang et al., 2007; Jiang and Zhai, 2007; Li

and Principe, 2017; Yao and Doretto, 2010).

Feature-based transfer learning aims to find common feature representation for

both source and target domains so that the mismatch between two domains can

be decreased. Argyriou et al. (Argyriou et al., 2007) proposed to learn a common

mapping function for both source and target domains, so that a classifier can be

constructed by solving an optimization problem on the low-dimensional feature

space. Lee et al. (Lee et al. 2007) proposed to ensemble related learning tasks

to learn meta priors that can be transferred across domains and add weights to

features for learning representation. Raina et al. (Raina et al. 2007) proposed to

apply sparse coding technique to learn high-level features when no labelled data is

provided in source domain. However, in some conditions, the high-level features

learned in source domain may not perform well in target domain. Under the

33


setting of unsupervised feature learning, manifold learning had also been exploited

in developing inductive transfer learning approach (Wang and Mahadevan, 2008).

Parameter-based transfer learning assumes that models in related domains

may share common parameters or priors. In parameter-based transfer learning,

a larger weight is usually added for the loss function of target domain instead

of same weights for both source and target domains. In this direction, Gao et

al. (Gao et al., 2008) proposed a local weighted ensemble learning framework to

combine multiple models for transfer learning, where the weights are dynamically

assigned according to the model predictive power on each test example of target

domain.

Relation-based transfer learning is mainly used for transferring knowledge

among multiple relational domains, such as social network, where the data are

not dependent and identically distributed. To solve this problem, an algorithm

TAMAR that transfers relational knowledge with Markov Logic Networks across

relational domains is proposed. Later, the author also extended TAMAR to the

single-entity setting.

2.2.2.2 Transductive transfer learning

Transductive transfer learning, in which the learning tasks are the same but the

source and target domains are different (i.e. Ts = Tt and Ds �= Dt) (Pan and

Yang, 2010).

Transductive transfer learning has often been used interchangeably with do-

main adaptationn (Jiang, 2008; Li et al., 2014; Long et al., 2014a; Shi et al., 2010;

Tso-Sutter et al., 2008; Wang and Mahadevan, 2011). Under the framework of

domain adaptation, the discrepancy between source and target domain can be

34


caused by following reasons: different marginal distribution (i.e., P (Xs) �= P (Xt)),

different conditional distribution (i.e., P (Ys|Xs) �= P (Yt|Xt)), and both. To

overcome marginal distribution discrepancy, sampling method can be applied

to estimate P (Xs) and P (Xt) separately based on the observed data. Fan et

al. (Fan et al., 2005) proposed to estimate the probability ratio by using various

classifiers. A kernel mean matching (KMM) algorithm was developed to learn

P (Xs) and P (Xt) directly by matching the means between the source domain data

and the target domain data in a reproducing-kernel Hilbert space (Huang et al.,

2006). Pan et al. (Pan et al., 2008) exploited the maximum mean discrepancy

embedding (MMDE) method, originally designed for dimensionality reduction, to

learn a low-dimensional space to reduce the marginal difference between different

domains for transductive transfer learning. However, MMDE may suffer from its

computational burden. Thus, Pan et al. (Pan et al., 2011a) further proposed an

efficient feature extraction algorithm, called transfer component analysis (TCA) to

overcome the drawback of MMDE. With respect to different conditional distribu-

tion, Zhong et al. (Zhong et al., 2009) proposed an adaptive kernel approach that

maps the marginal distribution of target domain and source domain data into a

common kernel space, and utilized a sample selection strategy to draw conditional

probabilities between the two domains closer. For the last case, Sun et al. (Sun

et al., 2011) proposed to tackle both marginal and conditional discrepancies in two

separate steps. First, they added weights to source data to reduce the marginal

distance between source and target data. They then computed weights of source

data to reduce the conditional distribution difference based on smooth assumption.

Finally, a classifier was learned for the target domain on those re-weighted source

data. Recently, Behbood et al. (Behbood et al., 2011, 2014) developed a fuzzy

35


domain adaptation method for real world banking application. Gong et al. (Gong

et al., 2014) developed a novel approach for unsupervised domain adaptation

with applications to visual recognition. Specifically, they tried to learn robust

features to construct classifiers. First, they applied geodesic flow kernel (GFK) to

summarize the inner products in an infinite sequence of feature subspaces that

smoothly interpolates between the source and target domains. Second, they

leveraged kernel to combine multiple base GFKs to model both source and target

domains at fine-grained granularities.

2.2.2.3 Unsupervised transfer learning

Unsupervised transfer learning, in which both learning task and domains are

different (i.e., Ts �= Tt and Ds �= Dt). Additionally, no labelled data are observed

in the source and target domain in training (Pan and Yang, 2010). This problem

setting is still an open challenge in transfer learning.

In (Dai et al., 2008), a new approach called self-taught clustering (STC) is

proposed, which aims at clustering a small collection of unlabelled data in the target

domain with the help of a large amount of unlabelled data in the source domain.

Especially, self-taught clustering tries to learn a common feature space across

domains, which is benefit for clustering in the target domain. Similarly, Wang et

al. (Wang et al., 2008) first applied clustering methods to generate pseudo class

labels for the target unlabelled data, and then applied dimensionality reduction

methods to the target data and labelled source data to reduce the dimensions.

In (Jiang and Chung, 2012), a transfer spectral clustering (TSC) algorithm was

proposed by involving not only the data manifold information of the individual

36

2.3 Cross-domain recommender system

task but also the feature manifold information shared between tasks. Compared

to STC, TSC was built on graphs.


Nowadays, the majority of recommender systems only offer recommendations of

items belonging to a single domain. For example, Netflix offers movie recommen-

dations; Spotify provides music recommendations. Although these recommender

systems have been applied successfully in the corresponding domains, there are

cases where providing multiple and diverse item recommendations could be bene-

ficial for exploring user unique preferences. For instance, a user could get relevant

music and book recommendations if he/she shows interests in a specific movie.

Furthermore, instead of treating each domain independently, exploiting knowledge

from relevant and auxiliary domains is helpful for improving recommendation

performance, especially in the situation of data sparsity.

Consider the above two motivations, the research of cross-domain recommender

system (CDRS) has grown to be an challenging but largely under explored topic.

It is studied as the problem of cross-system personalization in user modelling (Abel

et al., 2013; Shapira et al., 2013), as a potential solutions to address cold start

and data sparsity in recommender systems (Shi et al., 2011; Tiroshi et al., 2013),

and as a practical application of TL in the area of recommender systems (Li et al.,

2009a; Pan et al., 2010b).

Although CDRS has been studied from various perspectives in different re-

search areas, an unified definition of CDRS problem has not emerged yet. There

are some survey papers summarize the development of CDRS approaches. A

37


brief survey (Li, 2011) introduces cross-domain collaborative filtering (CDCF) in

two dimensions: collaborative filtering domains and knowledge transfer styles.

With respect to collaborative filtering domains, it considers three representative

domains in practice, which are system domain, data domain, and temporal do-

main, respectively. In knowledge transfer styles, it introduces three transfer ways,

namely rating-pattern sharing, latent-feature sharing and domain correlating.

An extended survey (Fernández-Tobías et al., 2012) mainly focuses on relations

between domains, including content-based relations and collaborative filtering

based relations. Cremonesi et al. (Cremonesi et al., 2011) considers four types of

data overlap, including no overlap, user overlap, item overlap, and full overlap,

to distinguish the literature in CDRS. Shi et al. (Shi et al., 2014) introduces

CDRS from the perspective of how to improve user-based and model-based CF

techniques by exploring various kinds of auxiliary data. Recently a more com-

prehensive survey (Cantador et al., 2015) defines the notation of domain at four

levels (i.e., attribute level, type level, item level and system level) and addresses

three recommendation tasks (i.e., multi-domain recommendation, linked-domain

recommendation and cross-domain recommendation).

2.3.1 Definition of cross-domain recommender system

Let Us and Is be the sets of users and items in the source domain Ds, and Ut

and It be the sets of users and items in the target domain Dt, a cross-domain

recommender system aims to use the knowledge in the source user-item interaction

matrix Xs ∈ R|Us|×|Is| to predict the missing values in the target user-item

38


interaction matrix Xt ∈ R|Ut|×|It|, so that the recommendation performance can

be greatly improved when the data in Xt is sparse.

2.3.2 Classification of cross-domain recommendation ap-

proaches

As discussed in (Cremonesi et al., 2011), there are four scenarios of data overlap

between source and target domains. Each scenario is briefly introduced below:

• No overlap: There are no overlapping users and items between the domains,

i.e., Ust = Us⋂Ut = ∅ and Ist = Is

⋂ It = ∅.

• User overlap: There are common users in both domains, i.e., Ust = Us⋂Ut �=

∅.

• Item overlap: There are common items in both domains, i.e., Ist = Is⋂ It �=

∅.

• User and item overlap: There are overlapped users and items between the

domains, i.e., Ust �= ∅ and Ist �= ∅.

According to the settings of data overlap, existing research on cross-domain

recommendation can be generally classified into two categories: cross-domain

recommendation for partially/fully overlapping domains and cross-domain recom-

mendation for non-overlapping domains.

39


2.3.2.1 Cross-domain recommendation for partially/fully overlapping

domains

This type of approaches assume that same latent features are shared in both source

and target domains. The type of latent features can be either user-specific latent

features or item-specific latent features. Under this assumption, different kinds

of models have been developed by fusing various types of user-side or item-side

information.

The model of collective matrix factorization (CMF) (Singh and Gordon, 2008)

is proposed to collectively factorize a user-item rating matrix and a item-content

matrix, by sharing the same item-specific latent features to enable knowledge

transfer between two domains. Similar to CMF, Ma et al. (Ma et al., 2008)

factorizes a user-item rating matrix and a user-user social network matrix simul-

taneously in order to find the shared user-specific latent features. Later, the

approach of weighted nonnegative matrix co-tri-factorization (WNMCTF) (Yoo

and Choi, 2009) exploits nonnegative matrix factorization (NMF) technique to

collectively factorize one user-item rating matrix, one user demographic matrix

and one item-content matrix, with the idea of sharing both user-specific latent

features and item-specific latent features to enhance the knowledge transfer. MCF-

LF (Zhang et al., 2012), CLP-GP (Cao et al., 2010) and NB-MCF (Chatzis, 2013)

study multiple user-side auxiliary data matrices and learn user’s preferences and

similarities, which are shown to be more effective than the approaches that only

share the latent features.

Instead of using the auxiliary data directly, some researchers propose to

explore more hidden information in the auxiliary data. Shi et al. (Shi et al.,

40


2013b) collectively factorizes a user-item rating matrix and a item-item similarity

matrix by mining from the movie’s mood description. Tang et al. (Tang et al.,

2013) collectively factorizes a user-item rating matrix weighted by user’s global

reputations and a user-user social matrix. Future, they add constraints on sharing

the same user-specific latent features. In addition to transferring all the knowledge

of the source domain, Lu et al. (Lu et al., 2013) selectively transfers high quality

knowledge from multiple user-aligned data, which was shown to be more accurate

than transfer without selection.

Considering the data heterogeneity in user feedbacks, Pan et al. (Pan et al.,

2010b) proposes CST which transfers knowledge from the auxiliary implicit

feedback of browsing records to the target explicit feedback of rating scores.

Specifically, it incorporates the coordinate systems (or latent features) extracted

from auxiliary data into the target factorization system via two regularization

terms. This work provides a way to deal with heterogeneous data for cross-domain

recommendation. In addition to sharing both user-specific and item-specific latent

features, the model of transfer by collective factorization (TCF) (Pan et al.,

2011b) also uses two inner matrices to represent the data-dependent information.

Latter, the authors propose a more effective model called iTCF (Pan and Ming,

2014) by extending TCF model.

2.3.2.2 Cross-domain recommendation for non-overlapping domains

Without overlapping users or items, how to build correspondences between cross-

domain users and between cross-domain items remain as an open challenge. To

address this problem, researchers propose to bridge two heterogeneous domains

41


through an implicit way or explicit way. Representative approaches are summarized

as follows.

Implicit rating-based cross-domain recommender systems

Methods that handle two domains without overlapping of users or items usually

transfer knowledge between the domains at a group level. Additionally, they only

use user preference data.

Codebook transfer (CBT) (Li et al., 2009a) bridges two domains by clustering

rating matrices and finds user-item patterns at the cluster level. Specifically,

it first extracts a codebook from the source rating matrix, and then transfers

the codebook to a sparse target domain as the shared knowledge. However, the

source rating matrix is assumed to be full in this method. Later the rating

matrix generative model (RMGM) (Li et al., 2009b) extends CBT by combining

codebook construction and codebook expansion in one single step, and relaxes

the constraints made on source rating matrix. Further, Gao et al. (Gao et al.,

2013) generalizes the codebook to include a data-independent rating pattern

and a data-dependent rating pattern, which is shown to be more accurate than

only sharing the data-independent common knowledge. To better capture the

interactions between domain-specific user factors and item factors, Hu et al. (Hu

et al., 2013) proposes CDTF model to use user explicit and implicit feedbacks,

respectively. Additionally, the technique of multi-task learning (Elkahky et al.,

2015; Moreno et al., 2012) and active learning (Zhao et al., 2013, 2017) have also

been studied to promote knowledge in cross-domain recommendation.

42


Explicit tag-based cross-domain recommender systems

Instead of linking domains by implicit rating patterns, tags are widely studied to

build an explicit knowledge transfer bridge between heterogeneous domains.

In this respect, shi et al. (Shi et al., 2011) applied overlapping tags to profile

cross-domain users and items, so that cross-domain user-to-user and item-to-

item similarities can be inferred from the tagging data. These similarities can

then be exploited as prior knowledge to regularize the joint matrix factorization

process. Since more unique information about individual domains is encoded in

the non-overlapping tags, Hao et al. (Hao et al., 2016) extended this work by

considering domain-specific tags. Enrich et al. (Enrich et al., 2013) developed

three tag-based rating prediction models using both the rating and tagging data

in the auxiliary domains. These models transfer knowledge using overlapping tags

on the assumption that tagging behavior in one domain can be exploited in a

completely different domain. Fernandez et al. (Fernández-Tobías and Cantador,

2014) improved one of these models by introducing an additional set of tag factors

to better capture the effect of tags in rating estimation. Wang et al. (Wang et al.,

2012) proposed a two-step tag transfer learning model. The model applied a

tag clustering approach to the tags in the source domain and used the learned

tag clusters to group tags in the target domain. In their model, the transferred

knowledge is represented as tag clusters. Fang et al. (Fang et al., 2015) learned

and shared a tag-occurrence matrix to help knowledge transfer across multiple

domains.

To semantically correlate tags of different domains, most of studies rely on

the construction of external knowledge base. Kumar et al. (Kumar et al., 2014)

proposed to measure the semantic relatedness between tags with WordNet based

43


ontology. Yang et al. (Yang et al., 2015) proposed to capture the semantic

relatedness between two different tags/keywords through the concept vectors

distilled from online encyclopedias. In (Yang et al., 2014), a Chinese Knowledge

Graph consisting billions of concepts was built to find an explicit correlation

between two tags used in different social media.

44

Chapter 3

Exploiting Domain Specific Tags

for Cross-domain

Recommendation

3.1 Introduction

The performance of CDCF relies on whether an effective domain link can be

established as bridge for knowledge transfer. In this direction, most existing

CDCF approaches assume that either users or items are fully or partially shared

between source and target domains (Pan et al., 2012; Pan and Yang, 2013; Singh

and Gordon, 2008). The shared users or items then becomes a bridge to support

knowledge transfer. However, due to different privacy policies in companies, it is

more common that users and items in both domains are completely non-overlapped,

i.e. correspondence is unknown. For this task, (Fang et al., 2015; Fernández-

Tobías and Cantador, 2014; Shi et al., 2011) proposed to build an explicit domain

45

3.1 Introduction

relationship through user-generated tags. The underlying assumption of these

models is that users with similar tagging behaviors are likely to share similar

interests.

Although explicit domain relationship is shown more effective than the implicit

one (Shi et al., 2011), there are two limitations in above assumption. First, the ratio

of overlapping tags decreases with domain heterogeneity and most users or items

are not covered by those limited overlapping tags. Inaccurate recommendations

will be generated on the basis of weak domain connection. Second, it is a waste

to discard abundant domain specific tags, which correspond to unshared parts of

tags in the individual domain. In addition, domain specific tags are more effective

for capturing the distinctive characteristics of individual domain. If we take the

domain specific tags into account, more links between heterogeneous domains can

be established to promote knowledge transfer.

In order to explore the role of domain specific tags, this chapter proposes

a novel tag-based CDCF algorithm, called enhanced tag-induced cross domain

collaborative filtering (ETagCDCF), which instead builds the correspondence

between cross-domain users or cross-domain items by exploiting domain specific

tags. Nevertheless, the diverse format in domain specific tags has presented

a challenge in generating an uniform feature space for aligning heterogeneous

domains. To address this problem, ETagCDCF applies spectral clustering to

group the domain specific tags based on tag co-occurrence pattern. As a result,

domain specific tags that are used in the same pattern will be grouped together

and a new tag representation will be generated to represent those non-identical

tags. Modeling on the new tag representation, user and item profiles can be

greatly enriched by adding more information encoded in domain specific tags and

46

3.2 Preliminary knowledge

helpful for inferring more accurate cross-domain similarities. The experiments on

two public datasets show that the proposed model performs well and achieves

state-of-the-art performance.

The reminder of this chapter is organized as follows: Section 3.2 introduces

some basic definitions and notations used in this chapter; Section 3.4 presents

the proposed model in details; Section 3.5 conducts a set of experiments on two

public datasets to evaluate the performance of the proposed model; and Section

3.6 summarizes this chapter.


In this section, some preliminary definitions and notations are presented to help

understand the problem setting of this chapter.

Definition 3-1 (Domain): A domain D is a category of items that we are

interested to recommend to different users.

In above definition, domain is defined on the perspective of item type. Different

recommended items, for example, book, movie, music, can be regarded as different

domains. In each domain, a set of users and items can be presented by U = {ui}mi=0

and I = {ij}nj=0, respectively. All the ratings given by m users to n items are

denoted by matrix Rm×n = {rij}i=1,2,...,m;j=1,2,...,n. In addition, the unique tag

set used in a domain is denoted by T = {tk}lk=1, where l is the number of

unique tags. Furthermore, the tag assignments are organized in the form of

TR = {ui : {ij : tk}}i=1,2,...,m;j=1,2,...,n;k=1,2,...,l.

Definition 3-2 (Overlapping tags): Overlapping tags is a set of tag that are

distributed in the source and target domains.

47


jack

jamie

leolily

rosa

suzy

amy

5.0 4.5 1.0

? 3.0 ?

4.5 4.0 5.0

3.5 ?

? 5.0

? 5.0

? ?

#good book

#love #sci-fi

#not real#fantasy

#hero#sci-fi

Movie

Book

#romantic

#romantic

Knowledge

Fig. 3.1 A scenario for tag-based cross-domain recommendation. In this figure, weaim to exploit knowledge from a movie domain to bootstrap book recommendation.The unobserved rating score is denoted by ? and each tag text starts with #.

Overlapping tags can be denoted by Tc = {ti|ti ∈ Ts ∩ Tt, i = 1, 2, . . . , lc},

where lc is the number of common tags. Taking the recommendation scenario

presented in Figure 3.1 as an example, tags of #romantic and #sci-fi, which are

distributed in both domains, are called overlapping tags.

Definition 3-3 (Domain-specific tags): Domain-specific tags refer to the tags

that are unique to the individual domain.

According to the above definition, two sets of domain-specific tags can be

obtained. They are denoted by T ds = {ti|ti ∈ Ts − Tc, i = 1, 2, . . . , ls} and

T dt = {tj|tj ∈ Tt − Tc, j = 1, 2, . . . , lt}, where ls and lt denotes the number of

domain-specific tags in the source and target domains, respectively. Similarly,

in Figure 3.1, the domain-specific tags in the source domains are {#not real,

#hero}, and the domain-specific tags in the target domain are {#good book, #love,

#fantasy}.

48

3.3 Enhanced tag-induced cross domain collaborative filtering

3.3 Enhanced tag-induced cross domain collab-

orative filtering

Given both domains Ds and Dt, the task of ETagCDCF is to infer implicit

relationships between Us and Ut and between Is and It through domain-specific

tags, so that the recommendation performance in the sparse target domain Dt can

be greatly improved by transferring rating knowledge from the source domain Ds.

Generally, ETagCDCF consists of following three steps:

(1) Mapping domain-specific tags into predefined k clusters to form a new tag

representation, so that the heterogeneity between domain-specific tags can

be greatly reduced.

(2) Refining users and items profiles with the new tag representation and then

compute cross-domain user-to-user and item-to-item similarities.

(3) Integrating the learned cross-domain similarities into matrix factorization

to serve as a tie between the source and target domains for transferring

knowledge.

3.3.1 The alignment of domain-specific tags

Inspired by the idea in (Pan et al., 2010a), domain-specific tags are grouped by

applying spectral clustering technique. Specifically, the clustering is implemented

with the co-occurrence pattern that relies on the relationships between overlapping

tags and domain-specific tags. Since there is a bi-directional nature in the tagging

data, such as a user can randomly distribute his favorite tags to the any items,

49


Algorithm 3.1 Alignment of domain-specific tagsInput: original domain dependent tags T d

s and T dt , common tags Tc, number of

clusters k.

Output: user specific tags DSU , item specific tags DSV , user specific tag

membership matrix RU , item specific tag membership matrix RV .

1: Apply Equation 1 and 2 on the union sets of T ds ∪ T d

t to select K specific tags

for users, which is denoted by DSU = {ti|i = 1, 2, . . . , K}, and L specific tags

for items, which is denoted by DSV = {ti|i = 1, 2, . . . , L}.

2: Based on DSU and Tc, calculate (user specific tag)-(common tag) co-occurrence

matrix MU ∈ RK×lc , where MU

i,j = 1 if CU(ti, tj) = 1, otherwise MUi,j = 0.

Similarly, by using DSV and Tc, calculate (item specific tag)-(common tag)

co-occurrence matrix MV ∈ RL×lc .

3: for each z ∈ [U, V ] do

4: Construct matrix Lz = (Dz)−1/2 ×Az ×(Dz)−1/2, where Az =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣0 M z

M z 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦,

Dz is a diagonal matrix, and Dzi = ∑j Az

i,j.

5: Find the k largest eigenvectors of Lz, u1, u2, . . . , uk, and form the matrix

Ez = [u1, u2, . . . , uk].

6: end for

7: Extract the first K rows of matrix EU ∈ R(K+lc)×k to obtain RU ∈ R

K×k and

first L rows of matrix EV ∈ R(L+lc)×k to obtain RV ∈ R

L×k.

8: return DSU , DSV , RU and RV

50


and an item receives tags from arbitrary users, therefore the tag co-occurrence is

modelled on user and item sides, respectively.

For users, the tag co-occurrence is defined as following:

CU(ti, tj) =

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩1 if UN(ti, tj) ≥ 1

0 otherwise

(3.1)

where UN(ti, tj) denotes the number of users who have assigned both tags ti and

tj.

Based on Equation 3.1, the domain-specific tags for user are represented

by DSU = {ti|∀ti ∈ T ds ∪ T d

t , ∃tj ∈ Tc, CU(ti, tj) = 1, i = 1, 2 . . . , (ls + lt), j =

1, 2, . . . , lc}.

Similarly, on item side, the tag co-occurrence is defined below:

CI(ti, tj) =

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩1 if IN(ti, tj) ≥ 1

0 otherwise

(3.2)

where IN(ti, tj) denotes the number of items which are labeled by tags ti and tj.

According to Equation 3.2, the domain specific tags for items are denoted

by DSV = {ti|∀ti ∈ T ds ∪ T d

t , ∃tj ∈ Tc, CI(ti, tj) = 1, i = 1, 2 . . . , (ls + lt), j =

1, 2, . . . , lc}.

After filtering specific tags for both users and items, the goal is to partition

those candidate tags into k clusters, where k is a predefined parameter. The

51


complete procedure of the alignment of domain-specific tags is described in the

Algorithm 3.1.

As shown in (Ding and He, 2004), the k principle components which refers

to the k largest eigenvectors u1, u2, . . . , uk in step 5 of Algorithm 3.1, can be

used to represent the original data in the subspace spanned by these k principle

components. In our problem, these k principle components serve as the high-level

representation of tags by clustering domain-specific tags. In the next Section, a

mapping function is developed to map users and items to the new subspace so

that cross-domain similarity can be computed.

3.3.2 Cross-domain similarities refinement

Before re-defining user and item profiles with tag clusters, the tag-based profile is

first defined as follows:

Definition 4 (User-specific tag indicator matrix): User-specific tag indica-

tor matrix X reflects the relationships between users (either from Us or Dt) and

the domain-specific tags DSU , which is denoted by X = {xij|i = 1, 2, . . . , m, j =

1, 2, . . . , K}, where xij = 1 if user i has assigned tag tj ∈ DSU , otherwise xij = 0.

Definition 5 (Item-specific tag indicator matrix): Item-specific tag indica-

tor matrix Y reflects the relationships between items (either from Is or It) and

the domain-specific tagsDSV , which is denoted by Y = {yij|i = 1, 2, . . . , n, j =

1, 2, . . . , L}, where yij = 1 if item i has been tagged by tag tj ∈ DSV , otherwise

yij = 0.

52


In addition to applying binary value to denote whether the tag is assigned by

a user or to an item, the tagging frequency can also be applied to estimate xij/yij.

However there we only consider binary value.

Then the alignment of user and item profiles is implemented by the mapping

functions defined in Equation 3.3 and 3.4, respectively.

ΦU(Xi) = Xi × RU (3.3)

ΦV (Yi) = Yi × RV (3.4)

where Xi denotes user-specific tag indicator matrix from domain i (i = s or t, which

represents source or target domain) and Xi ∈ Rmi×K . Yi denotes item-specific tag

indicator matrix from domain i (i = s or t) and Yi ∈ Rni×L.

Once user and item profiles are converted to the vector of tag clusters, it is

able to compute the cross-domain user-to-user and item-to-item similarities with

cosine similarity metric. The overall procedure of the refinement of cross-domain

similarity is summarized in Algorithm 3.2.

3.3.3 Model and inference

Matrix factorization is one of the most popular approaches in making recommender

systems, which is based on the low-dimensional factor model. It transforms both

users and items to the same latent factor space and tries to explain the preferences

of users by linearly combining the latent factors of users and items. This low-rank

approximation model performs well in single-domain recommendation. To extend

53


Algorithm 3.2 Refinement of cross-domain similarityInput: source domain tag assignments TRs, target domain tag assignments TRt,

user specific tags DSU , item specific tags DSV , user specific tag membership

matrix RU , item specific tag membership matrix DSV .

Output: cross-domain user-to-user similarity matrix SU , cross-domain item-to-

item similarity matrix SV .

1: Apply Definitions 4 and 5 on DSU , DSV , TRs and TRt to form 4 matrices,

Xs ∈ Rms×K , Ys ∈ Rns×L, Xt ∈ R

mt×K and Yt ∈ Rnt×L.

2: Apply mapping function (3) to generate source domain user-tag cluster relation

matrix UTs ∈ Rms×k and and target domain user-tag cluster relation matrix

UTt ∈ Rmt×k.

3: Apply mapping function (4) to generate source domain item-tag cluster

relation matrix ITs ∈ Rns×k and target domain item-tag cluster relation

matrix ITt ∈ Rnt×k.

4: for user i ∈ [1, 2, . . . , ms] do

5: for user p ∈ [1, 2, . . . , mt] do

6: SUi,p = Σ(UTs[i,:]�UTt[p,:])

Σ(UTs[i,:]�UTs[i,:])×Σ(UTt[p,:]�UTt[p,:]) , where � denotes dot product of

two vectors.

7: end for

8: end for

9: for item j ∈ [1, 2 . . . , ns] do

10: for item q ∈ [1, 2, . . . , nt] do

11: SVj,q = Σ(ITs[j,:]�ITt[q,:])

Σ(ITs[j,:]�ITs[j,:])×Σ(ITt[q,:]�ITt[q,:])

12: end for

13: end for

14: return SU and SV

54


the model to cross-domain recommendation, users and items in both domains

should be mapped to the same latent space to support knowledge transfer.

The two cross-domain similarity matrices SU and SV , which reflect the im-

plicit relationships between the source and target domains, are further added

as constraints for regularizing joint matrix factorization (Shi et al., 2011). The

objective function of ETagCDCF is formulated as:

F = 12

ms∑i=1

ns∑j=1

Isij(Rs

ij − (U si )T V s

j )2

+ 12

mt∑p=1

nt∑q=1

I tpq(Rt

pq − (U tp)T V t

q )2

+ α

2

ms∑i=1

mt∑p=1

(SUip − (U s

i )T U tp)2

+ β

2

ns∑j=1

nt∑q=1

(SVjq − (V s

j )T V tq )2

+ λ

2 (‖Us‖2F + ‖Vs‖2

F + ‖Ut‖2F + ‖Vt‖2

F )

(3.5)

where Rs contains ratings from ms users to ns items in the source domain. Is

is an indicator matrix confirming all the calculations are only conducted on the

observed ratings. In the target domain, the rating matrix is denoted by Rt and

the corresponding indicator matrix is denoted by I t. The latent factors of users in

the source domain are denoted by matrix Us ∈ Rd×ms , whose ith column denotes

the d-dimensional latent factor vector of user i. Similarly, matrix Vs ∈ Rd×ns

denotes the latent factors for items in the source domain, whose jth column

denotes the d-dimensional latent factor vector for item j. The latent factors of

users and items in target domain are denoted by Ut ∈ Rd×mt and Vt ∈ R

d×nt ,

respectively. α and β are two trade-off parameters, which control the relative

55


importance of cross-domain user-to-user and item-to-item similarity, respectively.

λ is the regularization parameter used to penalize the model complexity in order

to avoid over-fitting.

In Equation 3.5, the first part represents the matrix factorization in the source

domain, while the second part corresponds to the matrix factorization in the

target domain. Both factorization processes are regularized by the third and

fourth parts.

The goal is to estimate four optimal variables Us, Vs, Ut, Vt, so that the rating

matrix Rt can be approximated by Rt ≈ U t × V t with minimum error. The local

minimum for Equation 3.5 can be found by performing gradient descent on four

variables Us, Vs, Ut, Vt alternatively. Specially, the gradients with respect to each

variable are computed as below:

∂F

∂U si

=ns∑

j=1Is

ij((U si )T V s

j − Rsij)V s

j + αmt∑p=1

((U si )T U t

p − SUip)U t

p + λU si (3.6)

∂F

∂V sj

=ms∑i=1

Isij((V s

i )T U sj − Rs

ij)U si + β

nt∑q=1

((V sj )T V t

q − SUjq)V t

q + λV sj (3.7)

∂F

∂U tp

=nt∑

q=1I t

ij((U tp)T V t

q − Rtpq)V t

q + αms∑i=1

((U si )T U t

p − SUip)U s

i + λU tp (3.8)

∂F

∂V tq

=mt∑p=1

I tpq((U t

p)T V tq − Rt

pq)U tp + β

ns∑j=1

((V sj )T V t

q − SVjq)V s

j + λV tq (3.9)

56

3.4 Experiments

In the training phrase, these four variables are updated according to the following

rules:

Us ← Us − ε∂F

∂Us

Vs ← Vs − ε∂F

∂Vs

Ut ← Ut − ε∂F

∂Ut

Vt ← Vt − ε∂F

∂Vt

(3.10)

The learning rate ε determines the updating extent of variables during each

iteration. The learning rate can be adjusted automatically by applying the binary

search. The initial learning rate ε is set as 0.001 in the experiment.

3.4 Experiments

In this Section, a series of experiments are conducted to evaluate the proposed

model ETagCDCF under the setting of limited overlapping tags. First, the dataset

used in the experiment is described, and then the experimental setting is introduced.

Next the impact of parameters on the final recommendation performance is studied.

Finally, the comparison over single-domain and cross-domain recommendation

approaches are performed to validate the effectiveness of ETagCDCF.

57

3.4 Experiments

3.4.1 Description of dataset and experimental settings

ETagCDCF are evaluated on two publicly available datasets: the MovieLens 10M

data set1 and LibraryThing data set2. The MovieLens 10M (ML) dataset contains

over 10 million ratings and 100,000 tag applications applied to 10,681 movies by

71,567 users. The LibraryThing (LT) dataset contains over 700,000 ratings and 2

million tag applications assigned by 7,564 users to 39,519 books. Ratings in both

datasets are represented on a 1-5 scale, with interval steps of 0.5.

There are three principles in designing the experiments. First, the goal is

to exploit domain-specific tags to improve recommendation performance when

the number of overlapping tags is limited. Second, limited ratings should be

provided for each individual user to support the cold start context. Finally, all

the results can be reproduced in future. Considering all the above three factors,

a compromised proposal is achieved by extracting the first 1,000 users and first

1,000 items from both original datasets. Especially, we only take ratings and tag

assignments whose identifiers for users and items are both within the range of

1,000. Based on this criterion, we will keep a small subset of the original dataset.

Due to the characteristics of original datasets, tag assignments in our ML dataset

will be too limited to share enough common tags with LT dataset. Besides, there

will be a small amount of ratings provided by our LT dataset, which further

decreases the number of ratings contributed by each user in LT dataset. Table 3.1

shows the statistics of the final datasets. The rating sparsity is calculated by

1 − ratingsusers×items

, which is used to evaluate the ratio of observed ratings in the whole

1http://www.grouplens.org/node/732http://www.macle.nl/tud/LT/

58

3.4 Experiments

Table 3.1 Statistics of the datasets used in Chapter 3

MovieLens 10M LibraryThing

Users 946 726

Items 857 256

Ratings 41894 2779

Rating sparsity 94.83% 98.50%

Unique tags 138 548

Tag assignments 256 2779

Ratio of overlapping (shared) tags 13.04% 3.28%

rating space. In addition, there are only 18 overlapping tags shared between ML

and LT datasets.

For ETagCDCF, the number of tag cluster k and dimensionality of the latent

factors d are set to 10 because experiments on the validation set reveals that this

combination of parameters can achieve the best performance. The regularization

parameter λ in Equation 3.5 is set to 0.01 after tuning on the validation set. The

selection of two trade-off parameters α and β will be further discussed in the

Section 3.4.2.

For each dataset, 5-fold cross validation is conducted, and the averaged result

is reported as the final result. All the comparisons are evaluated by Mean Absolute

Error (MAE) and Root Mean Square Error (RMSE), which are widely applied to

measure the performance of rating prediction task. The definitions of MAE and

RMSE are shown below:

MAE =∑i,j

|rij − rij|N

(3.11)

59

3.4 Experiments

RMSE =∑i,j

√|rij − rij|2

N(3.12)

where ri,j denotes the predicted rating user i give to item j, rij is the real rating,

and N is the total number of test ratings.

3.4.2 Impact of parameters

In this Subsection, experiments are conducted to investigate the impact of two

trade-off parameters α and β in ETagCDCF, which respectively controls the

contribution of cross-domain user-to-user and item-to-item similarities to the final

objective function in Equation 3.5.

Same strategy in (Shi et al., 2011) are adapted to tune these parameters on

the validation set. First, β = 0 is fixed unchanged and the value of α is changed

alternatively within the range [0.0001, 0.001, 0.01, 0.1] to check the performance

of MAE and RMSE. This results are shown in Figure 3.2. Based on the results,

α is set to 0.01 as it got the lowest value on both LT and ML datasets. Next, α

= 0.01 is unchanged and the value of β is varied within the range [0.0001, 0.001,

0.01, 0.1] to check the variation of MAE and RMSE. The corresponding results

are shown in Figure 3.2. According to the above results, α = 0.01 and β = 0.01

are set as the optimal values.

3.4.3 Performance comparison

Several single-domain and cross-domain recommendation approaches are chosen

as baselines in the experiment, whose introductions are described as follows:

60

3.4E

xperiments

Fig. 3.2 MAE and RMSE variations via changing α and β

61

3.4 Experiments

UCF (Herlocker et al., 1999): User-based collaborative filtering is a conventional

memory-based single domain recommendation approach. It looks for users who

share same tastes to calculate a prediction for the active user. The key challenge

is to compute similarities between all pairs of users. In our implementation, the

similarity is computed by Pearson correlation coefficient and the neighborhood

size is set to 50.

ICF (Sarwar et al., 2001): Item-based collaborative filtering is a model-based

single domain recommendation approach. It generates recommendations for a

user by finding items that are similar to the other items the user had liked before.

Since the relationship between items is relatively static, item-item similarity model

does not have to be built so often and as a result it reduces more computation

and usually perform better than UCF on sparse dataset. In our implementation,

adjusted cosine similarity is adopted to compute item similarities.

SVD (Sarwar et al., 2000): Singular Value Decomposition is another well-known

model-based single domain recommendation approach, which relies on matrix

factorization technique and decomposes a rating matrix into three matrices with

reducing the dimensionality of the product space. It maps users and items

into a low dimensional space and discovers the intrinsic relationship among the

latent feature vectors of users and items for making recommendation. In our

implementation, the dimension of latent feature space is set to 10.

TagCDCF (Shi et al., 2011): Tag-induced cross-domain collaborative filtering

is a recently proposed tag-based cross-domain recommendation approach, which

utilizes overlapping tags to connect cross-domain users and cross-domain items so

that knowledge can be transferred between domains through those similar users

and items.

62

3.4 Experiments

The performance of ETagCDCF and other baselines are shown in Table 3.2

and Table 3.3. A smaller value of MAE or RMSE means a better performance.

The experiment results expose several interesting findings, which are discussed in

the following.

To study whether the knowledge obtained from the auxiliary domains is

useful to improve recommendation performance in the target domain, this study

compares the results of cross-domain recommendation models (i.e., TagCDCF,

ETagCDCF) with single-domain recommendation benchmarks (i.e., UCF, ICF,

SVD). Based on the results, specifically when LT is set as the source domain

and ML is set as the target domain, TagCDCF and ETagCDCF significantly

outperform all the other single-domain baselines in both MAE and RMSE. For

example, the improvement achieved by ETagCDCF is up to 20.97% in MAE and

20.88% in RMSE when compared with IBCT. The reason for poor performance

of IBCT on ML dataset can be explained by the characteristics of the dataset.

Because all the ratings made by 946 users are widely distributed among 857 movies,

which makes it difficult to collect enough co-rated movies as the foundation to

compute similarities among pairs of items. Inaccurate item-item similarity in ICF

leads to incorrect recommendations. Similar analysis also applies to UCF. As a

representative approach for single-domain recommendation, SVD overcomes above

problem by performing better than UCF and ICF, but it still fails to outperform

cross-domain recommendation approaches (i.e., TagCDCF, ETagCDCF). This

indicates that the knowledge learned from LT dataset (source domain) indeed

helps to facilitate the recommendation making in ML dataset (target domain).

With respect to another situation of setting ML as the source domain and LT as

the target domain correspondingly, cross-domain recommendation algorithms are

63

3.4 Experiments

Table 3.2 MAE comparison with other baselines (mean ± std)

ModelsDataset

ML LT

UCF 0.901994 (0.006238) 0.762814 (0.023162)

ICF 0.888476 (0.004812) 0.861590 (0.004147)

SVD 0.899647 (0.005595) 0.763140 (0.011803)

TagCDCF 0.779907 (0.005225) 0.891171 (0.017739)

ETagCDCF 0.702188 (0.003401) 0.818152 (0.017481)

Table 3.3 RMSE comparison with other baselines (mean ± std)

ModelsDataset

ML LT

UCF 1.158626 (0.005768) 1.030641 (0.021892)

ICF 1.138212 (0.016996) 1.110459 (0.026747)

SVD 1.102908 (0.005104) 0.962341 (0.015199)

TagCDCF 1.027675 (0.007242) 1.137431 (0.022205)

ETagCDCF 0.900501 (0.005620) 1.035668 (0.028353)

64

3.4 Experiments

inferior to single-domain recommendation algorithms. Specifically, SVD obtains

the best results evaluated by MAE and RMSE. There are two main reasons to

explain above phenomenon. First, only 2779 ratings are given in the LT dataset.

The collected rating data cannot reach a considerable scale as the input for matrix

factorization to drive it work well. At the meantime, matrix factorization plays a

fundamental brick in building ETagCDCF. This can be interpreted as a reason

for poor performance of ETagCDCF. However, in the implementation of SVD,

same pre-processing job in (Sarwar et al., 2000) is applied by filling sparse rating

matrix, so that the impact of data sparseness will not cause the sharp drop of

performance. Second, there are only 256 tag assignments (138 unique tags) in

the ML dataset. Limited tagging data will not provide enough information for

inferring accurate cross-domain similarities, which in turn mislead the knowledge

transfer between both domains.

To identify the role of social tags in improving recommendation performance,

the performance of SVD is compared with TagCDCF and ETagCDCF because

they are all built on matrix factorization model. The only difference lies in the

fact that SVD only relies on ratings to make recommendation, while TagCDCF

and ETagCDCF also integrate tag information into recommendation models. The

results show that both ETagCDCF and TagCDCF perform better than SVD in

some conditions. For example, comparing SVD with ETagCDCF on ML dataset,

the improvement achieved by ETagCDCF is up to 21.95% in MAE and 18.35% in

RMSE. Based on these results, we can summarize that social tags indeed offer

some additional information that goes beyond ratings to the factorization.

To check the effectiveness of the domain-specific tags in linking heterogeneous

domains when there is only a limited number of overlapping tags, ETagCDCF

65

3.5 Summary

is compared with its counterpart TagCDCF. ETagCDCF achieves better results

than TagCDCF on both LT and ML datasets. The improvement is up to 8.19%

in MAE and 8.95% in RMSE on LT. Similarly on ML, the improvement is up

to 9.97% in MAE and 12.37% in RMSE. In the setting of the experiment, there

are only 18 overlapping tags between the source and target domains. The weak

connection between both domains established by limited overlapping tags will

result in poor performance of TagCDCF. This observation has also been studied

in (Shi et al., 2011). In contrast, ETagCDCF are developed to utilize abundant

domain-specific tags to bridge two domains in this situation. The significant

improvement achieved by ETagCDCF clearly supports the possibility of linking

heterogeneous domains by using domain-specific tags when only a limited number

of overlapping tags are available.

3.5 Summary

Compared to limited overlapping tags shared by both domains, domain-specific

tags are rich in amount and contain unique information of individual domains. This

chapter has proposed a novel tag-based cross-domain collaborative filtering model,

which exploits abundant domain-specific tags to bridge up disjoint domains. To

eliminate the distance in non-identical tags, this chapter adopts spectral clustering

with tag co-occurrence pattern to group domain-specific tags. As a result, a new

tag representation in the form of tag clusters can be learned to model user and

item profiles across domains. Modeling with the derived tag representation, more

accurate cross-domain user-to-user and item-to-item similarities can be calculated

and integrated into joint matrix factorization process to guide knowledge transfer.

66

3.5 Summary

The experimental results demonstrate that ETagCDCF is capable of establishing

a strong domain link to help transfer more knowledge between domains.

67

Chapter 4

Exploiting Tag-induced

Structural Information for


4.1 Introduction

The explicit domain link learned in Chapter 3 by utilizing abundant domain-

specific tags is useful for increasing correspondence between heterogeneous domains.

However, there are some limitations in ETagCDCF (see Chapter 3). First, the

similarity between domain-specific tags is measured by the co-occurrence relation-

ship with overlapping tags. Isolated domain-specific tags without companion of

overlapping tags cannot be grouped accurately and thus the derived tag clusters

will add noise to the learning of inter-domain correlation. Second, although the

number of overlapping tags is limited, overlapping tags are the most intuitive

features which are beneficial for building a weak inter-domain correlation for align-

68

4.1 Introduction

ing domain heterogeneity to some extent. Therefore, it is desirable to play the

complementary roles of overlapping tags and domain specific tags in establishing

a more tight inter-domain correlation.

Furthermore, according to the principle of domain adaptation, not only inter-

domain similarity but also intra-domain similarity need to be maximized for

promoting knowledge transfer. In single-domain recommendation, the intra-

domain similarity between users or between items by learning from tag distribution

has attracted much attention and many state-of-the-art techniques have been

proposed for improving item recommendation (De Gemmis et al., 2008; Zhao

et al., 2008; Zhen et al., 2009). However, the intra-domain correlation represented

in the form of intra-domain similarity is not yet considered in the development of

cross-domain recommender systems.

In this chapter, this study considers the challenge of integrating structural

knowledge inferred from tags, including both intra- and inter-domain correlations,

into recommendation framework. Specifically, first, users and items are profiled

with overlapping tags and a basic inter-domain correlation is built in the form

of cross-domain similarities (i.e., cross-domain user-to-user similarity and cross-

domain item-to-item similarity). On the basis of correspondence established

by overlapping tags, connections between involved domains are then added by

clustering domain-specific tags and the derived tag clusters are used as the new

representation to refine the cross-domain similarities. Finally, tagging information

is also exploited to compute the intra-domain similarities between users and

between items, which are linked to build a compact intra-domain correlation.

By adopting both inter- and intra-domain correlations as structural knowledge

to regularize joint matrix factorization, a complete tag-induced cross domain

69

4.2 Notations

recommendation model, called CTagCDR , is proposed in this chapter to fully

explore the complementary role of tags in promoting knowledge transfer. The

experimental results demonstrate that CTagCDR performs well in both rating

prediction and item recommendation tasks.

This chapter is organized as follows: Section 4.2 introduces some basic notations

and states the problem formally; Section 4.3 introduces details of the proposed

model and presents the parameter estimation process; Section 4.5 evaluates the

proposed model through a series of experiments; and lastly Section 4.6 summarizes

this chapter.

4.2 Notations

Before describing the proposed model CTagCDR, the commonly used notations

in this chapter need to be explained. Without loss of generality, two domains are

considered, although CTagCDR can be easily generalized to multiple domains.

In this chapter, boldface uppercase letters, such as A, denotes a matrix. Then

the i-th row and j-th column of matrix A are denoted as Ai∗ and A∗j respectively.

The (i, j)-th entry of matrix A is denoted by Aij.

Given a sparse target domain D1 and a dense source domain D2, and for

the π-th domain (π=1,2), suppose there are a set of nπ users Uπ, mπ items V π

and lπ tags T π respectively. Tags in π-th domain are divided into two parts due to

different distributions in the domain, including the shared tags Tc and the domain-

specific tags T πs . The shared tags are the domain-independent tags that appear

in both D1 and D2, which are denoted by Tc = {tcm | tc

m ∈ T 1 ∩ T 2, 1 ≤ m ≤ lc}.

70

4.3 Complete Tag Induced Cross-domain Recommendation

The domain-specific tags are the domain-dependent tags that are exclusive to the

individual domain, and are denoted by T πs = {ts

n | tsn ∈ T π − Tc, 1 ≤ n ≤ lπ − lc}.

Let Rπ ∈ Rnπ×mπ be the sparse user-item interaction matrix for the users Uπ

and items V π, in which rπij represents rating score given by the user i to the item j

in the π-th domain. To mark the observable values in the rating matrix Rπ, an

indicator matrix is represented by IRπ , where IRπ

ij = 1 if the user i rated on the

item j and IRπ

ij = 0 otherwise.

By analyzing the tag assignment triplet T πijk, which is represented in the

form of {uπi , vπ

j , tπk} and denotes the action that user i assigned tag k to item j,

different kinds of user/item tagging matrices are generated. Specifically, sym-

bols Xπu , Y π

u , Zπu are used to denote the user tagging matrix according to the

shared tags, the domain-specific tags and all the tags in domain Dπ, respectively.

Similarly, the symbols Xπv , Y π

v , Zπv denote different item tagging matrices. Fre-

quently used notations and corresponding descriptions are summarized in Table

4.1.

4.3 Complete Tag Induced Cross-domain Rec-

ommendation

In cross-domain recommendation tasks, where neither users nor items overlap,

the challenge can be formulated by employing the relationships between users

and tags, and between items and tags, to build a reliable domain connection, so

that knowledge from source domain can be confidently transferred to improve

prediction in the target domain.

71


Table 4.1 Notations and corresponding descriptions used in Chapter 4

Symbols Descriptions

π domain index, π = 1, 2

Dπ domain π

nπ, mπ, lπ number of users, items, tags in Dπ, respectively

lc, lπs number of share tags, domain specific tags in Dπ, respectively

f number of latent factors

Uπ set of users in Dπ, Uπ = {uπi | 1 ≤ i ≤ nπ}

V π set of items in Dπ, V π = {vπj | 1 ≤ j ≤ mπ}

T π set of tags in Dπ, T π = {tπk | 1 ≤ k ≤ lπ}

Tc set of share tags, Tc = {tcm | 1 ≤ m ≤ lc}

T πd set of domain specific tags in Dπ, T π

d = {tsn | 1 ≤ n ≤ lπ

s }T π

ijk tag assignment made by user i on item j with tag k in Dπ.

Rπ nπ × mπ rating matrix in Dπ

Uπ nπ × f latent feature matrix of users in Dπ

V π mπ × f latent feature matrix of items in Dπ

Xπu , nπ × lc user tagging matrix based on share tags

Y πu , nπ × (lπ − lc) user tagging matrix based on domain specific tags

in Dπ

Zπu nπ × lπ user tagging matrix based on complete tags in Dπ

Xπv mπ × lc item tagging matrix based on share tags

Y πv

mπ × (lπ − lc) item tagging matrix based on domain specific tagsin Dπ

Zπv mπ × lπ item tagging matrix based on complete tags in Dπ

IA indicator matrix for matrix A

Ai∗ the i-th row of matrix A

A∗j the j-th column of matrix A

‖A‖ Frobenius norm of matrix A

72


To exploit the full potential of tagging information, the proposed CTagCDR

model aims to infer a strong inter-domain correlation and a compact intra-domain

correlation from the tagging data. Specifically, CTagCDR is composed of following

four major steps:

Step 1: Building basic inter-domain correlations using shared tags;

Step 2: Enhancing inter-domain correlations using domain-specific tag clusters;

Step 3: Inferring intra-domain correlations from tags in individual domains;

Step 4: Aggregation and integration of Inter- and intra-domain structural knowl-

edge.

The work flow of CTagCDR model is illustrated in Figure 4.1.

4.3.1 Step 1: Building basic inter-domain correlations us-

ing shared tags

In traditional CF approaches (Konstan et al., 1997; Sarwar et al., 2001), a user is

represented by a vector defined over the entire item space. This reflects a user’s

preference for items that s/he is interested in. Similarly, an item is represented by

a vector defined over the entire user space, which indicates the users that have

shown an interest in this item. Due to the heterogeneity of disjoint domains, this

way of modelling fails to characterize cross-domain users and items in a unified

way. Considering the property that social tags can encode both user preferences

and item attributes, (Shi et al., 2011) proposes to break {user, item, tag} ternary

relationships into two binary relationships: {user, tag} and {item, tag}, and build

user and item profiles through shared tags. As a result, cross-domain users and

items could be mapped to the same space built by shared tags for comparison.

73

4.3C

omplete

Tag

InducedC

ross-domain

Recom

mendation

source domain

target domain

Step 3:Inferring intra-

domain correlation

Joint matrix factorization

Step 4:Inter-domain

knowledge aggregationStep 2:

Building complementary inter-

domain correlation

CTagCDR

Step 4:Intra-domain knowledge integration

basic inter-domain similarity: complementary inter-domain similarity: intra-domain similarity:

SteSSS p 4444:::::Inter-domain

knoknknknk wledge aggregationonononn

Step 1:Building basic inter-domain correlation

Fig. 4.1 Workflow and components of CTagCDR model.

74


Nevertheless, only binary information is taken into account when making user

and item tagging matrices (Shi et al., 2011). It loses the ability to distinguish the

different tag distributions on user and item sides. To fully exploit the quantitative

information encoded in shared tags Tc, TF-IDF weighting is applied to build a

user tagging matrix Xπu . In particular, the (i, m)-th element is defined as tf ∗ idf

value between user i and shared tag m, as shown below,

[Xπu ]im =

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩tfu(i, m) × log2

(nπ

dfu(m)

), if user i used tag m.

0, otherwise.

(4.1)

where tfu(i, m) denotes the normalized frequency of tag m in user i’s tagging

history on all items, and dfu(m) denotes the number of users who have used tag m.

Note that if user i has never used tag m, then [Xπu ]im = 0

Similarly, the distribution of shared tags on items can be modelled and rep-

resented in an item tagging matrix Xπv . The (j, m)-th element is defined as

follows,

[Xπv ]jm =

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩tfv(j, m) × log2

(mπ

dfv(m)

), if item j was labelled by tag m.

0, otherwise

(4.2)

where tfv(j, m) denotes the normalized occurrence frequency of tag m on item j

used by all users, and dfv(m) denotes the number of items that has been attached to

75


the shared tag m. If item j has never been attached by share tag m, then [Xπv ]jm =

0.

Once user and item profiles are generated by shared tags and vectorized, the

cross-domain user and item similarities can be computed with different similarity

metrics. For simplicity, a cosine similarity is used to compute the cross-domain user-

to-user similarity matrix Su ∈ Rn1×n2 and item-to-item similarity Sv ∈ R

m1×m2

as:

Suip =

∑lcd=1(X1

u)id × (X2u)pd√∑lc

d=1(X1u)id × (X1

u)id

√∑lcd=1(X2

u)pd × (X2u)pd

Svjq =

∑lcd=1(X1

v )jd × (X2v )qd√∑lc

d=1(X1v )jd × (X1

v )jd

√∑lcd=1(X2

v )qd × (X2v )qd

(4.3)

These similarity matrices encode the information of shared tags and act as basic

inter-domain correlation between the source and target domains. This process is

presented in Algorithm 4.1.

4.3.2 Step 2: Enhancing inter-domain correlations using

domain-specific tag clusters

Shared tags can help to address domain heterogeneity in modelling cross-domain

users and items. However, to collect enough shared tags for disjoint domains

is usually difficult, the resulting loose coupling between the domains leads to

inaccurate predictions. Further, it is a waste to abandon domain-specific tags,

which are rich in amount and able to reflect the intrinsic properties of the individual

domain. By using domain-specific tags to connect the cross-domain users and

76


Algorithm 4.1 Basic inter-domain correlation constructionInput: Tag assignment triplets T π

tri(π = 1, 2).

Output: Basic cross-domain similarity matrices Sub and Sv

b .

1: Get shared tag set Tc = T 1 ∩ T 2.

2: Initialize Xπu ∈ Rnπ×lc and Xπ

v ∈ Rmπ×lc with zeros.

3: for π = 1, 2 do

4: for i = 1, 2, · · · , nπ do

5: for m = 1, 2, · · · , lc do

6: Given tag assignment triplets T πtri, fill element [Xπ

u ]im in Xπu by

Eq. (4.1).

7: end for

8: end for

9: for j = 1, 2, · · · , mπ do

10: for m = 1, 2, · · · , lc do

11: Given tag assignment triplets T πtri, fill element [Xπ

v ]jm in Xπv by

Eq. (4.2).

12: end for

13: end for

14: end for

15: Compute Sub and Sv

b by Eq. (4.3)

77


items, more domain linkages will be established to enhance the inter-domain

correlation.

However, a number of obstacles hinder the application of domain-specific tags.

First, it is not trivial to employ heterogeneous domain-specific tags as features

to link different domains, even though assembling tags from different domains

to construct a pool of tags helps to address domain heterogeneity. In addition,

the pairwise interactions, such as between users and tags and between items

and tags, are generally sparse due to the power-law phenomenon. This way of

modelling poses another two problems: high scalability and heavy computabil-

ity. Second, tags are arbitrary words generated by users from an uncontrolled

vocabulary, ambiguity and redundancy exist in the tagging data. The recom-

mendation performance will be undermined if this problem is not addressed. In

this context, tag clustering provides a natural solution to this challenge. By

clustering domain-specific tags, the derived tag clusters will eliminate the data

ambiguity and redundancy. Further, the clusters also serve as high-level and

compact representations for domain-specific tags, which provide a way to establish

unified user and item profiles between different domains.

To cluster diverse domain-specific tags, a tag co-occurrence pattern is designed

to consder the relationships between the shared and domain-specific tags (Pan

et al., 2010a). Specifically, if the domain-specific tags from different domains occur

with same shared tag in the user or item tagging histories, they will be grouped

into same tag cluster. To avoid focusing on the tag examples that are associated

with most users and items, domain-specific tags are filtered based on information

entropy instead of directly selecting with the highest usage frequency. Given a

shared tag m, the way to measure the importance of domain specific tag n on

78


user side is defined as follows:

θ(m, n) =

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩−α(m, n)

β(m, n) × log2α(m, n)β(m, n) , if α(m, n) �= 0

0, otherwise

(4.4)

where α(m, n) denotes the number of users who have used both shared tag m and

domain-specific tag n in their tagging histories, β(m, n) denotes the sum of users

who used either the shared tag m or the domain-specific tag n to describe their

interests. If θ(m, n)>θ(m), then we keep the domain-specific tag n, otherwise

we abandon it. θ(m) denotes the filtering threshold with regard to the shared

tag m, which is set as the average importance over all domain-specific tags in

our experiment. The counterpart filtering on the item side can be obtained in a

similar manner.

Filtering on the user-tag relationship, we will result in a tripartite graph Gu =

{V, E} as shown in Figure 4.2 to represent the relationship between the domain-

specific tags and the shared tags, where V is the set of nodes in this graph,

and E denotes the set of undirected edges. Let lu denote the number of filtered

domain-specific tags. Therefore, there are two types of nodes in V : lu nodes for

the domain-specific tags and lc nodes for the shared tags. Note that an edge in E

only exists between different types of nodes, i.e. between the shared tag m and

the domain-specific tag n, and the edge weight indicates the similarity for the

connected nodes. Take the nodes m and n in Figure 4.2 for example, the edge

79


source domain target domain

m

n

D2D1

Fig. 4.2 Example of tag tripartite graph constructed based on user-tag relationship.Red squares denote shared tags in both domains, while the green triangles and bluecircles denote the filtered domain-specific tags from both source and target domains,respectively. The edge weight reflects the similarity between the connected tags.

weight is set as the Jaccard similarity:

sim(m, n) = α(m, n)β(m, n)

Then we can define a (lu + lc) × (lu + lc) affinity matrix Au for this graph, which

is represented as:

Au =

⎡⎢⎢⎢⎢⎢⎢⎣Ilu×lu , B

Bᵀ, Ilc×lc

⎤⎥⎥⎥⎥⎥⎥⎦where the connectivity matrix B, lu × lc, whose elements denote the similarity

between the filtered domain-specific tags and shared tags, and I denotes the

80


identity matrix. Similarly, we can derive a tripartite graph Gv and an affinity

matrix Av from the item side.

Once we have an affinity matrix Au(Av) to represent the tag co-occurrence

pattern, any clustering technique that takes a similarity measure as its input can

be applied to implement the clustering. To be specific, Affinity Propagation (Frey

and Dueck, 2007) is adopted as our clustering method since it automatically finds

the number of clusters based on the data provided, which saves us from tuning

this parameter in validation. The generated ku(kv) tag clusters from user-tag

(item-tag) relation will be used as the new features to profile cross-domain users

(items), so that users and items from different domains will be mapped to the

same space spanned by the domain-specific tag clusters.

In the new subspace, the user and item vectors are aligned and used to

compute the new cross-domain user-to-user and item-to-item similarity matrices,

whose construction process takes only the information of the domain-specific tags

into account. By doing so, we can separately study the impact of the domain-

specific tags in linking different domains, and exploit their encoded information

to regularize matrix factorization. We describe this process in Algorithm 4.2.

4.3.3 Step 3: Inferring intra-domain correlations from tags

in individual domains

Existing tag-based CDCF approaches (Fang et al., 2015; Shi et al., 2011, 2013a)

mainly focus on using tags directly as aligned features to build a bridge between

different domains. This way of modelling helps to build an inter-domain connection

for knowledge transfer. However, their models ignore adding constraints to each

81


Algorithm 4.2 Complementary inter-domain correlation constructionInput: Tag assignment triplets T π

tri (π = 1, 2).

Output: Complementary cross-domain similarity matrices Suc and Sv

c .

1: Combine original domain-specific tags from source and target domains by Td =

T 1d + T 2

d .

2: Filter Td by Eq. 4.4 to get lu domain-specific tags T ud based on user-tag

relationship. Similarly, filter Td again to get lv domain-specific tags T vd based

on item-tag relationship.

3: Based on filtered domain spcific tags T ud and T v

d , build tagging matrices Y πu

and Y πv using Eqs. 4.1 and 4.2, respectively.

4: Construct affinity matrix Au and Av as explained in Section 4.3.2.

5: Apply AffinityPropagation on Au (Av) to get ku (kv) tag clusters, then make

tag membership matrix Mu ∈ R(lu+lc)×ku

(Mv ∈ R

(lv+lc)×kv

), where Mu

and Mv only take binary values {0, 1}, and only one ‘1’ can be in each

row of Mu and Mv .

6: for π = 1, 2 do

7: Y πu ← Y π

u × [Mu][1:lu,∗]

Y πv ← Y π

v × [Mv][1:lv ,∗]

8: end for

9: Given Y πu and Y π

v , compute Suc and Sv

c by Eq. 4.3

82


user or item involved in the domain. The resulting loose coupling of the individual

domain will inevitably have side effects for the knowledge transfer. Moreover,

the implicit user or item relationships within the individual domain can also be

elicited from tagging data and be exploited to add more valuable information to

improve the recommendation performance.

Following this thread, we adopt the idea of TagiCoFi (Zhen et al., 2009),

which employs user similarities exploited from user tagging histories to regu-

larize the matrix factorization procedure. Specifically, it adds a regularization

term tr(U�LU ) to the objective function of the PMF model, where tr(·) denotes

the trace of a matrix, L = D − S is known as the Laplacian matrix, S is the

tag-based user similarity matrix, and D is a diagonal matrix whose diagonal

elements Dii = ∑j Sij. The tag-based user similarity matrix S can be computed

using equations (5-6) in (Zhen et al., 2009). Due to space limitations, we skip the

details here.

The regularization term drives the latent factors of the users with similar

tagging behaviors to be similar as well. Such an extension can be considered as

adding an intra-domain correlation from the perspective of users. Similarly, we

can explore the tagging histories of items within individual domain and define

another type of intra-domain correlation in the context of items.

Since in our problem we have two disjoint domains, to add an internal control

for each domain and keep knowledge flowing among similar users or items, we add

regularizations from both user and item perspectives by exploiting intra-domain

user and item tagging histories. The inferred intra-domain correlations are added

as regularization terms in Eq. 4.6.

83


4.3.4 Step 4: Aggregation and Integration of Inter- and

intra-domain knowledge

The role of shared tags and domain-specific tags have been studied separately

in linking different domains. Both types of tags have their advantages and

disadvantages: shared tags serve as aligned features, independent of domain, but

fail to satisfy sufficient quantity, while clustering diverse domain-specific tags

accurately is nontrivial, which inevitably introduces noise to similarity calculations.

As a result, we propose aggregating their contributions to fully explore the

complementary role of shared tags and domain-specific tags for transferring

knowledge. Therefore, we define cross-domain user-to-user and item-to-item

similarities as:

Su = ISub ◦ Su

b +[1 − ISu

b

]◦ Su

c

Sv = ISvb ◦ Sv

b +[1 − ISv

b

]◦ Sv

c

(4.5)

where ◦ denotes element-wise multiplication. By assembling cross-domain sim-

ilarities generated with shared and domain-specific tags, more user and item

connections between domains will be built. In addition, we also need to integrate

the intra-domain correlations inferred from both user and item tagging histories

from the individual domains, so that the knowledge is not only be transferred

between domains but also within the domains.

To this end, we propose to extend joint matrix factorization by imposing

structural constraints on the inter- and intra-domain correlations. Specifically, we

84


minimize following objective function:

f(R1, R2 | U1, V 1, U2, V 2)

=12

n1∑i=1

m1∑j=1

IR1

ij

(R1

ij − g(

U1i∗V 1

j∗�))2

+12

n2∑p=1

m2∑q=1

IR2

pq

(R2

pq − g(

U2p∗V 2

q∗�))2

+λu

2

n1∑i=1

n2∑p=1

ISu

ip

(Su

ip − g(

U1i∗U2

p∗�))2

+λv

2

m1∑j=1

m2∑q=1

ISv

jq

(Sv

jq − g(

V 1j∗V 2

q∗�))2

+λα

2

([tr(U1�L1

uU1)]

+[tr(V 1�L1

vV 1)]

+[tr(U2�L2

uU2)]

+[tr(V 2�L2

vV 2)])

+λβ

2

(‖U1‖2 + ‖V 1‖2 + ‖U2‖2 + ‖V 2‖2

)

(4.6)

where g(·) is a logistic normalization function, which is set as a sigmoid function

in our experiment, and λu, λv, λα, λβ are hyper-parameters for controlling inter-

domain user similarity, inter-domain item similarity, intra-domain similarity and

regularizing latent factors, respectively. The details on how the parameters are

tuned and the effects of different values are presented in the following section.

To optimize the proposed model, we apply stochastic gradient descent to

update U1, V 1, U2, V 2 alternately. The derivative of each variable is computed

85


as per below:

∂f

∂U1i∗

=m1∑j=1

⎛⎝IR1

ij

(R1

i,j − g(

U1i∗V 1

j∗�))

g′(

U1i∗V 1

j∗�)⎞⎠V 1

j∗

+λu

n2∑p=1

⎛⎝ISu

ip

(Su

ip − g(

U1i∗U2

p∗�))

g′(

U1i∗U2

p∗�)⎞⎠U2

p∗

+λα

(L1

uU1)

i∗ + λβU1i∗

(4.7)

∂f

∂V 1j∗

=n1∑i=1

⎛⎝IR1

ij

(R1

i,j − g(

U1i∗V 1

j∗�))

g′(

U1i∗V 1

j∗�)⎞⎠U1

i∗

+λu

m2∑q=1

⎛⎝ISu

jq

(Sv

jq − g(

V 1j∗V 2

q∗�))

g′(

V 1j∗V 2

q∗�)⎞⎠V 2

q∗

+λα

(L1

vV 1)

j∗ + λβV 1j∗

(4.8)

∂f

∂U2p∗

=m2∑q=1

⎛⎝IR2

pq

(R2

p,q − g(

U2p∗V 2

q∗�))

g′(

U2p∗V 2

q∗�)⎞⎠V 2

q∗

+λv

n1∑i=1

⎛⎝ISu

ip

(Su

ip − g(

U1i∗U2

p∗�))

g′(

U1i∗U2

p∗�)⎞⎠U1

i∗

+λα

(L2

uU2)

p∗ + λβU2p∗

(4.9)

∂f

∂V 2q∗

=n2∑

p=1

⎛⎝IR2

pq

(R2

p,q − g(

U2p∗V 2

q∗�))

g′(

U2p∗V 2

q∗�)⎞⎠U2

p∗

+λv

m1∑j=1

⎛⎝ISu

jq

(Sv

jq − g(

V 1j∗V 2

q∗�))

g′(

V 1j∗V 2

q∗�)⎞⎠V 1

j∗

+λα

(L2

vV 2)

q∗ + λβV 2q∗

(4.10)

After updating latent factors of both users and items, we can approximate the

target rating matrix R1 by g(U1V 1) to verify the performance of our proposed

model.

86

4.4 Complexity analysis

4.4 Complexity analysis

The time complexity for each major step of CTagCDR model has been analyzed.

There are four major steps in CTagCDR. For Step 1, it mainly takes o(m1 × m2 ×lc + n1 × n2 × lc) time, which is the sum of computing cross-domain user-to-user

and item-to-item similarity matrices respectively. For Step 2, suppose there are

Ωu(Ωv) nonzero values in the affinity matrix Au (Av), the algorithmic complexity

of applying Affinity Propagation on Au and Av will be o(Ωu × Ωu + Ωv × Ωv).

After getting ku(kv) tag clusters for modelling cross-domain users (items), it

will take additional o(m1 × m2 × ku + n1 × n2 × kv) time to compute cross-

domain similarity matrices based on domain-specific tag clusters. For Step 3, we

model users and items with tags used within individual domain and construct

user and item similarity matrices for both source and target domains. This

step would take o(m1 × m1 × l1 + m2 × m2 × l2 + n1 × n1 × l1 + n2 × n2 × l2)

time. For Step 4, which is the core step of CTagCDR, it will totally take

o(ΩR1 + ΩR2 + ΩSu + ΩSv + ΩL1u

+ ΩL1v

+ ΩL2u

+ ΩL2v) time to update latent factors.

We find that most of time is taken to compute similarity matrix. However,

on one hand, there are some parallel computing software, such as Spark1, can be

used to speed up the computation of similarity matrix. On the other hand, all the

similarity matrices can be pre-processed as inputs for the algorithm since they

are constructed once. Therefore, we believe CTagCDR can be scaled to the large

dataset.1https://spark.apache.org/

87

4.5 Experiments

4.5 Experiments

In this section, we conduct a series of experiments to study the performance of

CTagCDR and test the effectiveness of exploiting both shared and domain-specific

tags in building bridges between disjoint domains for knowledge transfer. We first

describe the datasets used in these experiments, and then explain the experimental

setting, including: the methods employed as benchmarks and metrics applied to

evaluate performance of each approach. This is followed by experiments focused on

setting appropriate parameters for CTagCDR, especially the trade-off parameters:

λu, λv and λα. We also studied the impact of latent factors and ranking position

to the recommendation performance before we made overall comparison. Later,

we examined how our proposed method behaves under different configurations

of tag sparsity. Through these experiments, we aim to find positive answers to

following questions:

Q1: How does CTagCDR perform on different datasets when compared to both

single and cross-domain recommendation methods?

Q2: How does the tag-induced inter-domain correlation improve recommendation

performance when comparing with methods without bridging different domains

with tags?

Q3: How effective are the domain-specific tags in promoting knowledge transfer?

Q4: How will CTagCDR behave in different tag sparsity configurations?

88

4.5 Experiments

4.5.1 Datasets

To make a fair evaluation of our proposed model, it needs to conduct thorough

experiments on three publicly available datasets: MovieLens 10M dataset2, Library-

Thing dataset3 and LastFM dataset4. Those three datasets include information

on both user preferences and tagging information as required by our model.

MovieLens 10M (ML): A user-movie rating dataset containing over 10

million ratings (scales 0.5-5) and 95,580 tag applications, applied to 10,681 movies

by 71,567 users. In ML-10m, not every movie has been rated by a user and at

the same time tagged by at least one distinct tag. In other words, some movies

only have a rating score without a tag assignment, and vice versa. Since we focus

on improving recommendation performance by taking tagging information into

account, we discard records have neither type of information, resulting in 24,564

remaining ratings and tag assignments.

LibraryThing (LT): A user-book rating dataset, containing over 700,000

ratings (scales 1-5) and 2 million tag applications used by 7,564 users on 39,515

books. Each user gives a rating score and a tag assignment to a book in the

LT dataset. We observed inconsistencies in the rating scores in the original LT

dataset, where the same user-book pair has multiple different rating scores. To

avoid these inconsistencies, we filtered duplicate user-book pairs and kept only the

first record from the original dataset. To preserve a moderate size for evaluation,

we then selected the top 24564 records using the original dataset order as our final

dataset.2http://www.grouplens.org/node/733http://www.macle.nl/tud/LT/4http://grouplens.org/datasets/hetrec-2011/

89

4.5 Experiments

Table 4.2 Statistics of datasets used in Chapter 4

MovieLens-10m LibraryThing LastFM

# of users 2026 244 1524

# of items 5088 12809 6854

# of tags 9529 4596 7927

# of ratings 24564 24564 20665

rating ratio 0.24% 0.79% 0.17%

LastFM (FM): A user-song listening counts dataset released in HetRec 2011

contains social networking, tagging, and music artist listening information from

a set of 2K users from the Last.fm online music system. The same pruning

strategy was also applied to the FM dataset to select user-artist pairs with both

listening counts and tag information. We normalize each user’s listening count

by averaging all his/her listened artists. Unlike the aforementioned two datasets

with explicitly rated scores, the FM dataset contains only the listening count of a

user, considered as implicit feedback. We applied this dataset in our experiment

to test the performance of our proposed model with different forms of feedback.

The description of the final datasets used in this chapter is listed in Table 4.2.

4.5.2 Experiment Setup

4.5.2.1 Evaluation Methodology

We compared the performance of CTagCDR with both state-of-the-art single

domain and cross-domain recommendation approaches listed below:

PMF (Mnih and Salakhutdinov, 2008): probabilistic matrix factorization is a

popular method for basic matrix factorization. It tries to model user preferences

90

4.5 Experiments

as the dot product of latent factors of users and items. PMF is a state-of-the-art

single domain recommendation model but only exploits rating data. We apply

PMF as a single domain benchmark to evaluate the benefit of integrating tagging

information to improve recommendation performance.

TagiCoFi (Zhen et al., 2009): A tag-induced collaborative filtering method, which

builds on the basis of PMF model and captures user relationship from user tagging

behaviors to regularize matrix factorization. Compared to PMF, TagiCoFi exploits

both rating and tagging data in a single domain.

TagCDCF (Shi et al., 2011): Tag-induced cross-domain collaborative Filtering

is a tag-based cross-domain recommendation approach. It exploits overlapping

tags to link disjoint domains. The relationship between two domains is encoded

in cross-domain user-to-user and item-to-item similarity matrices. By building

domain connection with overlapping tags, useful knowledge can be transferred

from the source domain to the target domain. However, TagCDCF exploits

user/item-tag relations with only binary indicators. We applied TagCDCF as a

cross-domain benchmark to test our idea of exploiting domain-specific tags for

promoting knowledge transfer.

GTagCDCF (Shi et al., 2013a) : General tag-induced cross-domain collaborative

filtering is proposed to improve TagCDCF by taking the tagging frequency into

account. It is able to capture more information represented by tags and handle

multi-domain cases. In addition, GTagCDCF does not rely on the computation

of cross-domain similarity. Similar to TagCDCF, GTagCDCF only exploits

overlapping tags to connect multiple domains.

TMT (Fang et al., 2015): Cross-domain recommendation via tag matrix transfer

is a recently proposed model, which aims to establish a tag co-occurrence pattern

91

4.5 Experiments

from tag collections of both source and target domains, and it transfers knowledge

across domain via the leaned pattern. TMT is similar to our proposed model in

exploiting more types of tags, which include both overlapping and domain-specific

tags. However, instead of assuming sharing a common pattern between domains,

our model aims to cluster domain-specific tags as features to profile users and

items so that more implicit domain links can be discovered as bridges to transfer

knowledge.

4.5.2.2 Evaluation Metric

In datasets with explicit rating scores, such as ML-10m and LT, our task is to

predict a user preference score. To be consistent with existing work in evaluating

rating prediction, we adopt mean absolute error (MAE) and root mean square

error (RMSE) as the evaluation metric. The definitions of MAE and RMSE are

shown below:

MAE =∑

i,j∈TE

|rij − rij||TE|

RMSE =√√√√ ∑

i,j∈TE

(rij − rij)2

|TE|(4.11)

where rij denotes the predicted rating score user i give to item j, rij is the

corresponding ground truth. TE denotes the ratings needed to be predicted in

the test set and |TE| denotes the number of test cases. A lower MAE or RMSE

means a better prediction performance.

In dataset with implicit feedback, such as FM, our task is to provide user a

ranking list with limited items. In this task, we adopt Normalized Discounted

92

4.5 Experiments

Cumulative Gain (NDCG@k) to evaluate the ranking performance. First, for each

test user u, the DCG over first recommended k items is defined as:

DCG@k =k∑

i=1

2relui−1

log2(i + 1) (4.12)

where relui denotes the ranking score of recommended item at position i for user u.

The NDCG@k is the normalized version of DCG@k and averaged over all test

users, as defined by:

NDCG@k = 1N

N∑u=1

DCG@k

IDCG@k(4.13)

where IDCG@k is the DCG of the ideal ranking order, i.e., the ranking order based

on the actual ratings in test set. Higher values of NDCG@k are more desirable as

they indicate that the user favored items in their predicted lists.

4.5.2.3 Experimental Protocol

We examine the compared models for cross-domain recommendation task. For

that, we have ML vs LT, LT vs ML, FM vs ML, FM vs LT, LT vs FM and ML

vs FM as different kinds of related domains (the former is treated as the source

domain and the latter as target domain).

For each pair of domain, all the ratings and tag assignments from source

domain were applied as training data, while for each user in the target domain,

we randomly selected 20% of his/her ratings together with corresponding tag

assignments as the test data for evaluation, and the remaining ratings and tags

were combined with source domain ratings and tags to build training model. We

repeated each experiment 10 times and reported the average results.

93

4.6 Parameter Analysis

For fair comparison, we set a uniform size of latent factors for all methods.

The impact of latent factors on recommendation performance is further studied

in Section 4.7. In our implementation, the maximum iteration number was set to

100, and we adopted the best parameters reported in the corresponding papers to

implement each benchmark method.

For CTagCDR, the regularization parameter λβ was tuned to be 0.01 for

rating prediction task and 0.1 for ranking task, while the selection of other

parameters λu, λv, λα is described in the following section.

4.6 Parameter Analysis

There are two main outputs from the first three steps in CTagCDR: the inter-

domain and the intra-domain correlations (see Figure 4.1). They are learned

from tagging relations and are incorporated into matrix factorization to improve

recommendation performance. In this section, we will study the role of those

two tag-induced structures separately and describe the hyper-parameter tuning

process in controlling their contributions. For the sake of simplicity, we fixed

latent factors of CTagCDR model to 10 in this experiment.

To analyze the impact of the inter-domain correlations, we first set λα = 0,

which denotes no structural constraint is considered within individual domains.

The trade-off parameters λu and λv reflect the influence of cross-domain user-

to-user and item-to-item similarities, respectively, in regularizing mutual matrix

factorization. We adjusted the values of two trade-off parameters through a grid

search, and measured the recommendation performance in terms of RMSE on

the ML, LT datasets and NDCG@10 on the FM dataset. Due to limited space,

94

4.7 Impact of latent factors

we only presented LT vs ML, ML vs LT and ML vs FM as examples to describe

the parameter tuning process, the results are shown in Figure 4.3. From the

observations, we can find that CTagCDR achieved good results when λu and λv

moved over a wide range. This also indicates that CTagCDR model is not intended

to fall to the local optimum when searching for global solutions.

By adopting the optimal parameters for controlling the contribution from

the inter-domain correlation, we began to investigate the impact of intra-domain

correlations on recommendation performance. We varied the value of λα within

the range of {0.001, 0.01, 0.1, 1, 10} and showed the results in Figure 4.4. Due

to different tag configurations in all three cases, CTagCDR achieved the best

performance with different λα.

Next, by adopting the optimal parameters, we will investigate the effect

of latent factors and ranking position of ranking list on the recommendation

performance, and later make comprehensive comparison with all benchmarks.


To study the impact of latent factors on the recommendation performance, we

evaluated the factors of [5, 10, 15, 20]. Figure 4.5(a) to Figure 4.5(e) show the

performance of RMSE and NDCG@10 with respect to the number of latent factors.

Based on the results in Figure 4.5, we can find that CTagCDR achieves the

best performance on five pairs of domains and is comparable to the best in the case

of LT vs FM when the number of latent factors is set to 10, outperforming other

state-of-the-art approaches by a large margin. However, we also notice that the

performance of CTagCDR degrades when increasing the number of latent factors

95


(a) LT vs ML

(b) ML vs LT

(c) ML vs FM

Fig. 4.3 Impact of λu and λv on the recommendation performance of CTagCDR

96


(a) LT vs ML

(b) ML vs LT

(c) ML vs FM

Fig. 4.4 Impact of λα on the recommendation performance of CTagCDR

97

4.8 Sensitivity analysis on Top-k Recommendation

to some extent. As widely investigated in the previous works, larger latent factors

may cause overfitting and result in high computational complexity. Therefore,

we set latent factors to 10 for the following experiments considering the trade-off

between recommendation performance and computation cost.

4.8 Sensitivity analysis on Top-k Recommenda-

tion

To determine whether item ranking performance is sensitive to the ranking position

k of ranking list, we conducted another group of experiment by changing k from

1 to 10. Figure 4.6 shows the ranking performance of all methods in terms of

NDCG@k.

98


(a) LT vs ML

(b) ML vs LT

Fig. 4.5 Performance of RMSE and NDCG@10 on LT vs ML and ML vs LT w.r.t.the number of latent factors 99


(c) FM vs LT

(d) FM vs ML

Fig. 4.5 Performance of RMSE and NDCG@10 on FM vs LT and FM vs ML w.r.t.the number of latent factors 100


(e) LT vs FM

(f) ML vs FM

Fig. 4.5 Performance of RMSE and NDCG@10 on LT vs FM and ML vs FMw.r.t.the number of latent factors

In Figure 4.6, we can find all the compared methods perform in the same trend

with respect to small variations in the value of k. That is, the recommendation

101


(a) ML vs FM

(b) LT vs FM

Fig. 4.6 Performance of NDCG@k w.r.t. the ranking position k of ranking list

102

4.9 Performance Comparison

performance will gradually get improved when examining from top-1 item to

top-10 items in a recommendation list. The results of top-k recommendation also

highlights our method’s superiority in item ranking task.

To be consistent with literature in selecting a proper size of ranking list, we

truncated the ranking list at 10 and applied NDCG@10 as a main metric for

evaluating ranking performance.


Table 4.3 shows the performance of CTagCDR and the other approaches over

6 different domain pairs. The overall results demonstrate that CTagCDR is an

effective recommendation approach in both rating prediction and item ranking

tasks. For rating prediction task, first in ML vs LT, we can see that CTagCDR

outperforms second best method GTagCDCF by a large margin, the average

improvement is up to 12.15% in terms of MAE and 10.99% in terms of RMSE.

Second in FM vs LT, the improvement over TMT is 12.39% in MAE and 14.17%

in RMSE, respectively. Lastly in FM vs ML, CTagCDR achieves relatively small

improvement over TMT by 1.14% in MAE and 0.81% in RMSE. With regards to

item ranking task, CTagCDR outperforms TMT by 0.4% in terms of NDCG@10 in

ML vs FM. We also noticed that in the cases of LT vs ML and LT vs FM, CTagCDR

slightly underperforms TMT. The possible explanation for underperformance is

that, the insufficient tagging data in source domain LT is not able to share enough

overlapping tags with target domain. Since CTagCDR groups domain-specific tags

based on overlapping tags and utilizes co-occurrence pattern, less reliable domain

connection built by overlapping tags will introduce noise into the clustering of

103

4.9P

erformance

Com

parison

Table 4.3 Overall performance on six domain pairs

ML→ LT LT→ ML FM→ LT FM→ ML LT→ FM ML→ FM

MAE RMSE MAE RMSE MAE RMSE MAE RMSE NDCG@10 NDCG@10

PMF 0.9573 1.1962 1.0343 1.315 0.9526 1.1863 1.0343 1.315 0.886 0.738

TagiCoFi 1.2037 1.4292 1.2399 1.4684 1.1841 1.4106 1.2399 1.4684 0.884 0.757

TagCDCF 1.0404 1.3714 1.1017 1.4346 1.1555 1.5098 1.1559 1.5105 0.898 0.740

GTagCDCF 0.8092 1.0043 0.8281 1.0535 0.8629 1.0688 0.844 1.0721 0.902 0.760

TMT 0.8109 1.0401 0.7106 0.9512 0.8041 1.0315 0.7127 0.9524 0.884 0.761

CTagCDR 0.7108 0.8939 0.7262 0.9867 0.7045 0.8853 0.7046 0.9447 0.902 0.764

104


domain-specific tags, which will further harm cross-domain similarity calculation

and mislead joint matrix factorization. The overall results in Table 4.3 help to

address question Q1, we can conclude that CTagCDR is superior to the rest of

competing methods for most datasets.

For the single domain recommendation approaches, to our surprise, PMF

outperformed TagiCoFi in both LT and ML datasets, even though TagiCoFi takes

tagging information into account to improve recommendation performance. One

possible explanation to interpret underperformance of TagiCoFi is that, fails to

meet model complexity with sufficient latent factors when adding tag-induced

user similarity as regularization into matrix factorization. From Figure 4.5, we

observed that the performance of TagCoFi improves with the increase of the

number of latent factors.

For the tag-based cross-domain recommendation approaches, apart from

TagCDCF, they all outperform the single domain recommendation methods,

showing the success of transferring knowledge from relevant domain to compen-

sate data sparsity in target domain. TagCDCF exploits binary tag information

and underperforms PMF, which indicates that loose domain coupling is possible

to result in negative knowledge transfer and thus deteriorates recommendation

performance. We also observed same result as reported in (Shi et al., 2013a)

that the performance of GTagCDCF is superior to TagCDCF, providing insight

that modeling tagging frequency is more helpful for improving recommendation

quality. In addition to exploiting shared tags only, TMT and CTagCDR integrate

more types of tags into matrix factorization and both show strong performance.

The difference lies in the way of modeling tagging information. TMT builds

a tag-occurrence matrix from mixed tags of different domains and transfers it

105

4.10 Performance under Different Sparsity Level

as a shared pattern across domains, while CTagCDR clusters domain-specific

tags as features to augment cross-domain connections so that more knowledge

transfer bridge can be built. CTagCDR consistently outperforms TMT on most

pairs of domains, admitting the effectiveness of building independent intra- and

inter-domain correlations in regularizing knowledge transfer.

To address question Q2, we specifically focused on the comparison between

CTagCDR and TagiCoFi for two reasons. First, both models integrate tagging

information into matrix factorization to improve recommendation performance.

Second, they both establish intra-domain correlation by learning a user-user simi-

larity matrix from user tagging behaviors in the single domain, but CTagCDR

additionally connects two different domains for knowledge transfer by exploit-

ing tags from both domains. The outperformance of CTagCDR illustrates the

effectiveness of integrating tag-induced intra-domain correlation for promoting

knowledge transfer.

With regard to question Q3, we expected to find a positive role for domain-

specific tags in strengthening cross-domain connections. This conclusion can be

confirmed by the multiple comparisons between CTagCDR and TagCDCF, and

between CTagCDR and GTagCDCF.


We further evaluate how different components (shared tag induced inter-domain

correlation, specific tag cluster induced inter-domain correlation, tag induced

intra-domain correlation) contribute to the recommendation performance when

handling different data sparsity. In specific, we compare the performance of four

106


Fig. 4.7 Change of recommendation performance on ML vs LT during the incrementof train data size

relevant models to check their robustness and behaviors under different data

sparsity configurations. These four relevant models are as follows: base model is

a pure TagCDCF model but replacing binary information with tf*idf value in the

tagging matrix; base+intra further adds tag induced intra-domain correlations to

the base model; While base+inter model only adds domain-specific tags induced

inter-domain correlation to the base model and base+inter+intra refers to the

model that add all above tag-induced correlations.

We chose ML vs LT as example to evaluate these models by incrementally

increasing the training data in LT from 20% to 80% with step size 20%. The

results are shown in Figure 4.7. Note that in our experiment setting each rating

score is accompanied by at least one tag assignment. If we decrease the rating

data, the size of tagging data is reduced correspondingly. This helps to simulate

sparse conditions in both rating and tagging data.

107

4.11 Summary

As we see in Figure 4.7, the recommendation performance, which is evaluated

in terms of RMSE, will gradually get improved when we incrementally increase

the size of training data. In addition, base+inter model outperforms base model

in all sparsity level, admitting the effectiveness of integrating contributions from

domain-specific tags in enhancing domain connection. This advantage will be

significant when more training data is given since more overlapping tags are

shared by both domains. To our surprise, compared to base and base+inter

models, for base+intra model, tag-induced intra-domain correlation is more

helpful in promoting knowledge transfer for improving recommendation perfor-

mance. This can be explained by the fact that compact structural constraint

will automatically group similar users and items in each individual domain to-

gether. As a result, knowledge can be transferred between group of users and

items, which avoids introducing noise in the individual level. We also noticed that

base+inter+intra model performs slightly better when comparing with compet-

itive model base+intra. This observation shows to some extent that adding the

contribution from domain-specific tags is effective to promote knowledge transfer.

However, how to model domain-specific tags to avoid introducing noise remains a

challenging problem for the future study.

4.11 Summary

In order to explore the complementary role of different types of tags in linking

disjoint domains, this chapter proposes a complete tag-induced cross-domain

recommendation model, called CTagCDR, which infers inter- and intra-domain

correlations from tags as structural knowledge to regularize joint matrix factor-

108

4.11 Summary

ization. Compared to existing tag-based cross-domain recommendation models,

CTagCDR is able to capture the complete information encoded in both overlap-

ping and domain specific tags. The experimental results on three public datasets

and with five state-of-the-art baseline approaches demonstrate that CTagCDR

performs well in both rating prediction and item recommendation tasks and

can effectively improve recommendation performance when rating data in target

domain is sparse.

109

Chapter 5

Exploiting Tag Semantic for


5.1 Introduction

Despite the success achieved in Chapter 3 by exploiting abundant domain specific

tags to increase domain correlation, and in Chapter 4 by taking advantage of

different types of tags to infer structural constrains for regularizing knowledge

transfer, there are still some limitations in above methods. Overall, we apply bag-

of-words and utilize tags as features to build user and item profiles. The similarity

between cross-domain users or cross-domain items is solely based on lexical

similarity of tags. However, the uncontrolled vocabularies has resulted numerous

ambiguous, redundant and non-identical tags. If two tags are semantically related

but use different words, above methods may not consider those two tags to be

similar. In addition, each tag used in making profile is treated independently.

110

5.1 Introduction

Therefore, the context of tag distribution on a user or an item side has not been

preserved.

To address aforementioned problems, in this chapter a new tag semantic-

boosted cross domain recommendation algorithm, called TSCDR , is proposed to

improve cross-domain recommendation performance by considering tag semantics.

First, word2vec (Mikolov et al., 2013) technique is applied on a data structure

designed to encode tagging context. The output is a latent representation of

tags to reflect the semantic similarity among tags. Based on the learned tag

representation, then k-means clustering is used to merge those non-identical but

semantically equivalent tags into the same group. The resulted tag clusters are

further exploited as a joint embedding space to span across domains. By mapping

users and items to the same embedding space, we can identify more accurate cross-

domain user-to-user similarity and item-to-item similarity to regularize knowledge

transfer. Extensive experiments conducted on three public datasets with different

sparsity settings have justified the promising performance of TSCDR on top-N

recommendation task.

The rest of this chapter is organized as follows: Section 2 introduces the

preliminary knowledge for understanding this chapter. Section 4 is dedicated

to the presentation of the proposed model. In Section 5 extensive experiments

are performed to demonstrate the effectiveness of our approach. Conclusions are

summarized in Section 6.

111

5.2 Preliminaries

5.2 Preliminaries

This section begins with some frequently used notations, followed by the formu-

lation of the recommendation problem this study aims to address. An overview

of the tag-induced cross-domain collaborative filtering model, which forms the

building block for our proposed approach concludes this section.

5.2.1 Notations and Problem Formulation

Bold uppercase letters, such as Z, denote matrices. The i-th row and the j-th

column are denoted as Zi∗ and Z∗j, respectively. The (i, j)-th element of matrix

Z is denoted as Zi,j. Given source domain s and a target domain τ , in a single

domain π (π ∈ {s, τ}), there are nπ users Uπ and mπ items Iπ. The interaction

between users and items is represented by the user-item rating matrix Rπ, where

rπui is the rating score given by user u to item i. A binary weight matrix IRπ masks

the missing entries in Rπ, where (IRπ)ij = 1 if Rπij is observed and (IRπ)ij = 0

otherwise. In addition to the rating data, there are lπ tags T π annotating items.

Aπ ⊆ Uπ × Iπ × T π is a set of tag assignments, where the element (u, i, t) denotes

the tag t has been attached to item i by user u. Some frequently used symbols

along with their definitions are summarized in Table 5.1.

Our cross-domain recommendation task is to estimate the unobserved rat-

ing scores in Rτ in order to rank items. Formally, the source domain Φs =

{U s, Is, Rs,As} contains a dense training set and the target domain Φτ =

{U τ , Iτ , Rτ ,Aτ } contains a sparse one. The goal is to learn a function f with pa-

rameters Θ that predicts the most likely rating rτui in the test set Φτ = {(u, i)|u ∈

112

5.2 Preliminaries

Table 5.1 Symbols and corresponding descriptions used in Chapter 5

Symbols Descriptions

π = {s, τ} domain indicator (s for source domain, τ for target domain)

u, i, t individual user u, item i and tag t

Uπ A set of users in domain π, and |Uπ| = nπ

Iπ A set of items in domain π, and |Iπ| = mπ

T π A set of tags in domain π, and |T π| = lπ

Aπ tag assignments in domain π, Aπ ⊆ Uπ × Iπ × T π

Rπ user-item rating matrix in domain π, Rπ ∈ Rnπ×mπ

IRπ indicator matrix for Rπ, IRπ ∈ Rnπ×mπ

rπui rating score given by user u on item i in domain π

rπui predicted rating score for user u on item i in domain π

d dimension size for latent factors

P π latent factor matrix for users in domain π, P π ∈ Rnπ×d

Qπ latent factor matrix for items in domain π, Qπ ∈ Rmπ×d

SU cross-domain user-user similarity matrix

SI cross-domain item-item similarity matrix

‖Z‖F Frobenius norm of matrix Z

λ regularization parameter for �2-norm

λu, λiregularization weights for cross-domain user similarityand cross-domain item similarity, respectively

U τ , i ∈ Iτ }. This recommendation task can be formulated as:

arg min(rτui − rui) = arg min(f(u, i|Φs, Φτ , Θ) − rui)

s.t. U s ∩ U τ = ∅

Is ∩ Iτ = ∅

(5.1)

113

5.2 Preliminaries

5.2.2 Tag-induced Cross Domain Collaborative Filtering

Model

The TagCDCF model (Shi et al., 2011) uses overlapping tags as common features to

model both user and item profiles, and infers the similarities between cross-domain

users and cross-domain items as prior knowledge to regularize the joint matrix

factorization. TagCDCF is formulated as a minimizing optimization problem:

L =12(Rs − IRs � (P s(Qs)�)

)2

+12(Rτ − IRτ � (P τ (Qτ )�)

)2

+λu

2(SU − ISU � (P s(P τ )�)

)2

+λi

2(SI − ISI � (Qs(Qτ )�)

)2

+λ(‖P s‖2F +‖Qs‖2

F +‖P τ ‖2F +‖Qτ ‖2

F )

(5.2)

where � denotes the element-wise multiply operation. Knowledge in the source

domain is transferred to the target domain through two key components: the

similarity matrix between cross-domain users SU and the similarity matrix between

cross-domain items SI . TagCDCF only considers overlapping tags, not non-

overlapping tags, when building domain connections, and it does not exploit the

semantic information in tags. Hence, TagCDCF does not achieve optimal results

because SU and SI are inaccurate. There are several ways to infer SU and SI by

integrating the semantic information in tags. These are introduced next.

114

5.3 Tag Semantically-boosted Cross-domain Recommendation

Fig. 5.1 An example of ambiguous, redundant and non-identical but semanticallyequivalent tags

5.3 Tag Semantically-boosted Cross-domain Rec-

ommendation

The uncontrolled vocabularies used by users when assigning a tag has resulted

numerous ambiguous, redundant and non-identical tags (see Figure 5.1), therefore

we propose to utilize semantics to correlate tags to eliminate noise in tagging

data. The goal is to infer more accurate SU and SI on this basis. We begin by

introducing two simple baseline solutions. The details of our approach follow.

The most intuitive idea is to apply topic modelling technique on tag collection

consisting of tags from both source and target domains to learn joint topics, as

shown in Figure 5.2a. Using this method, cross-domain users and items can be

115


linked by mapping tag-based user and item profiles to the joint topic space. This

model is called Joint Topic Mining and details are introduced in Subsection 5.3.1

Joint topic mining connects different domains through a subset of joint topics.

However, it may be difficult to find enough reliable joint topics to fully represent

tagging behaviors of different domains. Therefore, our second model performs

topic modelling on individual domains separately to ensure the accuracy of the

tag topics. The challenge lies in how to align topics in different domains. We

propose adding a link between two topic nodes if the same tags are distributed

across two topics, as shown in Figure 5.2b. As such, an implicit path is inferred

and built to connect cross-domain users and items. This method is called Topic

Alignment, and the details are described in Subsection 5.3.2.

The topic alignment method is able to reveal distinctive topics in each domain,

but its performance suffers from two drawbacks. First, building topic links by

identical tags tends to result in sparse connections, especially for the topics

unique to each domain. Second, this method treats each tag independently, while

ignoring the surrounding tags used by the same user or annotated on the same

item. Therefore, our model also considers the usage context in which a tag is

used when extracting the tag semantic information. The details are presented in

Subsection 5.3.3.

5.3.1 Joint Topic Mining

The first step in determining a set of joint topics for two heterogeneous domains is

to construct a tag corpus by combining tags from both source and target domains.

This is specifically defined as F = T s ∪ T t, where |G| = g denotes the number

116


(a) Joint Topic Mining

(b) Topic Alignment

Fig. 5.2 Graphical illustration of joint topic mining and topic alignment

117


of unique tags in both domains. Based on the tag corpus, a tag weight vector

xsu = {ws

u1, wsu2, · · · , ws

ug} is created for source domain user u to represent his or

her preferences on all candidate tags. The tag weight wsut is defined as follows:

wsut =

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩f s(u, t) if tag t was assigned by user u

0 otherwise(5.3)

where f s(u, t) denotes the frequency that source domain user u used tag t to label

items in Is. Given the source domain user-tag weight vector, a ns × g user-tag

matrix Xs is created:

Xs =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

xs1

...

xsu

...

xsns

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

ws11, ws

12, · · · , ws1g

... ... ... ...

wsu1, ws

u2, · · · , wsug

... ... ... ...

wsns1, ws

ns2, · · · , wsnsg

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(5.4)

A g-dimensional tag weight vector ysi = {δs

i1, δsi2, · · · , δs

ig} can also be defined

for the source domain item i. Then, each tag weight δsit is calculated as follows:

δsit =

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩f s(i, t) if tag t was labeled on item i

0 otherwise(5.5)

118


where f s(i, t) denotes the frequency that tag t was attached on source domain

item i by users in U s. An ms × f item-tag matrix Y s is created for all items in

the source domain,

Y s =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

ys1

...

ysi

...

ysms

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

δs11, δs

12, · · · , δs1g

... ... ... ...

δsi1, δs

i2, · · · , δsig

... ... ... ...

δsms1, δs

ms2, · · · , δsmsg

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(5.6)

which represents the tag distributions on items in source domain. Similarly, an

nτ × g matrix Xτ is created for target domain to denote the relationships between

its users and the tag corpus, and an mτ × g matrix Y τ is created to denote the

relationship between the items and the tag corpus.

Next, we concatenated tagging matrices of both domains to construct mixture

tagging matrices as:

X =

⎡⎢⎢⎢⎢⎢⎢⎣Xs

Xτ

⎤⎥⎥⎥⎥⎥⎥⎦ Y =

⎡⎢⎢⎢⎢⎢⎢⎣Y s

Y τ

⎤⎥⎥⎥⎥⎥⎥⎦ (5.7)

where X ∈ R(ns+nτ )×g and Y ∈ R

(ms+mτ )×g. Then we applied topic modelling

techniques to learn latent features for cross-domain users and items in the form of

topics. We chose Latent Dirichlet Allocation (LDA) (Hoffman et al., 2010) as our

119


basic topic modelling method because of its significant performance in many NLP

tasks. In the problem at hand, non-overlapping tags are considered to be related

in both domains when they co-occur with the same overlapping tags. Further,

the more frequent the co-occurrence, the stronger the relationship among the

non-overlapping tags.

The topic distributions of tags on users can be learned by feeding the combined

matrix X to the LDA model, which is represented as a user-topic matrix N =⎡⎢⎢⎢⎢⎢⎢⎣N s

N τ

⎤⎥⎥⎥⎥⎥⎥⎦, where N ∈ R(ns+nτ )×k and k is the number of joint topics. N is divided

into two parts including source domain user-topic matrix N s ∈ Rns×k and target

domain user-topic matrix N τ ∈ Rnτ ×k. Similarly, the item-topic matrix M =⎡⎢⎢⎢⎢⎢⎢⎣

M s

M τ

⎤⎥⎥⎥⎥⎥⎥⎦ represents the item topic distributions, which can be learned by feeding

Y into the LDA model, where M s ∈ Rms×k denotes source domain item-topic

matrix and M τ ∈ Rnτ ×k denotes target domain item-topic matrix.

The similarity between a source domain user/item and a target domain

user/item is measured by mapping the cross-domain users and cross-domain

items into the same topic space and using the cosine similarity between the latent

representations of their profiles over joint topics. This is formally defined as:

sim(zs, zτ ) = zs · zτ

|zs||zτ | (5.8)

where zs is a row vector from N s (M s), and correspondingly zτ is a row vector

from N τ (M τ ). In this way, we obtain a refined cross-domain user similarity

120


matrix SU and a cross-domain item similarity matrix SI as evaluated by the dense

topics rather than sparse tagging data.

5.3.2 Topic Alignment

The joint topic mining method relies on general concepts, such as joint topics,

to link different domains. However, as previously mentioned, this method is not

able to identify unique discriminative topics expressed by the domain specific tags.

Therefore, rather than building a joint topic space, we propose extracting topics

in each domain separately. In case of diverse topics belonging to multiple domains,

the challenge is finding matching relationships between the topics.

We chose a topic matching model (Tang et al., 2012) that combines the topic

model with random walk to explore the implicit relationships among different

objects (e.g. authors, papers, publication venues) in cross-domain research col-

laboration. However, the tags in our problem are different from the keywords in

scientific papers, and the extracted topics do not represent collaborative depen-

dencies between users or between users and items. Therefore, we again applied

basic LDA model to estimate the topic distributions of tags associated with users

and items. For simplicity, assume there are ks topics in source domain and kτ

topics in target domain.

To build a path from users in the source domain to the users in the target

domain, the LDA model obtains two sets of topic distributions for users, which are

extracted from both domains respectively. The topic distribution from the source

domain is denoted as KsU ∈ R

ns×ks ; KτU ∈ R

nτ ×kτ denotes the distribution in

target domain. Then, the user-topic graphs generated in LDA model are extended

121


by linking the topics nodes from both domains. The relevance between user topic

nodes φsj and φτ

j′ is defined as:

Rel(φsj , φτ

j′ ) =

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩|P(φs

j)∩P(φτ

j′ )|

|P(φsj)∪P(φτ

j′ )| if P(φs

j) ∩ P(φτj

′ ) �= ∅

0 otherwise(5.9)

where P(φsj) denotes the subset of tags distributed over source domain user topic

φsj , and P(φτ

j′ ) denotes the corresponding tag set distributed over target domain

user topic φτj

′ . This generates the topic linkage matrix LU ∈ Rks×kτ , which

represents connections between the user topics in both domains.

Based on the new graph representation, a random walk with restart algo-

rithm (Tong et al., 2008) is used to suggest relevant user nodes in target domain

for a given user node in source domain, i.e.,

α(t+1) = (1 − ε)T αt + εβ (5.10)

where αt ∈ Rns+nτ is a vector denoting the probability that random walk arrives

at corresponding user nodes in step t, β ∈ Rns+nτ is a vector of 0, except that

the element corresponding to the start node is set to 1. ε denotes a probability

that the walk will return to the start node in each step. T defines the transition

probability of the random walk, and T =

⎡⎢⎢⎢⎢⎢⎢⎣X I

I� X�

⎤⎥⎥⎥⎥⎥⎥⎦, where X = KsULU (Kτ

U)�

and I is an identity matrix. Since we are only interested in finding relevant users

in the target domain for a given user in the source domain, the part corresponding

122


to the users in the target domain in α is sectioned off to compose a row in SU

when a stable state is reached in Equation 5.10.

Similarly, the above processes are also applied to items to extend the item-topic

graphs and update cross-domain item similarity matrix SI .

5.3.3 Embedding space Learning

While the topic alignment method is able to correlate tags between domains

through topics, it is not able to identify and match tags that share similar

semantic meanings. Inspired by word2vec (Mikolov et al., 2013), which attempts

to map words and phrases to a low dimensional continuous vector space to capture

semantic and syntactic information in words, the tagging data is projected into

the same one unified framework of neutral word embedding to learn the semantic

relationships between the tags.

There are two main similarities between the text data and the tagging data:

(1) The tagging data is organized in a similar way to the text data in word2vec.

In the tagging data, user tag specific item, and a list of relevant tags can be

collected from each user-item pair. These user-item pairs can be mapped

as documents in word2vec, and the distributed tags associated with the

user-item pairs are equivalent to the words in word2vec.

(2) The tagging data has a similar contextual setting to words in the text data.

Tags are text-based features, and a set of tags used in the same user-item pair

is used as the context for finding relationships between a current tag and its

surrounding tags. Further, tags with the same context are interchangeable

123


--

--

….

, , , ,

, , , , ,

….

....

....

Fig. 5.3 Modelling tagging data for word2vec. The tag marked by red color denotesoverlapping tag in both domains.

if the time they were generated is discarded. In other words, the order of

tags does not influence the prediction result.

In this case, we can apply word2vec on tagging data to produce tag embeddings.

Since the tag set associated with each user-item pair is usually small, we used

skip-gram model with negative sampling (SGNS) , which is more effective on

small dataset. In particular, we employed the word2vec technique implemented

in gensim toolbox 1 to process tagging data. To prepare input for word2vec,

<user-item-tag> triplet data was presented such that tags used on the same

user-item pair are collected as a list, and all the tag lists are presented together to

build the dictionary (corpus). The presentation of modelling tagging data as input

to word2vec is shown in Figure 5.3 . Within the word2vec model, we set vector

dimensionality to 300 and context window size to 5, the context window size

indicates how many tags around the target word are considered as context during1https://radimrehurek.com/gensim/

124


training. Word2vec outputs continuous vector representations for the unique

tags in the dictionary, which can be used to decide which tag is semantically or

contextually closer to which.

Although word2vec can process the uniform-length tag vectors, the challenge

of our problem is to handle the user and item profiles with variable-length tags.

Hence, the individual tag vectors need to be transformed into a feature set that is

the same length for both cross-domain users and cross-domain items. One possible

way to accomplish this is to exploit tag clusters as new features to build user

and item profiles. Then, the k-means clustering method can be used to group

semantically related tags based on the learned tag vectors. We selected k-means

clustering technique for our approach because it is simple and computationally

efficient in terms of computational cost; However, other clustering methods could

also be applied. The only issue with k-means clustering is choosing the right

number of clusters. The effect of the amount of chosen clusters on the performance

of final recommendation is further discussed in Section 5.4.5.2.

Suppose there are θ tag clusters. A simple function converts the tag-based

user profiles into a bag-of-centroids. This works just like bag-of-words but uses

clusters instead. The function is defined as:

f(P(u)) = (p(c1|u), · · · , p(cj|u), · · · , p(cθ|u))

p(cj|u) =

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩|P(u)∩P(cj)|∑θ

j|P(cj)| if P(u) ∩ P(cj) �= ∅

0 otherwise

(5.11)

125


where p(cj|u) measures the probability that user u tends to use tags in cluster cj

based on his/her tagging history. P(u) denotes the tags used by user u and P(cj)

denotes the tags belonging to the cluster centroid cj.

Once the user profiles have been refined into tag clusters, we can compute SU

which reflects user relationships according to their tagging behaviors. Similarly,

we can apply Equation 5.11 to build profiles for cross-domain items and calculate

SI .

5.3.4 Optimization

To minimize the objective function in Equation 5.1, we apply gradient descent to

alternately update P s, Qs, P τ , Qτ . The derivative of each variable is computed

as:∂L

∂P s=(Rs − IRs � (P s(Qs)�)

)Qs

+λu

(SU − ISU � (P s(P τ )�)

)P τ

+λP s

(5.12)

∂L∂Qs

=(Rs − IRs � (P s(Qs)�)

)�P s

+λu

(SI − ISI � (Qs(Qτ )�)

)Qτ

+λQs

(5.13)

∂L∂P τ

=(Rτ − IRτ � (P τ (Qτ )�)

)Qτ

+λv

(SU − ISU � (P s(P τ )�)

)�P s

+λP τ

(5.14)

126

5.4 Experiments and analysis

∂L∂Qτ

=(Rτ − IRτ � (P τ (Qτ )�)

)�P τ

+λv

(SI − ISI � (Qs(Qτ )�)

)�Qs

+λQτ

(5.15)

Our approach predicts ratings in the target domain by Rτ = P τ (Qτ )� once

optimal parameters have been learned. To evaluate the performance of top-N

recommendation, unobserved items are ranked according to the predicted ratings

for each test user.


This section presents a series of experiments that compare the performance of

TSCDR with other state-of-the-art single domain and cross-domain recommenda-

tion approaches.

5.4.1 Dataset

We experimented on three publicly accessible datasets: MovieLens 10M dataset2,

LibraryThing dataset3 and LastFM dataset4. To the best of our knowledge, these

three datasets are the only datasets that include both rating data and tagging

data.

MovieLens 10M (ML) is a movie rating dataset containing 95,580 tags and

over 10 million ratings with a scale of 0.5 to 5, provided by 71,567 users on 10,681

movies.2http://www.grouplens.org/node/733http://www.macle.nl/tud/LT/4http://grouplens.org/datasets/hetrec-2011/

127


LibraryThing (LT) is a book rating dataset that contains over 700,000

ratings (on a scale of 1-5) and 2 million tags by 7,564 users on 39,515 books.

As highlighted in (Fernández-Tobías and Cantador, 2014), we also observed

inconsistent rating scores on the same user-book pairs in the original LT dataset.

To over these inconsistencies, we corrected the ratings by duplicating the rating

in the first record and placing duplicate pairs in the original order.

LastFM (FM) is a user-song dataset released by HetRec in 2011, which

contains social networking, tagging, and music listening count from a set of 2000

users sampled from the Last.fm online music system. Unlike the other two datasets,

which contain explicit rating scores, the FM dataset contains the amount of times

a song was listened to, which is considered to be a type of implicit feedback.

5.4.2 Experiment Setup

Following the strategy adopted in (Fernández-Tobías and Cantador, 2014) for

preprocessing the dataset, we filtered the original dataset by removing records

without either rating scores or tags. To reduce redundancy and ambiguity in the

tagging data, we further stemmed the tags to remove meaningless tags with only

numbers or non-alphabetic characters.

Note that the scope of this study focuses on learning the relatedness between

heterogeneous domains through tags. Therefore, we used the constraints in (Gedikli

and Jannach, 2013) to vary the quality of tag information and investigate its

effect on recommendation performance. Specifically, we created two different

versions of each dataset by adjusting the threshold of the constraints listed in

Table 5.2. The resulting dataset variations are shown in Table 5.3. Based on

128


Table 5.2 The tag filtering quality constraints (Gedikli and Jannach, 2013)

Constraint Description

Min Users/Tag (U/T) Minimum number of users per tag.

Min Items/Tag (I/T) Minimum number of items per tag.

Min Tags/User (T/U) Minimum number of tags a user has specified.

Min Items/User (I/U) Minimum number of items rated by a user.

Min Tags/Item (T/I) Minimum number of tags applied to an item.

Min Users/Item (U/I) Minimum number of users that rated an item.

sparsity conditions, we generated the following six related domain pairs to conduct

experiments: ML-high vs LT-high, ML-high vs FM-high, ML-high vs LT-low,

ML-high vs FM-low, LT-high vs ML-low, LT-high vs FM-low. In each domain

pair, the former was considered to be the source domain, and the latter was

treated as the target domain.

To measure the performance of different approaches in top-N recommendation

task, we applied leave-one-out method for validation. For each user, we randomly

chose one of his/her interaction data as the test data with the remaining used

for training. Additionally, to tune the hyper-parameters for each baseline, we

randomly chose one interaction data for each user from the training data to

create the validation data. During the evaluation, we followed (He et al., 2017)

by generating negative items to estimate ranking performance. Specifically, we

ranked each user’s test item alongside with 100 negative items that had not been

unseen by the user. We run all experiments five times and reported the average

results.

129


We set the latent factors to a uniform size for all methods for a fair comparison.

In our implementation, the size of latent factor is set to 20 as a trade-off between

computational cost and efficiency. The other methods were set to their best

parameters as reported in the corresponding literature.

We set the tag vector dimensionality to 100 and discarded tags that were

used by less than five user-item pairs so as to learn effective tag embeddings

with word2vect technique. Further, we defined the context of the current tag

by considering ten surrounding tags—five tags before and five tags after the

current tag. Selection of the regularization parameters λu and λi is discussed in

Subsection 5.4.5.1.

5.4.3 Evaluation Metrics

Hit ratio (HR) and normalized discounted cumulative gain (NDCG) were selected

as evaluation metrics to judge the quality of a ranked recommendation list.

HR measures whether the test item is present in the top-N recommendation

lists, which is defined as:

HR = #hits

#users(5.16)

where #hits counts the number of users whose test items are successfully recalled

in the top-N recommendation list and #users is the total number of test users.

NDCG penalizes the position of a test item in the recommendation list. It

assumes that the lower the position of a test item, the less useful it is to the user.

This metric is defined as:

NDCG = DCG

IDCGDCG =

N∑j=1

2relj − 1log2(j + 1) (5.17)

130

5.4E

xperiments

andanalysis

Table 5.3 Dataset Variations for ML, LT and FM

NameMinimum Constraints Resulting Dataset

U/T I/T T/U I/U T/I U/I Users Items Ratings Tags Sparsity

ML-high 3 4 11 12 7 5 183 531 6142 1076 93.7%

ML-low 1 2 5 6 3 2 369 2177 14735 3100 98.2%

LT-high 45 83 53 102 19 20 835 1768 83216 850 94.4%

LT-low 20 40 25 50 9 10 2897 10917 356430 1700 98.9%

FM-high 4 7 14 11 7 3 349 590 5386 478 97.4%

FM-low 2 3 7 5 3 1 747 3367 12641 1127 99.5%

131


where relj is the relevance score of the test item at position j, and relj ∈ {0, 1}depending on whether the item appears in the top-N ranked list.

We truncated the recommendation list at 10 for each metric, which is a

commonly accepted size in the literature (He et al., 2017). However, we also

varied the length of the ranked list and tested how that affects recommendation

performance (see Subsection 5.4.5.4).

5.4.4 Baselines

The following methods were chosen as baselines to evaluate TSCDR:

ContextWalk (Bogers, 2010) is a graph-based approach. It models ternary

relationships <user-item-tag> in an undirected graph. Nodes represent users,

items and tags and the edges between the nodes represent their corresponding

relationships.

BPR (Rendle et al., 2009) is a personalized ranking approach that optimizes

matrix factorization with a pairwise loss function. It is a highly competitive

method for item recommendation task.

TagiCoFi (Zhen et al., 2009) is a tag-based single domain recommendation

approach. It learns the similarities between users based on tags, and then adds user

similarity matrix into PMF (Mnih and Salakhutdinov, 2008) model to regularize

matrix factorization.

TagCDCF (Shi et al., 2011) is a tag-induced cross-domain recommendation

approach. It links two disjoint domains through overlapping tags. Based on the

profiles defined with overlapping tags, it learns cross-domain user-to-user and

132


item-to-item similarity matrices as prior knowledge to regularize the joint matrix

factorization.

In addition to above state-of-the-art methods, we also experimented with two

schemes proposed in this chapter:

joint topic mining (JTM) , which uses joint topics to link users and items

across domains so that cross-domain similarity can be calculated based on topics

rather than tags, as described in Subsection 5.3.1; and

topic alignment (TA) , which combines topic modelling with random walk

to learn implicit relationships among cross-domain users and among cross-domain

items, as described in Subsection 5.3.2.

5.4.5 Experiment Results and Analysis

Here, we illustrate the performance of TSCDR model from different perspectives.

5.4.5.1 The Effect of Regularization Parameters

Given the regularization parameters: λu and λi control the contributions of cross-

domain similarities in regularizing the joint matrix factorization, it is important

to test their impact on recommendation performance. Due to space limitations,

we have used LT-high vs ML-low as a representative example to show the

recommendation performance effected by λu and λi. The evaluation is measured

by HR@10 and NDCG@10.

We followed parameter tuning process in (Shi et al., 2011) by fixing λi = 0

first and adjusting the value of λu within the range of [0.001, 0.01, 0.1, 1] to study

the impact of λu, as shown in Figure 5.4a. TSCDR achieved the best result when

133


(a) Effect of λu on LT-high VS ML-low

(b) Effect of λi on LT-high VS ML-low

Fig. 5.4 Performance of HR@10 and NDCG@10 w.r.t λu and λi

134


Fig. 5.5 Performance of HR@10 and NDCG@10 w.r.t number of tag clusters

λu = 0.1. Given the value λu = 0.1, we then gradually varied the value of λi to test

its impact on recommendation performance. The results are shown in Figure 5.4b.

HR@10 and NDCG@10 reached the peak performance when λi = 0.1 and the

peak values in Figure 5.4b are higher than Figure 5.4a, indicating that both the

user and item similarities are playing an active role in transferring knowledge

across domains to improve recommendation performance.

From these two observations, we found that TSCDR achieves good results

when λu and λv vary over a wide range, suggesting that the TSCDR model does

not tend to fall to the local optimum when searching for global solutions.

135


5.4.5.2 The Effect of Tag Clusters

One of the goals of this study is to explore the role of tag clusters as features to

differentiate users and items across domains. Again, we chose LT-high vs ML-low

as a representative example. We set λu = 0.1 and λi = 0.1 as optimal parameters.

Figure 5.5 shows the effect of varying the number of tag clusters.

It is clear that recommendation performance is correlated to the clustering

result of the tag embeddings generated by the word2vec technique. Setting a

smaller or larger number of clusters does not help to group similar tags, which

in turn contributes to fewer weights when distinguishing users and items across

domains. As Figure 5.5 shows, the best result was achieved with 4 tag clusters

as the number of clusters increasing from 2 to 10 because users and items are

partitioned more accurately according to the tags in their profiles. Additionally,

we found that recommendation performance does not change significantly with

a further increase in the number of clusters from 10 to 20. This fact indicates

that the model may be approaching the point of overfitting the training data with

such a large number of tag clusters.

5.4.5.3 Comparison with the Baselines

Table 5.4 shows the recommendation performance of TSCDR and other baselines

using HR@10 and NDCG@10.

TSCDR achieves the best recommendation performance with most domain

pairs in terms of both metrics. It outperformed the next best performing baseline

method BPR by a large margin (e.g., an 11.8% HR@10 and 23.1% NDCG@10

improvement for domain pair LT-high vs ML-low). However, we also noticed

136

5.4E

xperiments

andanalysis

Table 5.4 Comparison of TSCDR with other baselines

ML-high vs LT-high ML-high vs FM-high ML-high vs LT-low ML-high vs FM-low LT-high vs ML-low LT-high vs FM-low

HR@10 NDCG@10 HR@10 NDCG@10 HR@10 NDCG@10 HR@10 NDCG@10 HR@10 NDCG@10 HR@10 NDCG@10

ContextWalk 0.196 0.093 0.289 0.133 0.217 0.102 0.185 0.079 0.142 0.065 0.19 0.083

TagiCoFi 0.008 0.003 0.286 0.154 0.01 0.003 0.475 0.287 0.004 0.001 0.427 0.268

BPR 0.534 0.311 0.382 0.209 0.527 0.307 0.457 0.268 0.297 0.156 0.431 0.257

TagCDCF 0.141 0.068 0.079 0.034 0.136 0.068 0.05 0.021 0.076 0.035 0.059 0.027

JTM 0.194 0.113 0.08 0.038 0.181 0.108 0.038 0.018 0.171 0.077 0.125 0.066

TA 0.14 0.076 0.07 0.037 0.143 0.068 0.044 0.02 0.143 0.069 0.07 0.032

TSCDR 0.478 0.292 0.527 0.327 0.48 0.294 0.617 0.438 0.332 0.192 0.597 0.417

137


that BPR performed better than TSCDR with the domain pairs ML-high vs

LT-high and ML-high vs LT-low, where the source and target domains share

the same format of explicit feedbacks. This may be due to the size of dataset in

the source domain, which is much smaller than that of the target domain. Even

though the rating ratio of the source domain is more dense. Without sufficient

data in the source domain, there is not enough knowledge to transfer to the

target domain to improve recommendation performance. However, TSCDR still

outperformed the other cross-domain and single-domain recommendation methods

in these cases, indicating the effectiveness of building more knowledge transfer

bridges by correlating tags with deep semantic information.

JTM and TA performed slightly better than TagCDCF with most domain

pairs. The resulting abstract features in the form of topics are much more

effective representations for personalized recommendation. However, maintaining

performance with different parameter settings remains as a challenge to be further

investigated.

5.4.5.4 The Impact of Recommendation List Size

To determine whether item ranking performance is sensitive to the size of a

recommendation list, we changed the size of the top-N lists from 10 to 50 in step

10. Figures 5.6 and 5.7 show the evaluation rankings of all methods in terms of

HR@N and NDCG@N.

Figures 5.6 and 5.7 show similar trends for all the methods when the value of

N changed. That is, performance gradually improves for the top 10 to the top

50 items in a recommendation list. JTM outperforms TagCDCF in all domain

pairs, indicating that modelling topics, not tags, is more effective in bridging the

138

5.4E

xperiments

andanalysis

(a) ML-high VS LT-high (b) ML-high VS FM-high (c) LT-high VS ML-low

(d) ML-high VS LT-low (e) ML-high VS FM-low (f) LT-high VS FM-low

Fig. 5.6 Performance of top-N recommendation in terms of HR@N where N ranges from 10 to 50

139

5.4E

xperiments

andanalysis

(a) ML-high VS LT-high (b) ML-high VS FM-high (c) LT-high VS ML-low

(d) ML-high VS LT-low (e) ML-high VS FM-low (f) LT-high VS FM-low

Fig. 5.7 Performance of top-N recommendation in terms of NDCG@N where N ranges from 10 to 50

140

5.5 Summary

domains. To our surprise, there was not a significant difference between TA and

TagCDCF in most cases. This may be because the sparse connection between

topic layers is only built at tag level. For most domain pairs, TSCDR showed a

consistent improvement over the baselines at different positions, highlighting the

superior performance of our method in top-N recommendation tasks.

5.5 Summary

In this chapter, we aim to exploit semantic information to correlate those se-

mantically equivalent but non-identical tags, so that a close relation between

heterogeneous domains can be established by linking tags of different domains

and the knowledge in source domain can be transferred to the target domain to

address data sparsity problem. To this end, we have proposed a new tag-based

cross-domain recommendation algorithm, namely TSCDR, which unifies word2vec

and matrix factorization in a simple, extensible framework. We devised a new

feature space spanning across disjoint domains by grouping semantically equivalent

tags. TSCDR exploits the learned tag clusters to infer a more accurate cross-

domain similarity and utilize it to regularize joint matrix factorization. Extensive

experiments have been conducted to justify the promising performance of TSCDR

in top-N recommendation task.

141

Chapter 6

Conclusions and future work

This chapter presents the conclusions derived from the entire thesis. In Section 6.1

the main contributions of this thesis are summarized, and in Section 6.2, some

possible research directions are provided for future work.

6.1 Conclusions

Cross-domain recommender system is a new research topic and have attracted much

attention in recent years due to its effectiveness in alleviating the data sparsity

problem in the recommender systems. In this thesis, one of the major challenges in

the development of cross-domain recommender systems has been addressed, which

is to automatically build a bridge (domain link) between the involved domains

for transferring knowledge. As a problem setting for two heterogeneous domains,

the correspondence between cross-domain users and between cross-domain items

is not provided. In this context, user-generated tags are studied to link different

domains explicitly. However, how to exploit tags in establishing correspondence

142

6.1 Conclusions

between heterogeneous domains for improving recommendation performance still

remains as an open challenge and needs to be investigated extensively. The main

contributions of this thesis are summarized as follows:

(1) The development of an enhanced tag-induced cross-domain collaborative

filtering model (Chapter 3) by exploiting abundant domain-specific tags.

(Research Objective 1 is achieved)

An enhanced tag-induced cross-domain collaborative filtering model is pre-

sented in which abundant domain-specific tags, not limited overlapping

tags, are utilized to increase connections between heterogeneous domains.

To align diverse domain specific tags, spectral clustering together with tag

co-occurrence patterns are exploited to group domain-specific tags. Based

on the tag clusters, a new user and item profile can be defined and utilized

to compute cross-domain similarity for regularizing knowledge transfer. The

experimental results demonstrate that the proposed model is capable of

establishing a strong domain connection to support knowledge transfer when

the number of overlapping tags is scarce. Furthermore, domain-specific

tags are shown to be beneficial for adding more information about user

preferences into recommendations.

(2) The development of a complete tag-induced cross-domain recommendation

model (Chapter 4) by exploiting structural knowledge inferred with tags.


A complete tag-induced cross-domain recommendation model is proposed in

which both inter- and intra-domain correlations are considered as structural

knowledge to promote knowledge transfer. In this model, not only overlap-

143

6.1 Conclusions

ping tags but also domain-specific tags are exploited to play complementary

roles in the establishment of inter-domain correlation. Additionally, intra-

domain similarity between users and between items has also been introduced

by distinguishing the tag distribution in the individual domain, which is

likened to building a compact intra-domain correlation to support knowledge

transfer at a group level. Experiments on three public datasets and with five

state-of-the-art baseline approaches demonstrate that the proposed model

performs well in both rating prediction and item recommendation tasks.

(3) The development of a tag semantically-boosted cross-domain recommen-

dation model (Chapter 5) by exploiting semantic information of tags.


A tag semantically-boosted cross-domain recommendation model is devel-






representation. Derived tag clusters spanning across domains are exploited

as a joint embedding space for aligning heterogeneous domains. By mapping

users and items from both source and target domains to the same embed-

ding space, similar users and items across domains can be identified and

connected. As a result, knowledge is transferred from the source domain to

the target domain via matched users and items to improve recommendation

performance. Experimental results on multiple datasets demonstrate that

144

6.2 Future work

our proposed model outperforms other state-of-the-art baselines in the top-N

recommendation task.

6.2 Future work

Future directions identified in this research can be summarized as follows:

• In Chapter 3, the overlap between disjoint domains was increased with

tag clusters by grouping domain-specific tags. As shown in (Enrich et al.,

2013), irrelevant tags may hinder the improvement of recommendation

performance. The preprocessing step that filters domain-specific tags by

taking into account of tag relevance is not considered in our proposed model.

It is possible that an improved result could be achieved if irrelevant tags are

ignored to avoid introducing noise in clustering.

• Regarding the model proposed in Chapter 4, a simple similarity integration

strategy was designed in the establishment of inter-domain correlation in

which contributions of overlapping tags and domain-specific tags are treated

in the same way. According to the work of (Shambour and Lu, 2012; Slokom

and Ayachi, 2017), a delicate similarity fusion method may be more useful

for the exploration of tag-induced similarity.

• With respect to Chapter 5, the semantic information of tags was exploited

in improving recommendation performance. It is interesting to extend

the framework of TSCDR to integrate other auxiliary data sources, such

as reviews (Song et al., 2017) and images (McAuley et al., 2015), which

145

6.2 Future work

also contain plentiful semantic information to be mined to develop a more

effective recommendation approach.

• Applying the proposed models on other recommendation applications and

testing on more datasets are other attractive research directions. The

experiments are a strong indicator by which to evaluate the generality and

accuracy of the proposed models.

• It is also planned to integrate the proposed tag-based cross-domain recom-

mendation models into Smart BizSeeker in order to develop a prototype of

an advanced recommendation engine for application.

• Finally, all the models developed in this research are based on matrix factor-

ization, which only deals with user-item interactions in the form of numeric

ratings. In addition to the most common user and item dimensions, there

are many other valuable dimensions in real recommendation scenarios, such

as time, inquiry and price. They are useful for understanding the custom

needs of users if the potential relationship among them can be discovered.

To address multidimensional data, tensor-based models provide a straight-

forward way of integrating context information into recommendation (Frolov

and Oseledets, 2017; Symeonidis, 2016). It is an alternative to consider

tensor factorization in our proposed models.

146

Bibliography

Abel, F., Herder, E., Houben, G.-J., Henze, N., and Krause, D. (2013). Cross-

system user modeling and personalization on the social web. User Modeling

and User-Adapted Interaction, pages 1–41.

Adomavicius, G., Sankaranarayanan, R., Sen, S., and Tuzhilin, A. (2005). Incorpo-

rating contextual information in recommender systems using a multidimensional

approach. ACM Transactions on Information Systems, 23(1):103–145.

Adomavicius, G. and Tuzhilin, A. (2005). Toward the next generation of recom-

mender systems: A survey of the state-of-the-art and possible extensions. IEEE

Transactions on Knowledge and Data Engineering, 17(6):734–749.

147

BIBLIOGRAPHY

Adomavicius, G. and Tuzhilin, A. (2011). Context-aware recommender systems.

In Recommender systems handbook, pages 217–253. Springer.

Argyriou, A., Evgeniou, T., and Pontil, M. (2007). Multi-task feature learning.

In Advances in Neural Information Processing Systems, pages 41–48.

Barragáns-Martínez, A. B., Costa-Montenegro, E., Burguillo, J. C., Rey-López,

M., Mikic-Fonte, F. A., and Peleteiro, A. (2010). A hybrid content-based and

item-based collaborative filtering approach to recommend tv programs enhanced

with singular value decomposition. Information Sciences, 180(22):4290–4311.

Beel, J., Gipp, B., Langer, S., and Breitinger, C. (2016). paper recommender sys-

tems: a literature survey. International Journal on Digital Libraries, 17(4):305–

338.

Behbood, V., Lu, J., and Zhang, G. (2011). Long term bank failure prediction

using fuzzy refinement-based transductive transfer learning. In 2011 IEEE

International Conference on Fuzzy Systems, pages 2676–2683.

148

BIBLIOGRAPHY

Behbood, V., Lu, J., and Zhang, G. (2014). Fuzzy refinement domain adaptation

for long term prediction in banking ecosystem. IEEE Transactions on Industrial

Informatics, 10(2):1637–1646.

Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A

review and new perspectives. IEEE Transactions on Pattern Analysis and

Machine Intelligence, 35(8):1798–1828.

Bengio, Y., Courville, A. C., and Vincent, P. (2012). Unsupervised feature learning

and deep learning: A review and new perspectives. CoRR, abs/1206.5538, pages

1–30.

Bengio, Y. et al. (2009). Learning deep architectures for ai. Foundations and

trends® in Machine Learning, 2(1):1–127.

Bogers, T. (2010). Movie recommendation using random walks over the contextual

graph. In Proceedings of the 2nd International Workshop on Context-Aware

Recommender Systems, pages 1–5.

149

BIBLIOGRAPHY

Bouadjenek, M. R., Hacid, H., Bouzeghoub, M., and Vakali, A. (2013). Using

social annotations to enhance document representation for personalized search.

In Proceedings of the 36th international ACM SIGIR Conference on Research

and Development in Information Retrieval, pages 1049–1052.

Brin, S. and Page, L. (2012). Reprint of: The anatomy of a large-scale hypertextual

web search engine. Computer Networks, 56(18):3825–3833.

Burke, R. (2002). Hybrid recommender systems: Survey and experiments. User

Modeling and User-Adapted Interaction, 12(4):331–370.

Burke, R. (2005). Hybrid systems for personalized recommendations. Intelligent

Techniques for Web Personalization, pages 133–152.

Burke, R. (2007). The adaptive web. chapter Hybrid Web Recommender Systems,

pages 377–408.

Cantador, I., Bellogín, A., and Vallet, D. (2010). Content-based recommenda-

tion in social tagging systems. In Proceedings of the 4th ACM conference on

Recommender systems, pages 237–240.

150

BIBLIOGRAPHY

Cantador, I., Fernández-Tobías, I., Berkovsky, S., and Cremonesi, P. (2015).

Cross-domain recommender systems. In Recommender Systems Handbook,

pages 919–959. Springer.

Cao, B., Liu, N. N., and Yang, Q. (2010). Transfer learning for collective link

prediction in multiple heterogenous domains. In Proceedings of the 27th Inter-

national Conference on Machine Learning, pages 159–166.

Cao, D., He, X., Nie, L., Wei, X., Hu, X., Wu, S., and Chua, T.-S. (2017). Cross-

platform app recommendation by jointly modeling ratings and texts. ACM

Transactions on Information Systems, 35(4):37.

Chakraverty, S. and Saraswat, M. (2017). Review based emotion profiles for cross

domain recommendation. Multimedia Tools and Applications, pages 1–24.

Chatzis, S. (2013). Nonparametric bayesian multitask collaborative filtering.

In Proceedings of the 22nd ACM international conference on Conference on

Information and Knowledge Management, pages 2149–2158.

151

BIBLIOGRAPHY

Cho, K., Van Merriënboer, B., Bahdanau, D., and Bengio, Y. (2014a). On the

properties of neural machine translation: Encoder-decoder approaches. arXiv

preprint arXiv:1409.1259.

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk,

H., and Bengio, Y. (2014b). Learning phrase representations using rnn encoder-

decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.

Codina, V., Ricci, F., and Ceccaroni, L. (2013a). Exploiting the semantic similarity

of contextual situations for pre-filtering recommendation. In International

Conference on User Modeling, Adaptation, and Personalization, pages 165–177.

Springer.

Codina, V., Ricci, F., and Ceccaroni, L. (2013b). Semantically-enhanced pre-

filtering for context-aware recommender systems. In Proceedings of the 3rd

Workshop on Context-awareness in Retrieval and Recommendation, pages 15–18.

ACM.

Cook, D., Feuz, K. D., and Krishnan, N. C. (2013). Transfer learning for activity

recognition: A survey. Knowledge and Information Systems, 36(3):537–556.

152

BIBLIOGRAPHY

Cremonesi, P., Tripodi, A., and Turrin, R. (2011). Cross-domain recommender

systems. In Proceedings of the 11th IEEE International Conference on Data

Mining Workshops, pages 496–503.

Dai, W., Yang, Q., Xue, G.-R., and Yu, Y. (2007). Boosting for transfer learning.

In Proceedings of the 24th International Conference on Machine Learning, pages

193–200.

Dai, W., Yang, Q., Xue, G.-R., and Yu, Y. (2008). Self-taught clustering. In

Proceedings of the 25th International Conference on Machine Learning, pages

200–207.

De Gemmis, M., Lops, P., Semeraro, G., and Basile, P. (2008). Integrating tags

in a semantic content-based recommender. In Proceedings of the 2nd ACM

conference on Recommender Systems, pages 163–170.

Deng, L., Yu, D., et al. (2014). Deep learning: methods and applications. Foun-

dations and Trends® in Signal Processing, 7(3):197–387.

153

BIBLIOGRAPHY

Deselaers, T., Hasan, S., Bender, O., and Ney, H. (2009). A deep learning approach

to machine transliteration. In Proceedings of the 4th Workshop on Statistical

Machine Translation, pages 233–241.

Ding, C. and He, X. (2004). K-means clustering via principal component analysis.

In Proceedings of the twenty-first international conference on Machine learning,

pages 29–35.

Elkahky, A. M., Song, Y., and He, X. (2015). A multi-view deep learning approach

for cross domain user modeling in recommendation systems. In Proceedings of

the 24th International Conference on World Wide Web, pages 278–288.

Enrich, M., Braunhofer, M., and Ricci, F. (2013). Cold-start management with

cross-domain collaborative filtering and tags. In Proceedings of the International

Conference on Electronic Commerce and Web Technologies, pages 101–112.

Fan, W., Davidson, I., Zadrozny, B., and Yu, P. S. (2005). An improved cate-

gorization of classifier’s sensitivity on sample selection bias. In Data Mining,

Fifth IEEE International Conference on, pages 4–11.

154

BIBLIOGRAPHY

Fang, Z., Gao, S., Li, B., Li, J., and Liao, J. (2015). Cross-domain recommendation

via tag matrix transfer. In Proceedings of IEEE International Conference on

Data Mining Workshop, pages 1235–1240.

Fernández-Tobías, I. (2017). Matrix factorization models for cross-domain recom-

mendation: Addressing the cold start in collaborative filtering. PhD thesis.

Fernández-Tobías, I. and Cantador, I. (2014). Exploiting social tags in matrix

factorization models for cross-domain collaborative filtering. In Proceedings of

the Workshop on New Trends in Content-based Recommender System in RecSys,

pages 34–41.

Fernández-Tobías, I., Cantador, I., Kaminskas, M., and Ricci, F. (2011). A generic

semantic-based framework for cross-domain recommendation. In Proceedings of

the 2nd International Workshop on Information Heterogeneity and Fusion in


Fernández-Tobías, I., Cantador, I., Kaminskas, M., and Ricci, F. (2012). Cross-

domain recommender systems: A survey of the state of the art. In Proceedings

of Spanish Conference on Information Retrieval, pages 24–36.

155

BIBLIOGRAPHY

Frey, B. J. and Dueck, D. (2007). Clustering by passing messages between data

points. Science, 315(5814):972–976.

Frolov, E. and Oseledets, I. (2017). Tensor methods and recommender systems.

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7(3):1–

41.

Gao, J., Fan, W., Jiang, J., and Han, J. (2008). Knowledge transfer via multiple

model local structure mapping. In Proceedings of the 14th ACM SIGKDD

International Conference on Knowledge Discovery and Data Mining, pages

283–291.

Gao, S., Luo, H., Chen, D., Li, S., Gallinari, P., and Guo, J. (2013). Cross-

domain recommendation via cluster-level latent factor model. In Joint European

Conference on Machine Learning and Knowledge Discovery in Databases, pages

161–176.

Gedikli, F. and Jannach, D. (2013). Improving recommendation accuracy based

on item-specific tag preferences. ACM Transactions on Intelligent Systems and

Technology, 4(1):11–19.

156

BIBLIOGRAPHY

George, T. and Merugu, S. (2005). A scalable collaborative filtering framework

based on co-clustering. In Proceedings of the 5th IEEE international conference

on Data Mining, pages 4–12.

Ghazanfar, M. A. and Prugel-Bennett, A. (2010). A scalable, accurate hybrid

recommender system. In Proceedings of the 3d International Conference on

Knowledge Discovery and Data Mining, pages 94–98.

Glorot, X., Bordes, A., and Bengio, Y. (2011). Domain adaptation for large-scale

sentiment classification: A deep learning approach. In Proceedings of the 28th

International Conference on Machine Learning, pages 513–520.

Gong, B., Grauman, K., and Sha, F. (2014). Learning kernels for unsupervised

domain adaptation with applications to visual object recognition. International

Journal Computer Vision, 109(1-2):3–27.

Graves, A. and Jaitly, N. (2014). Towards end-to-end speech recognition with

recurrent neural networks. In Proceedings of the 31st International Conference

on Machine Learning, pages 1764–1772.

157

BIBLIOGRAPHY

Graves, A., Mohamed, A.-r., and Hinton, G. (2013). Speech recognition with

deep recurrent neural networks. In Proceedings of the 2013 IEEE International

Conference on Acoustics, Speech and Signal Processing, pages 6645–6649.

Grčar, M., Mladenič, D., Fortuna, B., and Grobelnik, M. (2005). Data sparsity

issues in the collaborative filtering framework. In International Workshop on

Knowledge Discovery on the Web, pages 58–76.

Hao, P., Zhang, G., and Lu, J. (2016). Enhancing cross domain recommendation

with domain dependent tags. In Proceedings of the 2016 IEEE International

Conference on Fuzzy Systems, pages 1266–1273.

Hariri, N., Mobasher, B., Burke, R., and Zheng, Y. (2011). Context-aware

recommendation based on review mining. In Proceedings of the 9th Workshop

on Intelligent Techniques for Web Personalization and Recommender Systems,

page 30.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image

recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision

and Pattern Recognition, pages 770–778.

158

BIBLIOGRAPHY

He, X., Liao, L., Zhang, H., Nie, L., Hu, X., and Chua, T.-S. (2017). Neural

collaborative filtering. In Proceedings of the 26th International Conference on

World Wide Web, pages 173–182.

Herlocker, J. L., Konstan, J. A., Borchers, A., and Riedl, J. (1999). An algorithmic

framework for performing collaborative filtering. In Proceedings of the 22nd

annual international ACM SIGIR conference on Research and development in

information retrieval, pages 230–237.

Hidasi, B. and Tikk, D. (2012). Fast als-based tensor factorization for context-

aware recommendation from implicit feedback. Machine Learning and Knowledge

Discovery in Databases, pages 67–82.

Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-r., Jaitly, N., Senior, A.,

Vanhoucke, V., Nguyen, P., Sainath, T. N., et al. (2012). Deep neural networks

for acoustic modeling in speech recognition: The shared views of four research

groups. IEEE Signal Processing Magazine, 29(6):82–97.

Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006). A fast learning algorithm for

deep belief nets. Neural computation, 18(7):1527–1554.

159

BIBLIOGRAPHY

Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of

data with neural networks. science, 313(5786):504–507.

Hoffman, M. D., Blei, D. M., and Bach, F. (2010). Online learning for latent

dirichlet allocation. In Proceedings of the 23rd International Conference on

Neural Information Processing Systems, pages 856–864.

Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM

Transactions on Information System, 22(1):89–115.

Hu, L., Cao, J., Xu, G., Cao, L., Gu, Z., and Zhu, C. (2013). Personalized

recommendation via cross-domain triadic factorization. In Proceedings of the

22nd International Conference on World Wide Web, pages 595–606.

Huang, J., Gretton, A., Borgwardt, K. M., Schölkopf, B., and Smola, A. J. (2007).

Correcting sample selection bias by unlabeled data. In Advances in Neural

Information Processing Systems, pages 601–608.

Huang, J., Smola, A. J., Gretton, A., Borgwardt, K. M., and Scholkopf, B. (2006).

Correcting sample selection bias by unlabeled data. In Proceedings of the 19th

160

BIBLIOGRAPHY

International Conference on Neural Information Processing Systems, pages

601–608.

Hwangbo, H. and Kim, Y. (2017). An empirical study on the effect of data sparsity

and data overlap on cross domain collaborative filtering performance. Expert

Systems with Applications, 89:254–265.

Jabeen, F., Khusro, S., Majid, A., and Rauf, A. (2016). Semantics discovery in

social tagging systems: A review. Multimedia Tools and Applications, 75(1):573–

605.

Jannach, D. and Adomavicius, G. (2016). Recommendations with a purpose. In

Proceedings of the 10th ACM Conference on Recommender Systems, pages 7–10.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadar-

rama, S., and Darrell, T. (2014). Caffe: Convolutional architecture for fast

feature embedding. In Proceedings of the 22nd ACM International Conference

on Multimedia, pages 675–678.

161

BIBLIOGRAPHY

Jiang, J. (2008). A literature survey on domain adaptation of statistical classifiers.

URL: http://sifaka. cs. uiuc. edu/jiang4/domainadaptation/survey, 3:1–12.

Jiang, J. and Zhai, C. (2007). Instance weighting for domain adaptation in nlp.

In Proceedings of the Association for Computational Linguistics, pages 264–271.

Jiang, M., Cui, P., Yuan, N. J., Xie, X., and Yang, S. (2016). Little is much:

Bridging cross-platform behaviors through overlapped crowds. In Proceedings

of the 30th Association for the Advancement of Artificial Intelligence, pages

13–19.

Jiang, W. and Chung, F.-l. (2012). Transfer spectral clustering. In Joint European

Conference on Machine Learning and Knowledge Discovery in Databases, pages

789–803.

Kemker, R. and Kanan, C. (2017). Self-taught feature learning for hyperspectral

image classification. IEEE Transactions on Geoscience and Remote Sensing,

55(5):2693–2705.

162

BIBLIOGRAPHY

Khan, M. M., Ibrahim, R., and Ghani, I. (2017). Cross domain recommender

systems: A systematic literature review. ACM Computing Surveys, 50(3):36.

Kim, B. M., Li, Q., Park, C. S., Kim, S. G., and Kim, J. Y. (2006). A new

approach for combining content-based and collaborative filters. Journal of

Intelligent Information Systems, 27(1):79–91.

Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv

preprint arXiv:1408.5882.

Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L., Gordon, L. R., and

Riedl, J. (1997). Grouplens: applying collaborative filtering to usenet news.

Communications of the ACM, 40(3):77–87.

Koren, Y. (2008). Factorization meets the neighborhood: A multifaceted collabo-

rative filtering model. In Proceedings of the 14th ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining, pages 426–434.

Koren, Y. and Bell, R. (2015). Advances in collaborative filtering. pages 77–118.

163

BIBLIOGRAPHY

Koren, Y., Bell, R., and Volinsky, C. (2009). Matrix factorization techniques for

recommender systems. Computer, 42(8):30–37.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification

with deep convolutional neural networks. In Advances in Neural Information

Processing Systems, pages 1097–1105.

Kuchaiev, O. and Ginsburg, B. (2017). Training deep autoencoders for collabora-

tive filtering. arXiv preprint arXiv:1708.01715.

Kumar, A., Kumar, N., Hussain, M., Chaudhury, S., and Agarwal, S. (2014).

Semantic clustering-based cross-domain recommendation. In IEEE Symposium

on Computational Intelligence and Data Mining, pages 137–141.

Lampropoulos, A. S., Lampropoulou, P. S., and Tsihrintzis, G. A. (2012). A

cascade-hybrid music recommender system for mobile services based on musical

genre classification and personality diagnosis. Multimedia Tools and Applications,

59(1):241–258.

164

BIBLIOGRAPHY

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature,

521(7553):436–444.

LeCun, Y., Boser, B. E., Denker, J. S., Henderson, D., Howard, R. E., Hubbard,

W. E., and Jackel, L. D. (1990). Handwritten digit recognition with a back-

propagation network. In Proceedings of the Advances in Neural Information


Lekakos, G. and Caravelas, P. (2008). A hybrid approach for movie recommenda-

tion. Multimedia tools and applications, 36(1):55–70.

Li, B. (2011). Cross-domain collaborative filtering: A brief survey. In Proceedings

of the 23rd IEEE International Conference on Tools with Artificial Intelligence,

pages 1085–1086.

Li, B., Yang, Q., and Xue, X. (2009a). Can movies and books collaborate?

cross-domain collaborative filtering for sparsity reduction. In Proceedings of

21st International Joint Conference on Artificial Intelligence, pages 2052–2057.

165

BIBLIOGRAPHY

Li, B., Yang, Q., and Xue, X. (2009b). Transfer learning for collaborative

filtering via a rating-matrix generative model. In Proceedings of the 26th Annual


Li, B., Zhu, X., Li, R., and Zhang, C. (2015). Rating knowledge sharing in cross-

domain collaborative filtering. IEEE transactions on cybernetics, 45(5):1068–

1082.

Li, K. and Principe, J. C. (2017). Transfer learning in adaptive filters: The

nearest-instance-centroid-estimation kernel least-mean-square algorithm. IEEE

Transactions on Signal Processing, 65(24):6520–6535.

Li, W., Duan, L., Xu, D., and Tsang, I. W. (2014). Learning with augmented

features for supervised and semi-supervised heterogeneous domain adaptation.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6):1134–

1148.

Long, M., Wang, J., Ding, G., Pan, S. J., and Philip, S. Y. (2014a). Adaptation

regularization: A general framework for transfer learning. IEEE Transactions

on Knowledge and Data Engineering, 26(5):1076–1089.

166

BIBLIOGRAPHY

Long, M., Wang, J., Ding, G., Shen, D., and Yang, Q. (2014b). Transfer learning

with graph co-regularization. IEEE Transactions on Knowledge and Data

Engineering, 26(7):1805–1818.

Lops, P., De Gemmis, M., and Semeraro, G. (2011). Content-based recommender

systems: State of the art and trends. In Recommender Systems Handbook, pages

73–105.

Lu, J., Behbood, V., Hao, P., Zuo, H., Xue, S., and Zhang, G. (2015a). Transfer

learning using computational intelligence: a survey. Knowledge-Based Systems,

80:14–23.

Lu, J., Wu, D., Mao, M., Wang, W., and Zhang, G. (2015b). Recommender

system application developments: A survey. Decision Support Systems, 74:12 –

32.

Lu, Z., Zhong, E., Zhao, L., Xiang, E. W., Pan, W., and Yang, Q. (2013). Selective

transfer learning for cross domain recommendation. In Proceedings of the 2013

SIAM International Conference on Data Mining, pages 641–649.

167

BIBLIOGRAPHY

Ma, H., Yang, H., Lyu, M. R., and King, I. (2008). Sorec: social recommendation

using probabilistic matrix factorization. In Proceedings of the 17th ACM

Conference on Information and knowledge Management, pages 931–940. ACM.

Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., and Potts, C. (2011).

Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual

Meeting of the Association for Computational Linguistics: Human Language

Technologies, pages 142–150.

Marlin, B. M. (2004). Modeling user rating profiles for collaborative filtering. In

Advances in Neural Information Processing Systems, pages 627–634.

McAuley, J., Targett, C., Shi, Q., and Van Den Hengel, A. (2015). Image-based

recommendations on styles and substitutes. In Proceedings of the 38th Interna-

tional ACM SIGIR Conference on Research and Development in Information

Retrieval, pages 43–52.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Dis-

tributed representations of words and phrases and their compositionality. In


168

BIBLIOGRAPHY

Mnih, A. and Salakhutdinov, R. (2008). Probabilistic matrix factorization. In


Mooney, R. J. and Roy, L. (2000). Content-based book recommending using

learning for text categorization. In Proceedings of the 5th ACM conference on

Digital libraries, pages 195–204.

Moreno, O., Shapira, B., Rokach, L., and Shani, G. (2012). Talmud: transfer

learning for multiple domains. In Proceedings of the 21st ACM International

Conference on Information and Knowledge Management, pages 425–434.

Nguyen, H. T., Wistuba, M., Grabocka, J., Drumond, L. R., and Schmidt-Thieme,

L. (2017). Personalized deep learning for tag recommendation. In Proceedings

of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages

186–197.

Page, L., Brin, S., Motwani, R., and Winograd, T. (1999). The pagerank citation

ranking: Bringing order to the web. Technical report, Stanford InfoLab.

169

BIBLIOGRAPHY

Pan, S. J., Kwok, J. T., and Yang, Q. (2008). Transfer learning via dimensionality

reduction. In AAAI, volume 8, pages 677–682.

Pan, S. J., Ni, X., Sun, J.-T., Yang, Q., and Chen, Z. (2010a). Cross-domain

sentiment classification via spectral feature alignment. In Proceedings of the

19th international conference on World wide web, pages 751–760.

Pan, S. J., Tsang, I. W., Kwok, J. T., and Yang, Q. (2011a). Domain adaptation

via transfer component analysis. IEEE Transactions on Neural Networks,

22(2):199–210.

Pan, S. J. and Yang, Q. (2010). A survey on transfer learning. IEEE Transactions

on Knowledge and Data Engineering, 22(10):1345–1359.

Pan, W. (2016). A survey of transfer learning for collaborative recommendation

with auxiliary data. Neurocomputing, 177:447–453.

Pan, W., Liu, N. N., Xiang, E. W., and Yang, Q. (2011b). Transfer learning to

predict missing ratings via heterogeneous user feedbacks. In Proceedings 23rd

of International Joint Conference on Artificial Intelligence, pages 2318–2323.

170

BIBLIOGRAPHY

Pan, W. and Ming, Z. (2014). Interaction-rich transfer learning for collaborative

filtering with heterogeneous user feedback. IEEE Intelligent Systems, 29(6):48–

54.

Pan, W., Xiang, E. W., Liu, N. N., and Yang, Q. (2010b). Transfer learning in

collaborative filtering for sparsity reduction. In Proceedings of the 24th AAAI

Conference on Artificial Intelligence, volume 10, pages 230–235.

Pan, W., Xiang, E. W., and Yang, Q. (2012). Transfer learning in collaborative

filtering with uncertain ratings. In Proceedings of the 26th AAAI Conference

on Artificial Intelligence, pages 662–668.

Pan, W. and Yang, Q. (2013). Transfer learning in heterogeneous collaborative

filtering domains. Artificial Intelligence, 197(0):39–55.

Panniello, U., Tuzhilin, A., Gorgoglione, M., Palmisano, C., and Pedone, A.

(2009). Experimental comparison of pre-vs. post-filtering approaches in context-

aware recommender systems. In Proceedings of the 3rd ACM conference on


171

BIBLIOGRAPHY

Park, D. H., Kim, H. K., Choi, I. Y., and Kim, J. K. (2012). A literature

review and classification of recommender systems research. Expert Systems with

Applications, 39(11):10059–10072.

Pazzani, M. J. and Billsus, D. (2007). Content-based recommendation systems.

In The Adaptive Web, pages 325–341.

Raina, R., Battle, A., Lee, H., Packer, B., and Ng, A. Y. (2007). Self-taught

learning: transfer learning from unlabeled data. In Proceedings of the 24th


Ramirez-Garcia, X. and García-Valdez, M. (2014). Post-filtering for a restaurant

context-aware recommender system. In Recent Advances on Hybrid Approaches

for Designing Intelligent Systems, pages 695–707.

Rendle, S., Freudenthaler, C., Gantner, Z., and Schmidt-Thieme, L. (2009). Bpr:

Bayesian personalized ranking from implicit feedback. In Proceedings of the

25th Conference on Uncertainty in Artificial Intelligence, pages 452–461.

172

BIBLIOGRAPHY

Rendle, S., Gantner, Z., Freudenthaler, C., and Schmidt-Thieme, L. (2011). Fast

context-aware recommendations with factorization machines. In Proceedings of

the 34th international ACM SIGIR conference on Research and development in

Information Retrieval, pages 635–644.

Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., and Riedl, J. (1994). Grou-

plens: An open architecture for collaborative filtering of netnews. In Proceedings

of the 1994 ACM Conference on Computer Supported Cooperative Work, pages

175–186.

Ricci, F., Rokach, L., and Shapira, B. (2011). Introduction to recommender

systems handbook. In Recommender systems handbook, pages 1–35.

Rohrbach, M., Ebert, S., and Schiele, B. (2013). Transfer learning in a transductive

setting. In Advances in Neural Information Processing Systems, pages 46–54.

Salakhutdinov, R., Mnih, A., and Hinton, G. (2007). Restricted boltzmann

machines for collaborative filtering. In Proceedings of the 24th international

conference on Machine learning, pages 791–798.

173

BIBLIOGRAPHY

Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. (2000). Application of

dimensionality reduction in recommender system-a case study. Technical report,

DTIC Document.

Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. (2001). Item-based collaborative

filtering recommendation algorithms. In Proceedings of the 10th international

conference on World Wide Web, pages 285–295.

Sarwar, B. M. (2001). Sparsity, Scalability, and Distribution in Recommender

Systems. PhD thesis. AAI9994525.

Schafer, J. B., Frankowski, D., Herlocker, J., and Sen, S. (2007). Collaborative

Filtering Recommender Systems, pages 291–324.

Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural

Networks, 61:85–117.

Sedhain, S., Menon, A. K., Sanner, S., and Xie, L. (2015). Autorec: Autoencoders

meet collaborative filtering. In Proceedings of the 24th International Conference

on World Wide Web, pages 111–112.

174

BIBLIOGRAPHY

Shambour, Q. and Lu, J. (2012). A trust-semantic fusion-based recommendation

approach for e-business applications. Decision Support System, 54(1):768–780.

Shao, L., Zhu, F., and Li, X. (2015). Transfer learning for visual categorization:

A survey. IEEE Transactions on Neural Networks and Learning Systems,

26(5):1019–1034.

Shapira, B., Rokach, L., and Freilikhman, S. (2013). Facebook single and cross

domain data for recommendation systems. User Modeling and User-Adapted

Interaction, pages 1–37.

Shepitsen, A., Gemmell, J., Mobasher, B., and Burke, R. (2008). Personalized

recommendation in social tagging systems using hierarchical clustering. In

Proceedings of the 2008 ACM conference on Recommender systems, pages 259–

266.

Shi, X., Liu, Q., Fan, W., Philip, S. Y., and Zhu, R. (2010). Transfer learning on

heterogenous feature spaces via spectral transformation. In Proceedings of the

10th International Conference on Data Mining, pages 1049–1054.

175

BIBLIOGRAPHY

Shi, Y., Larson, M., and Hanjalic, A. (2011). Tags as bridges between domains:

Improving recommendation with tag-induced cross-domain collaborative filter-

ing. In Proceedings of the 19th International Conference on User Modeling,

Adaption and Personalization, pages 305–316.

Shi, Y., Larson, M., and Hanjalic, A. (2013a). Exploiting social tags for cross-

domain collaborative filtering. arXiv:1302.4888v2.

Shi, Y., Larson, M., and Hanjalic, A. (2013b). Mining contextual movie similarity

with matrix factorization for context-aware recommendation. ACM Transactions

on Intelligent Systems and Technology, 4(1):1–19.

Shi, Y., Larson, M., and Hanjalic, A. (2014). Collaborative filtering beyond the

user-item matrix: A survey of the state of the art and future challenges. ACM

Computing Surveys, 47(1):1–45.

Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for

large-scale image recognition. arXiv preprint arXiv:1409.1556.

176

BIBLIOGRAPHY

Singh, A. P. and Gordon, G. J. (2008). Relational learning via collective matrix

factorization. In Proceedings of the 14th ACM SIGKDD International Conference

on Knowledge Discovery and Data Mining, pages 650–658.

Slokom, M. and Ayachi, R. (2017). A hybrid user and item based collaborative

filtering approach by possibilistic similarity fusion. In Advances in Combining

Intelligent Methods, pages 125–147.

Socher, R., Lin, C. C., Manning, C., and Ng, A. Y. (2011a). Parsing natural

scenes and natural language with recursive neural networks. In Proceedings of

the 28th International Conference on Machine Learning, pages 129–136.

Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., and Manning, C. D. (2011b).

Semi-supervised recursive autoencoders for predicting sentiment distributions.

In Proceedings of the Conference on Empirical Methods in Natural Language

Processing, pages 151–161.

Song, T., Peng, Z., Wang, S., Fu, W., Hong, X., and Philip, S. Y. (2017). Review-

based cross-domain recommendation through joint tensor factorization. In

177

BIBLIOGRAPHY

International Conference on Database Systems for Advanced Applications, pages

525–540.

Strub, F. and Mary, J. (2015). Collaborative filtering with stacked denoising

autoencoders and sparse inputs. In NIPS workshop on Machine Learning for

E-commerce, pages 1–9.

Su, X. and Khoshgoftaar, T. M. (2009). A survey of collaborative filtering

techniques. Advances in artificial intelligence, 2009:1–19.

Sun, Q., Chattopadhyay, R., Panchanathan, S., and Ye, J. (2011). A two-stage

weighting framework for multi-source domain adaptation. In Proceedings of the


Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to sequence learning

with neural networks. In Proceedings of the Advances in Neural Information


Symeonidis, P. (2016). Matrix and tensor decomposition in recommender systems.

In Proceedings of the 10th ACM Conference on Recommender Systems, pages

178

BIBLIOGRAPHY

429–430.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,

Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions.

In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern

Recognition, pages 1–9.

Tan, S., Bu, J., Qin, X., Chen, C., and Cai, D. (2014). Cross domain recommen-

dation based on multi-type media fusion. Neurocomputing, 127:124–134.

Tang, J., Hu, X., Gao, H., and Liu, H. (2013). Exploiting local and global social

context for recommendation. In Proceedings of the 23rd International Joint

Conference on Artificial Intelligence, pages 2712–2718.

Tang, J., Wu, S., Sun, J., and Su, H. (2012). Cross-domain collaboration recom-

mendation. In Proceedings of the 18th ACM SIGKDD international conference

on Knowledge discovery and data mining, pages 1285–1293.

Tiroshi, A., Berkovsky, S., Kaafar, M. A., Chen, T., and Kuflik, T. (2013). Cross

social networks interests predictions based ongraph features. In Proceedings of

179

BIBLIOGRAPHY

the 7th ACM Conference on Recommender Systems, pages 319–322.

Tong, H., Faloutsos, C., and Pan, J.-Y. (2008). Random walk with restart: fast

solutions and applications. Knowledge and Information Systems, 14(3):327–346.

Tso-Sutter, K. H., Marinho, L. B., and Schmidt-Thieme, L. (2008). Tag-aware rec-

ommender systems by fusion of collaborative filtering algorithms. In Proceedings

of the 2008 ACM symposium on Applied computing, pages 1995–1999.

Verbert, K., Duval, E., Lindstaedt, S., and Gillet, D. (2010). Context-aware

recommender systems. Journal of Universal Computer Science, 16(16):2175–

2178.

Verbert, K., Manouselis, N., Ochoa, X., Wolpers, M., Drachsler, H., Bosnic, I.,

and Duval, E. (2012). Context-aware recommender systems for learning: a

survey and future challenges. IEEE Transactions on Learning Technologies,

5(4):318–335.

Wang, C. and Mahadevan, S. (2008). Manifold alignment using procrustes analysis.

In Proceedings of the 25th International Conference on Machine Learning, pages

180

BIBLIOGRAPHY

1120–1127.

Wang, C. and Mahadevan, S. (2011). Heterogeneous domain adaptation using

manifold alignment. In Proceedings of the 22nd International Joint Conference

on Artificial Intelligence, pages 1541–1546.

Wang, H. and Yeung, D.-Y. (2016). Towards bayesian deep learning: A frame-

work and some existing methods. IEEE Transactions on Knowledge and Data

Engineering, 28(12):3395–3408.

Wang, J., de Vries, A. P., and Reinders, M. J. T. (2006). Unifying user-based and

item-based collaborative filtering approaches by similarity fusion. In Proceedings

of the 29th Annual International ACM SIGIR Conference on Research and

Development in Information Retrieval, pages 501–508.

Wang, W., Chen, Z., Liu, J., Qi, Q., and Zhao, Z. (2012). User-based collaborative

filtering on cross domain by tag transfer learning. In Proceedings of the 1st

International Workshop on Cross Domain Knowledge Discovery in Web and

Social Network Mining, pages 10–17.

181

BIBLIOGRAPHY

Wang, X., Yu, L., Ren, K., Tao, G., Zhang, W., Yu, Y., and Wang, J. (2017).

Dynamic attention deep model for article recommendation by learning human

editors’ demonstration. In Proceedings of the 23rd ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining, pages 2051–2059.

Wang, Z., Song, Y., and Zhang, C. (2008). Transferred dimensionality reduction.

Machine Learning and Knowledge Discovery in Databases, pages 550–565.

Weiss, K., Khoshgoftaar, T. M., and Wang, D. (2016). A survey of transfer

learning. Journal of Big Data, 3(1):1–40.

Weston, J., Ratle, F., Mobahi, H., and Collobert, R. (2012). Deep learning via

semi-supervised embedding. pages 639–655.

Wu, C.-Y., Ahmed, A., Beutel, A., Smola, A. J., and Jing, H. (2017). Recur-

rent recommender networks. In Proceedings of the Tenth ACM International

Conference on Web Search and Data Mining, pages 495–503.

Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun,

M., Cao, Y., Gao, Q., Macherey, K., et al. (2016). Google’s neural machine

182

BIBLIOGRAPHY

translation system: Bridging the gap between human and machine translation.

arXiv preprint arXiv:1609.08144.

Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M., Stolcke, A., Yu, D., and

Zweig, G. (2016). Achieving human parity in conversational speech recognition.


Xu, Z., Chen, C., Lukasiewicz, T., Miao, Y., and Meng, X. (2016). Tag-aware

personalized recommendation using a deep-semantic similarity model with

negative sampling. In Proceedings of the 25th ACM International on Conference

on Information and Knowledge Management, pages 1921–1924.

Yang, D., He, J., Qin, H., Xiao, Y., and Wang, W. (2015). A graph-based

recommendation across heterogeneous domains. In Proceedings of the 24th

ACM International on Conference on Information and Knowledge Management,

pages 463–472.

Yang, D., Xiao, Y., Song, Y., Zhang, J., Zhang, K., and Wang, W. (2014). Tag

propagation based recommendation a cross diverse social media. In Proceedings

of the 23rd International Conference on World Wide Web, pages 407–408.

183

BIBLIOGRAPHY

Yang, X., Guo, Y., and Liu, Y. (2013). Bayesian-inference-based recommendation

in online social networks. IEEE Transactions on Parallel and Distributed

Systems, 24(4):642–651.

Yao, Y. and Doretto, G. (2010). Boosting for transfer learning with multiple

sources. In Proceedings of 2010 IEEE Conference on Computer Vision and

Pattern Recognition, pages 1855–1862.

Yoo, J. and Choi, S. (2009). Weighted nonnegative matrix co-tri-factorization for

collaborative prediction. Advances in Machine Learning, pages 396–411.

Zhang, S., Yao, L., and Sun, A. (2017a). Deep learning based recommender

system: A survey and new perspectives. arXiv preprint arXiv:1707.07435.

Zhang, S., Yao, L., and Xu, X. (2017b). Autosvd++: An efficient hybrid

collaborative filtering model via contractive auto-encoders. arXiv preprint

arXiv:1704.00551.

Zhang, T. and Iyengar, V. S. (2002). Recommender systems using linear classifiers.

Journal of Machine Learning Research, 2:313–334.

184

BIBLIOGRAPHY

Zhang, Y., Cao, B., and Yeung, D.-Y. (2012). Multi-domain collaborative filtering.


Zhang, Y. and Koren, J. (2007). Efficient bayesian hierarchical user modeling for

recommendation system. In Proceedings of the 30th annual international ACM

SIGIR conference on Research and development in information retrieval, pages

47–54.

Zhang, Z., Lin, H., Liu, K., Wu, D., Zhang, G., and Lu, J. (2013). A hybrid

fuzzy-based personalized recommender system for telecom products/services.

Information Science, 235:117–129.

Zhao, L., Pan, S. J., Xiang, E. W., Zhong, E., Lu, Z., and Yang, Q. (2013). Active

transfer learning for cross-system recommendation. In Proceedings of the 27th

AAAI Conference on Artificial Intelligence, pages 1205–1211.

Zhao, L., Pan, S. J., and Yang, Q. (2017). A unified framework of active transfer

learning for cross-system recommendation. Artificial Intelligence, 245:38–55.

185

BIBLIOGRAPHY

Zhao, S., Du, N., Nauerz, A., Zhang, X., Yuan, Q., and Fu, R. (2008). Improved

recommendation based on collaborative tagging behaviors. In Proceedings of

the 13th international conference on Intelligent user interfaces, pages 413–416.

Zhen, Y., Li, W.-J., and Yeung, D.-Y. (2009). Tagicofi: tag informed collaborative

filtering. In Proceedings of the 3rd ACM Conference on Recommender systems,

pages 69–76.

Zheng, Y., Burke, R., and Mobasher, B. (2013). Recommendation with differential

context weighting. In Proceedings of the 21st International Conference on User

Modeling, Adaptation, and Personalization, pages 152–164.

Zheng, Y., Burke, R., and Mobasher, B. (2014a). Splitting approaches for context-

aware recommendation: An empirical study. In Proceedings of the 29th Annual

ACM Symposium on Applied Computing, pages 274–279.

Zheng, Y., Liu, C., Tang, B., and Zhou, H. (2016a). Neural autoregressive

collaborative filtering for implicit feedback. In Proceedings of the 1st Workshop

on Deep Learning for Recommender Systems, pages 2–6.

186

BIBLIOGRAPHY

Zheng, Y., Mobasher, B., and Burke, R. (2014b). Cslim: Contextual slim recom-

mendation algorithms. In Proceedings of the 8th ACM Conference on Recom-

mender Systems, pages 301–304.

Zheng, Y., Tang, B., Ding, W., and Zhou, H. (2016b). A neural autoregressive

approach to collaborative filtering. In Proceedings of the 33rd International

Conference on International Conference on Machine Learning, pages 764–773.

Zhong, E., Fan, W., Peng, J., Zhang, K., Ren, J., Turaga, D., and Verscheure, O.

(2009). Cross domain distribution adaptation via kernel mapping. In Proceedings

of the 15th ACM SIGKDD International Conference on Knowledge Discovery

and Data Mining, pages 1027–1036.

Zuo, Y., Zeng, J., Gong, M., and Jiao, L. (2016). Tag-aware recommender systems

based on deep neural networks. Neurocomputing, 204:51–60.

187

Abbreviations

Roman Symbols

AE AutoEncoder

BPR Bayesian Personalized Ranking

CBT CodeBook Transfer

CDCF Cross-domain Collaborative Filtering

CDRS Cross-domain Recommender System

CDTF Cross Domain Triadic Factorization

CF Collaborative Filtering

CMF Collective Matrix Factorization

CNN Convolutional Neural Network

CST Coordinate System Transfer

CTagCDR Completer Tag-induced Cross Domain Recommendation

DL Deep Learning

188

BIBLIOGRAPHY

DSSM Deep Semantic Similarity Model

ETagCDCF Enhance Tag-induced Cross Domain Collaborative Filtering

GFK Geodesic Flow Kernel

GTagCDCF General Tag-induced Cross Domain Collaborative Filtering

ICF Item-based Collaborative Filtering

JTM Joint Topic Mining

KMM Kernel Mean Matching

MAE Mean Absolute Error

MF Matrix Factorization

MMDE Maximum Mean Discrepancy Embedding

NADE Neural Autoregressive Distribution Estimation

NLP Natural Language Processing

NMF Nonnegative Matrix Factorization

PMF Probabilistic Matrix Factorization

RBM Restricted Boltzmann Machine

RMGM Rating Matrix Generative Model

RMSE Root Mean Square Error

RNN Recurrent Neural Network

189

BIBLIOGRAPHY

SGNS Skip-Gram with Negative Sampling

STC Self-taught Clustering

SVD Singular Vector Decomposition

TA Topic Alignment

Tag-induced Cross Domain Collaborative Filtering

TagiCoFi Tag informed Collaborative Filtering

TCA Transfer Component Analysis

TCF Transfer by Collective Transfer

TL Transfer Learning

TSC Transfer Spectral Clustering

TSCDR Tag Semantic-boosted Cross Domain Recommenation

UCF User-based Collaborative Filtering

WNMCTF Weighted Nonnegative Matrix Co-Tri-Factorization

190

cross-domain recommender system through tag …...cross-domain recommender system through tag-based...

Documents