heterogeneous cross domain ranking in latent space

36
Heterogeneous Cross Domain Ranking in Latent Space Bo Wang Joint work with Jie Tang, Wei Fan and Songcan Chen

Upload: afram

Post on 08-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Heterogeneous Cross Domain Ranking in Latent Space. Bo Wang Joint work with Jie Tang, Wei Fan and Songcan Chen. Framework of Learning to Rank. Example: Academic Network. Ranking over Web 2.0. Traditional Web: standard (long) documents - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Heterogeneous Cross Domain Ranking in Latent Space

Heterogeneous Cross Domain Ranking in Latent Space

Bo Wang

Joint work with Jie Tang, Wei Fan and Songcan Chen

Page 2: Heterogeneous Cross Domain Ranking in Latent Space

Framework of Learning to Rank

Page 3: Heterogeneous Cross Domain Ranking in Latent Space

Example: Academic Network

Page 4: Heterogeneous Cross Domain Ranking in Latent Space

Ranking over Web 2.0

Traditional Web: standard (long) documents relevance measures such as BM25 and Pa

geRank score may play a key role Web 2.0: shorter non-standard docum

ents users' click-through data and users' com

ments might be much more important

Page 5: Heterogeneous Cross Domain Ranking in Latent Space

Heterogeneous transfer ranking

If there isn't sufficient supervision on the domain of interest, how could one borrow labeled information from a related but heterogeneous domain to build an accurate model?

Differences from transfer learning What to transfer Instance type What we care Feature extraction

Page 6: Heterogeneous Cross Domain Ranking in Latent Space

Main Challenges How to formalize the problem in a

unified framework? As both feature distributions and

objects' types in the source domain and the target domain may be different.

How to transfer the knowledge of heterogeneous objects across domains?

How to preserve the preference relationships between instances across heterogeneous data sources?

Page 7: Heterogeneous Cross Domain Ranking in Latent Space

Outline Motivation Problem Formulation Transfer Ranking

Basic Idea The proposed algorithm Generalization bound

Experiment Ranking on Homogeneous data Ranking on Heterogeneous data

Conclusion

Page 8: Heterogeneous Cross Domain Ranking in Latent Space

Problem Formulation Source domain:

Instance space: Rank level set:

where Target domain: and The two domains are heterogeneous but

related Problem Definition: given and ,

the goal is to learn a ranking function for predicting the rank levels of test set

Page 9: Heterogeneous Cross Domain Ranking in Latent Space
Page 10: Heterogeneous Cross Domain Ranking in Latent Space

Outline Motivation Problem Formulation Transfer Ranking

Basic Idea The proposed algorithm Generalization bound

Experiment Ranking on Homogeneous data Ranking on Heterogeneous data

Conclusion

Page 11: Heterogeneous Cross Domain Ranking in Latent Space

Basic Idea Because the feature distributions or even

objects' types may be different across domains, we resort to finding a common latent space in which the preference relationships in source and target domains are all preserved

We can directly use a ranking loss function to evaluate how well the preferences are preserved in that latent space

Optimize the two ranking loss functions simultaneously in order to find the best latent space

Page 12: Heterogeneous Cross Domain Ranking in Latent Space

The Proposed AlgorithmGiven the labeled data in source domain We aim to learn a ranking function which

satisfies:

The ranking loss function can be defined as:

The latent space can be described by

The Framework:

Page 13: Heterogeneous Cross Domain Ranking in Latent Space

Ranking SVM

Page 14: Heterogeneous Cross Domain Ranking in Latent Space
Page 15: Heterogeneous Cross Domain Ranking in Latent Space
Page 16: Heterogeneous Cross Domain Ranking in Latent Space
Page 17: Heterogeneous Cross Domain Ranking in Latent Space

Generalization Bound

Page 18: Heterogeneous Cross Domain Ranking in Latent Space

Scalability Let d is the total number of different

features in two domains, then matrix D is d*d and W is d*2, so it can be applied to very large scale data without too many features

ComplexityRanking SVM training has O((n1 + n2)3) time and O((n1 + n2)2) space complexity, in our algorithm Tr2SVM, T is the maximal iteration number, then Tr2SVM has O((2T +1)(n1 + n2)3) time and O((n1 + n2)2) space complexity for training

Page 19: Heterogeneous Cross Domain Ranking in Latent Space

Outline Motivation Problem Formulation Transfer Ranking

Basic Idea The proposed algorithm Generalization bound

Experiment Ranking on Homogeneous data Ranking on Heterogeneous data

Conclusion

Page 20: Heterogeneous Cross Domain Ranking in Latent Space

Data Set LETOR 2.0

three sub datasets: TREC2003, TREC2004, and OHSUMED query-document pairs collection TREC data: a topic distillation task which aims to find goo

d entry points principally devoted to a given topic OHSUMED data: a collection of records from medical jour

nals LETOR_TR

three sub datasets: TREC2003_TR, TREC2004_TR, and OHSUMED_TR

Page 21: Heterogeneous Cross Domain Ranking in Latent Space

Data Set (Cont’d)

Page 22: Heterogeneous Cross Domain Ranking in Latent Space

Data Set (Cont’d)

Page 23: Heterogeneous Cross Domain Ranking in Latent Space

Experiment Setting Baselines:

Measures:MAP (mean average precision) and NDCG (normalized discount cumulative gain)

Three transfer ranking tasks: From S1 to T1 From S2 to T2 From S3 to T3

Page 24: Heterogeneous Cross Domain Ranking in Latent Space
Page 25: Heterogeneous Cross Domain Ranking in Latent Space
Page 26: Heterogeneous Cross Domain Ranking in Latent Space

Why effective? Why transfer ranking is

effective on LETOR_TR dataset? Because the features used in ranking

already contain relevance information between queries and documents.

Page 27: Heterogeneous Cross Domain Ranking in Latent Space

Outline Motivation Problem Formulation Transfer Ranking

Basic Idea The proposed algorithm Generalization bound

Experiment Ranking on Homogeneous data Ranking on Heterogeneous data

Conclusion

Page 28: Heterogeneous Cross Domain Ranking in Latent Space

Data Set A subset of ArnetMiner: 14,134 authors, 10,716 papers,

and 1,434 conferences. 8 most frequent queries from log file:

'information extraction', 'machine learning', 'semantic web', 'natural language processing', 'support vector machine', 'planning', 'intelligent agents' and 'ontology alignment'

Author collection: For each query, we gathered authors from Libra, Rexa and Ar

netMiner Conference collection:

For each query, we gathered conferences from Libra and ArntetMiner

Evaluation One faculty and two graduates to judge the relevance betw

een query and authors/conferences

Page 29: Heterogeneous Cross Domain Ranking in Latent Space

Feature Definition

All the features are defined between queries and virtual documents

Conference Use all the paper titles published on a

conference to form a conference "document"

Author Use all the paper titles authored by an expert

as the expert's "document"

Page 30: Heterogeneous Cross Domain Ranking in Latent Space

Feature Definition (Cont’d)

Page 31: Heterogeneous Cross Domain Ranking in Latent Space

Experimental Results

Page 32: Heterogeneous Cross Domain Ranking in Latent Space

Why effective?

Why our approach can be effective on the heterogeneous network?Because of latent dependencies between the objects, some common features can still be extracted from the latent dependencies.

Page 33: Heterogeneous Cross Domain Ranking in Latent Space

Conclusion

Page 34: Heterogeneous Cross Domain Ranking in Latent Space

Conclusion (Cont’d) We formally define transfer ranking problem

and propose a general framework We provide a preferred solution under the

regularized framework by simultaneously minimize two ranking loss functions in two domains and derive the generalization bound

The experimental results on LETOR and a heterogeneous academic network verified the effectiveness of the proposed algorithm

Page 35: Heterogeneous Cross Domain Ranking in Latent Space

Future Work

Develop new algorithms under the framework

Reduce the time complexity for online usage

Negative transfer Similarity between queries Actively select similar queries

Page 36: Heterogeneous Cross Domain Ranking in Latent Space

Thanks!

Your Question. Our Passion.